Tag Archives: Troubleshooting

Unable to connect to named SQL instance on remote machine

Recently I was doing installation of K2 5.2 on Azure VMs with SQL server named instance hosted on separate Azure VM. I’ve created SQL Server alias on K2 VM but then run into issue – neither K2 Setup Manager nor SSMS were able to connect to SQL through alias. I next tried direct connection via server\instance name which also failed. SSMS showed me the following error:

Could not open a connection to SQL Server. Microsoft SQL Server, Error: 53

I first focused on network connectivity between VM:

  • Confirmed that I can ping SQL Server VM from K2 Server VM
  • Confirmed that no firewall enabled on VM and Azure VMs on the same network with nothing blocking connectivity between them
  • I tried to use telnet to test port 1433 – it failed

This is what kept me focused on network connectivity layer a bit longer then necessary. But after confirming that SQL Server not listening on port 1433 using netstat -na | find “1433” it became quite clear that focus should be on SQL Server configuration. First of all – by default named instance listen on dynamic port, and you actually need to have SQL Server Browser Service enabled to ensure you can connect to named instance without specifying port while using dynamic ports. But in my case it was not that as in SQL Server configuration there was explicitly specified custom port (SQL Server Configuration Manager > Protocols for %INSTANCE_NAME% > TCP/IP Properties > TCP Dynamic Ports – if you have anything other than 0 in IPAll section fir this setting you are not using dynamic ports). When your problem is dynamic ports and disabled SQL Server Browser Service error message from SSMS looks as follows:

SQL Network Interfaces, error: 26 – Error Locating Server/Instance Specified

As you can see error message explicitly tells you “Error Locating Server/Instance Specified. To fix this either set 0 for TCP Dynamic Ports setting and enable SQL Server Browser Service or specify some port number there. You sort of choosing your dependency here – either browser service (may fail to start) or custom port (may be hijacked by other service). It seems that browser service is better approach.

So in my case I was confused by expecting named instance to listen on default port which was, to put it simply, wrong expectation. Here is how you can check on which port your instance is listening:

Commands which you can use to find out your SQL Server instance ports

But obviously having access to SQL Server you can get this data from SQL Server Configuration manager too: SQL Server Configuration Manager > Protocols for %INSTANCE_NAME% > TCP/IP Properties. Just keep in mind that you need to check TCP Dynamic Ports value both for specific address and for IPAll section. But like I said in my case, the problem was not about ports. Once I found out instance port I noticed that I still cannot connect to it using telnet, just because IP address was not enabled in SQL Server Configuration Manager > Protocols for %INSTANCE_NAME% > TCP/IP Properties (meaning it had Enabled=0). I corrected that and telnet connectivity test succeeded.
Still, when I get back to SSMS I was getting the same error – “Could not open a connection to SQL Server. Microsoft SQL Server, Error: 53”. Reason? With SQL Server 2016 and latest versions of SQL Server, I keep forgetting that the latest and greatest version of SSMS still reads alias settings from x86 registry hive (meaning you need to configure SQL alias using cliconfg.exe from C:\Windows\SysWOW64) – I have a hard time getting use to it. Interestingly fully missing x86 alias triggers error message “Could not open a connection to SQL Server. Microsoft SQL Server, Error: 53” while one we configure with non existing server or instance name will give you “SQL Network Interfaces, error: 26 – Error Locating Server/Instance Specified”.

Anyhow your key takeaways from this post should be:

  • Know your instance port
  • Make sure that IP address is enabled
  • We still need to configure alias twice (x86/x64) to avoid unpleasant surprises from apps reading setting from non-configured location

I hope this post may save some troubleshooting time for someone.

Please follow and like us:
0

“Method not found error” when attempting to add a new item into K2 Studio project

The other day I was about to build some tiny K2 test project using old-fashioned thick designer UI K2 Studio which I still prefer to use whenever I need to build some little K2 process. Unfortunately I bumped into this error:

Error message politely informs us about this:

Method not found: ‘Microsoft. Build. BuildEngine

BuildItem SourceCode. ProjectSystem. ProjectBuildItem

get_BuildItem()’.

That’s not very obvious, right? But, trust me it just complains that some DLL is not properly added to GAC, or there is a mismatch between DLL version in GAC and in some other location.

Solution? If you run K2 environment without coldfixes just run a Repair from installation media. But most likely you’ve applied some coldfix recently. Verify if files which supposed to go to GAC were added to that location correctly. Remember that some files go to K2 installation directory, while some others may go to “[Program Files]Reference Assemblies\SourceCode\v4.0\” and into GAC v4, i.e. into “C:\Windows\Microsoft.NET\assembly\GAC_MSIL\”. In my case the error was caused by the fact that SourceCode.Workflow.Authoring.dll assembly was not updated in GACv4.

When something wrong with aforementioned DLL you will also see the same error when trying to deploy something with PnD:

So in case you getting this type of error you know what to check now.

Please follow and like us:
0

Unable to start SSRS sevice after upgrading SQL Server instance

I recently spent quite a few time trying to figure out why my SSRS service fails to start after performing upgrade from SQL Server 2012 to SQL Server 2016. Service was failing to start with quite generic error “System error 5 has occurred. Access is denied” which is a bit to broad, but after a while I looked into instance bin directory which contains log file where I seen some error indicating that configuration file is garbled somehow. At this point I revert to reinstall once again solution, and this time I noticed warning which I probably missed during initial instance upgrade attempt:

TITLE: Microsoft SQL Server 2016 Setup

——————————

The following error has occurred:

Index was out of range. Must be non-negative and less than the size of the collection.

Parameter name: index

For help, click: http://go.microsoft.com/fwlink?LinkID=20476&ProdName=Microsoft%20SQL%20Server&EvtSrc=setup.rll&EvtID=50000&ProdVer=13.0.1711.0&EvtType=0x35C10BED%25400xF538E98B

I proceed with upgrade I seen exactly the same problem – I can’t start SSRS service after upgrade completion. This warning essentially lead me to the thread on TechNet containing right solution. It seems that upgrade process somehow can’t handle SSL bindings for SSRS service or handles them differently (produces malformed XML configuration for them?) and what you need to do is remove SSL bindings for SSRS in Reporting Services Configuration Manager (see screenshot below).

Just to save some time which may be wasted on troubleshooting just remove your SSL bindings (I did it for Web Service URL & Web Portal URL) before starting upgade of your instance. You may add these bindings back after upgrade.

Please follow and like us:
0

Restore missing K2 Server Console Mode Shortcut

Recently I’ve seen failed K2 blackpearl installation which went seemingly well but logged an error on Configuration Analysis stage for  K2 blackpearl Server Running task. The warning message was quite helpful listing all possible reasons which may cause issues for K2 service startup: disabled service or service account, wrong credentials, missing logo as a service rights for K2 service accounts or conflicting ports configuration (required ports are busy by something else). And we all know that we have detailed setup logs to check what went wrong during installation, right?

Nonetheless assuming you focused on inability to start K2 service, then as it fails even before initializing itself and its logging which would be the case in scenario with aforementioned warning in K2 blackpearl Configuration Analysis (Windows Services snap-in will show you “Service on local computer has been started then stopped” message), your first action should be attempt to run it in console mode and see what exactly is wrong/what it complains about 🙂

Unfortunately in my specific case Setup Manager also failed to create K2 Server Console Mode Shortcut, which should give you a hint that this is likely credentials issue. In case K2 Server Console Mode Shortcut is missing it can be created manually using the following command:

Refer to related section of K2 help for details. Simply by looking at the command above there are not a lot of things which can go wrong here – missing service executable file (definitely should lead to more warnings on the final stages of setup) or wrong credentials (those in theory can be locked during final stages of setup at the stage when it tried to register service/create shortcut). Anyhow if you run into something like this just restore console mode shortcut to investigate your problem with service startup further.

Please follow and like us:
0

System.IO.IOException : The requested operation could not be completed due to a file system limitation

I recently had a support case thanks to which I discovered rather cool way of checking out on big files in specific directory which I will describe later here.

Under certain conditions you may see the following issue in K2: very high CPU usage and by extension overall sluggishness of K2 applications accompanied with “System.IO.IOException : The requested operation could not be completed due to a file system limitation.”

As in most of the cases error message itself indicates what is wrong here and “The requested operation could not be completed due to a file system limitation” should ring a bell for you that some file or files run amok and growth beyond file system limits or something along these lines. If you read your logs even more closely they may even give away specific culprit to you indicating log file name which is responsible for this.

K2 has broad logging capabilities for monitoring and troubleshooting purposes (quite good overview of K2 logging can be found here) but in terms of logging volume main suspects are: SmO logging (the only logging which can’t be capped in terms of file size), ADUM logs (very voluminous, especially on debug logging level; file size can be limited by adjusting configurable settings, meaning that you have to go extra mile if you want to allow unhealthy big file name) and lastly debug assemblies you may receive from K2 support. Debug assemblies usually are quickly build ad-hoc troubleshooting tools to investigate specific issue and may well not have log file limit and write super detailed logging (=voluminous log files). As such those supposed to be removed upon completion of your troubleshooting effort, but in reality can be left applied for a while which gradually evolves into forever…

Anyhow exception “System.IO.IOException : The requested operation could not be completed due to a file system limitation.” in K2 host server log in most of the cases caused by abnormally high in size log file, which becomes so big that it exceeds RAM size which makes it difficult to open it and append for writing, and then you have that slippery slope situation with degraded performance and high CPU moment, and to that “aha, I forgot to disable/remove unneeded logging” moment.

Now my take away from this case (though what is said above also worth noting). How to quickly check on huge files in specific directory. Just use this PS script:

You may add “-First 10” parameter right after Select-Object in the script above to minimize output which is especially useful when you primarily interested to identify largest file or files.

Here is how the result for healthy K2 folder looks like (by healthy I mean one without strangely big log files):

Large files search

As you can see normally you should not have anything with size of 1 gigabyte more, but above mentioned exception is usually caused by 10-20 GB log file which will be featured prominently on the top of the output.

See also related K2 community KB: Exception – The requested operation could not be completed due to a file system limitation.

Please follow and like us:
0
Memory Leaks Everywhere

K2 host server eats up my RAM! :) Oh, really?

One of the frequent type of the issues I have to work on is high RAM usage (normally description of such problem is accompanied by phrase “with no apparent reason”) by K2 host server service. Most of the time I’m trying to create meaningful K2 community KB articles based on support cases I work on but not always everything what I want to say fits in into Click2KB format. So to discuss  “K2 host server eats up my RAM/I think I see memory leak here” issue in detail I decided to write this blog post.

Common symptom and starting point here is that you noticed abnormally high RAM usage by K2 host server service, which maybe even leads to service crash or total unresponsiveness of your K2 platform. What’s next and what the possibilities here?

Of course it is all depends on what exactly you see.

I think it is quite expected to see that immediately after server reboot K2 service memory consumption is lower than after server works for a while: Once you rebooted your server it starts clean – all threads and allocated memory is clear hence low RAM usage. But as server warms up it starts checking if it has tasks to process, and it fires other activities like users and group resolution by ADUM manager, recording data in identity cache table and so on. The more task processing threads are active the more memory is required. And keep in mind your host server treads configuration if you increased your default thread pool limits you should realize that it allows server to use more available resources on your server.

Empty (no deployed processes, no active users) K2 host server service has really tiny memory footprint:

K2 empty server with default thread pool settings

As you can see it uses less than 300 MB of RAM. And even if double default thread pool settings (and I heard that for that resources allocated upfront) memory usage stays the same at least on the box without any load.

Now we switching to interesting stuff, i.e. what could it be if RAM usage of K2 service is abnormally high?

And here comes important point: if your process design or custom code has any design flaws or hardware is poorly sized for intended workload processing queue starts growing and it may lead to resource overuse. I.e. it is not a memory leak but bottleneck caused by such things as (and I’m listing them based on probability of being cause of your issue):1) Custom code or process design. Easy proof that this is the cause is the fact that you unable to reproduce this “memory leak” on empty platform with no running processes. It does tell you that there is no memory leak in K2 platform base code in a way.

You can refer to process design best practices as a starting point here:http://help.k2.com/kb000352

I seen enough cases when high memory usage was caused by inefficient process design choices (something like mass uploads to DB or updating 20 MS Word documents properties in a row designed so that file is being downloaded/uploaded 20 times from SharePoint instead of doing batch update with one download/upload of a file).

Also when next time you will see this high memory usage state before doing reboot execute the following queries against K2 database:

A) Check how many process running at the same time right now and if any of them constantly stays in running state:

It will give you number of running processes at specific point in time. Constantly having 20 process or more in status 1 may indicate a problem, but more importantly to execute this query multiple times with 1-2 minutes interval and see if some of the process instances with the same ID stays running constantly or for a very long time. This will be likely your “offending” process and you will want to check at which step it is so slow and so on.

B) Check for processes with abnormally high state size:

This query will return processes to 200 processes with state size 1 MB or greater (if you have any). So if this query brings some results those are problematic processes which cause abnormally high memory usage/performance problems (most likely due to use of looping within the process).

Just some illustrative example of what else can be wrong (and possibilities are huge here 🙂 ): my colleague run into an issue where K2 service process memory usage suddenly started growing at ~16 GB per day rate, and in the end the reason was that every 10 seconds K2 smartactions tried to process an email which was sent to K2 service account mailbox and at is the same account under which smaractions were configured and it lead to sort of cycle and each sending attempt eat up couple of MB of memory. It was only possible to see this with full logging level and during the night where there was no other activities on the server cluttering log files.

2) Slow response/high latency of external systems or network. Depending on design of your workflows they may have dependencies on external systems (SQL, SharePoint) and it could be the case that slow response from their side causing growth of queue on K2 side with memory usage growth (sort of vicious circle or something like race condition can be in play here and it is often difficult to untangle this and isolate root cause).

In such scenario it is better to:

A) At the time of an issue verify K2 host server logs and ADUM logs to see if there are any time outs or communication type of errors/exceptions.

B) Check all servers which comprise your environment (K2, SQL, SharePoint, IIS) and watch out for resource usage spikes and errors in Event Viewer (leverage “Administrative Events” view). K2 relies heavily on SQL where K2 DB is hosted and if it is undersized or overloaded (scheduled execution of some SSIS packages, scheduled antivirus scan or backup) and if it is slow to respond you may see memory usage growth/slowness on K2 server side.

If your servers virtualized confirm your K2 vServers placement with virtualization platform admins – K2 and K2 DB SQL instance should not coexist on the same vHost with I/O intensive apps (especially Exchange, SharePoint).

You should pay special attention to ADUM logs – if there are loads of errors those have to be addressed as K2 server may constantly waste resources on futile attempts to resolve some no longer existing SharePoint group provider (site collection deleted, but group provider still in K2) or resolving objects from non working domain (failed connectivity or trusts config). These resolution attempts eat up resources and may prevent ADUM from timely refreshing things which are needed for running processes by this making situation worse (growing queue).

IMPORTANT NOTE: It never woks in large organizations if you just ask your colleagues (SQL admins/virtualization admins) whether all is OK on their side – you will always get response that all is OK 🙂 You have to ask specific questions and get explicit confirmation of things like VM placement, whether your K2 DB SQL instance is shared with any other I/O intensive apps. You want to have a list and go through it eliminating possibilities.

I personally worked with one client who spent months troubleshooting performance and reviewing their K2 solutions inside out and searching for a leak while it was solved in the end by moving K2 DB to a dedicated SQL server instance, and in a hindsight they realized that previously K2 DB coexisted with some obscure integration DB not heavily used but it had a SSIS package which was firing twice a day and maxed out SQL resources for couple of hours causing prolonged and different disruptions to their K2 system. Checking SQL was suggested from the very beginning and answer was we don’t have issues on SQL side, even after they asked twice their SQL admins.

3) Inadequate hardware sizing. To get an idea about how to size your K2 server you can look at this table:

Scale out

This may look a bit controversial to you but this table is from Performance and Capacity Planning document from K2 COE document and it illustrates how you have to scale out based on total number of users and number of concurrent user with base configuration of 1 server with 8GB of RAM. Depending on your current hardware configuration this may or may not support your idea of scaling up.

Also see these documents on sizing and performance: K2 blackpearl Performance Testing Results and K2 blackpearl Performance Testing White Paper

Also see this K2 community KB: K2 Host Service CPU usage close to 100%

4) Memory leak. This is rather unlikely as K2 code (like code of any other mature commercial software) goes through strict QA and testing, and, personally, I saw not more than 3 cases where there was memory leak type of an issue which had to be fixed in K2 – it was all in old versions and in very specific, not frequent scenarios.

If what you observe is not prolonged memory usage spikes which not going away by themselves, but your K2 service just at times maxing out resource usage but then all goes back to normal with no intervention from your side (such as K2 service/server restart) then it looks like insufficient hardware type of situation (though other issues I mentioned previously still may have influence here). Memory leak is rather imply that you need to stop service or something like this to resolve it.

If after checking all the points mentioned above you still suspect that there could be some memory leak I would recommend you to place K2 support case and prepare all K2 logs along with memory dumps collected in the low and high memory usage states (you can obtain instructions on collecting memory dumps from K2 support).

Please follow and like us:
0

AD DS infrastructure failures and K2

I recently worked on a number of cases where clients complained about errors on K2 side caused by failures on AD DS side. Specifically there were some suggestions that K2 was unable to handle partial outage of AD DS, namely failure of single DC while there are other DCs available. So based on recent cases I saw I did some research and you may find the results of this research below. It is rather a long form write up which may require some updates/edits afterwards but I decided to post it to share this information with wider community as well as to keep all these notes for my own reference in case it will be necessary for me to revisit these findings.

DISCLAIMER: Some flaws in interpretation of system behavior described below are possible, those will be edited/corrected when/if necessary.

Symtoms/What you may see in your K2 environment when there are some issues with your AD DS infrastructure

Most of the applications being used in Microsoft Active Directory networks have certain degree of dependency on availability of Active Directory Directory Services (AD DS) for purposes of authentication and for obtaining required data about directory objects (users, groups etc.).

In case you have failures or other availability issues with your AD DS infrastructure you may observe symptoms/problems on K2 side similar to those described below.

Example scenario 1 (WAN link outage/no DCs are reachable to serve queries against remote domain)

You may observe growing queue of AD SmO queries on IIS side to the point at which all of the queries sent from K2 smarforms to AD DS fail/no longer returning any information and after a long delay the following error message is being thrown:

A referral was returned from the server.

This error comes from AD DS (more specifically from DC which serves K2 app/K2 server queries to specific domain) and most likely caused by the fact that there is no domain available to serve this query at all.

Example scenario 2 (single DC failed, other DCs are available)

You receiving the following error on K2 server:

System.DirectoryServices.ActiveDirectory.ActiveDirectoryServerDownException: The server is not operational. Name: “DC-XYZ.domain.com” —> System.Runtime.InteropServices.COMException: The server is not operational.

You confirmed that DC mentioned in the error message is down but there are other DCs up and running in this domain.

Example scenario 3 (it could be 2b, but you see that only K2 smartforms are affected)

You may see the same error message as in scenario 2, i.e.:

System.DirectoryServices.ActiveDirectory.ActiveDirectoryServerDownException: The server is not operational. Name: “DC-XYZ.domain.com” —> System.Runtime.InteropServices.COMException: The server is not operational.

But you also see that both K2 workspace and base OS working just fine and using alternate DC, but K2 smartforms keep throwing an error which mentions failed DC (which is indeed failed).

All described scenarios are slightly different but in all of these cases it may seem that one K2 didn’t switch to alternative available DC for specific domain. Key question/requirement here is to switch to another available DC without any downtime or with a minimum downtime (no K2 server or K2 service restart).

Research and general recommendations

First of all it is necessary to understand what kind of dependency on AD DS we have from K2 side. Most obvious things are  AD Service SmOs and User Role Manager Service (URM) – both of them dependent on AD DS availability but in a different ways. AD Services queries AD DS directly (so it is a good test to check whether AD DS queries can be served without issues) whereas URM service relies on K2 identity cache and returns you cached data from K2 database. URM service return data from multiple security providers registered in K2 and it stores cached data in Identity.Identity table in K2 database. URM service is dependent on AD DS only at the time of cached data refresh thus it will allow you not to notice AD DS failure if your AD DS cache is not expired yet.

In the beginning of this blog post we mentioned two major scenarios for AD DS failure (with third type which can be qualified as a sub-case of (2)):

1) WAN link failure when no domains are available to serve K2 request to specific domain because all DCs are behind the WAN link. This is applicable to multi domain environments with remote domains.

2) Failure of specific DC to which K2 server is connected for querying specific domain.

Given AD DS design best practices none of those scenarios should present any problems for applications dependent on AD DS:

(1) It is best practice to have extra DCs placed on remote sites so that there is no dependency on WAN link to preserve link bandwidth and safeguard against availability issues. At the very least RODC should be present locally on site for any remote domain if for some reason you can not place RWDC locally on each remote site.

NOTE: in a link failure scenario when there are no locally available DCs there is nothing that can be done from K2 side, it is a question of restoring WAN link or placing locally available RWDC/RODC to mitigate against this scenario.

(2) Golden rule and requirement for any production AD DS deployment is to have not less than two DCs per domain. So failure of one domain controller should not present any issues.

Now separately on scenario (3) when you getting the same error as in scenario (2): “System.DirectoryServices.ActiveDirectory.ActiveDirectoryServerDownException: The server is not operational. Name: “DC-XYZ.domain.com” but you clearly see that your base OS and K2 workspace using alternate available DC where as K2 smartforms keep throwing an error which mentions failed DC. With high probability you may see this error with K2 4.6.8/4.6.9.

In this specific scenario you clearly see that K2 workspace works fine at the time when you have this issue with K2 smartforms. This is because Designer, Runtime and ViewFlow web applications in K2 are using the newer WindowsSTS redirect implementation (http://k2.denalilx.com/Identity/STS/Windows) which was introduced in 4.6.8 whereas K2 Workspace still uses “Windows Authentication”.

I.e. you may see that K2 workspace uses windows authentication and in its web.config file ADConnectionString is configured as “LDAP://domain.com”, for WindowsSTS K2 label is being used, i.e. “LDAP://dc=domain,dc=com”

You may see aforementioned error occurring on the redirect to “http://k2.domain.com/Identity/STS/Windows/”

There is also a known issue with Windows STS implementation in K2 when exception on GetGroups causes user authentification to fail on Windows STS which was fixed in 4.6.10 but there is still open request to improve error handling with the aim to catch exceptions caused by temporary unavailability of DC and then have STS retry again. So that in cases where the DC is inaccessible for a short interval for unknown reasons the retry will then connect successfully.

So in scenario (3) you will likely see that DC locator is switched to alternate DC but Windows STS not performing switch/retry after temporary DC failure. It is something I need to research more, but it seems that in this case you have to restart K2 service to get back to normal operation of K2 smartforms.

Irrespective of scenario (maybe apart from scenario (3)) first point to check when you see any such issues is to work with your AD DS team to clarify which specific issue do you have on AD DS side and whether it is fixed/addressed or not. There is no use to perform any attempts of fixing things from K2 side if AD DS issue is not addressed unless this is an issue with specific DC and there are other locally available DCs. The only possible thing is to temporarily remove connection string to some extra domain if you can afford this (and if it is a less important/additional domain which has an issue).

You may get a confirmation from your AD DS administration/support team that the they have issue with one specific DC which is failed or down for maintenance (the latter should be very rare/exceptional case of planned maintenance during business hours) and there are others locally available DCs to serve requests from K2 server. If this is the case you can try to do the following things:

1) Use AD Service SmO to check that you can query affected domain – if it works you should not have any issues in K2, if not proceed with further checks.

2) Use the following command to verify which DC is currently being used by K2 server for specific domain:

If this command returns failed DC then this is an issue with your DC locator service/AD DS infrastructure, or to put it another way problem external to K2.

In general AD DS as a technology with decades of evolution and high adoption rate is very stable and there are no well known cases where DC locator fails to switch to alternative available DC. But depending on configuration and issues of specific environments as well as implementations of application of code which interacts with AD DS there can be some cases when DC locator switching does not work properly.

3) If on the 2nd step you getting failed/unavailable DC try to use the following command:

This will force DC locator cache refresh and may help you to switch to another DC. Note sometimes it is necessary to run this a few times till another DC is selected.

4) If step 3 does not help you to switch to another available DC you may try to restart the netlogon service as DC locator cache is implemented as a part of this service. Here is an example of how to do it with PowerShell:

Once this is done verify whether you are switched to available DC with use of the following command:

5) If you see that after switching of DC locator to available DC K2 AD Service SmOs are still does not work consider K2 service restart/or server reboot. This is most likely could be scenario (3) when K2 workspace/base OS works well but K2 smartforms “stuck” with server down exception.

Note the only valid test here is use of AD Service SmOs to query domain – if it works then no need to do something else from K2 side. In case you see issue in the areas depending on URM User service it may simply be the case that cached data is expired and new data is still builds up. Sometimes it may be necessary to force identity cache refresh and wait till cache builds up completely (this can take very long time in large scale production environments).

Additional details and recommendations

K2 performs bind with the DirectoryEntry class e.g:

This process relies on Domain Controller Locator which is an algorithm that runs in the context of the Net Logon service. Essentially Domain Controller Locator is a sort of AD DS client part which is responsible for selecting specific DC for specific domain. Domain Controller Locator has its own cache. The Net Logon service caches the domain controller information so that it is not necessary to repeat the discovery process for subsequent requests. Caching this information encourages the consistent use of the same domain controller and, thus, a consistent view of Active Directory.

NOTE: as you may notice in explanations for scenario 3 K2 Workspace and K2 smartforms perform bind to AD differently, at least connection string they use are different.

Refer to the Microsoft documentation for details:

Domain Controller Location Process

Domain Controller Locator

Recommendations

1) Reconfigure K2 to use GC instead of LDAP.

The global catalog is a distributed data repository that contains a searchable, partial representation of every object in every domain in a multidomain Active Directory Domain Services (AD DS) forest. So essentially your GC placed in local domain can serve part of the queries which otherwise should go to DCs in another domain, potentially over WAN link.

From purely AD DS side GC has the following benefits:

– Forest-wide searches. The global catalog provides a resource for searching an AD DS forest.

– User logon. In a forest that has more than one domain GC can be used during logon for universal group membership group enumeration (Windows 2000 native DFL or higher) and for resolving UPN name when UPN is used at logon.

– Universal Group Membership Caching: In a forest that has more than one domain, in sites that have domain users but no global catalog server, Universal Group Membership Caching can be used to enable caching of logon credentials so that the global catalog does not have to be contacted for subsequent user logons. This feature eliminates the need to retrieve universal group memberships across a WAN link from a global catalog server in a different site. Essentially you may enable this feature to make use of GC even more efficient.

To reconfigure K2 to use GC you have to edit RoleInit XML field of HostServer.SecurityLabel table and replace “LDAP://” to “GC://” with subsequent restart of K2 service.

From K2 prospective it should improve responsiveness of AD SmartObjects as well as slightly decrease reliance on WAN link/number of queries to DCs outside of local domain.

2) Try to use Domain Locator cache refresh clear up for example scenario 2 (see details above, nltest /dsgetdc:DomainName /force) and verify whether it is viable workaround. Use “nltest /dsgetdc:DomainName” to confirm which specific DC is being used by K2 server and verify status and availability of this specific DC with your infrastructure team.

3) In scenario 3 try to restart K2 service but first confirm that DC locator uses working DC.

4) There is also an existing feature request to investigate possibility to built in some DC failure detection/switching capabilities into K2 code in the future versions of the product.

Please follow and like us:
0

How to check whether the UPA properties are populated correctly for specific user

Certain SharePoint 2013 features as well as K2 for SharePoint need to have User Profile Application (UPA) working and its database populated with correct data.

Sometimes it is difficult to confirm whether or not UPA is correctly configured as SharePoint UI does not show you all the properties for the users. Moreover, even if UPA is not configured properly users still can login to SharePoint and successfully get OAuth tokern, and this fact complicates troubleshooting.

As a quick way to confirm that UPA is populated correctly for a particular user you may ask him to login to SharePoint and navigate to the following page:

It will return all UPA propeerties for the user. For OAuth tokens to work correctly following properties should be popluated: SPS-ClaimID, SPS-ClaimProviderID, SPS-ClaimProviderType, and SPS-UserPrincipalName.

Please follow and like us:
0