Thursday, August 1, 2013

Restore Master to Different Instance

While setting up a warm standby instance for one our job servers, I encountered an issue with decryption of credential passwords.  I have included the error text from the popup message box below.

An error occurred during decryption. (Microsoft SQL Server, Error: 15466)

This error occurred either during creation of a new credential or modification of an existing credential.
In our environment, we have elected to run all user jobs from a separate server, the job server referenced above.  A second instance has been brought online and we are configuring them for warm standby for each other.  Each instance will have the same configuration as the other; i.e. jobs, proxies, packages etc.  However, the jobs will only be enabled on the primary node for that job.

In an attempt to get the two instances in sync, the master and msdb databases from the original node were restored to the new node.  Restoring of the databases went well and it wasn’t until testing of the jobs to verify the functionality that the issue was uncovered.
The job failed and logged the error:

Error authenticating proxy domain\userid, system error: Logon failure: unknown
user name or bad password.)

Having seen this error in the past when researching other job failures, I assumed that for some reason the password had been corrupted during the restore of the master database.

After opening the credential in question and rekeying the password, the error above was displayed.  I tried again with the same results. I then decided just to delete the credential and re-add with the correct password, failure again.  Attempting to add a new credential that had not existed on the server before ended with the same error.
After the repeated failures, I opened my browser to see what others had done to resolve this issue.  To my surprise, I couldn’t find another case of this happening.  I did find a couple of mentions of this perhaps pointing to a corrupt Server Master Key for the new instance.

Since this instance had been restored from another instance, I tried to import the Service Master Key from the original instance.   This attempt also failed.  When reviewing the articles above, the mention of the SQL Service Account got me to thinking.   What would happen if I changed the service account on my new instance to match the original instance?

Using SQL Configuration Manager, I made that change and restarted the instance.  Going back to SSMS I was able to create the proxy with the correct password, and I was able to run the job to success.
However, leaving the server in this configuration violated our security policy.  I hesitantly changed the service account back to the correct domain id, restarted the instance and then executed the job.  Success was once again reported by the job.

Below are the steps that were taken to resolve the issue:
  • Restore master database
  • Change SQLService Account to original account from source server
  • Restart SQL Services
  • Change SQLService Account to correct id for this instance
  • Restart SQLServices

I hope that by documenting the issue, troubleshooting steps taken, and the successful resolution,  it may save someone else some time and headaches.

Wednesday, July 31, 2013

SSIS ForEach continuation after Error

ForEach in SQLServer Integration Services

Integration Services has built in constructs that provide the ability to loop through a result set.  This functionality can be used to loop through files in a directory, items returned from a query, or a list generated with another method.
One issue with the ForEach container, when using the default configuration, is that if one loop iteration fails the loop fails and processing stops.  Below is a technique that can be used to log the error and proceed to the next value in the list.
The package shown uses a ‘ForEach File Enumerator’ to loop through a directory, in this example c:\temp.  Three files exist in the directory that match the criteria.  LoopDemo.txt, LoopReadOnly.txt, and LoopSequential.txt.  LoopReadOnly.txt has the read only attribute set to true.
With the default options set for the ForEach loop construct the first file is deleted, LoopDemo.txt. When deletion on the second file is attempted, LoopReadOnly.txt, an error is produced and the loop fails after displaying the error.

By setting the ‘MaximumErrorCount’ to 0 on the ForEach container the failure will occur and the message displayed, however the loop will continue through the remaining file.

Errors produced by the task can be logged in any manner that is available, email notification, logging to a table etc.

Tuesday, January 8, 2013

Server in Script Upgrade Mode

When one of our SQL Server 2008R2 clusters failed over the amount of time needed between when the server came online and when clients could connect to the instance seemed to be overly long.  After some research into the logs I found this error occuring multiple times for different ids, most of the ids were automated processes that connect as quickly as possible.

Login Failed for user 'xxxxx'.  Reason: Server is in script upgrade mode.  Only administrator can connect at this time.

Going back over the upgrade logs the SQL Instance in question hadn't been patched in quite some time.  So what was the reason for the upgrade script running?

I googled for answer and came up empty as to why this would be happening after the patch was successfully installed.

After some additional research the issue was found,  SP1 had been installed on the primary node but not the secondary node of the cluster.  Each time the instance was failed over the SQL Engine had to perform and upgrade/downgrade of the system databases.

Lesson Learned:  Patch one node and schedule the other node upgrade.