Wednesday, 30 January 2013

Adventures with Powershell, first steps...

I keep hearing about this mysterious product that is now bundled with SQL Server in various guises called Powershell.

I've made the occasional foray in it's direction to find a DOS prompt with all the functionality that should be in a DOS prompt, in fact, something surprisingly like a Unix command line but in DOS, off course I'll probably find a horse's head next to me in the morning for suggesting such heresy.

Anyway, to date I found no practical purpose for me or the environment that I manage because everything it seems to offer I can do already via SQL Agent without having to learn arcane Powershell scripting and magic words.

But no, I have one nagging issue.

Prior to some recent data centre activity we DBA types had configured a job that collected, on a daily basis, stats from each SQL server and stored them at a single point. This was entirely hand carved using t-sql, ssis & the SQL agent & worked beautifully on SQL Server 2000, 2005 & 2008R2. The only flaw was that the first method coded, and due to time constraints never altered, the code worked in an entirely linear and fault intolerant fashion - so if server A was uncontactable the SSIS step would fail and the whole process failed for the day. This would have been fine if it wasn't for service companies who seem to make it their mission to make networks about as unreliable and poorly performing as possible to keep themselves in work while not having to tax themselves too hard to be forced into remedying the situation.

So, four years of living with an unreliable but generally working situation became the accepted norm due to massive volumes of actual work - why bother monitoring or checking when there are dozens of calls a day telling us how bad things are.

This cannot continue.

Enter the next service provider who promises everything will be fabulous, they look after these babies in their sleep. Contract signed and responsibility for managing services on Windows servers duly ensconced with said service provider who has to tools to manage the situation.

Said tools, yours truly & colleagues, don't cost the service provider a penny but do provide a regular parade of helpdesk calls along the lines of "service x isn't working, the users are jumping on our heads, why didn't you notice as they've started wearing studs". This is of course good for the service desk because it keeps their volume of calls up whilst they can file under first time fix because the same said tools have had to resort to managing the services themselves.

This toolset is actually pretty good because you can find extensions of the toolset sitting in other teams who merrily report directly to the biggest tools that server x is down again, naturally they won't report it to the helpdesk because they don't actually help and as a change to the environment (bringing server back up) is a change request and investigating why it went down is complex enough to be handed to third line but because it mentions "database" in the text somewhere they shunt it into the DBA queue which puts us all in limbo because the DBA queue is not actively monitored since the same service provider decided that staff of the client in non application support roles can't use the ticketing system.

I'm feeling better now.

So the fundamental problem is that we need to know when servers or services are down before anyone from the user community has a chance to nip home for their stilettos. We cannot rely on the service company for anything. What we need is a means of checking server status and service status whilst minimising the need to connect to SQL boxes but still keeping control of the entire process within the SQL world so that we can stop anyone else scuppering us further.

Enter Powershell (I got there in the end).

There is an interesting article I came across recently that introduces something very close to what I need written by Jugal Shah on MSSQLTIPS.

The principle works nicely - Powershell appears rather wordy so performance isn't great (30 seconds to check 4 servers) so it's not going to running every few seconds but anything around an hour will do fine for our purposes and I suspect there must be some ways to reduce the volumes of text and hence run times.

What it doesn't do is put any control and coordination around the principle that we would expect in our environment, so next time I'll post our implementation which works in our environment.

Tuesday, 22 January 2013

Progress Updates

A colleague not too long ago introduced me to sys.dm_exec_requests which has a percent_complete column and has significantly changed our view of longer processes and lends a degree of confidence where before we would be sitting there wondering if it was time to reboot.

 I have, however, encountered a notable problem, on a reasonable sized database restore (this will depend on infrastructure but I've seen this above 100Gb) progress sits at 0% for what seems like at eternity.

In fact the timing before progress starts getting logged has seemed to be to be about how long it would take to create the file and space required and do almost all of the restore process.

I have now discovered more detail that might be useful next time I encounter the "problem" because it appears that recovery is split into 5 phases :

  1. Copy of data from the backup to database pages.
  2. Discovery 
  3. Analysis 
  4. Redo 
  5. Undo 
where on Enterprise edition (2005+) the database is available to users from the end of phase 4 (using Fast Recovery) and on old versions & other editions the database is available after the undo phase (more on that here : fast-recovery-with-sql-server .

I think I will find that SQL is reporting 0% while the data is being copied back to the data files and given that this will be happening as a transaction it remains 0% until completed which explaims the behaviour being seen.

It seems that the view sys.dm_tran_database_transactions will give further detail during  the undo phase and where the database_transaction_log_bytes_reserved column will show decreasing volumes for the given database.

Further, where there are several transactions to undo the database_transaction_next_undo_lsn column will provide additional detail.

Monday, 14 January 2013

Memory issues

I've a SQL script that produces cumulative waits (sys.dm_os_wait_stats), key items for review (same source & gives % of all waits to each type but ignore "benign" wait types) and then reproduces the same information but as a delta so current waits can be seen and finally looks at current activities with waits so it is possible to see what is hurting (via sys.dm_os_waiting_tasks and sys.dm_exec_sessions / sys.dm_exec_requests).

This gives me a broad summary of what's going on an is a good indicator of where to look when there are problems, what's more it generates a good viewpoint without taking an age to run which is always good news as the only boxes this needs to be executed against are those with problems.

What I also have is an accumulated list of wait types and meanings, I'm not sure where I picked up the initial list but as time has gone on I've added to and clarified meanings. One area I have been looking at lately is memory grants because I had a server which continually has outstanding memory grants - within the last 12 months we've gone from 16Gb to 40Gb and I think the machine would still like more. The core database suffers a range of problems include the wrong fields / no fields being indexed, uniqueidentifiers, non unique primary keys and a failure to agree an archive policy prior to implementation but that's a little out of scope here. What I do know is that the machine is now largely stable but could do with more memory (EDIT:10 days on and performance problems have persisted and more RAM would definitely be good).

During the course of investigating I have collected further information about the RESOURCE_SEMAPHORE wait type that appears to link wait types to memory shortages which ties in with the grants outstanding issue.

In this article MSSQL Tips : Resource_Semource waits the requested_memory_kb column from sys.dm_exec_query_memory_grants is highlighted showing how much memory has been requested and then talks about looking at sys.dm_exec_query_memory_grants and sys.dm_exec_sql_text(sql_handle) to look at the largest memory requests.

Having been really rather instructive the article then fizzles out and concludes the answer is to look at indexes or lack thereof.

That is good starting point and all to often basic indexes are the fix but indexes can be a minefield, vendors are often very sensitive to making changes  and if they discover changes have been made without agreement then blame all future problems on that one things - I had a vendor once offer to accept the idea of creating two indexes (which I had tested and transformed their system performance) but having received the creation statements from me wanted to charge £800 to allow us to run the statements (I would have been the one running them in the production instance). Further, changing indexes on a mature system can often lead to unexpected effects that are difficult to predict without testing the application in greater depth.

Looking further at the memory requested I would suggest that there is a case to consider other factors too, yes they will be equally contentious with vendors but adding indexes might be a lazy solution :

  1. Are joins of the correct type (can that outer be replaced with an inner) and are all join criteria specified
  2. Are functions being executed against fields in the where clause that might impact index usage (perhaps in the future, always worth checking as comparisons against variables might be alterable to modify the variable).
  3. If the query loading tables and fields that add no value.
  4. Are there varchar fields involved - the query optimiser will estimate that varchars take half their defined length for memory requirements usually - have a looked at "Estimated Row size" in the query plan - if your varchar fields are frequently using a lot less than or more than half their defined size the estimates of row size might be a long way out especially for longer fields. You can test this for yourself, turn on the actual query plan and try :

    select cast('hello' as varchar(30)) from

    Right click on the "compute scalar" item and have a look at the estimate row size (I see 29Bytes), now change the varchar(30) to text or varchar(max), nvarchar(30) or anything else and see what sizes you get. For varchar(max) I get an estimate of 4035B - in the table I picked I have 129 rows so the difference is (129 * 4035) - (129 * 29) - thats 504Kb difference on just 129 rows, it'll soon add up.
  5. How many rows does the query plan report will be retrieved - I've just been looking at a query that said it was retrieving 233,131 rows which didn't match the 136 rows I could see in the result set. Looking further at the query and the context from Profiler I can see that the query is generating a parameter list for an SSRS report - still too many rows though but the "distinct" reveals why I see so few records. More examination clarifies that the number of records reported is because the developer wants to get "types" of something that are used and is achieving that by an inner join between the types table (225 rows) and the "items" table that has 15m rows - replacing the inner join with an exists reduces the impact somewhat - runtime reduced from 12 seconds to less than 1.
I'm not saying this is the be all and end all of query optimization, it's a massive subject and I am grateful the optimizer does a brilliant job most of the time, I am though trying to highlight other potential issues.

Sunday, 13 January 2013

3NF reminder

Normalisation can be a mine field for some but I find that when I try to normalize data I don't have 3 key steps to get to 3NF, I just end up with a single operation that weeds all three things in one go.

I am conscious that 3 steps becomes more relevant for unfamiliar situations and for newbies but the descriptions I learned :

  1. Remove repeating groups
  2. Full functional dependency
  3. Transitive dependency
are pretty useless to succinctly describe the purpose - I've recently tried to narrate to a newbie what these mean and ended up with a less than brilliant explanation which made little sense to me let alone someone starting out.

However, I have recently I came across this meaningful summary :

  1. No repeating elements or groups of elements
  2. No partial dependencies on a concatenated key
  3. No dependencies on non-key attributes
which explain everything in a rememberable form

Saturday, 12 January 2013

Configure Perfmon with counters loaded

Configuring a set of counters on perfmon is pretty easy but saving them for reuse normally means having to create a "scheduled" trace, run it then open the data but here is a means documented to save settings for easier access :

Monday, 7 January 2013

Statistics Updates

In the relatively recent past I have been redeveloping index rebuild functionality that I setup for use by an employer.

We now have a weekly routine that rebuilds anything > 40% fragmentation and a daily routine for anything about 90% fragmentation. I would like to lower those numbers but our SAN crashes when too much disk activity hits it and the SAN boys are at a loss to resolve so I need to manage the volume of activity a little, it's not a great compromise but it helps.

What we also do at the end of the index rebuild process is EXEC SP_UPDATESTATS to force an update of any statistics that need it. SP_UPDATESTATS only operates on stats that need updating but it still has to check them all but usefully outputs a list of everything and the operation undertaken. The alternative statement UPDATE STATISTICS does an update regardless so would significantly increase overall load. I'e just found yet more useful information on statistics updates which much more succinctly talks about parameter sniffing that I have seen on other sides on Kimberley's site .

The most recent change I have had included is for each table the first action is to rebuild any non-unique clusters which are common in one system I use (and invariably easy to fix as I've tried to explain) - this is because (certainly in SQL Server 2005 and above) if a non-unique cluster is rebuilt all other indexes on that table are rebuilt too - so we check for the need to rebuild those non-unique clusters first & if that happens ignore all other indexes on the given table to avoid the risk of duplicate activity.

Work outstanding :

  1. I have spotted how to identify the sample size set by the designer on statistics so I wouldto alter the code to reuse that sample size so that design decisions remain in place as we have become increasingly conscious of the value of appropriate sample sizes whilst wanting to avoid 100%. 
  2. Identify a means of deciding, per database / table, if automatic stats updates are allowable so reducing the risk of updates during operational hours - this might mean some tracing to identify if any of the problems we are encountering are caused by this activity.
  3. Review use of online index rebuilds and tempdb rebuilds - in testing these have worked really well.
  4. Review use of reorganisation, at one point we were not reorganising becasue sometimes it took longer than a rebuild but because of its page by page nature it can be stopped and will work better with other activity so it is possible we want to up the percentage that is acceptable for a rebuild - I think that needs deciding server by server but needs better consideration.

Sunday, 6 January 2013

Clustered Indexes

I've been rummaging further on Kimberley's site, one area I've spent time trying to explain to developers & 3rd parties is how to use clustered indexes more effectively.

What I encounter is a lack of understanding of basic clusters, BOL does not reveal everything but reading it carefully does give the key points. The main culprits are not having a clustered key where one is obvious and then putting the cluster on the wrong field.

One provider I work with has a table with around 100 million rows and the cluster on a field that is NEVER used to reference the data, but, I am not allowed to modify the database in any way, so we have to copy the updates from the table every day and create our own version of the table then index it properly - reducing our  nightly processing run from around 10 hours to 3 (I'm not claiming the process is perfect but having the correct clustered index makes a notable difference)....if only indexed views could work inter-database [sigh].

However, Kimberly has provided more detail that I was unaware of which adds further thoughts to creating a clustered index.

Saturday, 5 January 2013

Duplicated Indexes

I've just started looking in more detail at duplicated indexes and have been trying to figure out a means of properly identifying duplicates, from my perspective because of the irrelevance of the order of included columns, but luckily someone has already done the work for me :
Kimberley L Tripp, Duplicated Indexes

Thursday, 3 January 2013

Still here

Time to resurrect this as a store of useful information rather than having notes buried in paper notebooks.