Welcome to Era of Data
home || about || contact || blog ||





Archive for the ‘SQL Server’ Category

Making the SQL Server error log easier to view

Saturday, September 24th, 2011

The SQL Server Error log

At some point every dba will have had to look at the current SQL Server error log using their favourite text editor.

However, as the default error log file is literally just called errorlog (with no file extension), there’s the awkward process of selecting the application with which to open the file every time you need to view it, with no option to associate errorlog with your favourite text editor thanks to the lack of file extension. If it’s a 3am callout and the server is down, you can’t use Management Studio to view and filter the log so your only resort is to maually load the file and anything that makes this easier is always helpful.

Like all good solutions, the answer is deceptively simple and I use it whenever I build a SQL Server instance (and it also has an added benefit I’ll describe later); change the default error log filename to have an .txt or .log extension. It’s an old trick I learnt from my Sybase days and is still very much relevant to the latest and greatest versions of SQL Server (2008 R2, as of the time of writing).

It’s achieved by editing the command line in SQL Server Configuration Manager (SSCM) and editing the default error log location hard-coded in that command line to something more user-friendly. Double-click the SQL Server instance you want to edit in SSCM, under the SQL Server Services folder to bring up the properties, click on the Advanced tab and the command line can be edited under Startup Parameters. Before you do anything, save the current parameters to a text file.

The default SQL Server command line has 3 basic components: the name and location of the master database primary data file, the name and location of the master database transaction log file, and the name and location of the error log. These are represented by the -d (data file) -l (log file) and -e (error file) parameters in the command line. There are other command line options like -E for large page extents and -T for trace flags, as well as options for starting in single user mode and specifying named instances etc etc but they’re all well outside the scope of this post (I’m not mentioning them to show off what I know about the command line, but just to highlight the most likely additional parameters you may come across on any of your servers that you want to make this modification on).

I know most dbas are terrified of playing around with the SQL Server command line, but all we’re interested in is modifying the -e parameter, which by default will be something like ‘-eC:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Log\ERRORLOG;’. All we’re interested in is modifying the bit between -e and the semi-colon. Simply adding .LOG or .TXT (see Fig. 1) will make your life so much easier when you have to peruse the log file contents of either the current log or rolled-over logs as you can bypass SSMS and just use your preferred text editor.

Figure 1: The SSCM Console

Inevitably, a restart will be required for the change to take effect, so don’t do this on production systems until you’ve tested on development and staging systems first.

I also prefix the name of the host/instance to the error log name as it makes it so much easier to identify the server concerned if you copy log files down to your desktop machine (as I tend to do quite often).

That added benefit? Not only can you alter the filename of the error log, but you can also alter the path, which means you can keep the SQL Server error logs, SQL Server Agent error logs and default trace files in a separate location, as it is this folder that you specify for the error log that dictates where these other files get written to.

Disclaimer: Do this entirely at your own risk. I’ve carried these instructions out on production systems on places I have worked out and am comfortable with making these changes. Test them on development/staging servers first if you’re doing this for the first time or are unsure about making changes.

Working out how long a SQL Server backup/restore will take

Wednesday, August 3rd, 2011

Whilst messing around with SQL Server backup performance testing today I remembered a little-mentioned column in the sys.dm_exec_requests output called percent_complete. This is a rather useful column which reports the progress of certain operations and was introduced in SQL Server 2005.

Combined with the output of another column in this DMV called estimated_completion_time we can create a query which neatly displays when a backup or restore operation started, how long SQL Server expects it to take and therefore when it will complete, with a percentage completion column thrown in for good measure.

Here’s the query I put together whilst waiting for my backup performance tests to complete:


select convert(nvarchar(22),db_name(database_id)) as [database],
case command when 'BACKUP DATABASE' then 'DB' when 'RESTORE DATABASE' then 'DB RESTORE' when 'RESTORE VERIFYON' then 'DB RESTORE/VERIFYING' else 'LOG' end as [type],
start_time as [started],
dateadd(mi,estimated_completion_time/60000,getdate()) as [finishing],
datediff(mi, start_time, (dateadd(mi,estimated_completion_time/60000,getdate()))) - wait_time/60000 as [mins left],
datediff(mi, start_time, (dateadd(mi,estimated_completion_time/60000,getdate()))) as [total wait mins (est)],
convert(varchar(5),cast((percent_complete) as decimal (4,1))) as [% complete],
getdate() as [current time]
from sys.dm_exec_requests
where command in ('BACKUP DATABASE','BACKUP LOG','RESTORE DATABASE','RESTORE VERIFYON')

This will return the following:

Figure 1: Backup Progress script output
The database name has been converted to varchar so the output of the query looks ok when result output is set to text mode, otherwise the database name gets padded with blank characters up to the width of the sysname datatype.
It comes in rather useful when running a backup or restore operation where the WITH STATS option has not been used – I’m actually finding that this script is preferrable to the output the WITH STATS option provides, as it can be run on demand and the estimate completion time will get updated with the latest predicted time whenever it is run.
Just bear in mind that if instant file initialisation is off the first few minutes of a new restore will be spent initialising the database files so the predicted end time will be wildly pessimistic. Once the file initialisation is complete the predicted end time will return more accurate information.
I have also found that during a large restore the end time is not accurate for the first 5 minutes or so.

SQL Server wait stats

Monday, February 21st, 2011

The single biggest clue to the source of a performance problem will be from something referred to as the wait stats.

What are waitstats?

In short wait stats are statistics gathered on what resources SQL Server is waiting on internally whilst executing queries.

Why are they useful?

They can often provide very quick and objective evidence of a performance bottleneck by showing cumulative wait times that SQL Server spent on getting access to the CPU, memory or disk access as well as numerous other internal resources.
This post is only going to concentrate on IO related waits.

Gathering this information on at least a daily basis can provide invaluable data on a system’s performance over time and can reveal sudden changes in behaviour caused by e.g. a recent change to an application. Speaking from personal experience, using historically captured wait stats and contrasting them with sudden changes to the topmost wait types is an invaluable way of proving that recent application changes were the trigger for a change in behaviour.

How do I gather these wait stats?

That’s fairly straightforward – it’s a T-SQL statement. First, the SQL Server ‘version independent’ method:

DBCC SQLPERF (WAITSTATS)

The sample output from my local SQL Server 2005 Express installation is below:

Wait Type                        Requests      Wait Time     Signal Wait Time
-------------------------------- ------------- ------------- ----------------
MISCELLANEOUS                    0             0             0
PAGEIOLATCH_NL                   0             0             0
PAGEIOLATCH_KP                   0             0             0
PAGEIOLATCH_SH                   178           327           0
PAGEIOLATCH_UP                   16            436           0
PAGEIOLATCH_EX                   14            0             0
PAGEIOLATCH_DT                   0             0             0
IO_COMPLETION                    173           1279          0
ASYNC_IO_COMPLETION              1             1294          0
CHKPT                            1             202           0
BACKUPIO                         0             0             0
DISKIO_SUSPEND                   0             0             0
IMPPROV_IOWAIT                   0             0             0
WRITELOG                         29            639           15

From SQL Server 2005, the wait stats data has been split into two separate dynamic management views (DMVs) sys.dm_latch_wait_stats and sys.dm_os_wait_stats. The latch wait stats are counters for internal waits within the SQL Server database engine and the OS wait stats accumulate data on waits for external resources such as CPU, disk and memory.

As we’re concentrating on IO waits, its the sys.dm_os_wait_stats we’re interested in.

The following query will grab the IO wait types output:

select * from sys.dm_os_wait_stats
where
wait_type like 'PAGEIO%' or
wait_type like '%IO_COMPLETION' or
wait_type like 'DISKIO%' or
wait_type like 'BACKUPIO%'or
wait_type like 'WRITE%'
order by wait_time_ms

wait_type                                                    waiting_tasks_count  wait_time_ms         max_wait_time_ms     signal_wait_time_ms
------------------------------------------------------------ -------------------- -------------------- -------------------- --------------------
PAGEIOLATCH_SH                                               20892941             110133998            17895                332331
PAGEIOLATCH_EX                                               392948               7270457              2395                 10353
IO_COMPLETION                                                719058               1061552              916                  6929
PAGEIOLATCH_UP                                               4332                 81307                3635                 1243
BACKUPIO                                                     3998                 16393                240                  18
ASYNC_IO_COMPLETION                                          133                  14771                805                  0
PAGEIOLATCH_DT                                               0                    0                    0                    0
PAGEIOLATCH_NL                                               0                    0                    0                    0
PAGEIOLATCH_KP                                               0                    0                    0                    0
DISKIO_SUSPEND                                               0                    0                    0                    0

Incidentally, these queries are only meant to show how to grab the data, if you hunt around on the web you’ll find some elegant queries for processing this data.

This KB from MS discusses these wait types (and more).

SQL Server 2008 R2 has even more granular wait-types to isolate IO related waits, but the above query has been limited to allow it to work from SQL Server 2005.

wait_time_ms is the total accumulated wait time (since that SQL Server instance was last restarted) spent on accessing that resource.
max_wait_time_ms is the peak wait time for one of these requests.
signal_wait_time_ms signal wait time is how long was spent waiting to get access to the runnable queue for that resource, and can indicate CPU pressure if this is high.

If these IO related wait types are high in the list of overall waits (see the Waits and Queues article from Microsoft lower down), then you have an IO issue.
This is either down to the performance of the IO subsytem, or the physical design of the database. The design issues could be due to the file/filegroup layout of the database concerned, but also related to the layout of tempdb (in a previous role it never ceased to amaze me how much thought went into the layout of an application database at various customer sites, whilst tempdb was left e.g. at it’s 8MB starting size with (just one) data file and log file sharing the same drives and/or sharing application database drives).

Take a close look at the particular wait type concerned. If the WRITELOG wait type is dominant, you know it’s the log drive. If it’s BACKUPIO, we can infer that it’s the backup drive(s).
We can’t always blame the disks, however. For example, if the PAGEIOLATCH_SH is high in the list, we know it’s the data drive(s), but this could also point to database design issues – if some of the tables had better indexes the query optimizer can come up with a more efficient query plan and retrieve the same data with far fewer IOs.

These are just a few examples, if you gather this data regularly a clear picture can be built up of what performance issues are affecting your systems, and by picking upon trends you can get advance warning of impending issues.

For further information on wait types and what all the output columns mean, there is an excellent Waits and Queues document on the Microsoft website which goes through them and is highly recommended reading.

The most telling output can be from identifying IO stalls.
SQL Server 2000 had fn_virtual_filestats and this has now been incorporated into sys.dm_io_virtual_file_stats which is coming up in the next post…