Archive for January, 2013

Column store solutions

Column store indexes arrived in the enterprise edition of SQL Server 2012. Two of the limitations are as follows:

Only nonclustered column store indexes can be defined (which means you still need to allocate storage space for the underlying heap or clustered B tree data) and

Once a nonclustered column store index has been defined the parent table or partition can only be used for read-only purposes.

Both of these limitations are rumored to be lifted in the next version of the product.

Putting these and many other technical limitations aside, purely on the licensing front many businesses simply won’t want to incur the huge cost of the enterprise edition for a feature they expect to see as core for a modern database management system. Granted the current product offers us the best of both worlds, allowing us to combine row based data with column based data but will that be enough of a selling point ? Particularly when many other vendors are offering pure column based storage with high peformance methods (similar to bulk copy/insert) to import data directly into their read/write column stores.

One of the many vendors in the column store space is Actian Corporation (formerly Ingres Corp.) (http://www.actian.com/). Their Vectorwise analytic database product is well ranked in the TPC-H Top Ten Performance Results (http://www.tpc.org/tpch/results/tpch_perf_results.asp). From a SQL Server perspective it’s worth comparing this product’s capabilities with column store in SQL Server 2012. To this end the free 30 day trial Windows 64 bit version was downloaded and installed on a virtual machine configured with 10GB disk space, 8GB RAM and 2 virtual CPUs. The download is available at http://www.actian.com/downloads/vectorwise#trial. One of the prerequisites for one of the graphical administration tools is the Java Run Time environment so this was downloaded and installed as well.

Coming from an Ingres heritage it was pleasing to see that ANSI SQL is used as the standard query language for Vectorwise. The Actian Director GUI (equivalent of MS SQL Management Studio) was used to create a test database, create a table and populate it with 46+ million rows. The table structures used were the same as the ones posted by Gavin Payne (Microsoft Certified Architect, Microsoft Certified Master at Coeo, UK) at this link:

http://gavinpayneuk.com/2012/07/22/this-is-columnstore-part-1/

create table tblfact (custid integer not null, prodid integer not null, sale money not null) with structure=vectorwise

create table dimprod (prodid integer not null,prodname varchar(200) not null) with structure=vectorwise

insert into dimprod (prodid, prodname) 
values 
(1, 'Car'),(2, 'Van'),(3, 'Truck'), 
(4, 'Taxi'), (5, 'Horse'), (6, 'Helicopter'), 
(7, 'BizJet'),(8, 'JumboJet'),(9, 'Ferry'), 
(10, 'Hovercraft')

-- 100 different customers, 10 different productions, sales in the range 1 to 1000000 
-- This insert was run manually ten times...
insert into tblfact (custid, prodid, sale) 
select (randomf()*100)+1, (randomf()*10)+1,(randomf()*1000000)+1

-- ...and then this was run multiple times to 'bulk up' the volume to 46,137,344 rows
insert into tblfact (custid, prodid, sale) 
select (randomf()*100)+1, (randomf()*10)+1,(randomf()*10000)+1
from tblfact

-- Running this query returned the sum of sales for all ten products in less than 2 seconds
select dimprod.prodname, sum(tblfact.sale),count(*) 
from dimprod
inner join tblfact
on tblfact.prodid = dimprod.prodid 
group by dimprod.prodname

-- Left joins (although not required in this case) also work (In SQL Server the columnstore index is used although row mode as opposed to batch mode processing is used in the execution plan)
select dimprod.prodname,sum(tblfact.sale),count(*)
from dimprod
left join tblfact
on tblfact.prodid = dimprod.prodid 
group by dimprod.prodname

Vectorwise querying

The on-disk size of the column stores were as follows:

Directory of C:\Program Files\Actian\VectorwiseVW\ingres\data\vectorwise\testingdb\CBM\default

27/01/2013  07:35    <DIR>          .
27/01/2013  07:35    <DIR>          ..
26/01/2013  19:04                64 db_key
27/01/2013  00:54             4,096 _serveradminSdimprod__prodid_00000036
27/01/2013  00:54             4,096 _serveradminSdimprod__prodname_00000037
27/01/2013  00:50        44,515,328 _serveradminStblfact__custid_00000033
27/01/2013  00:50        28,516,352 _serveradminStblfact__prodid_00000034
27/01/2013  00:50       373,329,920 _serveradminStblfact__sale_00000035
6 File(s)    446,369,856 bytes

So 46,137,344 rows each of size 16 bytes = 704MB raw data compressed down to 426MB of column store data files. From the file names you can tell which columns have the largest size. Not surprisingly the sale column is the largest and the product column with only a spread of ten products is the smallest.

Both JDBC and ODBC can be used to query the database. A very rich and broad range of SQL constructs can be used to query the database including, for example, the windowing analytic functions.

To compare with SQL Server 2012 the same tables were created and populated with same number of rows. One minor difference was that a clustered primary key on a rowid identity column was added to the tblfact table. This was done to minimise the size of the column store indexes (heap table produce larger column store indexes than clustered B trees – see ‘Why is a columnstore index built from a heap larger than a columnstore index built on the same data from a clustered B-tree?’ at http://social.technet.microsoft.com/wiki/contents/articles/3540.sql-server-columnstore-index-faq-en-us.aspx )

The following ‘bulk up’ query was used to fill the tblfact table:

insert into tblfact (custid, prodid, sale)
SELECT TOP 2000000
(ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT))%100)+1,
(ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT))%10)+1,
(ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT))%1000000)+1
from tblfact

The compressed size (using data_compression=page) of this tblfact table was 686MB. The columnstore index on (custid, prodid, sale) was  261MB.

— Queries to fetch column store size

select lob_reserved_page_count/128.0
from sys.dm_db_partition_stats
where index_id<>1
and object_id = object_id(‘tblfact’)

select sum(css.on_disk_size)/(1024*1024.0)
from sys.column_store_segments css
inner join sys.partitions p
on css.hobt_id = p.hobt_id
where p.object_id = object_id(‘tblfact’)
— and column_id = 4

SQL Server columnstore

SQL Server columnstore

Outside the Microsoft world there are many products available that do blazingly fast business analytics with multi-billion row data stored purely as column stores. Many of these products have column store as a core feature and not a costly enterprise feature. With the current limitations of the functionality, with the next release some way off and with other vendors with feature rich products already available it makes sense for businesses to evaluate all options.

Leave a comment

Collection of Powershell clustering commands

Here are a collection of Powershell Windows clustering commands that I’ve found to be quite useful:

Start by opening a Powershell console under Administrator rights and running

import-module failoverclusters

And then to see the cluster resources on a cluster named SQLCLUSTER run

get-clusterresource -cluster sqlcluster

pscluster1

With the file share witness we have a node and file share majority quorum configuration. Let’s see where the file share is located:

get-clusterresource -cluster sqlcluster | where-object {$_.ResourceType -like “File Share Witness”} | get-clusterparameter

pscluster2

Visually correlating the virtual server and instance names to form a server\instance name pair is quite difficult. Let’s use this to see if we can do this with script:

get-clusterresource -cluster sqlcluster | where-object {$_.ResourceType -like “SQL Server”} | get-clusterparameter -cluster sqlcluster VirtualServerName,InstanceName

pscluster3

This is better but we can get the name pair by grouping on the first column:

get-clusterresource -cluster sqlcluster | where-object {$_.ResourceType -like “SQL Server”} | get-clusterparameter -cluster sqlcluster VirtualServerName,InstanceName | group-object ClusterObject | select-object @{Name = “SQLInstance”;Expression = {[string]::join(“\”,($_.Group | select-object -expandproperty Value))}}

pscluster4

We’d like to see the nodes that the SQL Servers are on and their current state:

get-clusterresource -cluster sqlcluster | where-object {$_.ResourceType -like “SQL Server”} | foreach-object {($ownernode,$state) = ($_.OwnerNode,$_.State); $_ | get-clusterparameter -cluster sqlcluster -Name VirtualServerName,InstanceName | group-object ClusterObject | select @{Name=”SQLInstance”;Expression = {[string]::join(“\”,($_.Group | select-object -expandproperty Value))}},@{Name=”OwnerNode”;Expression={$ownernode}},@{Name=”State”;Expression={$state}}}

pscluster5

Turning to the availability groups in the first screenshot it seems each one gets its own resource group. Let’s filter just the availability groups and their resources (some may have availability group listeners defined, others may not):

get-clusterresource -cluster sqlcluster -inputobject (get-clusterresource -cluster sqlcluster | where-object {$_.ResourceType -like “SQL Server Availability Group”}).OwnerGroup

pscluster6

Here in the first column you can see the availability group name (MyHAGroup), the listener’s IP address (192.168.81.10) and the listener name (SQLHA1). This matches all the details you would see within the SQL Management Studio under the Availability Group configuration.

These quick one liners may be useful if you’re dealing with multiple SQL instances consolidated onto the same Windows cluster.

3 Comments

DBCC CheckDB internal snapshot size

There are various ways to see the on-disk size for manually created database snapshots . For example, sys.dm_io_virtual_file_stats returns a size_on_disk_bytes column. You can monitor this column to see the effect of ‘copy on write’ for the sparse files that make up the database snapshot. One type of snapshot that we potentially would like to see the details for would be the internal snapshots that are created when DBCC CheckDB or DBCC CheckTable are run. We know that the DBCC internal snapshot files are created in the root of each data file’s home volume. We know that we have to provision enough space for the size of these DBCC internal snapshot files. Unfortunately, sys.dm_io_virtual_file_stats does not show details for these files. We know that the size of these files will be dependent on the amount of write activity of the checked database. CheckDBs against large databases can run for quite some time. Concurrent user activity could give rise to large DBCC internal snapshots as a result of the ‘copy on write’ effect. In an ideal world we’d like to move the CheckDB operational maintenance window so that internal snapshots are as small as possible. Without knowing the exact size of the snapshots this becomes quite difficult.

One simple but indirect way of examining the size of the DBCC internal snapshots is to monitor the reduction in free space for each of the data file’s volumes (assuming the data files are not growing). The reduction would be equivalent to the size of the hidden snapshot files.

Since SQL Server itself exposes no figures for the DBCC internal snapshot files the functionality within Windows was examined to try to identify anything related to the size of compressed or sparse files. Doing searches a very good reply by Jeffrey Snover [MSFT] was found in this StackOverflow thread:

http://stackoverflow.com/questions/554010/how-to-get-the-actual-size-on-disk-of-a-file-from-powershell

This Powershell snippet was include in the following script:

add-type -type  @'
using System;
using System.Runtime.InteropServices;
using System.ComponentModel;

namespace Win32Functions{

public class ExtendedFileInfo
{
[DllImport("kernel32.dll", SetLastError=true, EntryPoint="GetCompressedFileSize")]
static extern uint GetCompressedFileSizeAPI(string lpFileName, out uint lpFileSizeHigh);

public static ulong GetCompressedFileSize(string filename)
{
uint high;
uint low;

low = GetCompressedFileSizeAPI(filename, out high);

int error = Marshal.GetLastWin32Error();

if (high == 0 && low == 0xFFFFFFFF && error != 0)
throw new Win32Exception(error);
else
return ((ulong)high << 32) + low;
}
}
}
'@

$datafiles = @("\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf","\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf","\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf")

while (1)
{
Start-Sleep -Seconds 1

$thistime = (Get-Date).DateTime

$streams = $datafiles | foreach-object { $_ + ":" + (get-item $_ -stream * | where-object {$_.Stream -ne ':$DATA'}).Stream }

$streams | where-object {$_ -match 'MSSQL_DBCC'} | foreach-object { $size = [Win32Functions.ExtendedFileInfo]::GetCompressedFileSize($_); "$thistime,$size,$_" }  | out-file -Append "c:\temp\dbccstreams.txt"
}

The $datafiles array holds the names of the database data files. Inside the while loop we enumerate any NTFS streams, excluding the main stream ($DATA), that exist for these data files. The script uses the get-item -stream parameter that is new in Powershell 3.0 (I recommend this be installed on your monitoring workstation or server and not your production SQL Server). The GetCompressedFileSize function is used to enumerate the size in bytes of each stream. The while loop samples the sizes every 1 second.

In a typical execution you would run the Powershell script on your monitoring server, you’d then invoke DBCC CheckDB on your SQL Server. When this completes you’d hit control-break on the Powershell script to exit and then finally examine the text results in the c:\temp\dbccstreams.txt file.

You can see from the hard-coded elements of the script that my database consisted of three data files on three separate volumes (S:, U: and V:) The data files were sized at 1024MB each. The database consisted of one very simple table with sample data as follows:


-- Create test table. Note, it's a bad idea to cluster on NEWID(). This is done purely for testing.
CREATE TABLE dbo.TestTable
(
id uniqueidentifier NOT NULL DEFAULT NEWID() PRIMARY KEY,
val char(100) NOT NULL DEFAULT REPLICATE('abcd',25)
)
GO

-- Populate with data
SET NOCOUNT ON

DECLARE @i int = 1

WHILE (1=1)
BEGIN
INSERT INTO dbo.TestTable DEFAULT VALUES

IF (@i % 100 = 0)
BEGIN
-- Exit if database is over 40% full
IF (SELECT SUM(FILEPROPERTY(name,'spaceused')/128.0)/(SUM(size)/128.0) FROM sys.database_files WHERE type_desc <> 'LOG') > 0.40 BREAK
END

SET @i = @i + 1

END

When DBCC CheckDB was run against this data with no concurrent user activity the output of the Powershell monitoring script was as follows:

01 January 2013 09:00:59,131072,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:00:59,65536,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:00:59,65536,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:01:00,131072,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:01:00,65536,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:01:00,65536,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:01:01,131072,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:01:01,65536,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:01:01,65536,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:01:02,131072,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:01:02,65536,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:01:02,65536,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:01:03,131072,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:01:03,65536,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:01:03,65536,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:01:04,131072,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:01:04,65536,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:01:04,65536,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:01:05,131072,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:01:05,65536,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:01:05,65536,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:01:06,131072,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:01:06,65536,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:01:06,65536,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:01:07,131072,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:01:07,65536,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:01:07,65536,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:01:08,131072,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:01:08,65536,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:01:08,65536,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:01:10,131072,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:01:10,65536,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:01:10,65536,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8

Charting this data in a pivot chart in Excel we see this:

dbcc_with_no_user_activity

You can see the name format of the DBCC internal snapshot files (streams, to be precise) and the fact that they are always a multiple of 64K in size.

Now for DBCC CheckDB with concurrent user activity (in my case, with just one connection running the population of dbo.TestTable with default values up from 40% to 60% database space used) the following results were obtained running the Powershell monitoring script:

01 January 2013 09:07:36,65536,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:07:36,65536,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:07:36,65536,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:07:37,3080192,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:07:37,2555904,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:07:37,2752512,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:07:38,3473408,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:07:38,3145728,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:07:38,3276800,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:07:39,3735552,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:07:39,3997696,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:07:39,3538944,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:07:40,4390912,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:07:40,4456448,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:07:40,4128768,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:07:41,4718592,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:07:41,5242880,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:07:41,4587520,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:07:42,5373952,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:07:42,5898240,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:07:42,5177344,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:07:43,5636096,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:07:43,6684672,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:07:43,5701632,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:07:44,5898240,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:07:44,7143424,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:07:44,6160384,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:07:45,6553600,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:07:45,7995392,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:07:45,6815744,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:07:47,6881280,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:07:47,8388608,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:07:47,7536640,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:07:48,7340032,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:07:48,9568256,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:07:48,8192000,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8
01 January 2013 09:07:49,8978432,\\SQLVSERVER1\s$\MSSQL\Data\MyCheckDBTest_data1.mdf:MSSQL_DBCC8
01 January 2013 09:07:49,11272192,\\SQLVSERVER1\u$\MSSQL\Data\MyCheckDBTest_data2.ndf:MSSQL_DBCC8
01 January 2013 09:07:49,10092544,\\SQLVSERVER1\v$\MSSQL\Data\MyCheckDBTest_data3.ndf:MSSQL_DBCC8

Pivot charting this in Excel shows:
dbcc_with_user_activity

The vertical axis is size in bytes so you can see that for my very small database we’re not seeing huge snapshot files. Again, the sizes are always multiples of 64K. We can see the increase in size on all three data files as a result of the ‘copy on write’ effect.

Running the Powershell monitoring script against larger databases with high volumes of concurrent update activity will generate snapshot file sizes in the hundreds of megabyte to gigabyte range. If your execution of DBCC CheckDB is scheduled then you may observe smaller snapshot file sizes by shifting the start time to a different value. When you originally set the schedule for the DBCC execution what you assumed was a quiet time for your database may well have been a busy time.

 

 

 

 

 

 

Leave a comment