sqlscope
This user hasn't shared any biographical information
Homepage: https://sqlscope.wordpress.com
SQL deployments on Kubernetes
Posted in Kubernetes on April 20, 2019
Setup SQL Server on Kubernetes running on Mac. Uses minikube for a single node cluster.
Prerequisites: use Homebrew to install minikube and kubernetes-cli
brew cask install minikube
brew install kubernetes-cli
Start minikube specifying the hyperkit driver.
These commands to manually create a minikube sql instance:
minikube start –vm-driver=hyperkit
kubectl run mssql –image=microsoft/mssql-server-linux –port=1433 –env ACCEPT_EULA=Y –env SA_PASSWORD=P455w0rd
kubectl expose deployment mssql –type=LoadBalancer
minikube service mssql –url
These commands to remove the sql instance:
kubectl delete service mssql
kubectl delete deployment mssql
Using YAML files
Create the persistent volume claims:
kubectl apply -f ~/MinikubeApps/mssql1/mssql1pvc.yaml
kubectl apply -f ~/MinikubeApps/mssql2/mssql2pvc.yaml
kubectl describe pvc mssql1pvc
kubectl describe pvc mssql2pvc
Examine the persistent volumes using the VOLUME name:
kubectl get pv pvc-e5f45a8c-5ac1-11e9-93b8-16777a0068b3
Create secret to hold the sa password:
kubectl create secret generic mssql –from-literal=SA_PASSWORD=”P455w0rd123″
Create the deployments:
kubectl apply -f ~/MinikubeApps/mssql1/mssql1deployment.yaml
kubectl apply -f ~/MinikubeApps/mssql2/mssql2deployment.yaml
Load balancer service:
kubectl apply -f ~/MinikubeApps/mssql1/mssql1service.yaml
kubectl apply -f ~/MinikubeApps/mssql2/mssql2service.yaml
kubectl describe deployment mssql1deploy
kubectl describe deployment mssql2deploy
kubectl describe service mssql1service
kubectl describe service mssql2service
Connect using the IP and port number
minikube service mssql1service –url
minikube service mssql2service –url
Examine the database files
The persistent volumes are in the node /tmp folder so ssh into node:
minikube ssh
ls /tmp/hostpath-provisioner/
ls /tmp/hostpath-provisioner/pvc-70c9d10d-5ac2-11e9-93b8-16777a0068b3/data
ls /tmp/hostpath-provisioner/pvc-70c9d10d-5ac2-11e9-93b8-16777a0068b3/log
Examine logs
kubectl logs deployment/mssql1deploy
kubectl logs deployment/mssql2deploy
kubectl logs –tail 5 –selector app=mssql1app
kubectl logs –tail 5 –selector app=mssql2app
Referenced YAML files:
~/MinikubeApps/mssql1/mssql1pvc.yaml
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mssql1pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi
~/MinikubeApps/mssql2/mssql2pvc.yaml
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mssql2pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi
~/MinikubeApps/mssql1/mssql1deployment.yaml
apiVersion: apps/v1beta1 kind: Deployment metadata: name: mssql1deploy spec: replicas: 1 template: metadata: labels: app: mssql1app spec: terminationGracePeriodSeconds: 10 containers: - name: mssql1 image: microsoft/mssql-server-linux ports: - containerPort: 1433 securityContext: privileged: true env: - name: ACCEPT_EULA value: "Y" - name: SA_PASSWORD valueFrom: secretKeyRef: name: mssql key: SA_PASSWORD volumeMounts: - name: mssql1mount mountPath: /var/opt/mssql volumes: - name: mssql1mount persistentVolumeClaim: claimName: mssql1pvc
~/MinikubeApps/mssql2/mssql2deployment.yaml
apiVersion: apps/v1beta1 kind: Deployment metadata: name: mssql2deploy spec: replicas: 1 template: metadata: labels: app: mssql2app spec: terminationGracePeriodSeconds: 10 containers: - name: mssql2 image: microsoft/mssql-server-linux ports: - containerPort: 1433 securityContext: privileged: true env: - name: ACCEPT_EULA value: "Y" - name: SA_PASSWORD valueFrom: secretKeyRef: name: mssql key: SA_PASSWORD volumeMounts: - name: mssql2mount mountPath: /var/opt/mssql volumes: - name: mssql2mount persistentVolumeClaim: claimName: mssql2pvc
~/MinikubeApps/mssql1/mssql1service.yaml
apiVersion: v1 kind: Service metadata: name: mssql1service spec: selector: app: mssql1app ports: - protocol: TCP port: 1433 targetPort: 1433 type: LoadBalancer
~/MinikubeApps/mssql2/mssql2service.yaml
apiVersion: v1 kind: Service metadata: name: mssql2service spec: selector: app: mssql2app ports: - protocol: TCP port: 1433 targetPort: 1433 type: LoadBalancer
Docker volumes on Mac
Posted in Docker on April 5, 2019
The Docker daemon on MacOS runs within a Hyperkit virtual machine. When you create a named volume using, for example,
docker create volume mssqlsystem
this volume in a Linux environment would actually be mapped to
/var/lib/docker/volumes/
On the MacOS host machine you won’t find this path. It resides within the Hyperkit virtual machine. To access the path you use the screen
command:
screen ~/library/containers/com.docker.docker/data/vms/0/tty
Having entered the VM session you now change directory to the volume path
cd /var/lib/docker/volumes/
Within this you see a folder for each of the Docker volumes that you have created. For the example mentioned above I see persisted SQL files in
/var/lib/docker/volumes/mssqlsystem/_data/data
/var/lib/docker/volumes/mssqlsystem/_data/log
/var/lib/docker/volumes/mssqlsystem/_data/secrets
For the example I mapped the Docker mssqlsystem volume to the /var/opt/mssql directory within the SQL Server container.
You exit the screen
session by typing CTRL-a followed either by d which detaches the session or k which terminates the session. To list any detached sessions you use
screen -ls
You can re-enter a detached session if you have only one of them using
screen -r
or
screen -r <yourlistedttyfullname>
if you have more than one listed name.
A good reference for screen
is https://linuxize.com/post/how-to-use-linux-screen/
Contrast volumes with bind mounts which map directly to a folder on the host:
docker run --name testsqlcontainer --volume "/Users/macuser/Documents/ContainerShare":/MyShared --env 'ACCEPT_EULA=Y' --env 'SA_PASSWORD=P4$$w0rd' --publish 1400:1433 --detach mcr.microsoft.com/mssql/server:2019-CTP2.4-ubuntu
OSTRESS doesn’t push SQL Server as hard as you think
Posted in Benchmarking on July 29, 2018
Many folks recommend using the RML Utilities OSTRESS.EXE as a benchmarking/stress testing tool. The problem is there seems to be an internal bottleneck limiting the tool to make only about 500 SQL calls per second per thread. The following illustrates the problem.
Create this simple no-op stored procedure in your testing database
CREATE PROCEDURE dbo.noopproc AS BEGIN RETURN END
Now use OSTRESS to start one thread calling the stored procedure 20,000 times
ostress -Syoursqlserver -dtestingdb -E -Q”dbo.noopproc” -n1 -r20000
Output:
07/29/18 12:11:50.276 [0x00001ED0] I/O Completion manager started
07/29/18 12:11:50.277 [0x00001558] Attempting DOD5015 removal of [C:\Users\xxx\AppData\Local\Temp\output\ostress.log]
07/29/18 12:11:50.280 [0x00001558] OSTRESS, A Generic ODBC-Based Stress/Replay Utility.
Version 9.04.0051 built for x64.
Copyright ⌐ 1997-2014 Microsoft. All Rights Reserved
07/29/18 12:11:50.281 [0x00001558] Computer: xxx
07/29/18 12:11:50.282 [0x00001558] Base Module: C:\Program Files\Microsoft Corporation\RMLUtils\ostress.exe
07/29/18 12:11:50.282 [0x00001558] Process Id: 1400
07/29/18 12:11:50.283 [0x00001558] Active proc mask(0): 0x000000FF
07/29/18 12:11:50.283 [0x00001558] Architecture: 9
07/29/18 12:11:50.284 [0x00001558] Page size: 4096
07/29/18 12:11:50.284 [0x00001558] CPUs: 8
07/29/18 12:11:50.285 [0x00001558] Processor groups: 1
07/29/18 12:11:50.285 [0x00001558] Highest node: 0
07/29/18 12:11:50.286 [0x00001558] Proximity: 00 Node: 00
07/29/18 12:11:50.287 [0x00001558] —————————————
07/29/18 12:11:50.288 [0x00001558] Group: 0
07/29/18 12:11:50.288 [0x00001558] —————————————
07/29/18 12:11:50.288 [0x00001558] Processor(s): 0x00000003 Function units: Shared
07/29/18 12:11:50.289 [0x00001558] Processor(s): 0x0000000C Function units: Shared
07/29/18 12:11:50.290 [0x00001558] Processor(s): 0x00000030 Function units: Shared
07/29/18 12:11:50.290 [0x00001558] Package mask: 0x000000FF
07/29/18 12:11:50.291 [0x00001558] Processor(s): 0x000000C0 Function units: Shared
07/29/18 12:11:50.291 [0x00001558] Processor(s): 0x000000FF assigned to Numa node: 0
07/29/18 12:11:50.292 [0x00001558] Current time bias: -60 minutes -1.00 hours DST Adjusted GMT Daylight Time
07/29/18 12:11:50.292 [0x00001558] WARNING: -o parameter was not supplied; using default output path of <temporary directory>\output [C:\Users\xxx\AppData\Local\Temp\output\]
07/29/18 12:11:50.293 [0x00001558] Max threads setting: 10000
07/29/18 12:11:50.293 [0x00001558] Arguments:
07/29/18 12:11:50.294 [0x00001558] -Sxxx
07/29/18 12:11:50.294 [0x00001558] -dtestingdb
07/29/18 12:11:50.295 [0x00001558] -E
07/29/18 12:11:50.295 [0x00001558] -Qdbo.noopproc
07/29/18 12:11:50.295 [0x00001558] -n1
07/29/18 12:11:50.296 [0x00001558] -r20000
07/29/18 12:11:50.298 [0x00001558] Using language id (LCID): 1024 [English_United States.1252] for character formatting with NLS: 0x0006020F and Defined: 0x0006020F
07/29/18 12:11:50.299 [0x00001558] Default driver: SQL Server Native Client 11.0
07/29/18 12:11:50.299 [0x00001558] Attempting DOD5015 removal of [C:\Users\xxx\AppData\Local\Temp\output\query.out]
07/29/18 12:11:50.302 [0x00001558] Starting query execution…
07/29/18 12:11:50.304 [0x00001558] BETA: Custom CLR Expression support enabled.
07/29/18 12:11:50.305 [0x00001558] Creating 1 thread(s) to process queries
07/29/18 12:11:50.305 [0x00001558] Worker threads created, beginning execution…
07/29/18 12:12:28.335 [0x00001558] Total IO waits: 0, Total IO wait time: 0 (ms)
07/29/18 12:12:28.336 [0x00001558] OSTRESS exiting normally, elapsed time: 00:00:38.037
So 38 seconds to call the no-op stored procedure 20,000 times from one thread using OSTRESS .EXE. That’s not very fast.
Compare to a simple Powershell script making the same 20,000 calls.
$connstring = 'Server=yoursqlserver;Database=testingdb;Trusted_Connection=True' $connection = New-Object System.Data.SqlClient.SqlConnection $connstring $connection.Open() $command = New-Object System.Data.SqlClient.SqlCommand('dbo.noopproc', $connection) $command.CommandType = [System.Data.CommandType]'StoredProcedure' $starttime = Get-Date (1..20000) | ForEach-Object { $ret = $command.ExecuteNonQuery() } Write-Host "Duration: " (New-TimeSpan -Start $starttime -End (Get-Date)).TotalSeconds $connection.Close() | Out-Null $command.Dispose() | Out-Null $connection.Dispose() | Out-Null
Output:
Duration: 1.570418
So 1.57 seconds to make the same make the same 20,000 stored procedure calls.
Performance monitor showing the difference in batch requests per second
This was tested against a SQL 2017 Developer Edition instance patched to Cumulative Update 9.
The recommendation is to not use OSTRESS as your benchmarking/stress testing tool.
Number of copies based retention for database backups
Posted in Backups on December 26, 2016
Ola Hallengren’s DatabaseBackup stored procedure implements retention of database backups based on the age of backups in hours. For example, if you specify the @CleanupTime parameter as 24 then when the next backup is run if the existing backup is older than 24 hours it will be deleted. If you want to retain two backups you would specify 48.
For certain cases a more natural way of specifying retention would be through number of copies. The attached version of Ola’s DatabaseBackup stored procedure implements this via the @CleanupCopies and @MirrorCleanupCopies parameters. The modified version of Ola’s stored procedure calls dbo.CleanupCopies to do the actual deletions:
CREATE PROCEDURE [dbo].[CleanupCopies] @DirectoryPath nvarchar(4000), @CleanupCopies int, @NumberOfFiles int, @NumberOfDirectories int AS BEGIN SET NOCOUNT ON DECLARE @RetainFiles int = @CleanupCopies * @NumberOfFiles / @NumberOfDirectories, @CurrentFileID int, @CurrentFileName nvarchar(4000), @ReturnCode int = 0 DECLARE @CurrentFiles TABLE (ID int IDENTITY (1,1) PRIMARY KEY NOT NULL, FileName varchar(4000) NULL, Depth int NULL, IsFile int NULL, CleanupCompleted bit NULL, CleanupOutput int NULL) INSERT @CurrentFiles (FileName,Depth,IsFile) EXEC [master].dbo.xp_dirtree @DirectoryPath,0,1 SET @ReturnCode = @@ERROR IF @ReturnCode <> 0 RETURN @ReturnCode DELETE FROM @CurrentFiles WHERE ID IN ( SELECT TOP(@RetainFiles) ID FROM @CurrentFiles ORDER BY FileName DESC ) UPDATE @CurrentFiles SET CleanupCompleted = 0, CleanupOutput = 0 -- Do the deletions WHILE EXISTS (SELECT * FROM @CurrentFiles WHERE CleanupCompleted = 0) BEGIN SELECT TOP 1 @CurrentFileID = ID, @CurrentFileName = @DirectoryPath + N'\' + FileName FROM @CurrentFiles WHERE CleanupCompleted = 0 ORDER BY ID ASC EXECUTE @ReturnCode = [master].dbo.xp_delete_file 0, @CurrentFileName UPDATE @CurrentFiles SET CleanupCompleted = 1, CleanupOutput = @ReturnCode WHERE ID = @CurrentFileID END SELECT @ReturnCode = ISNULL(MAX(CleanupOutput),0) FROM @CurrentFiles RETURN @ReturnCode END
Caveats:
Retention based on number of copies is not available for transaction log backups. Retention based on age is best for these.
It only caters for native backups (ie not implemented for LiteSpeed, Red Gate and Idera backups). The reason being the solution makes use of xp_delete_file which only handles native backups.
Table driven procedure parameter values
Posted in Maintenance on November 12, 2016
This posting shows one approach for the lookup of stored procedure parameter values. The example use case will be the very popular SQL Server Maintenance Solution authored by Ola Hallengren (https://ola.hallengren.com/).
The solution consists of three parameterised stored procedures, one for database backups, one for database integrity checks and one for index optimisations. These procedures can be run as SQL Agent TSQL job steps. The stored procedure calls along with their parameter values can be encoded in these job steps. But maintenance can become a bit of a problem.
Suppose you have many databases on your SQL Server. You have multiple backup job steps with all but one of the backup stored procedure parameters being different. They differ in terms of which directory they write their backups to. Suppose you are low on storage so a new volume has been assigned. You now need to identify and update those job steps that are referring to the close to full volumes and change the directory name parameter to the newly presented volume. This can be done manually using the GUI or you could hack the msdb.dbo.sysjobsteps table updating the command column. As another example suppose you have multiple job stops implementing different index optimisation parameters. Again you could refer to the sysjobsteps table.
A far better approach would be to store the parameter values in a lookup table. You would have three lookup tables, one for each of the maintenance operations – backup, integrity check and index optimisation. You list all the variations of the parameters as separate rows in these three tables with each row having a descriptive tag name. This tag name column be the primary key for that table. So a given tag name would refer to all the parameter values for an execution of that operation.
Taking the dbo.DatabaseBackup stored procedure as an example these are the parameters (as of the time of writing):
SELECT * FROM INFORMATION_SCHEMA.PARAMETERS WHERE SPECIFIC_CATALOG = db_name() AND SPECIFIC_SCHEMA = 'dbo' AND SPECIFIC_NAME = 'DatabaseBackup' ORDER BY ORDINAL_POSITION
The screenshot doesn’t show the complete list of parameters. There are 36 in total for this procedure. With the exception of @Databases (which will be dealt with separately later) we want to pivot so that PARAMETER_NAME values (with the leading @ symbol ignored) become our column names. In addition, the first column name will be named ParamTag and be the primary key. The second column will be a fixed value, the stored procedure name. Here’s a procedure that accomplishes this task:
/* First create a common 'parent' table for all three procedures. Storing the ParamTag value is sufficient for the data model */ /* This common table will be required when permitted operations for groups of databases is discussed */ CREATE TABLE dbo.OperationParams ( OperationType sysname NOT NULL CONSTRAINT CK_OperationParams CHECK (OperationType IN ('DatabaseBackup','DatabaseIntegrityCheck','IndexOptimize')), ParamTag nvarchar(255) NOT NULL, CONSTRAINT PK_OperationParams PRIMARY KEY CLUSTERED (OperationType,ParamTag) ) GO CREATE PROCEDURE [dbo].[CreateOperationParamTable] @procname sysname AS BEGIN DECLARE @sqlstmt nvarchar(max) -- Build the column list first SET @sqlstmt = ( SELECT ',[' + STUFF(PARAMETER_NAME,1,1,'') + '] ' + DATA_TYPE + CASE WHEN CHARACTER_MAXIMUM_LENGTH IS NULL THEN '' WHEN CHARACTER_MAXIMUM_LENGTH = -1 THEN '(max)' ELSE '(' + STR(CHARACTER_MAXIMUM_LENGTH) + ')' END + ' NULL' FROM INFORMATION_SCHEMA.PARAMETERS WHERE SPECIFIC_CATALOG = db_name() AND SPECIFIC_SCHEMA = 'dbo' AND SPECIFIC_NAME = @procname AND PARAMETER_NAME <> '@Databases' ORDER BY ORDINAL_POSITION FOR XML PATH('') ) -- Add the create table and constraints DECLARE @quotedtablename sysname = QUOTENAME(@procname + 'Params') SET @sqlstmt = N'IF OBJECT_ID(''dbo.' + @quotedtablename + ''') IS NOT NULL DROP TABLE dbo.' + @quotedtablename + ';' + CHAR(13) + CHAR(10) + N'CREATE TABLE dbo.' + @quotedtablename + ' (' + CHAR(13) + CHAR(10) + N'[ParamTag] nvarchar(255) NOT NULL CONSTRAINT ' + STUFF(@quotedtablename,2,0,'PK_') + N' PRIMARY KEY CLUSTERED,' + CHAR(13) + CHAR(10) + N'[OperationType] sysname NOT NULL CONSTRAINT ' + STUFF(@quotedtablename,2,0,'CK_') + N' CHECK ([OperationType] IN (' + QUOTENAME(@procname,CHAR(39)+CHAR(39)) + N')) CONSTRAINT ' + STUFF(@quotedtablename,2,0,'DF_') + N' DEFAULT (' + QUOTENAME(@procname,CHAR(39)+CHAR(39)) + N')' + CHAR(13) + CHAR(10) + @sqlstmt + CHAR(13) + CHAR(10) + N',CONSTRAINT ' + STUFF(@quotedtablename,2,0,'FK_') + N' FOREIGN KEY ([OperationType],[ParamTag]) REFERENCES dbo.OperationParams ([OperationType],[ParamTag]))' -- Run it PRINT 'Running:' + CHAR(13) + CHAR(10) + @sqlstmt EXEC sp_executesql @sqlstmt END
The procedure is then called as follows to create the Params tables for each of Ola’s three stored procedures:
EXEC [dbo].[CreateOperationParamTable] 'DatabaseBackup' EXEC [dbo].[CreateOperationParamTable] 'DatabaseIntegrityCheck' EXEC [dbo].[CreateOperationParamTable] 'IndexOptimize' /* Insert some param values */ INSERT INTO dbo.OperationParams VALUES ('DatabaseBackup','FullBackups') INSERT INTO dbo.DatabaseBackupParams (ParamTag,Directory,BackupType,[Compress],[LogToTable]) values ('FullBackups','c:\dbadmin\backup','FULL','Y','Y') INSERT INTO dbo.OperationParams VALUES ('DatabaseIntegrityCheck','CheckDBs') INSERT INTO dbo.DatabaseIntegrityCheckParams (ParamTag,CheckCommands,[LogToTable]) values ('CheckDBs','CHECKDB','Y') INSERT INTO dbo.OperationParams VALUES ('IndexOptimize','ReindexReorgs') INSERT INTO dbo.IndexOptimizeParams (ParamTag,FragmentationLow,[LogToTable]) values ('ReindexReorgs','INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE','Y') SELECT * FROM dbo.OperationParams SELECT * FROM dbo.DatabaseBackupParams SELECT * FROM dbo.DatabaseIntegrityCheckParams SELECT * FROM dbo.IndexOptimizeParams
This results in creation of the DatabaseBackupParams table, the DatabaseIntegrityCheckParams table and the IndexOptimizeParams table:
With the Databases parameter removed we now have parameter tagged rows that can be applied to one or more groups of databases. Ola’s solution allows the Databases parameter to accept keywords such as USER_DATABASES, SYSTEM_DATABASES or a comma separated list of database names. Having omitted the Databases column from the above three tables we know need these tables to allow us to define groups of databases and the permitted operations for the groups:
CREATE TABLE [dbo].[DatabaseGroupParent] ( [GroupName] [nvarchar](40) NOT NULL, CONSTRAINT [PK_DatabaseGroupParent] PRIMARY KEY CLUSTERED ([GroupName] ASC) ) CREATE TABLE [dbo].[DatabaseGroup] ( [GroupName] [nvarchar](40) NOT NULL CONSTRAINT [FK_DatabaseGroup] FOREIGN KEY REFERENCES [dbo].[DatabaseGroupParent] ([GroupName]), [DatabaseName] [sysname] NOT NULL, CONSTRAINT [PK_DatabaseGroup] PRIMARY KEY CLUSTERED ([GroupName] ASC,[DatabaseName] ASC) ) CREATE TABLE [dbo].[DatabaseGroupOperations] ( [GroupName] [nvarchar](40) NOT NULL CONSTRAINT [FK_DatabaseGroupOperations_1] FOREIGN KEY REFERENCES [dbo].[DatabaseGroupParent] ([GroupName]), [OperationType] [sysname] NOT NULL, [ParamTag] [nvarchar](255) NOT NULL, CONSTRAINT [PK_DatabaseGroupOperations] PRIMARY KEY CLUSTERED ([GroupName] ASC,[OperationType] ASC,[ParamTag] ASC), CONSTRAINT [FK_DatabaseGroupOperations_2] FOREIGN KEY ([OperationType], [ParamTag]) REFERENCES [dbo].[OperationParams] ([OperationType], [ParamTag]) ) GO
One issue with Ola’s solution is the default parameters for the IndexOptimize @FragmentationMedium and @FragmentationHigh parameters. If you don’t specify these parameters the values used are ‘INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE’ and ‘INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE’ respectively. This compares to @FragmentationLow where if you do not specify this parameter the value used is NULL which signifies that reindex/reorganise is not used. NULL specified for any of these parameters specifies that reindex/reorganise is not used. In this table driven solution NULL is translated to DEFAULT when the store procedures are called. This presents problems for @FragmentationMedium and @FragmentationHigh. It’s not a problem for @FragmentationLow as NULL is the DEFAULT value. The workaround for @FragmentationMedium and @FragmentationHigh is to use a user defined keyword – for example, NO_INDEX_MAINT. Rather than hardcoding this string in the procedures the solution stores the following table:
CREATE TABLE dbo.OperationParamsNullKeyword ( OperationType sysname NOT NULL, ParamName sysname NOT NULL, NullKeyword nvarchar(max) NOT NULL, CONSTRAINT PK_OperationParamsNullKeyword PRIMARY KEY CLUSTERED (OperationType,ParamName) ) GO INSERT INTO dbo.OperationParamsNullKeyword VALUES ('IndexOptimize','FragmentationMedium','NO_INDEX_MAINT'), ('IndexOptimize','FragmentationHigh','NO_INDEX_MAINT') GO
In effect, the processing stored procedure will translate these keywords to ‘NULL’ and not ‘DEFAULT’. Ola has example ‘C. Update statistics on all user databases’ on his Index and Statistics page. The IndexOptimizeParams table containing
FragmentationLow FragmentationMedium FragmentationHigh UpdateStatistics
NULL NO_INDEX_MAINT NO_INDEX_MAINT ALL
will be translated to
EXECUTE dbo.IndexOptimize
@Databases = ‘USER_DATABASES’,
@FragmentationLow = NULL,
@FragmentationMedium = NULL,
@FragmentationHigh = NULL,
@UpdateStatistics = ‘ALL’
Now on to the wrapper stored procedure that for a specified operation (backup, integrity check or index optimize), a specified group name or specific database name(s) or one of Ola’s database keywords and for a specific operation parameter tag runs Ola’s actual stored procedure. Here’s the stored procedure that does that:
/* Helper stored procedure that creates a string such as ",@Directory='c:\backups',@BackupType='FULL',@Verify='Y'" depending on your actual values in the corresponding Params table */ CREATE PROCEDURE [dbo].[GetOperationParams] @Operation sysname, @OperationParamTag nvarchar(255), @OperationParams nvarchar(max) OUTPUT AS BEGIN DECLARE @sqlstmt nvarchar(max) SET @sqlstmt = STUFF( ( SELECT '+'',@'+COLUMN_NAME+'=''+COALESCE('+ CASE WHEN ParamName IS NULL THEN '' ELSE 'CASE WHEN ['+COLUMN_NAME+']='''+NullKeyword+''' THEN ''NULL'' ELSE NULL END,' END + CASE WHEN CHARACTER_SET_NAME IS NOT NULL THEN 'QUOTENAME(['+COLUMN_NAME+'],CHAR(39)+CHAR(39))' ELSE 'CAST(['+COLUMN_NAME+'] AS NVARCHAR(2000))' END +',''DEFAULT'')' FROM INFORMATION_SCHEMA.COLUMNS LEFT OUTER JOIN dbo.OperationParamsNullKeyword ON OperationType = @Operation AND ParamName = COLUMN_NAME WHERE TABLE_SCHEMA = 'dbo' AND TABLE_NAME = @Operation + N'Params' AND ORDINAL_POSITION > 2 FOR XML PATH('') ), 1,1,'') SET @sqlstmt = N'SELECT @OperationParams = ' + @sqlstmt + N' FROM dbo.' + QUOTENAME(@Operation + 'Params') + ' WHERE ParamTag = ' + QUOTENAME(@OperationParamTag,CHAR(39)+CHAR(39)) EXEC sp_executesql @sqlstmt,N'@OperationParams varchar(max) OUTPUT',@OperationParams = @OperationParams OUTPUT END GO /* The wrapper stored procedure that you would use in your SQL Agent TSQL job step */ CREATE PROCEDURE [dbo].[RunOperation] @Operation sysname, @DatabaseGroup nvarchar(40) = NULL, @DatabaseName sysname = NULL, @OperationParamTag nvarchar(255) AS BEGIN DECLARE @OperationParams nvarchar(max) = NULL EXEC dbo.GetOperationParams @Operation = @Operation,@OperationParamTag = @OperationParamTag,@OperationParams = @OperationParams OUTPUT DECLARE @Databases nvarchar(max) = NULL IF @DatabaseName IS NOT NULL SET @Databases = @DatabaseName ELSE SET @Databases = STUFF( ( SELECT ',' + dg.DatabaseName FROM dbo.DatabaseGroup dg WHERE dg.GroupName = @DatabaseGroup /* Check that the group operation is permitted */ AND EXISTS ( SELECT 1 FROM dbo.DatabaseGroupOperations dgo WHERE dgo.GroupName = @DatabaseGroup AND dgo.OperationType = @Operation AND dgo.ParamTag = @OperationParamTag ) ORDER BY dg.DatabaseName FOR XML PATH('') ), 1,1,'') SET @Databases = '@Databases=''' + @Databases + '''' DECLARE @sqlstmt nvarchar(max) SET @sqlstmt = 'EXEC dbo.' + QUOTENAME(@Operation) + ' ' + @Databases + @OperationParams PRINT 'Running: ' + @sqlstmt EXEC sp_executesql @sqlstmt END GO
An example wrapper call would be
/* Create a database group */ INSERT INTO dbo.DatabaseGroupParent VALUES ('GroupUserDBs1') INSERT INTO dbo.DatabaseGroup VALUES ('GroupUserDBs1','TestDB'), ('GroupUserDBs1','DBAdmin') /* Populate the permitted group operations */ INSERT INTO dbo.DatabaseGroupOperations VALUES ('GroupUserDBs1','DatabaseBackup','FullBackups') /* Now the actual call to backup the database group */ EXEC dbo.RunOperation @Operation = 'DatabaseBackup',@DatabaseGroup='GroupUserDBs1',@OperationParamTag = 'FullBackups'
And the results would be
Running: EXEC dbo.[DatabaseBackup] @Databases='DBAdmin,TestDB',@Directory='c:\dbadmin\backup',@BackupType='FULL',@Verify=DEFAULT,@CleanupTime=DEFAULT,@CleanupMode=DEFAULT,@Compress='Y',@CopyOnly=DEFAULT,@ChangeBackupType=DEFAULT,@BackupSoftware=DEFAULT,@CheckSum=DEFAULT,@BlockSize=DEFAULT,@BufferCount=DEFAULT,@MaxTransferSize=DEFAULT,@NumberOfFiles=DEFAULT,@CompressionLevel=DEFAULT,@Description=DEFAULT,@Threads=DEFAULT,@Throttle=DEFAULT,@Encrypt=DEFAULT,@EncryptionAlgorithm=DEFAULT,@ServerCertificate=DEFAULT,@ServerAsymmetricKey=DEFAULT,@EncryptionKey=DEFAULT,@ReadWriteFileGroups=DEFAULT,@OverrideBackupPreference=DEFAULT,@NoRecovery=DEFAULT,@URL=DEFAULT,@Credential=DEFAULT,@MirrorDirectory=DEFAULT,@MirrorCleanupTime=DEFAULT,@MirrorCleanupMode=DEFAULT,@AvailabilityGroups=DEFAULT,@Updateability=DEFAULT,@LogToTable='Y',@Execute=DEFAULT Date and time: 2016-11-12 21:32:44 Server: WIN2016 Version: 13.0.1601.5 Edition: Developer Edition (64-bit) Procedure: [DBAdmin].[dbo].[DatabaseBackup] Parameters: @Databases = 'DBAdmin,TestDB', @Directory = 'c:\dbadmin\backup', @BackupType = 'FULL', @Verify = 'N', @CleanupTime = NULL, @CleanupMode = 'AFTER_BACKUP', @Compress = 'Y', @CopyOnly = 'N', @ChangeBackupType = 'N', @BackupSoftware = NULL, @CheckSum = 'N', @BlockSize = NULL, @BufferCount = NULL, @MaxTransferSize = NULL, @NumberOfFiles = NULL, @CompressionLevel = NULL, @Description = NULL, @Threads = NULL, @Throttle = NULL, @Encrypt = 'N', @EncryptionAlgorithm = NULL, @ServerCertificate = NULL, @ServerAsymmetricKey = NULL, @EncryptionKey = NULL, @ReadWriteFileGroups = 'N', @OverrideBackupPreference = 'N', @NoRecovery = 'N', @URL = NULL, @Credential = NULL, @MirrorDirectory = NULL, @MirrorCleanupTime = NULL, @MirrorCleanupMode = 'AFTER_BACKUP', @AvailabilityGroups = NULL, @Updateability = 'ALL', @LogToTable = 'Y', @Execute = 'Y' Source: https://ola.hallengren.com Date and time: 2016-11-12 21:32:44 Database: [DBAdmin] Status: ONLINE Standby: No Updateability: READ_WRITE User access: MULTI_USER Is accessible: Yes Recovery model: SIMPLE Encrypted: No Differential base LSN: 34000000250700126 Last log backup LSN: N/A Date and time: 2016-11-12 21:32:44 Command: DECLARE @ReturnCode int EXECUTE @ReturnCode = [master].dbo.xp_create_subdir N'c:\dbadmin\backup\WIN2016\DBAdmin\FULL' IF @ReturnCode <> 0 RAISERROR('Error creating directory.', 16, 1) Outcome: Succeeded Duration: 00:00:00 Date and time: 2016-11-12 21:32:44 Date and time: 2016-11-12 21:32:44 Command: BACKUP DATABASE [DBAdmin] TO DISK = N'c:\dbadmin\backup\WIN2016\DBAdmin\FULL\WIN2016_DBAdmin_FULL_20161112_213244.bak' WITH NO_CHECKSUM, COMPRESSION Processed 488 pages for database 'DBAdmin', file 'DBAdmin' on file 1. Processed 7 pages for database 'DBAdmin', file 'DBAdmin_log' on file 1. BACKUP DATABASE successfully processed 495 pages in 0.026 seconds (148.512 MB/sec). Outcome: Succeeded Duration: 00:00:00 Date and time: 2016-11-12 21:32:44 Date and time: 2016-11-12 21:32:44 Database: [TestDB] Status: ONLINE Standby: No Updateability: READ_WRITE User access: MULTI_USER Is accessible: Yes Recovery model: SIMPLE Encrypted: No Differential base LSN: 34000011863500037 Last log backup LSN: N/A Date and time: 2016-11-12 21:32:44 Command: DECLARE @ReturnCode int EXECUTE @ReturnCode = [master].dbo.xp_create_subdir N'c:\dbadmin\backup\WIN2016\TestDB\FULL' IF @ReturnCode <> 0 RAISERROR('Error creating directory.', 16, 1) Outcome: Succeeded Duration: 00:00:00 Date and time: 2016-11-12 21:32:44 Date and time: 2016-11-12 21:32:44 Command: BACKUP DATABASE [TestDB] TO DISK = N'c:\dbadmin\backup\WIN2016\TestDB\FULL\WIN2016_TestDB_FULL_20161112_213244.bak' WITH NO_CHECKSUM, COMPRESSION Processed 4424 pages for database 'TestDB', file 'TestDB' on file 1. Processed 2 pages for database 'TestDB', file 'TestDB_log' on file 1. BACKUP DATABASE successfully processed 4426 pages in 0.168 seconds (205.778 MB/sec). Outcome: Succeeded Duration: 00:00:00 Date and time: 2016-11-12 21:32:44 Date and time: 2016-11-12 21:32:44
The table driven approach allows you to easily report out which options your solution is using as well as allowing you to change these options without having to make any changes to the SQL Agent jobs themselves. You can apply these options to user defined groups of databases.
Charting virtual file stats using R
Posted in R on March 22, 2015
This blog post shows how to retrieve data using R and chart the results in a number of ways. The example data consists of virtual file stats. The source is the virtual file stats DMV so the ‘measures’ are ever increasing values since the last restart of the SQL Server. The deltas could be calculated using SQL windowing functions but I show how to use the R LAG function to achieve the same result. The raw cumulative data is as follows:
The TestDB consists of four data files and the virtual file stats DMV was sampled every 10 minutes. The above shows 30 rows out of a total of 2304 rows (from 1st March to 4th March inclusive). I’m only interested in the ‘number of bytes read’ measure. The same analysis could be done for any of the other measure columns.
Here’s the R code to read from the table and chart the results:
# Use the install.packages lines if this is the first use of the libraries # install.packages("RODBC") # install.packages("dplyr") # install.packages("ggplot2") # install.packages("scales") library(RODBC) library(dplyr) library(ggplot2) library(scales) # Connect to the database dbhandle <- odbcDriverConnect('driver={SQL Server};server=MySQLServer;database=TestDB;trusted_connection=true') # Cast the sampletime as separate date and time columns res <- sqlQuery(dbhandle, "select CAST(sampletime AS date) AS DateValue,CAST(sampletime AS time) AS TimeValue,physical_name,num_of_bytes_read from virtualfilestats order by physical_name,sampletime") # Load results into a dplyr data frame df<-tbl_df(res) # Cast and format the time column df$TimeValue<-as.POSIXct(df$TimeValue,format = "%H:%M:%S") # Group the data frame by file name so the lag function works per file name. # Also, ensure the proper ordering within each group so the delta values are calculated correctly using lag df<-df%>%group_by(physical_name)%>%arrange(DateValue,TimeValue) # Use the lag function to calculate the deltas df<-df%>%mutate(num_of_bytes_read=num_of_bytes_read - lag(num_of_bytes_read)) # One chart per day with colours representing the different file names ggplot(df,aes(x=TimeValue,y=num_of_bytes_read,colour=physical_name)) + geom_line() + facet_grid(~DateValue) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_x_datetime(breaks=date_breaks("1 hour"), labels=date_format("%H:%M")) + scale_y_continuous(labels = comma) # One chart per day per file name. Independent y axis for each file name (scales = "free_y") ggplot(df,aes(x=TimeValue,y=num_of_bytes_read)) + geom_line() + facet_grid(physical_name~DateValue,scales="free_y") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_x_datetime(breaks=date_breaks("1 hour"), labels=date_format("%H:%M")) + scale_y_continuous(labels = comma) # One chart per day per file name. Same scale for every y axis ggplot(df,aes(x=TimeValue,y=num_of_bytes_read)) + geom_line() + facet_grid(physical_name~DateValue) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_x_datetime(breaks=date_breaks("1 hour"), labels=date_format("%H:%M")) + scale_y_continuous(labels = comma)
And here are the three charts that result from the ggplot commands:
The first one showing a chart for each day:
The next two are better. The only difference between the two is the independent y axis for the first:
You can see that the most reads occur between 7am and 7pm with more data being read from data file 1 and 2 (perhaps because of proportional fill with different file sizes).
R scripting is great for this type of DBA work. You have full control of the elements that appear on your visualisations using the ggplot2 library. Coupled with this the dplyr library contains a wealth of windowing functions to deal with the most common data trends seen by DBAs.
Programmatically generating charts
Posted in R on March 15, 2015
Excel is all well and good when you want to manually plot your data. You manually load your data, create a bar chart, format axes, change the alignment for labels, add a title and save as a file for your team to review. But in an age where everyone is using tools such as Powershell or Python for their administration tasks it makes sense to use a scripting tool to generate charts. After all, as data professionals we all know how to programmatically read from a CSV file or from an SQL Server table. This blog post shows just how easily and elegantly you can generate a publication quality chart using the ggplot2 library in R.
R is an established data analysis tool. There are a weath of libraries for all sorts of statistical analysis. Many primitives that exist in SQL Server have counterparts in R. For example, a table can be represented as a data frame. In terms of data frame operations (ie selecting, filtering, grouping, ordering, windowing functions) a library that implements most of the SQL type operations is the dplyr library. This blog post does not concern dplyr but one its authors (Hadley Wickham) has also authored the ggplot2 library. This library implements “the grammar of graphics” functionality. You programmatically generate your chart using the functions within this library. An example making use of this library now follows.
Suppose you have the following CSV file containing summary results for eight different SQL Server configurations. The raw data for each configuration consists of average transactions per second, end-to-end total batch time, disk service time for the transaction log volume, average seconds per write for the transaction log volume:
config,avgtps,totalbatchtime,disktime,avgdisk
instance2,375,7.33,1.27,0.414
instance3highcpuplan,794,4.08,1.00,0.333
instance4,391,8.01,1.28,0.426
instance5,296,10.6,1.76,0.585
instance6,351,8.95,1.99,0.658
instance2highcpuplan,822,4.04,0.9896,0.325
instance1san,589,5.42,1.10,0.365
instance1ssd,939,3.49,0.10,0.038
You want to visualise the effect of SSD on reducing total batch time compared to the effect of the high performance CPU power plan.
Assuming you have R and RStudio installed on your workstation this is all the code you need to generate a publication quality chart for the results:
# Install the packages if this is the first time they are being used by the user # install.packages('dplyr') # install.packages('ggplot2') # install.packages('reshape2') # Use the libraries library(dplyr) library(ggplot2) library(reshape2) # Read the raw data into a native R data frame # You could just as easily read the raw data from SQL Server using the RODBC library (see a future blog post) rawdata<-read.csv('c:\\temp\\data.csv') # Load the raw data into a dplyr data frame and display the "wide" format results df<-tbl_df(rawdata) df # The df data frame holds its contents in what's know as "wide" format. That is, with columns for the four measures. # Charting works most naturally with "narrow" format data so we unpivot the results using the reshape2 melt function (author Hadley Wickham again) plotdata <- melt(df, id = c("config"), measure = c("avgtps", "totalbatchtime","disktime","avgdisk")) # Display the "narrow" format results. This is format that ggplot will operate on plotdata # Plot the results ggplot(plotdata, aes(x = config, y = value)) + geom_bar(stat="identity") + facet_wrap(~variable,scales="free_y",ncol=1) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + geom_text(aes(label = value, y = value, vjust=-1), size = 3)
The output is as follows:
df in “wide” format:
Source: local data frame [8 x 5] config avgtps totalbatchtime disktime avgdisk 1 instance2 375 7.33 1.2700 0.414 2 instance3highcpuplan 794 4.08 1.0000 0.333 3 instance4 391 8.01 1.2800 0.426 4 instance5 296 10.60 1.7600 0.585 5 instance6 351 8.95 1.9900 0.658 6 instance2highcpuplan 822 4.04 0.9896 0.325 7 instance1san 589 5.42 1.1000 0.365 8 instance1ssd 939 3.49 0.1000 0.038 |
|
|
plotdata in “narrow” format:
config variable value 1 instance2 avgtps 375.0000 2 instance3highcpuplan avgtps 794.0000 3 instance4 avgtps 391.0000 4 instance5 avgtps 296.0000 5 instance6 avgtps 351.0000 6 instance2highcpuplan avgtps 822.0000 7 instance1san avgtps 589.0000 8 instance1ssd avgtps 939.0000 9 instance2 totalbatchtime 7.3300 10 instance3highcpuplan totalbatchtime 4.0800 11 instance4 totalbatchtime 8.0100 12 instance5 totalbatchtime 10.6000 13 instance6 totalbatchtime 8.9500 14 instance2highcpuplan totalbatchtime 4.0400 15 instance1san totalbatchtime 5.4200 16 instance1ssd totalbatchtime 3.4900 17 instance2 disktime 1.2700 18 instance3highcpuplan disktime 1.0000 19 instance4 disktime 1.2800 20 instance5 disktime 1.7600 21 instance6 disktime 1.9900 22 instance2highcpuplan disktime 0.9896 23 instance1san disktime 1.1000 24 instance1ssd disktime 0.1000 25 instance2 avgdisk 0.4140 26 instance3highcpuplan avgdisk 0.3330 27 instance4 avgdisk 0.4260 28 instance5 avgdisk 0.5850 29 instance6 avgdisk 0.6580 30 instance2highcpuplan avgdisk 0.3250 31 instance1san avgdisk 0.3650 32 instance1ssd avgdisk 0.0380 |
|
|
The bar charts making use of the facets command to show each measure as a separate chart and “free_y” scales to show independent y axes. Also, 90 degree rotation of x axis labels and measure value placed above each bar.
Visualising the results you can see that although instance1 with SSD had dramatically reduced transaction log latency compared to instance1 on SAN in terms of reduction of total batch time the bigger “bang for the buck” was seen for instance 2 going from a balance CPU power plan to a high performance CPU power plan. The total batch time went down by almost 3.3 seconds and the average transactions per second more than doubled. This is interesting but is not the point of this blog post.
Using the “grammar of graphics” approach implemented in the ggplot2 library you have complete programmatic control of the elements you wish to see presented and the style of each element. And the great thing is that you can send not just the image to your end users but the command you used to generate the image as well. This command is a piece of script so can be treated as source code that you can version control and maintain history. You can’t do any of this for an Excel based solution.
You don’t need to be an R coding expert to use the scripting language. The learning curve is far lower than that for Powershell. And since the language is a data language and the operations that are performed tend to be vector and matrix type operations it’s a natural fit for those professionals who are already familiar with set based programming with SQL.
With Microsoft’s acquisition of Revolution Analytics there’s a high chance that R will become more prominent in the toolset that Microsoft provide to SQL Server professionals.
Column store storage for perfmon data
Posted in Column store on January 11, 2015
Existing solutions for storing perfmon data in a SQL Server database typically use traditional b-tree type tables to house the data (for example, the PAL tool and RELOG.EXE when the destination is a SQL table). This article shows a very simple schema for storing perfmon data in a column store table.
Because perfmon data mostly consists of well defined repeating values it is an ideal candidate for compressed storage. Depending on the sampling period the amount of data being collected is likely to be very high so a high capacity data storage structure that can be queried very quickly is also desirable. Lastly, because the data is not modified after it is collected it appears as read-only data within the database. All these attributes are a perfect match for column store storage.
Here’s one possible schema that can be used for storing perfmon data in a column store table:
CREATE TABLE [dbo].[factCounterData]( [ServerID] [int] NOT NULL, [DateID] [int] NOT NULL, [TimeID] [int] NOT NULL, [CounterID] [int] NOT NULL, [CounterValue] [float] NOT NULL ) GO CREATE CLUSTERED COLUMNSTORE INDEX [cci_factCounterData] ON [dbo].[factCounterData] WITH (DROP_EXISTING = OFF) GO
The are various ways in which you may be collecting perfmon data. This article won’t discuss these options but will assume the data is present in a series of perfmon BLG files. These files can be imported into a database in a number of ways but the simplest method is to use the RELOG command. You create a system ODBC data source name (DSN) and specify this when running the command. For example,
RELOG.EXE \\MyServerName\d$\perflogs\admin\perfdata*.blg -f SQL -o SQL:PerfmonDb!Win2012r2
For this example, the machine that runs RELOG.EXE has an ODBC system DSN named PerfmonDb that specifies the target database.
On first execution RELOG.EXE creates these tables in the target database:
CREATE TABLE [dbo].[CounterData]( [GUID] [uniqueidentifier] NOT NULL, [CounterID] [int] NOT NULL, [RecordIndex] [int] NOT NULL, [CounterDateTime] [char](24) NOT NULL, [CounterValue] [float] NOT NULL, [FirstValueA] [int] NULL, [FirstValueB] [int] NULL, [SecondValueA] [int] NULL, [SecondValueB] [int] NULL, [MultiCount] [int] NULL, PRIMARY KEY CLUSTERED ( [GUID] ASC, [CounterID] ASC, [RecordIndex] ASC ) ) GO CREATE TABLE [dbo].[CounterDetails]( [CounterID] [int] IDENTITY(1,1) NOT NULL, [MachineName] [varchar](1024) NOT NULL, [ObjectName] [varchar](1024) NOT NULL, [CounterName] [varchar](1024) NOT NULL, [CounterType] [int] NOT NULL, [DefaultScale] [int] NOT NULL, [InstanceName] [varchar](1024) NULL, [InstanceIndex] [int] NULL, [ParentName] [varchar](1024) NULL, [ParentObjectID] [int] NULL, [TimeBaseA] [int] NULL, [TimeBaseB] [int] NULL, PRIMARY KEY CLUSTERED ( [CounterID] ASC ) ) GO CREATE TABLE [dbo].[DisplayToID]( [GUID] [uniqueidentifier] NOT NULL, [RunID] [int] NULL, [DisplayString] [varchar](1024) NOT NULL, [LogStartTime] [char](24) NULL, [LogStopTime] [char](24) NULL, [NumberOfRecords] [int] NULL, [MinutesToUTC] [int] NULL, [TimeZoneName] [char](32) NULL, PRIMARY KEY CLUSTERED ( [GUID] ASC ), UNIQUE NONCLUSTERED ( [DisplayString] ASC ) ) GO
Subsequent executions of the RELOG.EXE command append to the data that exists in the dbo.CounterData table. Sample data from each of the tables is as follows:
DisplayToID:
CounterDetails:
CounterData:
Now if you forget the column store table and simply query the three tables that RELOG.EXE creates you would have a short term workable solution for your perfmon reporting needs. However, as the volume of data in CounterData grows you will find that your storage requirements grow and your reporting queries take longer to run.
In data warehousing terms CounterData is the fact table. This will become very large. The two other tables are the dimension tables. These will never become really large. The column store solution will focus on the CounterData table.
The columns of interest in CounterData are GUID (link to DisplayToID dimension table which holds the DisplayString name – the server name), CounterID (link to CounterDetail dimension table which holds the counter names), CounterDateTime and CounterValue. The column store table will adopt best practice and store as many of the dimension links as integer columns. You can see from the structure of the factCounterData table at the start of this article that the GUID is not stored. Instead an integer ServerID is stored. Also, the CounterDateTime value is split into DateID and TimeID. Here’s the code that populates the column store table:
-- Dimension table to hold the GUID to ServerID mapping CREATE TABLE [dbo].[dimServer]( [ServerID] [int] IDENTITY(1,1) NOT NULL, [ServerName] [varchar](1024) NOT NULL, [GUID] [uniqueidentifier] NOT NULL, CONSTRAINT [PK_dimServer] PRIMARY KEY CLUSTERED ( [ServerID] ASC ) )
-- Routine to populate dimServer. Insert row if it's the first time we're seeing the DisplayString CREATE PROCEDURE [dbo].[procInsertDimServer] AS BEGIN SET NOCOUNT ON INSERT INTO dbo.dimServer (ServerName,GUID) SELECT DisplayString,GUID FROM dbo.DisplayToID d WHERE NOT EXISTS (SELECT * FROM dbo.dimServer s WHERE s.ServerName = d.DisplayString) END
CREATE PROCEDURE [dbo].[procInsertFactCounterData] AS BEGIN SET NOCOUNT ON -- Insert the server row if we've not inserted it during previous executions EXEC dbo.procInsertDimServer -- Now insert the fact data. Split the datetime value into separate date and time columns INSERT INTO dbo.factCounterData ( ServerID, DateID, TimeID, CounterID, CounterValue ) SELECT s.ServerID, CAST(REPLACE(LEFT(c.CounterDateTime,10),'-','') AS int), DATEDIFF(second,'00:00:00',SUBSTRING(c.CounterDateTime,12,8)), c.CounterID, c.CounterValue FROM dbo.CounterData c INNER JOIN dbo.dimServer s ON c.GUID = s.GUID END
-- CounterData is now just a staging table so can be cleared down before running RELOG.EXE CREATE PROCEDURE [dbo].[procClearDownCounterData] AS BEGIN SET NOCOUNT ON TRUNCATE TABLE dbo.CounterData END
With this code in place the typical execution steps are
- Execute dbo.procClearDownCounterData
- Run the RELOG.EXE command to populate CounterData, CounterDetails and DisplayToID
- Execute dbo.procInsertFactCounterData
In terms of storage costs, for one server capturing one hundred different performance counters every two seconds I found over 650 million rows in the factCounterData occupied only two gigabytes of database space. The compression is dependent on the number of repeating values but you should fine impressive storage reductions over conventional b-tree storage.
In terms of reporting, here’s a parameterised stored procedure that can be used to query a server for a specified counter. Start and end time ranges can be supplied when calling the procedure:
USE [perfmondb] GO /****** Object: StoredProcedure [dbo].[procGetCounterValues] Script Date: 13/01/2015 23:23:16 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE PROCEDURE [dbo].[procGetCounterValues] @ServerName varchar(1024), @ObjectName varchar(1024) = NULL, @CounterName varchar(1024) = NULL, @InstanceName varchar(1024) = NULL, @Start datetime = NULL, @End datetime = NULL AS BEGIN SET NOCOUNT ON DECLARE @StartDateID int = CONVERT(char(8),@Start,112) DECLARE @StartTimeID int = DATEDIFF(second,'00:00:00',CAST(@Start AS time)) DECLARE @EndDateID int = CONVERT(char(8),@End,112) DECLARE @EndTimeID int = DATEDIFF(second,'00:00:00',CAST(@End AS time)) SELECT s.ServerName, DATEADD(second,f.TimeID,CAST(f.DateID AS char(8))) AS CounterTime, c.ObjectName, c.CounterName, c.InstanceName, f.CounterValue FROM dbo.factCounterData f INNER JOIN dbo.dimServer s ON s.ServerID = f.ServerID INNER JOIN dbo.CounterDetails c ON c.CounterID = f.CounterID WHERE s.ServerName = @ServerName AND (@ObjectName IS NULL OR c.ObjectName = @ObjectName) AND (@CounterName IS NULL OR c.CounterName = @CounterName) AND (@InstanceName IS NULL OR c.InstanceName = @InstanceName) AND ( @StartDateID IS NULL OR f.DateID > @StartDateID OR (f.DateID = @StartDateID AND f.TimeID >= @StartTimeID) ) AND ( @EndDateID IS NULL OR f.DateID < @EndDateID OR (f.DateID = @EndDateID AND f.TimeID <= @EndTimeID) ) ORDER BY CounterTime END GO
An example call:
EXECUTE [dbo].[procGetCounterValues]
@ServerName = ‘Win2012r2’,
@CounterName = ‘% Processor Time’,
@InstanceName = ‘_Total’,
@Start = ‘20141130 16:46:41’,
@End = ‘20141130 16:46:59’
Output is this:
So ten rows out of a total of over one million rows for my data set. And the plan showed batch mode execution for the far right column store index scan:
Even querying the table when it held over 650 million rows I found my queries returned in less than three seconds.
So storage space and query response is optimised when using column store storage for performance monitor data.
Failed full backup can impact next differential backup
Posted in Backups on December 6, 2014
For any of your very large databases you are probably taking once a week full backups followed by daily differential backups. We know that differential change map (DCM) pages hold references to the extents that have changed since the last full backup and it is these that are examined when a differential backup is run. The DCM pages allow the changed extents to be quickly identified.
Using your maintenance routines you are probably finding that your full database backup takes a couple of hours to complete and backs up a terabyte worth of data. Your daily differential backups complete in minutes and backup a hundred gigabytes worth of data by the time the next full backup is due. This is all good and your maintenance fits into the windows you have for administration.
But what if you run the once a week full backup on a Sunday, it fails and you notice on a Monday morning when your maintenance window is closed ? It’s probably the case that in terms of the backup chain the full backup failure doesn’t matter. You keep on taking differential backups with these being based on the previous week’s full database backup. You have no problem in terms of recovery. But what you’ll be surprised to find is that the first and subsequent differential backups since the failed full backup will take substantially longer to complete particularly when compared to the previous week’s elapsed times. This will be the case even though the volume of data changes hitting your database hasn’t changed since the previous week. The differential backup duration may increase several fold compared to the previous week.
The reason for the increase in the differential backup time is that the backup process now examines ever single data page for changes. It won’t be until the next full backup successfully completes that the optimised mode of the differential backup will run (the mode that uses the DCM pages to identify changed extents). Here is a repro script illustrating the issue:
-- Create a test table in one of your testing databases -- Created so that one row occupies one page so that we can easily makes sense of the page numbers that are reported -- in the backup output CREATE TABLE testdb.dbo.mytable ( id int NOT NULL, val char(7000) NOT NULL ) GO -- Populate the table with 50,000 rows (50,000 data pages) SET NOCOUNT ON GO INSERT INTO testdb.dbo.mytable VALUES (1,'hello') GO 50000 -- Take a full backup BACKUP DATABASE testdb TO DISK='C:\SQL2014\Backup\testdb.bak' WITH INIT,NO_COMPRESSION GO /* Processed 50656 pages for database 'testdb', file 'testdb' on file 1. Processed 1 pages for database 'testdb', file 'testdb_log' on file 1. BACKUP DATABASE successfully processed 50657 pages in 4.865 seconds (81.346 MB/sec). */ -- You can see that roughly 50,000 pages were backed up -- Now that previous backup was successful. Let's see how quick the next differential backup is. -- It should be fast as there have been no data changes since the last full backup. BACKUP DATABASE testdb TO DISK='C:\SQL2014\Backup\testdbdiff.bak' WITH INIT,DIFFERENTIAL,NO_COMPRESSION GO /* Processed 56 pages for database 'testdb', file 'testdb' on file 1. Processed 1 pages for database 'testdb', file 'testdb_log' on file 1. BACKUP DATABASE WITH DIFFERENTIAL successfully processed 57 pages in 0.015 seconds (29.687 MB/sec). */ -- And, yes, it is very quick with only 56 database pages being examined. -- Now run the full backup again but interrupt it before completion to simulate an error you may get in your production system BACKUP DATABASE testdb TO DISK='C:\SQL2014\Backup\testdb.bak' WITH INIT,NO_COMPRESSION GO /* Msg 3204, Level 16, State 1, Line 69 The backup or restore was aborted. Msg 3013, Level 16, State 1, Line 69 BACKUP DATABASE is terminating abnormally. Query was cancelled by user. */ -- Now run the differential backup. It should be quick. BACKUP DATABASE testdb TO DISK='C:\SQL2014\Backup\testdbdiff.bak' WITH INIT,DIFFERENTIAL,NO_COMPRESSION GO /* Processed 50656 pages for database 'testdb', file 'testdb' on file 1. Processed 1 pages for database 'testdb', file 'testdb_log' on file 1. BACKUP DATABASE WITH DIFFERENTIAL successfully processed 50657 pages in 1.890 seconds (209.392 MB/sec). */
50,000+ database pages were examined for the differential backup that followed the failed full backup. That equates to every page in the database. For terabyte sized databases that will result in a very high elapsed time. It won’t be until the next successful full backup that the differential backup will return to its ‘optimised’ behaviour:
BACKUP DATABASE testdb TO DISK='C:\SQL2014\Backup\testdb.bak' WITH INIT,NO_COMPRESSION GO /* Processed 50656 pages for database 'testdb', file 'testdb' on file 1. Processed 2 pages for database 'testdb', file 'testdb_log' on file 1. BACKUP DATABASE successfully processed 50658 pages in 3.784 seconds (104.587 MB/sec). */ BACKUP DATABASE testdb TO DISK='C:\SQL2014\Backup\testdbdiff.bak' WITH INIT,DIFFERENTIAL,NO_COMPRESSION GO /* Processed 40 pages for database 'testdb', file 'testdb' on file 1. Processed 1 pages for database 'testdb', file 'testdb_log' on file 1. BACKUP DATABASE WITH DIFFERENTIAL successfully processed 41 pages in 0.016 seconds (20.019 MB/sec). */
These observations would suggest that the DCM page content is examined at the start of the full backup and cleared as the backup progresses. If the backup fails it seems that the DCM page content is not rolled back to its pre-full backup state. Subsequent differential backups somehow detect that the previous full backup has failed (some sort of flag in the boot page of DCM pages, perhaps) and run in a mode where they examines all database pages. These cases will show up as high reads in your virtual file stats, your perfmon disk reads and your SAN disk reads.
So for the differential backups of your big databases look out for these being impacted when your full database backups fail.
Reading extended event files using client side tools only
Posted in Tracing on November 15, 2014
For any serious tracing working you’ll be using extended events and sending the output to XEL target files. Various options exist when it comes to reading these files. You can open the files using the file open function in Management Studio. Or you can use the sys.fn_xe_file_target_read_file function to have your SQL Server database engine read the files.
If you want to read the XEL files programmatically and don’t want to be constrained by having a database connection you can use a .Net program and use the extended event reader functionality in Microsoft.SqlServer.XEvent.Linq.dll. For example, this post describes an approach using C#:
If you prefer to use Powershell here’s one solution:
# If doing this on a client that just has the client tools and no database engine then you have to use 32 bit Powershell as Microsoft # have not provided a 64 bit assembly for the client tools distribution. # If you are running this on a server that has the database engine then you don't need to use 32 bit Powershell as the 64 bit assembly # is present in the C:\Program Files\Microsoft SQL Server\120\Shared directory # For SQL 2014 you have to add the reference to Microsoft.SqlServer.XE.Core.dll. You don't need this for SQL 2012 Add-Type -Path 'C:\Program Files (x86)\Microsoft SQL Server\120\Tools\Binn\ManagementStudio\Extensions\Application\Microsoft.SqlServer.XE.Core.dll' Add-Type -Path 'C:\Program Files (x86)\Microsoft SQL Server\120\Tools\Binn\ManagementStudio\Extensions\Application\Microsoft.SqlServer.XEvent.Linq.dll' $events = New-Object Microsoft.SqlServer.XEvent.Linq.QueryableXEventData("c:\temp\test.xel") $events | Foreach-Object { $_.Actions | Where-Object { $_.Name -eq 'client_hostname' } } | Group-Object Value
This can be used if, for example, you’re tracing the sql_batch_completed event and have added the client_hostname action. The script outputs the counts for each host name in the XEL file.
As can be seen in the comments it’s unfortunate that for the client side only solution we have no choice but to use the 32 bit assemblies. This is still the case for the SQL 2014 client side of things.