SQL deployments on Kubernetes

Setup SQL Server on Kubernetes running on Mac. Uses minikube for a single node cluster.

Prerequisites: use Homebrew to install minikube and kubernetes-cli

brew cask install minikube
brew install kubernetes-cli

Start minikube specifying the hyper kit driver.

These commands to manually create a minikube sql instance:

minikube start –vm-driver=hyperkit

kubectl run mssql –image=microsoft/mssql-server-linux –port=1433 –env ACCEPT_EULA=Y –env SA_PASSWORD=P455w0rd

kubectl expose deployment mssql –type=LoadBalancer

minikube service mssql –url

These commands to remove the sql instance:

kubectl delete service mssql
kubectl delete deployment mssql

Using YAML files

Create the persistent volume claims:

kubectl apply -f ~/MinikubeApps/mssql1/mssql1pvc.yaml
kubectl apply -f ~/MinikubeApps/mssql2/mssql2pvc.yaml

kubectl describe pvc mssql1pvc
kubectl describe pvc mssql2pvc

Examine the persistent volumes using the VOLUME name:

kubectl get pv pvc-e5f45a8c-5ac1-11e9-93b8-16777a0068b3

Create secret to hold the sa password:

kubectl create secret generic mssql –from-literal=SA_PASSWORD=”P455w0rd123″

Create the deployments:

kubectl apply -f ~/MinikubeApps/mssql1/mssql1deployment.yaml
kubectl apply -f ~/MinikubeApps/mssql2/mssql2deployment.yaml

Load balancer service:

kubectl apply -f ~/MinikubeApps/mssql1/mssql1service.yaml
kubectl apply -f ~/MinikubeApps/mssql2/mssql2service.yaml

kubectl describe deployment mssql1deploy
kubectl describe deployment mssql2deploy

kubectl describe service mssql1service
kubectl describe service mssql2service

Connect using the IP and port number

minikube service mssql1service –url
minikube service mssql2service –url

Examine the database files

The persistent volumes are in the node /tmp folder so ssh into node:

minikube ssh

ls /tmp/hostpath-provisioner/

ls /tmp/hostpath-provisioner/pvc-70c9d10d-5ac2-11e9-93b8-16777a0068b3/data
ls /tmp/hostpath-provisioner/pvc-70c9d10d-5ac2-11e9-93b8-16777a0068b3/log

Examine logs

kubectl logs deployment/mssql1deploy
kubectl logs deployment/mssql2deploy

kubectl logs –tail 5 –selector app=mssql1app
kubectl logs –tail 5 –selector app=mssql2app

Referenced YAML files can be found here:



Leave a comment

Docker volumes on Mac

The Docker daemon on MacOS runs within a Hyperkit virtual machine. When you create a named volume using, for example,

docker create volume mssqlsystem

this volume in a Linux environment would actually be mapped to


On the MacOS host machine you won’t find this path. It resides within the Hyperkit virtual machine. To access the path you use the screen command:

screen ~/library/containers/com.docker.docker/data/vms/0/tty

Having entered the VM session you now change directory to the volume path

cd /var/lib/docker/volumes/

Within this you see a folder for each of the Docker volumes that you have created. For the example mentioned above I see persisted SQL files in




For the example I mapped the Docker mssqlsystem volume to the /var/opt/mssql directory within the SQL Server container.

You exit the screen session by typing CTRL-a followed either by d which detaches the session or k which terminates the session. To list any detached sessions you use

screen -ls

You can re-enter a detached session if you have only one of them using

screen -r


screen -r <yourlistedttyfullname>

if you have more than one listed name.

A good reference for screen is https://linuxize.com/post/how-to-use-linux-screen/


Contrast volumes with bind mounts which map directly to a folder on the host:

docker run --name testsqlcontainer --volume "/Users/macuser/Documents/ContainerShare":/MyShared --env 'ACCEPT_EULA=Y' --env 'SA_PASSWORD=P4$$w0rd' --publish 1400:1433 --detach mcr.microsoft.com/mssql/server:2019-CTP2.4-ubuntu


Leave a comment

OSTRESS doesn’t push SQL Server as hard as you think

Many folks recommend using the RML Utilities OSTRESS.EXE as a benchmarking/stress testing tool. The problem is there seems to be an internal bottleneck limiting the tool to make only about 500 SQL calls per second per thread. The following illustrates the problem.

Create this simple no-op stored procedure in your testing database


Now use OSTRESS to start one thread calling the stored procedure 20,000 times

ostress -Syoursqlserver -dtestingdb -E -Q”dbo.noopproc” -n1 -r20000


07/29/18 12:11:50.276 [0x00001ED0] I/O Completion manager started
07/29/18 12:11:50.277 [0x00001558] Attempting DOD5015 removal of [C:\Users\xxx\AppData\Local\Temp\output\ostress.log]
07/29/18 12:11:50.280 [0x00001558] OSTRESS, A Generic ODBC-Based Stress/Replay Utility.
Version 9.04.0051 built for x64.
Copyright ⌐ 1997-2014 Microsoft. All Rights Reserved
07/29/18 12:11:50.281 [0x00001558] Computer: xxx
07/29/18 12:11:50.282 [0x00001558] Base Module: C:\Program Files\Microsoft Corporation\RMLUtils\ostress.exe
07/29/18 12:11:50.282 [0x00001558] Process Id: 1400
07/29/18 12:11:50.283 [0x00001558] Active proc mask(0): 0x000000FF
07/29/18 12:11:50.283 [0x00001558] Architecture: 9
07/29/18 12:11:50.284 [0x00001558] Page size: 4096
07/29/18 12:11:50.284 [0x00001558] CPUs: 8
07/29/18 12:11:50.285 [0x00001558] Processor groups: 1
07/29/18 12:11:50.285 [0x00001558] Highest node: 0
07/29/18 12:11:50.286 [0x00001558] Proximity: 00 Node: 00
07/29/18 12:11:50.287 [0x00001558] —————————————
07/29/18 12:11:50.288 [0x00001558] Group: 0
07/29/18 12:11:50.288 [0x00001558] —————————————
07/29/18 12:11:50.288 [0x00001558] Processor(s): 0x00000003 Function units: Shared
07/29/18 12:11:50.289 [0x00001558] Processor(s): 0x0000000C Function units: Shared
07/29/18 12:11:50.290 [0x00001558] Processor(s): 0x00000030 Function units: Shared
07/29/18 12:11:50.290 [0x00001558] Package mask: 0x000000FF
07/29/18 12:11:50.291 [0x00001558] Processor(s): 0x000000C0 Function units: Shared
07/29/18 12:11:50.291 [0x00001558] Processor(s): 0x000000FF assigned to Numa node: 0
07/29/18 12:11:50.292 [0x00001558] Current time bias: -60 minutes -1.00 hours DST Adjusted GMT Daylight Time
07/29/18 12:11:50.292 [0x00001558] WARNING: -o parameter was not supplied; using default output path of <temporary directory>\output [C:\Users\xxx\AppData\Local\Temp\output\]
07/29/18 12:11:50.293 [0x00001558] Max threads setting: 10000
07/29/18 12:11:50.293 [0x00001558] Arguments:
07/29/18 12:11:50.294 [0x00001558] -Sxxx
07/29/18 12:11:50.294 [0x00001558] -dtestingdb
07/29/18 12:11:50.295 [0x00001558] -E
07/29/18 12:11:50.295 [0x00001558] -Qdbo.noopproc
07/29/18 12:11:50.295 [0x00001558] -n1
07/29/18 12:11:50.296 [0x00001558] -r20000
07/29/18 12:11:50.298 [0x00001558] Using language id (LCID): 1024 [English_United States.1252] for character formatting with NLS: 0x0006020F and Defined: 0x0006020F
07/29/18 12:11:50.299 [0x00001558] Default driver: SQL Server Native Client 11.0
07/29/18 12:11:50.299 [0x00001558] Attempting DOD5015 removal of [C:\Users\xxx\AppData\Local\Temp\output\query.out]
07/29/18 12:11:50.302 [0x00001558] Starting query execution…
07/29/18 12:11:50.304 [0x00001558] BETA: Custom CLR Expression support enabled.
07/29/18 12:11:50.305 [0x00001558] Creating 1 thread(s) to process queries
07/29/18 12:11:50.305 [0x00001558] Worker threads created, beginning execution…
07/29/18 12:12:28.335 [0x00001558] Total IO waits: 0, Total IO wait time: 0 (ms)
07/29/18 12:12:28.336 [0x00001558] OSTRESS exiting normally, elapsed time: 00:00:38.037

So 38 seconds to call the no-op stored procedure 20,000 times from one thread using OSTRESS .EXE. That’s not very fast.

Compare to a simple Powershell script making the same 20,000 calls.

$connstring = 'Server=yoursqlserver;Database=testingdb;Trusted_Connection=True'
$connection = New-Object System.Data.SqlClient.SqlConnection $connstring
$command = New-Object System.Data.SqlClient.SqlCommand('dbo.noopproc', $connection)
$command.CommandType = [System.Data.CommandType]'StoredProcedure'
$starttime = Get-Date

(1..20000) | ForEach-Object { $ret = $command.ExecuteNonQuery() }

Write-Host "Duration: " (New-TimeSpan -Start $starttime -End (Get-Date)).TotalSeconds

$connection.Close() | Out-Null
$command.Dispose() | Out-Null
$connection.Dispose() | Out-Null


Duration: 1.570418

So 1.57 seconds to make the same make the same 20,000 stored procedure calls.

Performance monitor showing the difference in batch requests per second


This was tested against a SQL 2017 Developer Edition instance patched to Cumulative Update 9.

The recommendation is to not use OSTRESS as your benchmarking/stress testing tool.


Leave a comment

Number of copies based retention for database backups

Ola Hallengren’s DatabaseBackup stored procedure implements retention of database backups based on the age of backups in hours. For example, if you specify the @CleanupTime parameter as 24 then when the next backup is run if the existing backup is older than 24 hours it will be deleted. If you want to retain two backups you would specify 48.

For certain cases a more natural way of specifying retention would be through number of copies. The attached version of Ola’s DatabaseBackup stored procedure implements this via the @CleanupCopies and @MirrorCleanupCopies parameters. The modified version of Ola’s stored procedure calls dbo.CleanupCopies to do the actual deletions:

CREATE PROCEDURE [dbo].[CleanupCopies]
@DirectoryPath nvarchar(4000),
@CleanupCopies int,
@NumberOfFiles int,
@NumberOfDirectories int


DECLARE @RetainFiles int = @CleanupCopies * @NumberOfFiles / @NumberOfDirectories,
@CurrentFileID int,
@CurrentFileName nvarchar(4000),
@ReturnCode int = 0

FileName varchar(4000) NULL,
Depth int NULL,
IsFile int NULL,
CleanupCompleted bit NULL,
CleanupOutput int NULL)

INSERT @CurrentFiles (FileName,Depth,IsFile)
EXEC [master].dbo.xp_dirtree @DirectoryPath,0,1

SET @ReturnCode = @@ERROR

IF @ReturnCode <> 0 RETURN @ReturnCode

DELETE FROM @CurrentFiles
SELECT TOP(@RetainFiles) ID
FROM @CurrentFiles

UPDATE @CurrentFiles
SET CleanupCompleted = 0,
CleanupOutput = 0

-- Do the deletions
WHILE EXISTS (SELECT * FROM @CurrentFiles WHERE CleanupCompleted = 0)
SELECT TOP 1 @CurrentFileID = ID,
@CurrentFileName = @DirectoryPath + N'\' + FileName
FROM @CurrentFiles
WHERE CleanupCompleted = 0

EXECUTE @ReturnCode = [master].dbo.xp_delete_file 0, @CurrentFileName

UPDATE @CurrentFiles
SET CleanupCompleted = 1,
CleanupOutput = @ReturnCode
WHERE ID = @CurrentFileID


SELECT @ReturnCode = ISNULL(MAX(CleanupOutput),0) FROM @CurrentFiles

RETURN @ReturnCode



Retention based on number of copies is not available for transaction log backups. Retention based on age is best for these.

It only caters for native backups (ie not implemented for LiteSpeed, Red Gate and Idera backups). The reason being the solution makes use of xp_delete_file which only handles native backups.


Leave a comment

Table driven procedure parameter values

This posting shows one approach for the lookup of stored procedure parameter values. The example use case will be the very popular SQL Server Maintenance Solution authored by Ola Hallengren (https://ola.hallengren.com/).

The solution consists of three parameterised stored procedures, one for database backups, one for database integrity checks and one for index optimisations. These procedures can be run as SQL Agent TSQL job steps. The stored procedure calls along with their parameter values can be encoded in these job steps. But maintenance can become a bit of a problem.

Suppose you have many databases on your SQL Server. You have multiple backup job steps with all but one of the backup stored procedure parameters being different. They differ in terms of which directory they write their backups to. Suppose you are low on storage so a new volume has been assigned. You now need to identify and update those job steps that are referring to the close to full volumes and change the directory name parameter to the newly presented volume. This can be done manually using the GUI or you could hack the msdb.dbo.sysjobsteps table updating the command column. As another example suppose you have multiple job stops implementing different index optimisation parameters. Again you could refer to the sysjobsteps table.

A far better approach would be to store the parameter values in a lookup table. You would have three lookup tables, one for each of the maintenance operations – backup, integrity check and index optimisation. You list all the variations of the parameters as separate rows in these three tables with each row having a descriptive tag name. This tag name column be the primary key for that table. So a given tag name would refer to all the parameter values for an execution of that operation.

Taking the dbo.DatabaseBackup stored procedure as an example these are the parameters (as of the time of writing):

AND SPECIFIC_NAME = 'DatabaseBackup'



The screenshot doesn’t show the complete list of parameters. There are 36 in total for this procedure. With the exception of @Databases (which will be dealt with separately later) we want to pivot so that PARAMETER_NAME values (with the leading @ symbol ignored) become our column names. In addition, the first column name will be named ParamTag and be the primary key. The second column will be a fixed value, the stored procedure name. Here’s a procedure that accomplishes this task:

/* First create a common 'parent' table for all three procedures. Storing the ParamTag value is sufficient for the data model */
/* This common table will be required when permitted operations for groups of databases is discussed */
CREATE TABLE dbo.OperationParams
OperationType sysname NOT NULL CONSTRAINT CK_OperationParams CHECK (OperationType IN ('DatabaseBackup','DatabaseIntegrityCheck','IndexOptimize')),
ParamTag nvarchar(255) NOT NULL,
CONSTRAINT PK_OperationParams PRIMARY KEY CLUSTERED (OperationType,ParamTag)

CREATE PROCEDURE [dbo].[CreateOperationParamTable]
    @procname sysname

DECLARE @sqlstmt nvarchar(max)

-- Build the column list first
SET @sqlstmt = (
    SELECT ',[' + STUFF(PARAMETER_NAME,1,1,'') + '] ' + DATA_TYPE +
    + ' NULL'
    AND SPECIFIC_NAME = @procname
    AND PARAMETER_NAME <> '@Databases'
    FOR XML PATH('')

-- Add the create table and constraints
DECLARE @quotedtablename sysname = QUOTENAME(@procname + 'Params')
SET @sqlstmt = N'IF OBJECT_ID(''dbo.' + @quotedtablename + ''') IS NOT NULL DROP TABLE dbo.' + @quotedtablename + ';' + CHAR(13) + CHAR(10) +
    N'CREATE TABLE dbo.' + @quotedtablename + ' (' + CHAR(13) + CHAR(10) + 
    N'[ParamTag] nvarchar(255) NOT NULL CONSTRAINT ' + STUFF(@quotedtablename,2,0,'PK_') + N' PRIMARY KEY CLUSTERED,' + CHAR(13) + CHAR(10) + 
    N'[OperationType] sysname NOT NULL CONSTRAINT ' + STUFF(@quotedtablename,2,0,'CK_') + N' CHECK ([OperationType] IN (' + QUOTENAME(@procname,CHAR(39)+CHAR(39)) + N')) CONSTRAINT ' + STUFF(@quotedtablename,2,0,'DF_') + N' DEFAULT (' + QUOTENAME(@procname,CHAR(39)+CHAR(39)) + N')' + CHAR(13) + CHAR(10) + 
    @sqlstmt + CHAR(13) + CHAR(10) + 
    N',CONSTRAINT ' + STUFF(@quotedtablename,2,0,'FK_') + N' FOREIGN KEY ([OperationType],[ParamTag]) REFERENCES dbo.OperationParams ([OperationType],[ParamTag]))'

-- Run it
PRINT 'Running:' + CHAR(13) + CHAR(10) + @sqlstmt
EXEC sp_executesql @sqlstmt


The procedure is then called as follows to create the Params tables for each of Ola’s three stored procedures:

EXEC [dbo].[CreateOperationParamTable] 'DatabaseBackup'
EXEC [dbo].[CreateOperationParamTable] 'DatabaseIntegrityCheck'
EXEC [dbo].[CreateOperationParamTable] 'IndexOptimize'

/* Insert some param values */
INSERT INTO dbo.OperationParams VALUES ('DatabaseBackup','FullBackups')
INSERT INTO dbo.DatabaseBackupParams (ParamTag,Directory,BackupType,[Compress],[LogToTable]) values ('FullBackups','c:\dbadmin\backup','FULL','Y','Y')
INSERT INTO dbo.OperationParams VALUES ('DatabaseIntegrityCheck','CheckDBs')
INSERT INTO dbo.DatabaseIntegrityCheckParams (ParamTag,CheckCommands,[LogToTable]) values ('CheckDBs','CHECKDB','Y')
INSERT INTO dbo.OperationParams VALUES ('IndexOptimize','ReindexReorgs')
INSERT INTO dbo.IndexOptimizeParams (ParamTag,FragmentationLow,[LogToTable]) values ('ReindexReorgs','INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE','Y')

SELECT * FROM dbo.OperationParams
SELECT * FROM dbo.DatabaseBackupParams
SELECT * FROM dbo.DatabaseIntegrityCheckParams
SELECT * FROM dbo.IndexOptimizeParams

This results in creation of the DatabaseBackupParams table, the DatabaseIntegrityCheckParams table and the IndexOptimizeParams table:


With the Databases parameter removed we now have parameter tagged rows that can be applied to one or more groups of databases. Ola’s solution allows the Databases parameter to accept keywords such as USER_DATABASES, SYSTEM_DATABASES or a comma separated list of database names. Having omitted the Databases column from the above three tables we know need these tables to allow us to define groups of databases and the permitted operations for the groups:

CREATE TABLE [dbo].[DatabaseGroupParent] (
    [GroupName] [nvarchar](40) NOT NULL,
CREATE TABLE [dbo].[DatabaseGroup] (
    [GroupName] [nvarchar](40) NOT NULL CONSTRAINT [FK_DatabaseGroup] FOREIGN KEY REFERENCES [dbo].[DatabaseGroupParent] ([GroupName]),
    [DatabaseName] [sysname] NOT NULL,  
    CONSTRAINT [PK_DatabaseGroup] PRIMARY KEY CLUSTERED ([GroupName] ASC,[DatabaseName] ASC) 

CREATE TABLE [dbo].[DatabaseGroupOperations] (     
    [GroupName] [nvarchar](40) NOT NULL CONSTRAINT [FK_DatabaseGroupOperations_1] FOREIGN KEY REFERENCES [dbo].[DatabaseGroupParent] ([GroupName]),
    [OperationType] [sysname] NOT NULL,     
    [ParamTag] [nvarchar](255) NOT NULL, 
    CONSTRAINT [PK_DatabaseGroupOperations] PRIMARY KEY CLUSTERED ([GroupName] ASC,[OperationType] ASC,[ParamTag] ASC),
    CONSTRAINT [FK_DatabaseGroupOperations_2] FOREIGN KEY ([OperationType], [ParamTag]) REFERENCES [dbo].[OperationParams] ([OperationType], [ParamTag])

One issue with Ola’s solution is the default parameters for the IndexOptimize @FragmentationMedium and @FragmentationHigh parameters. If you don’t specify these parameters the values used are ‘INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE’ and ‘INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE’ respectively. This compares to @FragmentationLow where if you do not specify this parameter the value used is NULL which signifies that reindex/reorganise is not used. NULL specified for any of these parameters specifies that reindex/reorganise is not used. In this table driven solution NULL is translated to DEFAULT when the store procedures are called. This presents problems for @FragmentationMedium and @FragmentationHigh. It’s not a problem for @FragmentationLow as NULL is the DEFAULT value. The workaround for @FragmentationMedium and @FragmentationHigh is to use a user defined keyword – for example, NO_INDEX_MAINT. Rather than hardcoding this string in the procedures the solution stores the following table:

CREATE TABLE dbo.OperationParamsNullKeyword
OperationType sysname NOT NULL,
ParamName sysname NOT NULL,
NullKeyword nvarchar(max) NOT NULL,
CONSTRAINT PK_OperationParamsNullKeyword PRIMARY KEY CLUSTERED (OperationType,ParamName)

INSERT INTO dbo.OperationParamsNullKeyword

In effect, the processing stored procedure will translate these keywords to ‘NULL’ and not ‘DEFAULT’. Ola has example ‘C. Update statistics on all user databases’ on his Index and Statistics page. The IndexOptimizeParams table containing

FragmentationLow    FragmentationMedium    FragmentationHigh         UpdateStatistics
NULL                            NO_INDEX_MAINT       NO_INDEX_MAINT       ALL

will be translated to

EXECUTE dbo.IndexOptimize
@Databases = ‘USER_DATABASES’,
@FragmentationLow = NULL,
@FragmentationMedium = NULL,
@FragmentationHigh = NULL,
@UpdateStatistics = ‘ALL’


Now on to the wrapper stored procedure that for a specified operation (backup, integrity check or index optimize), a specified group name or specific database name(s) or one of Ola’s database keywords and for a specific operation parameter tag runs Ola’s actual stored procedure. Here’s the stored procedure that does that:

/* Helper stored procedure that creates a string such as ",@Directory='c:\backups',@BackupType='FULL',@Verify='Y'" depending on your actual values in the corresponding Params table */

CREATE PROCEDURE [dbo].[GetOperationParams]
@Operation sysname,
@OperationParamTag nvarchar(255),
@OperationParams nvarchar(max) OUTPUT

DECLARE @sqlstmt nvarchar(max)

SET @sqlstmt = STUFF(
    'CASE WHEN ['+COLUMN_NAME+']='''+NullKeyword+''' THEN ''NULL'' ELSE NULL END,'
    LEFT OUTER JOIN dbo.OperationParamsNullKeyword
    ON OperationType = @Operation
    AND ParamName = COLUMN_NAME
    AND TABLE_NAME = @Operation + N'Params'
    FOR XML PATH('')

SET @sqlstmt = N'SELECT @OperationParams = ' + @sqlstmt + N' FROM dbo.' + QUOTENAME(@Operation + 'Params') + ' WHERE ParamTag = ' + QUOTENAME(@OperationParamTag,CHAR(39)+CHAR(39))

EXEC sp_executesql @sqlstmt,N'@OperationParams varchar(max) OUTPUT',@OperationParams = @OperationParams OUTPUT



/* The wrapper stored procedure that you would use in your SQL Agent TSQL job step */

CREATE PROCEDURE [dbo].[RunOperation]
@Operation sysname,
@DatabaseGroup nvarchar(40) = NULL,
@DatabaseName sysname = NULL,
@OperationParamTag nvarchar(255)

DECLARE @OperationParams nvarchar(max) = NULL

EXEC dbo.GetOperationParams @Operation = @Operation,@OperationParamTag = @OperationParamTag,@OperationParams = @OperationParams OUTPUT

DECLARE @Databases nvarchar(max) = NULL

IF @DatabaseName IS NOT NULL 
    SET @Databases = @DatabaseName
    SET @Databases = STUFF(
        SELECT ',' + dg.DatabaseName
        FROM dbo.DatabaseGroup dg
        WHERE dg.GroupName = @DatabaseGroup
        /* Check that the group operation is permitted */
        AND EXISTS (
            SELECT 1 FROM dbo.DatabaseGroupOperations dgo
            WHERE dgo.GroupName = @DatabaseGroup
            AND dgo.OperationType = @Operation
            AND dgo.ParamTag = @OperationParamTag
        ORDER BY dg.DatabaseName
        FOR XML PATH('')

SET @Databases = '@Databases=''' + @Databases + ''''
DECLARE @sqlstmt nvarchar(max)

SET @sqlstmt = 'EXEC dbo.' + QUOTENAME(@Operation) + ' ' + @Databases + @OperationParams

PRINT 'Running: ' +  @sqlstmt

EXEC sp_executesql @sqlstmt




An example wrapper call would be

/* Create a database group */
INSERT INTO dbo.DatabaseGroupParent
VALUES ('GroupUserDBs1')

INSERT INTO dbo.DatabaseGroup

/* Populate the permitted group operations */
INSERT INTO dbo.DatabaseGroupOperations VALUES ('GroupUserDBs1','DatabaseBackup','FullBackups')

/* Now the actual call to backup the database group */
EXEC dbo.RunOperation @Operation = 'DatabaseBackup',@DatabaseGroup='GroupUserDBs1',@OperationParamTag = 'FullBackups'


And the results would be

Running: EXEC dbo.[DatabaseBackup] @Databases='DBAdmin,TestDB',@Directory='c:\dbadmin\backup',@BackupType='FULL',@Verify=DEFAULT,@CleanupTime=DEFAULT,@CleanupMode=DEFAULT,@Compress='Y',@CopyOnly=DEFAULT,@ChangeBackupType=DEFAULT,@BackupSoftware=DEFAULT,@CheckSum=DEFAULT,@BlockSize=DEFAULT,@BufferCount=DEFAULT,@MaxTransferSize=DEFAULT,@NumberOfFiles=DEFAULT,@CompressionLevel=DEFAULT,@Description=DEFAULT,@Threads=DEFAULT,@Throttle=DEFAULT,@Encrypt=DEFAULT,@EncryptionAlgorithm=DEFAULT,@ServerCertificate=DEFAULT,@ServerAsymmetricKey=DEFAULT,@EncryptionKey=DEFAULT,@ReadWriteFileGroups=DEFAULT,@OverrideBackupPreference=DEFAULT,@NoRecovery=DEFAULT,@URL=DEFAULT,@Credential=DEFAULT,@MirrorDirectory=DEFAULT,@MirrorCleanupTime=DEFAULT,@MirrorCleanupMode=DEFAULT,@AvailabilityGroups=DEFAULT,@Updateability=DEFAULT,@LogToTable='Y',@Execute=DEFAULT
Date and time: 2016-11-12 21:32:44
Server: WIN2016
Version: 13.0.1601.5
Edition: Developer Edition (64-bit)
Procedure: [DBAdmin].[dbo].[DatabaseBackup]
Parameters: @Databases = 'DBAdmin,TestDB', @Directory = 'c:\dbadmin\backup', @BackupType = 'FULL', @Verify = 'N', @CleanupTime = NULL, @CleanupMode = 'AFTER_BACKUP', @Compress = 'Y', @CopyOnly = 'N', @ChangeBackupType = 'N', @BackupSoftware = NULL, @CheckSum = 'N', @BlockSize = NULL, @BufferCount = NULL, @MaxTransferSize = NULL, @NumberOfFiles = NULL, @CompressionLevel = NULL, @Description = NULL, @Threads = NULL, @Throttle = NULL, @Encrypt = 'N', @EncryptionAlgorithm = NULL, @ServerCertificate = NULL, @ServerAsymmetricKey = NULL, @EncryptionKey = NULL, @ReadWriteFileGroups = 'N', @OverrideBackupPreference = 'N', @NoRecovery = 'N', @URL = NULL, @Credential = NULL, @MirrorDirectory = NULL, @MirrorCleanupTime = NULL, @MirrorCleanupMode = 'AFTER_BACKUP', @AvailabilityGroups = NULL, @Updateability = 'ALL', @LogToTable = 'Y', @Execute = 'Y'
Source: https://ola.hallengren.com
Date and time: 2016-11-12 21:32:44
Database: [DBAdmin]
Status: ONLINE
Standby: No
Updateability: READ_WRITE
User access: MULTI_USER
Is accessible: Yes
Recovery model: SIMPLE
Encrypted: No
Differential base LSN: 34000000250700126
Last log backup LSN: N/A
Date and time: 2016-11-12 21:32:44
Command: DECLARE @ReturnCode int EXECUTE @ReturnCode = [master].dbo.xp_create_subdir N'c:\dbadmin\backup\WIN2016\DBAdmin\FULL' IF @ReturnCode <> 0 RAISERROR('Error creating directory.', 16, 1)
Outcome: Succeeded
Duration: 00:00:00
Date and time: 2016-11-12 21:32:44
Date and time: 2016-11-12 21:32:44
Command: BACKUP DATABASE [DBAdmin] TO DISK = N'c:\dbadmin\backup\WIN2016\DBAdmin\FULL\WIN2016_DBAdmin_FULL_20161112_213244.bak' WITH NO_CHECKSUM, COMPRESSION
Processed 488 pages for database 'DBAdmin', file 'DBAdmin' on file 1.
Processed 7 pages for database 'DBAdmin', file 'DBAdmin_log' on file 1.
BACKUP DATABASE successfully processed 495 pages in 0.026 seconds (148.512 MB/sec).
Outcome: Succeeded
Duration: 00:00:00
Date and time: 2016-11-12 21:32:44
Date and time: 2016-11-12 21:32:44
Database: [TestDB]
Status: ONLINE
Standby: No
Updateability: READ_WRITE
User access: MULTI_USER
Is accessible: Yes
Recovery model: SIMPLE
Encrypted: No
Differential base LSN: 34000011863500037
Last log backup LSN: N/A
Date and time: 2016-11-12 21:32:44
Command: DECLARE @ReturnCode int EXECUTE @ReturnCode = [master].dbo.xp_create_subdir N'c:\dbadmin\backup\WIN2016\TestDB\FULL' IF @ReturnCode <> 0 RAISERROR('Error creating directory.', 16, 1)
Outcome: Succeeded
Duration: 00:00:00
Date and time: 2016-11-12 21:32:44
Date and time: 2016-11-12 21:32:44
Command: BACKUP DATABASE [TestDB] TO DISK = N'c:\dbadmin\backup\WIN2016\TestDB\FULL\WIN2016_TestDB_FULL_20161112_213244.bak' WITH NO_CHECKSUM, COMPRESSION
Processed 4424 pages for database 'TestDB', file 'TestDB' on file 1.
Processed 2 pages for database 'TestDB', file 'TestDB_log' on file 1.
BACKUP DATABASE successfully processed 4426 pages in 0.168 seconds (205.778 MB/sec).
Outcome: Succeeded
Duration: 00:00:00
Date and time: 2016-11-12 21:32:44
Date and time: 2016-11-12 21:32:44


The table driven approach allows you to easily report out which options your solution is using as well as allowing you to change these options without having to make any changes to the SQL Agent jobs themselves. You can apply these options to user defined groups of databases.


Leave a comment

Charting virtual file stats using R

This blog post shows how to retrieve data using R and chart the results in a number of ways. The example data consists of virtual file stats. The source is the virtual file stats DMV so the ‘measures’ are ever increasing values since the last restart of the SQL Server. The deltas could be calculated using SQL windowing functions but I show how to use the R LAG function to achieve the same result. The raw cumulative data is as follows:


The TestDB consists of four data files and the virtual file stats DMV was sampled every 10 minutes. The above shows 30 rows out of a total of 2304 rows (from 1st March to 4th March inclusive). I’m only interested in the ‘number of bytes read’ measure. The same analysis could be done for any of the other measure columns.

Here’s the R code to read from the table and chart the results:

# Use the install.packages lines if this is the first use of the libraries
# install.packages("RODBC")
# install.packages("dplyr")
# install.packages("ggplot2")
# install.packages("scales")


# Connect to the database
dbhandle <- odbcDriverConnect('driver={SQL Server};server=MySQLServer;database=TestDB;trusted_connection=true')

# Cast the sampletime as separate date and time columns
res <- sqlQuery(dbhandle, "select CAST(sampletime AS date) AS DateValue,CAST(sampletime AS time) AS TimeValue,physical_name,num_of_bytes_read from virtualfilestats order by physical_name,sampletime")

# Load results into a dplyr data frame

# Cast and format the time column
df$TimeValue<-as.POSIXct(df$TimeValue,format = "%H:%M:%S")

# Group the data frame by file name so the lag function works per file name.
# Also, ensure the proper ordering within each group so the delta values are calculated correctly using lag

# Use the lag function to calculate the deltas
df<-df%>%mutate(num_of_bytes_read=num_of_bytes_read - lag(num_of_bytes_read))

# One chart per day with colours representing the different file names
ggplot(df,aes(x=TimeValue,y=num_of_bytes_read,colour=physical_name)) + geom_line() + facet_grid(~DateValue) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_x_datetime(breaks=date_breaks("1 hour"), labels=date_format("%H:%M")) + scale_y_continuous(labels = comma)

# One chart per day per file name. Independent y axis for each file name (scales = "free_y")
ggplot(df,aes(x=TimeValue,y=num_of_bytes_read)) + geom_line() + facet_grid(physical_name~DateValue,scales="free_y") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_x_datetime(breaks=date_breaks("1 hour"), labels=date_format("%H:%M")) + scale_y_continuous(labels = comma)

# One chart per day per file name. Same scale for every y axis
ggplot(df,aes(x=TimeValue,y=num_of_bytes_read)) + geom_line() + facet_grid(physical_name~DateValue) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_x_datetime(breaks=date_breaks("1 hour"), labels=date_format("%H:%M")) + scale_y_continuous(labels = comma)

And here are the three charts that result from the ggplot commands:

The first one showing a chart for each day:


The next two are better. The only difference between the two is the independent y axis for the first:



You can see that the most reads occur between 7am and 7pm with more data being read from data file 1 and 2 (perhaps because of proportional fill with different file sizes).

R scripting is great for this type of DBA work. You have full control of the elements that appear on your visualisations using the ggplot2 library. Coupled with this the dplyr library contains a wealth of windowing functions to deal with the most common data trends seen by DBAs.


Leave a comment

Programmatically generating charts

Excel is all well and good when you want to manually plot your data. You manually load your data, create a bar chart, format axes, change the alignment for labels, add a title and save as a file for your team to review. But in an age where everyone is using tools such as Powershell or Python for their administration tasks it makes sense to use a scripting tool to generate charts. After all, as data professionals we all know how to programmatically read from a CSV file or from an SQL Server table. This blog post shows just how easily and elegantly you can generate a publication quality chart using the ggplot2 library in R.

R is an established data analysis tool. There are a weath of libraries for all sorts of statistical analysis. Many primitives that exist in SQL Server have counterparts in R. For example, a table can be represented as a data frame. In terms of data frame operations (ie selecting, filtering, grouping, ordering, windowing functions) a library that implements most of the SQL type operations is the dplyr library. This blog post does not concern dplyr but one its authors (Hadley Wickham) has also authored the ggplot2 library. This library implements “the grammar of graphics” functionality. You programmatically generate your chart using the functions within this library. An example making use of this library now follows.

Suppose you have the following CSV file containing summary results for eight different SQL Server configurations. The raw data for each configuration consists of average transactions per second, end-to-end total batch time, disk service time for the transaction log volume, average seconds per write for the transaction log volume:


You want to visualise the effect of SSD on reducing total batch time compared to the effect of the high performance CPU power plan.

Assuming you have R and RStudio installed on your workstation this is all the code you need to generate a publication quality chart for the results:

# Install the packages if this is the first time they are being used by the user

# install.packages('dplyr')
# install.packages('ggplot2')
# install.packages('reshape2')

# Use the libraries

# Read the raw data into a native R data frame
# You could just as easily read the raw data from SQL Server using the RODBC library (see a future blog post)

# Load the raw data into a dplyr data frame and display the "wide" format results

# The df data frame holds its contents in what's know as "wide" format. That is, with columns for the four measures.
# Charting works most naturally with "narrow" format data so we unpivot the results using the reshape2 melt function (author Hadley Wickham again)
plotdata <- melt(df, id = c("config"), measure = c("avgtps", "totalbatchtime","disktime","avgdisk"))

# Display the "narrow" format results. This is format that ggplot will operate on

# Plot the results
ggplot(plotdata, aes(x = config, y = value)) +
geom_bar(stat="identity") +
facet_wrap(~variable,scales="free_y",ncol=1) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
geom_text(aes(label = value, y = value, vjust=-1), size = 3)

The output is as follows:

df in “wide” format:

Source: local data frame [8 x 5]

                config avgtps totalbatchtime disktime avgdisk
1            instance2    375           7.33   1.2700   0.414
2 instance3highcpuplan    794           4.08   1.0000   0.333
3            instance4    391           8.01   1.2800   0.426
4            instance5    296          10.60   1.7600   0.585
5            instance6    351           8.95   1.9900   0.658
6 instance2highcpuplan    822           4.04   0.9896   0.325
7         instance1san    589           5.42   1.1000   0.365
8         instance1ssd    939           3.49   0.1000   0.038

plotdata in “narrow” format:

                 config       variable    value
1             instance2         avgtps 375.0000
2  instance3highcpuplan         avgtps 794.0000
3             instance4         avgtps 391.0000
4             instance5         avgtps 296.0000
5             instance6         avgtps 351.0000
6  instance2highcpuplan         avgtps 822.0000
7          instance1san         avgtps 589.0000
8          instance1ssd         avgtps 939.0000
9             instance2 totalbatchtime   7.3300
10 instance3highcpuplan totalbatchtime   4.0800
11            instance4 totalbatchtime   8.0100
12            instance5 totalbatchtime  10.6000
13            instance6 totalbatchtime   8.9500
14 instance2highcpuplan totalbatchtime   4.0400
15         instance1san totalbatchtime   5.4200
16         instance1ssd totalbatchtime   3.4900
17            instance2       disktime   1.2700
18 instance3highcpuplan       disktime   1.0000
19            instance4       disktime   1.2800
20            instance5       disktime   1.7600
21            instance6       disktime   1.9900
22 instance2highcpuplan       disktime   0.9896
23         instance1san       disktime   1.1000
24         instance1ssd       disktime   0.1000
25            instance2        avgdisk   0.4140
26 instance3highcpuplan        avgdisk   0.3330
27            instance4        avgdisk   0.4260
28            instance5        avgdisk   0.5850
29            instance6        avgdisk   0.6580
30 instance2highcpuplan        avgdisk   0.3250
31         instance1san        avgdisk   0.3650
32         instance1ssd        avgdisk   0.0380


The bar charts making use of the facets command to show each measure as a separate chart and “free_y” scales to show independent y axes.  Also, 90 degree rotation of x axis labels and measure value placed above each bar.


Example ggplot chart


Visualising the results you can see that although instance1 with SSD had dramatically reduced transaction log latency compared to instance1 on SAN in terms of reduction of total batch time the bigger “bang for the buck” was seen for instance 2 going from a balance CPU power plan to a high performance CPU power plan. The total batch time went down by almost 3.3 seconds and the average transactions per second more than doubled. This is interesting but is not the point of this blog post.

Using the “grammar of graphics” approach implemented in the ggplot2 library you have complete programmatic control of the elements you wish to see presented and the style of each element. And the great thing is that you can send not just the image to your end users but the command you used to generate the image as well. This command is a piece of script so can be treated as source code that you can version control and maintain history. You can’t do any of this for an Excel based solution.

You don’t need to be an R coding expert to use the scripting language. The learning curve is far lower than that for Powershell. And since the language is a data language and the operations that are performed tend to be vector and matrix type operations it’s a natural fit for those professionals who are already familiar with set based programming with SQL.

With Microsoft’s acquisition of Revolution Analytics there’s a high chance that R will become more prominent in the toolset that Microsoft provide to SQL Server professionals.


Leave a comment