Archive for January, 2015

Column store storage for perfmon data

Existing solutions for storing perfmon data in a SQL Server database typically use traditional b-tree type tables to house the data (for example, the PAL tool and RELOG.EXE when the destination is a SQL table). This article shows a very simple schema for storing perfmon data in a column store table.

Because perfmon data mostly consists of well defined repeating values it is an ideal candidate for compressed storage. Depending on the sampling period the amount of data being collected is likely to be very high so a high capacity data storage structure that can be queried very quickly is also desirable. Lastly, because the data is not modified after it is collected it appears as read-only data within the database. All these attributes are a perfect match for column store storage.

Here’s one possible schema that can be used for storing perfmon data in a column store table:

CREATE TABLE [dbo].[factCounterData](
[ServerID] [int] NOT NULL,
[DateID] [int] NOT NULL,
[TimeID] [int] NOT NULL,
[CounterID] [int] NOT NULL,
[CounterValue] [float] NOT NULL
)
GO

CREATE CLUSTERED COLUMNSTORE INDEX [cci_factCounterData] ON [dbo].[factCounterData] WITH (DROP_EXISTING = OFF)
GO

The are various ways in which you may be collecting perfmon data. This article won’t discuss these options but will assume the data is present in a series of perfmon BLG files. These files can be imported into a database in a number of ways but the simplest method is to use the RELOG command. You create a system ODBC data source name (DSN) and specify this when running the command. For example,

RELOG.EXE \\MyServerName\d$\perflogs\admin\perfdata*.blg -f SQL -o SQL:PerfmonDb!Win2012r2

For this example, the machine that runs RELOG.EXE has an ODBC system DSN named PerfmonDb that specifies the target database.

On first execution RELOG.EXE creates these tables in the target database:

CREATE TABLE [dbo].[CounterData](
[GUID] [uniqueidentifier] NOT NULL,
[CounterID] [int] NOT NULL,
[RecordIndex] [int] NOT NULL,
[CounterDateTime] [char](24) NOT NULL,
[CounterValue] [float] NOT NULL,
[FirstValueA] [int] NULL,
[FirstValueB] [int] NULL,
[SecondValueA] [int] NULL,
[SecondValueB] [int] NULL,
[MultiCount] [int] NULL,
PRIMARY KEY CLUSTERED
(
[GUID] ASC,
[CounterID] ASC,
[RecordIndex] ASC
)
)
GO

CREATE TABLE [dbo].[CounterDetails](
[CounterID] [int] IDENTITY(1,1) NOT NULL,
[MachineName] [varchar](1024) NOT NULL,
[ObjectName] [varchar](1024) NOT NULL,
[CounterName] [varchar](1024) NOT NULL,
[CounterType] [int] NOT NULL,
[DefaultScale] [int] NOT NULL,
[InstanceName] [varchar](1024) NULL,
[InstanceIndex] [int] NULL,
[ParentName] [varchar](1024) NULL,
[ParentObjectID] [int] NULL,
[TimeBaseA] [int] NULL,
[TimeBaseB] [int] NULL,
PRIMARY KEY CLUSTERED
(
[CounterID] ASC
)
)
GO

CREATE TABLE [dbo].[DisplayToID](
[GUID] [uniqueidentifier] NOT NULL,
[RunID] [int] NULL,
[DisplayString] [varchar](1024) NOT NULL,
[LogStartTime] [char](24) NULL,
[LogStopTime] [char](24) NULL,
[NumberOfRecords] [int] NULL,
[MinutesToUTC] [int] NULL,
[TimeZoneName] [char](32) NULL,
PRIMARY KEY CLUSTERED
(
[GUID] ASC
),
UNIQUE NONCLUSTERED
(
[DisplayString] ASC
)
)
GO

Subsequent executions of the RELOG.EXE command append to the data that exists in the dbo.CounterData table. Sample data from each of the tables is as follows:

DisplayToID:

colstoreperfmon1

CounterDetails:

colstoreperfmon2

CounterData:

colstoreperfmon3

Now if you forget the column store table and simply query the three tables that RELOG.EXE creates you would have a short term workable solution for your perfmon reporting needs. However, as the volume of data in CounterData grows you will find that your storage requirements grow and your reporting queries take longer to run.

In data warehousing terms CounterData is the fact table. This will become very large. The two other tables are the dimension tables. These will never become really large. The column store solution will focus on the CounterData table.

The columns of interest in CounterData are GUID (link to DisplayToID dimension table which holds the DisplayString name – the server name), CounterID (link to CounterDetail dimension table which holds the counter names), CounterDateTime and CounterValue. The column store table will adopt best practice and store as many of the dimension links as integer columns. You can see from the structure of the factCounterData table at the start of this article that the GUID is not stored. Instead an integer ServerID is stored. Also, the CounterDateTime value is split into DateID and TimeID. Here’s the code that populates the column store table:

 

-- Dimension table to hold the GUID to ServerID mapping
CREATE TABLE [dbo].[dimServer](
[ServerID] [int] IDENTITY(1,1) NOT NULL,
[ServerName] [varchar](1024) NOT NULL,
[GUID] [uniqueidentifier] NOT NULL,
CONSTRAINT [PK_dimServer] PRIMARY KEY CLUSTERED
(
[ServerID] ASC
)
)
-- Routine to populate dimServer. Insert row if it's the first time we're seeing the DisplayString
CREATE PROCEDURE [dbo].[procInsertDimServer] AS
BEGIN
SET NOCOUNT ON

INSERT INTO dbo.dimServer (ServerName,GUID)
SELECT DisplayString,GUID
FROM dbo.DisplayToID d
WHERE NOT EXISTS (SELECT * FROM dbo.dimServer s WHERE s.ServerName = d.DisplayString)
END
CREATE PROCEDURE [dbo].[procInsertFactCounterData]
AS
BEGIN
SET NOCOUNT ON

-- Insert the server row if we've not inserted it during previous executions
EXEC dbo.procInsertDimServer

-- Now insert the fact data. Split the datetime value into separate date and time columns
INSERT INTO dbo.factCounterData (
                ServerID,
                DateID,
                TimeID,
                CounterID,
                CounterValue
)
SELECT
                s.ServerID,
                CAST(REPLACE(LEFT(c.CounterDateTime,10),'-','') AS int),
                DATEDIFF(second,'00:00:00',SUBSTRING(c.CounterDateTime,12,8)),
                c.CounterID,
                c.CounterValue
FROM dbo.CounterData c
INNER JOIN dbo.dimServer s
ON c.GUID = s.GUID
END
-- CounterData is now just a staging table so can be cleared down before running RELOG.EXE
CREATE PROCEDURE [dbo].[procClearDownCounterData]
AS
BEGIN
SET NOCOUNT ON
TRUNCATE TABLE dbo.CounterData
END

With this code in place the typical execution steps are

  1. Execute dbo.procClearDownCounterData
  2. Run the RELOG.EXE command to populate CounterData, CounterDetails and DisplayToID
  3. Execute dbo.procInsertFactCounterData

In terms of storage costs, for one server capturing one hundred different performance counters every two seconds I found over 650 million rows in the factCounterData occupied only two gigabytes of database space. The compression is dependent on the number of repeating values but you should fine impressive storage reductions over conventional b-tree storage.

In terms of reporting, here’s a parameterised stored procedure that can be used to query a server for a specified counter. Start and end time ranges can be supplied when calling the procedure:


USE [perfmondb]
GO

/****** Object:  StoredProcedure [dbo].[procGetCounterValues]    Script Date: 13/01/2015 23:23:16 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE PROCEDURE [dbo].[procGetCounterValues]
@ServerName varchar(1024),
@ObjectName varchar(1024) = NULL,
@CounterName varchar(1024) = NULL,
@InstanceName varchar(1024) = NULL,
@Start datetime = NULL,
@End datetime = NULL
AS
BEGIN

SET NOCOUNT ON

DECLARE @StartDateID int = CONVERT(char(8),@Start,112)
DECLARE @StartTimeID int = DATEDIFF(second,'00:00:00',CAST(@Start AS time))
DECLARE @EndDateID   int = CONVERT(char(8),@End,112)
DECLARE @EndTimeID   int = DATEDIFF(second,'00:00:00',CAST(@End AS time))

SELECT
s.ServerName,
DATEADD(second,f.TimeID,CAST(f.DateID AS char(8))) AS CounterTime,
c.ObjectName,
c.CounterName,
c.InstanceName,
f.CounterValue
FROM dbo.factCounterData f
INNER JOIN dbo.dimServer s
ON s.ServerID = f.ServerID
INNER JOIN dbo.CounterDetails c
ON c.CounterID = f.CounterID
WHERE s.ServerName = @ServerName
AND (@ObjectName IS NULL OR c.ObjectName = @ObjectName)
AND (@CounterName IS NULL OR c.CounterName = @CounterName)
AND (@InstanceName IS NULL OR c.InstanceName = @InstanceName)
AND (
@StartDateID IS NULL
OR f.DateID > @StartDateID
OR (f.DateID = @StartDateID AND f.TimeID >= @StartTimeID)
)
AND (
@EndDateID IS NULL
OR f.DateID < @EndDateID
OR (f.DateID = @EndDateID AND f.TimeID <= @EndTimeID)
)
ORDER BY CounterTime
END

GO

An example call:

EXECUTE [dbo].[procGetCounterValues]
@ServerName = ‘Win2012r2’,
@CounterName = ‘% Processor Time’,
@InstanceName = ‘_Total’,
@Start = ‘20141130 16:46:41’,
@End = ‘20141130 16:46:59’

Output is this:

colstoreperfmon4

So ten rows out of a total of over one million rows for my data set. And the plan showed batch mode execution for the far right column store index scan:

colstoreperfmon5

 

Even querying the table when it held over 650 million rows I found my queries returned in less than three seconds.

So storage space and query response is optimised when using column store storage for performance monitor data.

Leave a comment