Charting virtual file stats using R

This blog post shows how to retrieve data using R and chart the results in a number of ways. The example data consists of virtual file stats. The source is the virtual file stats DMV so the ‘measures’ are ever increasing values since the last restart of the SQL Server. The deltas could be calculated using SQL windowing functions but I show how to use the R LAG function to achieve the same result. The raw cumulative data is as follows:

virtualfilestats1

The TestDB consists of four data files and the virtual file stats DMV was sampled every 10 minutes. The above shows 30 rows out of a total of 2304 rows (from 1st March to 4th March inclusive). I’m only interested in the ‘number of bytes read’ measure. The same analysis could be done for any of the other measure columns.

Here’s the R code to read from the table and chart the results:


# Use the install.packages lines if this is the first use of the libraries
# install.packages("RODBC")
# install.packages("dplyr")
# install.packages("ggplot2")
# install.packages("scales")

library(RODBC)
library(dplyr)
library(ggplot2)
library(scales)

# Connect to the database
dbhandle <- odbcDriverConnect('driver={SQL Server};server=MySQLServer;database=TestDB;trusted_connection=true')

# Cast the sampletime as separate date and time columns
res <- sqlQuery(dbhandle, "select CAST(sampletime AS date) AS DateValue,CAST(sampletime AS time) AS TimeValue,physical_name,num_of_bytes_read from virtualfilestats order by physical_name,sampletime")

# Load results into a dplyr data frame
df<-tbl_df(res)

# Cast and format the time column
df$TimeValue<-as.POSIXct(df$TimeValue,format = "%H:%M:%S")

# Group the data frame by file name so the lag function works per file name.
# Also, ensure the proper ordering within each group so the delta values are calculated correctly using lag
df<-df%>%group_by(physical_name)%>%arrange(DateValue,TimeValue)

# Use the lag function to calculate the deltas
df<-df%>%mutate(num_of_bytes_read=num_of_bytes_read - lag(num_of_bytes_read))

# One chart per day with colours representing the different file names
ggplot(df,aes(x=TimeValue,y=num_of_bytes_read,colour=physical_name)) + geom_line() + facet_grid(~DateValue) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_x_datetime(breaks=date_breaks("1 hour"), labels=date_format("%H:%M")) + scale_y_continuous(labels = comma)

# One chart per day per file name. Independent y axis for each file name (scales = "free_y")
ggplot(df,aes(x=TimeValue,y=num_of_bytes_read)) + geom_line() + facet_grid(physical_name~DateValue,scales="free_y") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_x_datetime(breaks=date_breaks("1 hour"), labels=date_format("%H:%M")) + scale_y_continuous(labels = comma)

# One chart per day per file name. Same scale for every y axis
ggplot(df,aes(x=TimeValue,y=num_of_bytes_read)) + geom_line() + facet_grid(physical_name~DateValue) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_x_datetime(breaks=date_breaks("1 hour"), labels=date_format("%H:%M")) + scale_y_continuous(labels = comma)

And here are the three charts that result from the ggplot commands:

The first one showing a chart for each day:

virtualfilestats2

The next two are better. The only difference between the two is the independent y axis for the first:

virtualfilestats3

virtualfilestats4

You can see that the most reads occur between 7am and 7pm with more data being read from data file 1 and 2 (perhaps because of proportional fill with different file sizes).

R scripting is great for this type of DBA work. You have full control of the elements that appear on your visualisations using the ggplot2 library. Coupled with this the dplyr library contains a wealth of windowing functions to deal with the most common data trends seen by DBAs.

 

Advertisements
  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: