Archive for April, 2013

SQLIO write buffer type

When doing a search for SQLIO you will find many blog posts showing example executions with the -BH parameter. According to the help for SQLIO the possible values for the -B parameter are H for hardware buffering, S for software buffering, Y for hardware and software buffering and N for no buffering. The use of -BH for the SQLIO read examples may be fine but I would suggest the use of -BH for the write examples is incorrect if you are trying to match SQL Server’s write access method. Using Process Monitor we can identity the write buffering method that SQL Server uses when, for example, we do simple inserts into a table:

sqlio1

You can see the sequential writes to the transaction log made by the log writer process after each commit and the random writes to the data file made by the checkpoint process. In both cases the write type is a Non-cached, Write Through write.

Now checking Process Monitor when the following SQLIO command is run using the -BH parameter

sqlio -kW -s5 -dD -o8 -fsequential -b8 -BH -LS -Fparam.txt

sqlio v1.5.SG
using system counter for latency timings, 2597685 counts per second
parameter file used: param.txt
file d:\testfile.dat with 4 threads (0-3) using mask 0x0 (0)
4 threads writing for 5 secs to file d:\testfile.dat
using 8KB sequential IOs
enabling multiple I/Os per thread with 8 outstanding
buffering set to use hardware disk cache (but not file cache)
using specified size: 100 MB for file: d:\testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 28807.29
MBs/sec:   225.05
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 1
Max_Latency(ms): 3
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%:  0 99  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

we see this:

sqlio2

Note the missing Write-Through.

Specifying -BN in place of -BH we match the write type that SQL Server is actually using:

sqlio -kW -s5 -dD -o8 -fsequential -b8 -BN -LS -Fparam.txt

sqlio v1.5.SG
using system counter for latency timings, 2597685 counts per second
parameter file used: param.txt
file d:\testfile.dat with 4 threads (0-3) using mask 0x0 (0)
4 threads writing for 5 secs to file d:\testfile.dat
using 8KB sequential IOs
enabling multiple I/Os per thread with 8 outstanding
buffering set to not use file nor disk caches (as is SQL Server)
using specified size: 100 MB for file: d:\testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 28794.08
MBs/sec:   224.95
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 0
Max_Latency(ms): 3
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%:  1 99  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

sqlio3

The difference in results for the two parameter types may be quite large if you are doing benchmarks against different types of storage infrastructure. For example, replicated SAN storage.

Advertisements

2 Comments

Next version features

It’s no secret that the next version of SQL Server will have two features that many customers are looking forward to seeing: firstly, read write clustered column store index and second, a fast, latch free, in-memory technology for OLTP workloads (code name Hekaton). If you’re working purely in the Microsoft world these additions seem to be cutting edge. That’s because over 90% of the product uses decades old features such as row based data formats, conventional disk storage, uncompressed data and scale-up only solutions.

If you look outside the Microsoft world you’ll find that many of the features that will only start to appear in next version of SQL Server are already at the core of some competitor products. To give you one such example purely in terms of utilising modern technologies, consider SAP HANA (http://www.saphana.com/welcome). This system has in-memory, compressed columnstore and rowstore at its core. Large data volumes are handled through scale-out (hash distribution partitioning of data across large number of nodes). The traditional query optimisation approaches (introduction of numerous indexes, store the data more than once using materialised/indexed views or using a separate OLAP database for the aggregates) are unnecessary as query performance across terabytes of in-memory, real-time, raw data is already blazingly fast. The system comes as an appliance (Parallel Data Warehouse like) but unlike PDW V2 it already makes use of modern solid state technologies for the persistence of its in-memory data.

In term of compliance the SQL language is ANSI compliant. Some nice additions results from its in-built version store approach for data changes. For example, see the SQL reference guide for SELECT (http://help.sap.com/hana/html/sql_select.html). You’ll see this under the ‘Time Travel’ section:

select * from x as of utctimestamp '2012-01-02 02:00:00'; // return one record (1,1)
select * from x as of utctimestamp '2012-01-03 03:00:00'; // return two records (1,1) and (2,2)
select * from x as of utctimestamp '2012-01-04 04:00:00'; // return nothing

Although it may not sound like it, this blog entry is not supposed to be an advert for SAP HANA. Admittedly the architecture and performance results are impressive. The reason for the posting is that from a Microsoft SQL Server user’s point of view it’s great that our own product’s feature set is being added to. But it’s only by reviewing what already exists in the non-Microsoft market can we put in proper perspective the functionality, limitations, cost and benefits of any new features.

Leave a comment