Archive for December, 2014

Failed full backup can impact next differential backup

For any of your very large databases you are probably taking once a week full backups followed by daily differential backups. We know that differential change map (DCM) pages hold references to the extents that have changed since the last full backup and it is these that are examined when a differential backup is run. The DCM pages allow the changed extents to be quickly identified.

Using your maintenance routines you are probably finding that your full database backup takes a couple of hours to complete and backs up a terabyte worth of data. Your daily differential backups complete in minutes and backup a hundred gigabytes worth of data by the time the next full backup is due. This is all good and your maintenance fits into the windows you have for administration.

But what if you run the once a week full backup on a Sunday, it fails and you notice on a Monday morning when your maintenance window is closed ? It’s probably the case that in terms of the backup chain the full backup failure doesn’t matter. You keep on taking differential backups with these being based on the previous week’s full database backup. You have no problem in terms of recovery. But what you’ll be surprised to find is that the first and subsequent differential backups since the failed full backup will take substantially longer to complete particularly when compared to the previous week’s elapsed times. This will be the case even though the volume of data changes hitting your database hasn’t changed since the previous week. The differential backup duration may increase several fold compared to the previous week.

The reason for the increase in the differential backup time is that the backup process now examines ever single data page for changes. It won’t be until the next full backup successfully completes that the optimised mode of the differential backup will run (the mode that uses the DCM pages to identify changed extents). Here is a repro script illustrating the issue:


-- Create a test table in one of your testing databases
-- Created so that one row occupies one page so that we can easily makes sense of the page numbers that are reported
-- in the backup output

CREATE TABLE testdb.dbo.mytable (
id int NOT NULL,
val char(7000) NOT NULL
)
GO

-- Populate the table with 50,000 rows (50,000 data pages)
SET NOCOUNT ON
GO

INSERT INTO testdb.dbo.mytable
VALUES (1,'hello')
GO 50000

-- Take a full backup
BACKUP DATABASE testdb TO DISK='C:\SQL2014\Backup\testdb.bak' WITH INIT,NO_COMPRESSION
GO

/*
Processed 50656 pages for database 'testdb', file 'testdb' on file 1.
Processed 1 pages for database 'testdb', file 'testdb_log' on file 1.
BACKUP DATABASE successfully processed 50657 pages in 4.865 seconds (81.346 MB/sec).
*/

-- You can see that roughly 50,000 pages were backed up

-- Now that previous backup was successful. Let's see how quick the next differential backup is.
-- It should be fast as there have been no data changes since the last full backup.
BACKUP DATABASE testdb TO DISK='C:\SQL2014\Backup\testdbdiff.bak' WITH INIT,DIFFERENTIAL,NO_COMPRESSION
GO

/*
Processed 56 pages for database 'testdb', file 'testdb' on file 1.
Processed 1 pages for database 'testdb', file 'testdb_log' on file 1.
BACKUP DATABASE WITH DIFFERENTIAL successfully processed 57 pages in 0.015 seconds (29.687 MB/sec).
*/

-- And, yes, it is very quick with only 56 database pages being examined.

-- Now run the full backup again but interrupt it before completion to simulate an error you may get in your production system
BACKUP DATABASE testdb TO DISK='C:\SQL2014\Backup\testdb.bak' WITH INIT,NO_COMPRESSION
GO

/*
Msg 3204, Level 16, State 1, Line 69
The backup or restore was aborted.
Msg 3013, Level 16, State 1, Line 69
BACKUP DATABASE is terminating abnormally.
Query was cancelled by user.
*/

-- Now run the differential backup. It should be quick.
BACKUP DATABASE testdb TO DISK='C:\SQL2014\Backup\testdbdiff.bak' WITH INIT,DIFFERENTIAL,NO_COMPRESSION
GO

/*
Processed 50656 pages for database 'testdb', file 'testdb' on file 1.
Processed 1 pages for database 'testdb', file 'testdb_log' on file 1.
BACKUP DATABASE WITH DIFFERENTIAL successfully processed 50657 pages in 1.890 seconds (209.392 MB/sec).
*/

50,000+ database pages were examined for the differential backup that followed the failed full backup. That equates to every page in the database. For terabyte sized databases that will result in a very high elapsed time. It won’t be until the next successful full backup that the differential backup will return to its ‘optimised’ behaviour:

BACKUP DATABASE testdb TO DISK='C:\SQL2014\Backup\testdb.bak' WITH INIT,NO_COMPRESSION
GO

/*
Processed 50656 pages for database 'testdb', file 'testdb' on file 1.
Processed 2 pages for database 'testdb', file 'testdb_log' on file 1.
BACKUP DATABASE successfully processed 50658 pages in 3.784 seconds (104.587 MB/sec).
*/

BACKUP DATABASE testdb TO DISK='C:\SQL2014\Backup\testdbdiff.bak' WITH INIT,DIFFERENTIAL,NO_COMPRESSION
GO

/*
Processed 40 pages for database 'testdb', file 'testdb' on file 1.
Processed 1 pages for database 'testdb', file 'testdb_log' on file 1.
BACKUP DATABASE WITH DIFFERENTIAL successfully processed 41 pages in 0.016 seconds (20.019 MB/sec).
*/

These observations would suggest that the DCM page content is examined at the start of the full backup and cleared as the backup progresses. If the backup fails it seems that the DCM page content is not rolled back to its pre-full backup state. Subsequent differential backups somehow detect that the previous full backup has failed (some sort of flag in the boot page of DCM pages, perhaps) and run in a mode where they examines all database pages. These cases will show up as high reads in your virtual file stats, your perfmon disk reads and your SAN disk reads.

So for the differential backups of your big databases look out for these being impacted when your full database backups fail.

Advertisements

Leave a comment