25 September 2012

Let's Break Something

Here's a fun little exercise for the bored DBA.  Break your database.

When I say your database, I don't mean your company's database, or anything anywhere remotely within earshot of a production machine.  But I have some sample databases on an eval version of SQL Server 2012 that would be prime sacrificial lamb material.  The goal is to simulate a worst case scenario and recover from it.  I've done this once before, but it has been long enough that I've forgotten exactly how I did it, so we'll give it a bash.

First, take a backup.

Then, reread that first step and make sure you did it.  I'll wait.

...

Backup taken?  Are you sure?  OK then, so next I'll need you to open You really did take a backup right?  Let's just make sure of that.  OK, I'll trust you at this point.  So next you'll need to open SQL Server Configuration Manager and shut your instance down.  Once the server is switched off (you will find it under SQL Server Services, right-click and click Stop on the relevant SQL Server instance, if more than one), you can have unfettered access to the data files, and we can do all sorts of unmentionable atrocities to them.

No time like the present, so browse to your data file, right click the database THAT YOU HAVE PREVIOUSLY TAKEN A BACKUP OF, AND THAT IS NOT AN IMPORTANT DATABASE IN ANY RESPECT, AND THAT IS NOT ON ANYTHING AKIN TO A PRODUCTION SERVER, and open in a text editor.  Highlight some text (any text!) and replace with something pithy...I have chosen "My hovercraft is full of eels".  Be creative, muck up as much of the data as you wish.

Now on to the fruits of our labours...restart the service in Configuration Manager.  Then launch Management Studio.  Look at that...our database with a "Recovery Pending" note affixed.  Data is unaccessible.  Hmmm!  Seems different slightly than how things reacted in 2008R2, so let's play around and see how it goes.

In the error logs:
Error: 824, Severity: 24, State: 2.
SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:9; actual 0:0). It occurred during a read of page (1:9) in database ID 8 at offset 0x00000000012000 in file '[path to data file]'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

So it knows what we did.  Alright, fine, SQL Server, I'll do what you say.  dbcc checkdb returns "Database [name] cannot be opened due to inaccessible files or insufficient memory or disk space." Dandy! Thank you SQL Server for telling me to do what you then tell me I can't do.  It's data loss acceptance time, folks.  A quick restore of a full backup and we're back up.

Let's have some more "fun"!  This time, I repeated the experiment, but went in and changed one single character to another character.  Pretty mild change, innit?  Maybe SQL Server won't notice?  Same error, same severity level, same unrecoverable database.  So restored from backup again, and this time, I monkey with the log file LDF instead of the actual data file.  No errors reported and no loss of functionality (at least, apparently!).  But, my database was in simple recovery model.  Wondering if that had anything to do with it, I switched to full recovery model and corrupted the logfile again.  That did it, the reassuring "Recovery Pending".

Error: 9004, Severity: 21, State: 1.
An error occurred while processing the log for database [name].  If possible, restore from backup. If a backup is not available, it might be necessary to rebuild the log.

So, a slightly smaller severity number, good sign eh?  So first I execute ALTER DATABASE [name] SET EMERGENCY.  This puts it in an "emergency" state and turns the tan database icon to an attractive shade of pale red, indicating that SOMETHIN' IS WRONG RIGHTCHERE!!!   I tried running CHECKDB with the REPAIR_ALLOW_DATA_LOSS parameter, and it complained about it needing to be in single user mode.  So ALTER DATABASE [name] SET SINGLE_USER, and then DBCC CHECKDB ([name], REPAIR_ALLOW_DATA_LOSS).  Here's the relevant result:

Warning: The log for database [name] has been rebuilt. Transactional consistency has been lost. The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were. You should run DBCC CHECKDB to validate physical consistency. The database has been put in dbo-only mode. When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.

OK, so it rebuilt the log for me!  Switched it out of single user mode, and voila, we're back to normal.  Note that (as one might expect) the restore chain is broken, so in "real life" this would be a good time to take a full backup and start fresh with your log backups.

I've poked at SQL Server in an inappropriate manner enough this morning, so I shall give it a rest for the time being. 

No comments: