10 February 2015

Finding Orphaned Data Files

So Kendra Little's post about how dropping a database that is in an offline state will not clean up the MDF and LDF files got me thinking. I hadn't really considered this before, and wanted an easy way of seeing if I was wasting space with large MDF files that we thought we dropped years ago...a query across CMS should fit the bill.

Here's what my code does:
  • Extracts the folder path and data file names for all databases on the instance, from sys.master_files.
  • For each unique path that is used for storing data files, use xp_cmdshell to get all file names stored there.
  • Find any MDF or NDF files in that list that aren't mentioned in sys.master_files.

Here's the code:

DECLARE @files TABLE (pth VARCHAR(500), fname VARCHAR(500));
DECLARE @cmdOutput TABLE (txt VARCHAR(1000) NULL);
DECLARE @cmd VARCHAR(1000), @MDFpath VARCHAR(500);
SET NOCOUNT ON;

INSERT INTO @files
SELECT
        
LEFT(physical_name,LEN(physical_name)-PATINDEX('%\%',REVERSE(physical_name))+1),
        
RIGHT(physical_name,PATINDEX('%\%',REVERSE(physical_name))-1)
FROM sys.master_files WHERE type = 0;

DECLARE pathcurs CURSOR FOR SELECT DISTINCT pth from @files
OPEN pathcurs
FETCH NEXT FROM pathcurs INTO @MDFpath
WHILE @@FETCH_STATUS = 0  
BEGIN
        SET
@cmd = 'dir ' + @MDFpath + ' /a:-d /b';
        
INSERT INTO @cmdOutput EXECUTE master..xp_cmdshell @cmd;
        
FETCH NEXT FROM pathcurs INTO @MDFpath
END  
CLOSE
pathcurs
DEALLOCATE pathcurs

SELECT c.txt AS DetachedDataFiles FROM @cmdOutput c
LEFT JOIN @files f ON UPPER(c.txt) = UPPER(f.fname)
WHERE UPPER(c.txt) LIKE '%.[M,N]DF%' AND f.fname IS NULL;


Note, in SQL 2005 you'll probably see references to distmdl.mdf and mssqlsystemresource.mdf, which are the data files for the Distribution and Resource databases, respectively. You can ignore these, or alter the script above to ignore them.

02 February 2015

PRINT Statement "Churn"

So I was sitting in one of those post-mortem type meetings, the sort fraught with tearful recriminations and teeth-gnashing (or at least, perhaps, they ought to be) and I was amused to observe the birth of a silly superstition. It was a mix of technical and nontechnical folk, business analysts, developers, business VPs, etc. We were discussing the performance of a particular SSIS package that runs a stored proc, appears to be complete in terms of data, but then sits and runs on the server for hours afterward, when it should have completed and closed. One developer coined the term "churn", as in, how butter is made, to describe what SQL Server was doing.

It was baffling to see how readily and happily the non-technical folk (and some of the technicals, for shame) accepted this as a good enough answer. Instead of asking, why is this happening and how do we fix it, the execs just came to view "The Churn" as a necessary part of the data processing. "So, how long do we expect The Churn to take with this large a file?" "Can you get me an ETA for when The Churn will be finished?" Soon everybody was getting in on it...even people on the business side could understand The Churn. I mean, what could be more natural than that, you run an SSIS package, it works, and then you have a few hours for the DTEXEC process to sit and "churn"?

This kind of silly superstition, where we take something we don't understand and invent a name for it as a kind of acceptance, hobbles an IT organization, papering over ignorance instead of exposing it and digging deeper. It turns out in this case, as our appdev manager surmised, it was massively looped PRINT statements (completely unnecessary of course) in the stored procedures.

To demonstrate, I created a sample stored procedure that loops 100,000 times:

CREATE PROC usp_CHURN
AS
BEGIN
   SET NOCOUNT ON
;
  
DECLARE @i INT = 1;
  
DECLARE @text VARCHAR(200);

  
WHILE(@i < 100000)
  
BEGIN
       SELECT
@text='You are at increment number ' + CAST(@i AS VARCHAR(20));
      
--PRINT @text
      
SET @i += 1;
  
END
   INSERT INTO
TempTable SELECT GETDATE()
END


As is, with the PRINT statement commented out, this runs very fast (subsecond), as you would expect. The INSERT just lets you know when the proc is actually done.

So I created a DTSX package that does nothing more than run this process, and from a different server, created a batch file that outputs the time, then runs the package, then outputs the time again. Running the proc as is, well, as I say, there's nothing to it, it's just memory getting moved around. But when I uncomment the PRINT statement, things take a bit of a nosedive.

Comparing the package start time (10:41:33.49) to the timestamp logged in the table (10:41:35.010) we see that less than 2 seconds elapsed in the actual execution of the procedure. However, the package completed at 10:45:23.38. So two seconds to do the work, and almost four minutes for SSIS to handle all those PRINT statements getting pushed back up the chain. Now, imagine that scaled up to millions or billions, and you can understand why we had an apparently complete SSIS package chewing up one of the vCPUs on a box for days. Yes, days. And taking the PRINT statements out fixed it.

Tactical moral of the story: be very sparing and cautious when sticking PRINT statements into heavily looped sections of your code, unless absolutely necessary. Strategic moral of the story: stop accepting surface level explanations and lazy superstitions...dig in and figure it out.