21 July 2016

Estimating Size When Dropping a Clustered Columnstore Index

SQL Server 2014 introduced clustered columnstore indexes which have proven to be very handy in our shop for large data warehouse "fact" tables. The compression algorithm they use is very efficient and (dependent on data of course) can compress data down at ratios that make conventional B-tree row and page compression look pretty anemic in comparison. But, there's an ironic downside to this great compression performance...

Clustered columnstore indexes aren't great for running mass updates. They are best for relatively static data. So our BI team tends to drop the columnstore index and build conventional indexes (clustered or non-clustered) to support an update on one of these tables, then rebuild the columnstore after the update. It's a slow, painful process, but it works. However, every now and again somebody forgets just how great columnstore compression is and drops the columnstore index, effectively building a massive uncompressed B-tree heap. This operation can easily fill a drive if it is a large enough table and drive space has not been attended to. So estimating the size of the table uncompressed becomes essential.

Microsoft walks you through the process for Heaps, Clustered Indexes, and Non-Clustered Indexes. The non-leaf level index estimation requires a bit more complexity, so you're generally best doing it by hand, but for basic leaf-level/heap estimation, I decided to automate the process so you can just plug in your table name, and estimate the size of the uncompressed heap by examining the columns. It isn't ideal for a variety of things...obscure data types, sparse columns, and if you have a lot of variable data types you're better off computing the AVG(LEN(column)) manually to get a more accurate length of that column, on average. But for our tables, mostly with static length columns, it is pretty handy.

----ALTER THESE VALUES, RUN IN CORRECT DB----DECLARE @SchName SYSNAME = 'dbo';DECLARE @TblName SYSNAME = 'FACT_CLAIM_HISTORY';---------------------------------------------
DECLARE @IxName SYSNAME, @ExistingPages INT, @ExistingMB INT;
SELECT @IxName = name FROM sys.indexesWHERE OBJECT_SCHEMA_NAME(object_id)=@SchName AND OBJECT_NAME(object_id)=@TblNameAND index_id=1 and [type]=5;
SELECT @ExistingPages = SUM(a.total_pages) FROM sys.partitions pINNER JOIN sys.allocation_units a ON p.[partition_id] = a.[container_id]WHERE OBJECT_SCHEMA_NAME(object_id)=@SchName AND OBJECT_NAME(object_id)=@TblName;
SELECT @ExistingMB = (8 * @ExistingPages) / 1024;
IF (@IxName IS NULL)BEGIN
PRINT
'No clustered columnstore index on that table.';END
ELSE
BEGIN
   DECLARE
      
@TotalRows INT,
      
@TotalColumns INT,
      
@StatCols INT,
      
@DynCols INT,
      
@StatColBytes INT,
      
@DynColBytes INT,
      
@DynColBytesTracking INT,
      
@NullBitmapBytes INT,
      
@RowBytes INT,
      
@RowsPerPage INT,
      
@TotalPages INT,
      
@TotalSizeMB INT;

  
PRINT 'Table Name: ' + @SchName+'.'+@TblName;
  
PRINT 'Index Name: ' + @IxName;

  
--Total Rows
  
SELECT @TotalRows = SUM([rows]) FROM sys.partitions
  
WHERE OBJECT_SCHEMA_NAME(object_id)=@SchName AND OBJECT_NAME(object_id)=@TblName;
  
PRINT 'Total Rows: ' + CAST(@TotalRows AS VARCHAR(100));

  
--Total Columns
  
SELECT @TotalColumns = COUNT(*) FROM sys.columns
  
WHERE OBJECT_SCHEMA_NAME(object_id)=@SchName AND OBJECT_NAME(object_id)=@TblName;
  
PRINT 'Total Columns: ' + CAST(@TotalColumns AS VARCHAR(100));

  
--Total Static Length Columns
  
SELECT @StatCols = COUNT(*) FROM sys.columns c
  
INNER JOIN sys.types t ON t.system_type_id = c.system_type_id
  
WHERE OBJECT_SCHEMA_NAME(c.object_id)=@SchName AND OBJECT_NAME(c.object_id)=@TblName
  
AND t.name NOT LIKE 'var%' AND t.name NOT LIKE '%text';
  
PRINT 'Total Static Length Columns: ' + CAST(@StatCols AS VARCHAR(100));

  
--Total Variable Length Columns
  
SELECT @DynCols = COUNT(*) FROM sys.columns c
  
INNER JOIN sys.types t ON t.system_type_id = c.system_type_id
  
WHERE OBJECT_SCHEMA_NAME(c.object_id)=@SchName AND OBJECT_NAME(c.object_id)=@TblName
  
AND ( t.name LIKE 'var%' OR t.name LIKE '%text');
  
PRINT 'Total Variable Length Columns: ' + CAST(@DynCols AS VARCHAR(100));

  
--Total Static Length Column Bytes
  
SELECT @StatColBytes = ISNULL(SUM(c.max_length),0) FROM sys.columns c
  
INNER JOIN sys.types t ON t.system_type_id = c.system_type_id
  
WHERE OBJECT_SCHEMA_NAME(c.object_id)=@SchName AND OBJECT_NAME(c.object_id)=@TblName
  
AND t.name NOT LIKE 'var%' AND t.name NOT LIKE '%text';
  
PRINT 'Total Static Length Column Bytes: ' + CAST(@StatColBytes AS VARCHAR(100));

  
--Max Variable Length Column Bytes
  
SELECT @DynColBytes = ISNULL(SUM(c.max_length),0) FROM sys.columns c
  
INNER JOIN sys.types t ON t.system_type_id = c.system_type_id
  
WHERE OBJECT_SCHEMA_NAME(c.object_id)=@SchName AND OBJECT_NAME(c.object_id)=@TblName
  
AND ( t.name LIKE 'var%' OR t.name LIKE '%text');
  
PRINT 'Maximum Variable Length Column Bytes: ' + CAST(@DynColBytes AS VARCHAR(100));

  
--Total Variable Column Bytes with Tracking Bytes
  
SELECT @DynColBytesTracking = CASE WHEN (@DynCols > 0) THEN (2 + (@DynCols * 2) + @DynColBytes) ELSE 0 END;
  
PRINT 'Total Variable Length Column Bytes With Tracking: ' + CAST(@DynColBytesTracking AS VARCHAR(100));

  
--Null Bitmap Bytes
  
SELECT @NullBitmapBytes = 2 + ((@TotalColumns + 7) / 8);
  
PRINT 'Null Bitmap Bytes: ' + CAST(@NullBitmapBytes AS VARCHAR(100));

  
--Row Size Bytes (including header bytes)
  
SELECT @RowBytes = @StatColBytes + @DynColBytesTracking + @NullBitmapBytes + 4;
  
PRINT 'Row Bytes: ' + CAST(@RowBytes AS VARCHAR(100));

  
--Rows Per Page
  
SELECT @RowsPerPage = 8096 / (@RowBytes + 2);
  
PRINT 'Rows Per Page: ' + CAST(@RowsPerPage AS VARCHAR(100));

  
--Total Pages
  
SELECT @TotalPages = ROUND((@TotalRows * 1.0) / @RowsPerPage, 0);
  
PRINT 'Total Pages: ' + CAST(@TotalPages AS VARCHAR(100));

  
--Heaap size in megabytes
  
SELECT @TotalSizeMB = (8 * @TotalPages) / 1024;
  
PRINT 'Total Size, MB: ' + CAST(@TotalSizeMB AS VARCHAR(100));

  
--Current (compressed) values
  
PRINT 'Current (compressed) Pages: ' + CAST(@ExistingPages AS VARCHAR(100));
  
PRINT 'Current (compressed) Size, MB: ' + CAST(@ExistingMB AS VARCHAR(100));

  
--Compression Ratio, for fun...
  
PRINT 'Compression Ratio: ' + CAST((@TotalSizeMB*1.0/@ExistingMB) AS VARCHAR(100));

END


Here's some sample output:
Table Name: dbo.FACT_CLAIM_HISTORY
Index Name: CCIX_FACT_CLAIM_HISTORY
Total Rows: 282749658
Total Columns: 79
Total Static Length Columns: 79
Total Variable Length Columns: 0
Total Static Length Column Bytes: 405
Maximum Variable Length Column Bytes: 0
Total Variable Length Column Bytes With Tracking: 0
Null Bitmap Bytes: 12
Row Bytes: 421
Rows Per Page: 19
Total Pages: 14881561
Total Size, MB: 116262
Current (compressed) Pages: 3211431
Current (compressed) Size, MB: 25089
Compression Ratio: 4.633983020447





No comments: