01 June 2017

Vigenere Cipher ala T-SQL

I started thinking about how this would work and got a bit obsessed...the Vigenere cipher is a very old cipher (but relatively secure for its time, with a resistance to basic frequency analysis) that uses a shared key and a Vigenere table, basically a grid of letters.

It uses a repeating pattern of your keyword as a sort of keystream, similar to a one time pad (which are superior, having no repeating pattern). Out comes the T-SQL hammer, because this looks like a nail to me:
DECLARE @TextToEncrypt VARCHAR(500) = 'This is an especially secret message that I wish to encrypt with a Vigenere cipher.';

--Strip out non-alpha chars
WHILE PATINDEX('%[^a-z]%', @TextToEncrypt) > 0
SET @TextToEncrypt = STUFF(@TextToEncrypt, PATINDEX('%[^a-z]%',@TextToEncrypt),1,'');
--Set to upper case
SET @TextToEncrypt = UPPER(@TextToEncrypt);

--Shared secret key
--Let's make sure to "clean" that one too:
--Strip out non-alpha chars
WHILE PATINDEX('%[^a-z]%', @Key) > 0
SET @Key = STUFF(@Key, PATINDEX('%[^a-z]%',@Key),1,'');
--Set to upper case
SET @Key = UPPER(@Key);

--Pad out key to length of cleartext
SELECT @KeyPad=LEFT(REPLICATE(@Key,(LEN(@TextToEncrypt)/LEN(@Key))+1),LEN(@TextToEncrypt));

--Vigenere Cipher Encryption and Output
DECLARE @CipherText VARCHAR(500)='';
WHILE @i <= LEN(@TextToEncrypt)
@CipherText = @CipherText +
CHAR(65+((ASCII(SUBSTRING(@TextToEncrypt, @i, 1))-65+ASCII(SUBSTRING(@KeyPad, @i, 1))-65) % 26));
SET @i=@i+1;


The basic math for encryption is for each character (assuming 0-25 A-Z values), (plaintext + key) mod 26 = ciphertext. Output is thoroughly indecipherable, lacking the key (or a great deal of spare time): RVCVVWVRVYJIUAYZFBFIXVVZGIKKYUYWUEOMNOMLLGCBWULTOAZZBENAESHHEIXMGNYV

So to decrypt...my math is a bit shoddy but it is “theoretically” (ciphertext – key) mod 26 = plaintext. Something is going awry when cipher value is greater than key value, but anyway here’s the code with an IF/THEN workaround:
--Strike that...reverse it...

--Shared secret key

--Pad out key to length of ciphertext
SELECT @KeyPad=LEFT(REPLICATE(@Key,(LEN(@CipherText)/LEN(@Key))+1),LEN(@CipherText));

--Vigenere Cipher Decryption and Output
DECLARE @ClearText VARCHAR(500)='';
DECLARE @ClearChar CHAR(1);
WHILE @i <= LEN(@CipherText)
--This is a cheating IF-THEN because I have something in the math not quite right...but she works
IF (ASCII(SUBSTRING(@CipherText, @i, 1))>=ASCII(SUBSTRING(@KeyPad, @i, 1)))
SET @ClearChar=CHAR(65+(((ASCII(SUBSTRING(@CipherText, @i, 1))-65)-(ASCII(SUBSTRING(@KeyPad, @i, 1))-65)) % 26));
@ClearChar=CHAR(65+(26+(((ASCII(SUBSTRING(@CipherText, @i, 1))-65)-(ASCII(SUBSTRING(@KeyPad, @i, 1))-65)) % 26)));
SET @ClearText = @ClearText + @ClearChar;
SET @i=@i+1;



There you are...it’s a very simple solution and not at all secure in a comparable way to modern computer-based symmetric algorithms, but kind of fun to understand and practice string manipulation with.

Synchronous One-Time Password Token via T-SQL

While studying for the CISSP exam...which I passed last Saturday...I had this problem rankling around in my brain: how exactly time-based synchronization tokens, like the cryptocard devices we use at work for two factor authentication, might work. T-SQL isn't the best way to do this of course (although if you could carry around an entire instance of SQL Server inside a tiny crypto token that'd be pretty cool) but it's what I know. So after I kept failing to get something working while trying to use ENCRYPTBYPASSPHRASE and its decrypting cousin, I was out jogging in the rain and realized that a hash function is actually how you could do it. These type of functions depend on a combination of a secret key tied to that user, known only to the cryptocard itself and the authentication server, and a time based counter, which is how they avoid replay attacks (sniffing the wire and grabbing the credentials to reuse later). Instead of using the secret key to encrypt the timestamp or (somewhat oddly, but possible) the timestamp to encrypt the secret key...what you can do is just concatenate all that together, and hash it. The hash is a one way function so you can't reconstitute it back to its original text and find the secret key.

So imagine this code running on your token device when you push the button. The user and secret key are fixed in the device's firmware/ROM (before being distributed to the user). User probably not even necessary but it makes a nice addition. Then what happens is you take the user, key, and timestamp with seconds stripped off, and concatenate it into a long, rather unique string. Then hash it. Because nobody wants to enter a 160 bit hash in a window, we simplify it down by converting to int, making it positive, and taking the first 8 numbers off of that...and then output to the user on our little LCD panel.

/*This is the CryptoCard Side*/
DECLARE @SecretKey VARCHAR(100) = 'd4&g!29Betcetcetc';
DECLARE @TokenValue CHAR(8);

--Concatenate user ID, secret key, and date (floored to minute)
--Hash with SHA
--Convert to integer and ABS negative values
--Pull first 8 characters as token value
SELECT @TokenValue;

Then we, to authenticate, put in our username and password (basic single factor authentication), and are prompted to enter in this token (providing the multifactor, something-you-have + something-you-know authentication).

The server obviously validates the user password against the directory service or authentication server, and to authenticate the token it could do something like this. First grab the secret key for the specified user seeking authentication, make a small table variable for token values, and then insert the 'integerized' hash value for all possible values given the time (floored to the minute level), the user, and the secret key, within the last 5 minutes, and with some future minutes thrown in just in case clocks get off a bit. It's a fuzzy match, but so are biometrics, for that matter. If the provided token value is in that table variable, congrats, you've been authenticated. If not, sorry, no dice.

Now, nobody would use SQL Server to do this, nor should they, but to a DBA that has a hammer, everything looks like a nail, I suppose.
/*This is the Authentication Server Side*/
--User Provides User ID and token value
DECLARE @ProvidedUserID VARCHAR(10) = 'MyUserID';
DECLARE @ProvidedTokenValue CHAR(8) = '82650027';

--At this point, the system would retrieve the stored shared secret key into memory for that user from some sort of table or structure
--However to simplify this example, we're just going to hardcode it:
DECLARE @SecretKey VARCHAR(100) = 'd4&g!29Betcetcetc';

DECLARE @PossibleRecentValues TABLE (TokenValue CHAR(8));

--Compute possible values for last 5 minutes, plus two future to account for time sync issues
INSERT INTO @PossibleRecentValues (TokenValue) VALUES

IF EXISTS (SELECT 1 FROM @PossibleRecentValues WHERE TokenValue=@ProvidedTokenValue)
PRINT 'User is authenticated';
'Token expired or invalid';

16 August 2016

How You Can Crack Another Session's Temp Table, And Why You Might Want To

One of the understood behaviors of a SQL Server temporary table is that (aside from global temp tables) you can't peek into them from another session. Your temp table is your own and nobody else can access it (although sysadmins can at least find row counts in tempdb, which can be handy). But every now and then you run into an odd situation where it would be incredibly handy to find out what's inside of a temporary table that you didn't create.

For instance, say you have a batch process that has been running for hours, and seems to be doing nothing of interest. You look into the code that is running, and it has a while loop based on a count of the rows in a temp table. The exact methods can and do vary considerably, this is a completely contrived example.

SET NOCOUNT ON;CREATE TABLE #TempTable ( Col1 INT, Col2 CHAR(2000));INSERT INTO #TempTable VALUES (-42691,'Odd Record'),(0,'A'),(1,'A'),(2,'A'),(3,'A'),(4,'A'),(5,'A'),(6,'A'),(7,'A'),(8,'A'),(9,'A'),(10,'A');
DELAY '00:00:01';

--Do something with the record

DELETE FROM #TempTable WHERE Col1 = @i;
SET @i = @i+1;END

Now you may look at that and see the obvious logical flaw here, but if the table population wasn't visible to you, it might not be so clear. So you see that it is spinning forever in this while loop, and you can tell from various DMVs that the table exists with one row. But you can't see inside the table to find out why. OR CAN YOU? First let's grab the object ID...note the percent wildcard, important because SQL Server appends a bunch of silly stuff on the name:

SELECT name, [object_id], type_desc, create_date FROM tempdb.sys.tables WHERE name LIKE '#TempTable%'

name object_id type_desc create_date
#TempTable____________________000000011F66 -1590458413 USER_TABLE 2016-08-16 11:55:02

Make sure the create_date matches up to what you would expect (in case you grab a wrong but similarly named table) and take the object_id value and plug it into either DBCC IND, or my current favorite, sys.dm_db_database_page_allocations:

SELECT index_id, allocation_unit_type_desc, allocated_page_page_id, page_type, page_type_desc, page_level, extent_file_id FROM sys.dm_db_database_page_allocations (DB_ID(), -1590458413, NULL, NULL, 'DETAILED') WHERE is_allocated = 1 ORDER BY index_id, page_type;

index_id allocation_unit_type_desc allocated_page_page_id page_type page_type_desc page_level extent_file_id
0 IN_ROW_DATA 939 10 IAM_PAGE 0 1

In this example, we can ignore the IAM page and go straight to the one (currently) allocated data page. Note both the file ID and page ID...often very important for tempdb where multiple files is more the norm.


If you're new to DBCC PAGE, just Google Paul Randal and DBCC PAGE and you'll be off in no time. Just don't confuse it with DBCC WRITEPAGE or it will be a bad, bad day for you. But once you scan on down through, you can see your record. Col1 with a negative value, and our oddly written increment logic (starting from zero) will never hit that. So that's where our endless loop came from!

ParentObject Object Field VALUE
PAGE HEADER: Slot 0 Offset 0x60 Length 2011 Record Type PRIMARY_RECORD
PAGE HEADER: Slot 0 Offset 0x60 Length 2011 Record Attributes  NULL_BITMAP
PAGE HEADER: Slot 0 Offset 0x60 Length 2011 Record Size 2011
Slot 0 Offset 0x60 Length 2011 Memory Dump @0x000000003078A060 0000000000000000:   1000d807 3d59ffff 4f646420 5265636f 72642020  ..Ø.=YÿÿOdd Record  
Slot 0 Offset 0x60 Length 2011 Memory Dump @0x000000003078A060 0000000000000014:   20202020 20202020 20202020 20202020 20202020                      
extra rows…
Slot 0 Offset 0x60 Length 2011 Memory Dump @0x000000003078A060 00000000000007D0:   20202020 20202020 020000                             
Slot 0 Offset 0x60 Length 2011 Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4 Col1 -42691
Slot 0 Offset 0x60 Length 2011 Slot 0 Column 2 Offset 0x8 Length 2000 Length (physical) 2000 Col2 Odd Record         

As seemingly obscure as this may appear, I have had to use it to find root cause of a production issue. Kudos to the Pauls (both White and Randal)...the former's article helped me last year when I needed to try this out the first time, and the latter's various blog posts and SQLskills training has helped me a lot in sorting out how and when to use these useful undocumented commands like DBCC PAGE.

Happy data hacking!

15 August 2016

Estimating Completion Times of a SQL Process

This is the sort of query I've written from scratch a hundred times, or done manually using my trusty old friend calc.exe. You have a running process, and you know it has to chew through so many rows, and you have a count (possibly with a nolock hint) query that can see how many it has processed, or many are remaining to process. Well, I finally wrote out an adaptable version of this query that I can use as a template when I'm trying to quickly gauge how long a process is going to take. Basically you adjust the two "count" queries for your specific scenario, set either a target count query or a target fixed value (can be zero if you're counting backwards), and set a delay time between the two count values.

@t INT,                 --Target value
@c1 INT,                --First count value
@c2 INT,                --Second count value
@tm1 DATETIME,          --First time value
@tm2 DATETIME;          --Second time value

--Alter delay value depending how patient you are.

DECLARE @SampleDelay CHAR(8) = '00:01:00';

--Determine Target Count
--Alter the following based on the specifics of the situation, or use a hard coded value:

--COUNT(*) FROM someTable WHERE WITH (NOLOCK) ThisValue='this';

--Determine First Count

SELECT @tm1=GETDATE(), @c1=
--Alter the following based on the specifics of the situation:
--COUNT(*) FROM someOtherTable WITH (NOLOCK) WHERE ThisValue='that';

--Variable Delay


--Determine Second Count
SELECT @tm2=GETDATE(), @c2=
--Alter the following based on the specifics of the situation...should be same as first count query barring variable names
--COUNT(*) FROM someOtherTable WITH (NOLOCK) WHERE ThisValue='that';

--Compute Estimates

PRINT 'Target Count:         ' + CAST(@t AS VARCHAR(100));
PRINT 'Initial Count:        ' + CAST(@c1 AS VARCHAR(100)) + ' at ' + CAST(@tm1 AS VARCHAR(100));
PRINT 'Second Count:         ' + CAST(@c2 AS VARCHAR(100)) + ' at ' + CAST(@tm2 AS VARCHAR(100));
PRINT 'Elapsed seconds:      ' + CAST(DATEDIFF(second,@tm1,@tm2) AS VARCHAR(100));
PRINT 'Records processed:    ' + CAST(ABS(@c2-@c1) AS VARCHAR(100));
PRINT 'Rate per second:      ' + CAST(((@c2-@c1)*1.0)/DATEDIFF(second,@tm1,@tm2) AS VARCHAR(100));
PRINT 'Records remaining:    ' + CAST(ABS(@t - @c2) AS VARCHAR(100));
PRINT 'Seconds remaining:    ' + CAST(CAST((@t - @c2)/(((@c2-@c1)*1.0)/DATEDIFF(second,@tm1,@tm2)) AS INT) AS VARCHAR(100));
PRINT 'Estimated completion: ' + CAST(DATEADD(second,(@t - @c2)/(((@c2-@c1)*1.0)/DATEDIFF(second,@tm1,@tm2)), GETDATE()) AS VARCHAR(100));

Gives you rate information, time and records remaining, and calculates an estimated completion time. Here's a sample of the output:

Target Count: 20000
Initial Count: 1346 at Aug 15 2016 2:05PM
Second Count: 1498 at Aug 15 2016 2:06PM
Elapsed seconds: 60
Records processed: 152
Rate per second: 2.533333333333
Records remaining: 18502
Seconds remaining: 7303
Estimated completion: Aug 15 2016 4:08PM

21 July 2016

Estimating Size When Dropping a Clustered Columnstore Index

SQL Server 2014 introduced clustered columnstore indexes which have proven to be very handy in our shop for large data warehouse "fact" tables. The compression algorithm they use is very efficient and (dependent on data of course) can compress data down at ratios that make conventional B-tree row and page compression look pretty anemic in comparison. But, there's an ironic downside to this great compression performance...

Clustered columnstore indexes aren't great for running mass updates. They are best for relatively static data. So our BI team tends to drop the columnstore index and build conventional indexes (clustered or non-clustered) to support an update on one of these tables, then rebuild the columnstore after the update. It's a slow, painful process, but it works. However, every now and again somebody forgets just how great columnstore compression is and drops the columnstore index, effectively building a massive uncompressed B-tree heap. This operation can easily fill a drive if it is a large enough table and drive space has not been attended to. So estimating the size of the table uncompressed becomes essential.

Microsoft walks you through the process for Heaps, Clustered Indexes, and Non-Clustered Indexes. The non-leaf level index estimation requires a bit more complexity, so you're generally best doing it by hand, but for basic leaf-level/heap estimation, I decided to automate the process so you can just plug in your table name, and estimate the size of the uncompressed heap by examining the columns. It isn't ideal for a variety of things...obscure data types, sparse columns, and if you have a lot of variable data types you're better off computing the AVG(LEN(column)) manually to get a more accurate length of that column, on average. But for our tables, mostly with static length columns, it is pretty handy.

----ALTER THESE VALUES, RUN IN CORRECT DB----DECLARE @SchName SYSNAME = 'dbo';DECLARE @TblName SYSNAME = 'FACT_CLAIM_HISTORY';---------------------------------------------
DECLARE @IxName SYSNAME, @ExistingPages INT, @ExistingMB INT;
SELECT @IxName = name FROM sys.indexesWHERE OBJECT_SCHEMA_NAME(object_id)=@SchName AND OBJECT_NAME(object_id)=@TblNameAND index_id=1 and [type]=5;
SELECT @ExistingPages = SUM(a.total_pages) FROM sys.partitions pINNER JOIN sys.allocation_units a ON p.[partition_id] = a.[container_id]WHERE OBJECT_SCHEMA_NAME(object_id)=@SchName AND OBJECT_NAME(object_id)=@TblName;
SELECT @ExistingMB = (8 * @ExistingPages) / 1024;
'No clustered columnstore index on that table.';END
@TotalRows INT,
@TotalColumns INT,
@StatCols INT,
@DynCols INT,
@StatColBytes INT,
@DynColBytes INT,
@DynColBytesTracking INT,
@NullBitmapBytes INT,
@RowBytes INT,
@RowsPerPage INT,
@TotalPages INT,
@TotalSizeMB INT;

PRINT 'Table Name: ' + @SchName+'.'+@TblName;
PRINT 'Index Name: ' + @IxName;

--Total Rows
SELECT @TotalRows = SUM([rows]) FROM sys.partitions
WHERE OBJECT_SCHEMA_NAME(object_id)=@SchName AND OBJECT_NAME(object_id)=@TblName;
PRINT 'Total Rows: ' + CAST(@TotalRows AS VARCHAR(100));

--Total Columns
SELECT @TotalColumns = COUNT(*) FROM sys.columns
WHERE OBJECT_SCHEMA_NAME(object_id)=@SchName AND OBJECT_NAME(object_id)=@TblName;
PRINT 'Total Columns: ' + CAST(@TotalColumns AS VARCHAR(100));

--Total Static Length Columns
SELECT @StatCols = COUNT(*) FROM sys.columns c
INNER JOIN sys.types t ON t.system_type_id = c.system_type_id
WHERE OBJECT_SCHEMA_NAME(c.object_id)=@SchName AND OBJECT_NAME(c.object_id)=@TblName
AND t.name NOT LIKE 'var%' AND t.name NOT LIKE '%text';
PRINT 'Total Static Length Columns: ' + CAST(@StatCols AS VARCHAR(100));

--Total Variable Length Columns
SELECT @DynCols = COUNT(*) FROM sys.columns c
INNER JOIN sys.types t ON t.system_type_id = c.system_type_id
WHERE OBJECT_SCHEMA_NAME(c.object_id)=@SchName AND OBJECT_NAME(c.object_id)=@TblName
AND ( t.name LIKE 'var%' OR t.name LIKE '%text');
PRINT 'Total Variable Length Columns: ' + CAST(@DynCols AS VARCHAR(100));

--Total Static Length Column Bytes
SELECT @StatColBytes = ISNULL(SUM(c.max_length),0) FROM sys.columns c
INNER JOIN sys.types t ON t.system_type_id = c.system_type_id
WHERE OBJECT_SCHEMA_NAME(c.object_id)=@SchName AND OBJECT_NAME(c.object_id)=@TblName
AND t.name NOT LIKE 'var%' AND t.name NOT LIKE '%text';
PRINT 'Total Static Length Column Bytes: ' + CAST(@StatColBytes AS VARCHAR(100));

--Max Variable Length Column Bytes
SELECT @DynColBytes = ISNULL(SUM(c.max_length),0) FROM sys.columns c
INNER JOIN sys.types t ON t.system_type_id = c.system_type_id
WHERE OBJECT_SCHEMA_NAME(c.object_id)=@SchName AND OBJECT_NAME(c.object_id)=@TblName
AND ( t.name LIKE 'var%' OR t.name LIKE '%text');
PRINT 'Maximum Variable Length Column Bytes: ' + CAST(@DynColBytes AS VARCHAR(100));

--Total Variable Column Bytes with Tracking Bytes
SELECT @DynColBytesTracking = CASE WHEN (@DynCols > 0) THEN (2 + (@DynCols * 2) + @DynColBytes) ELSE 0 END;
PRINT 'Total Variable Length Column Bytes With Tracking: ' + CAST(@DynColBytesTracking AS VARCHAR(100));

--Null Bitmap Bytes
SELECT @NullBitmapBytes = 2 + ((@TotalColumns + 7) / 8);
PRINT 'Null Bitmap Bytes: ' + CAST(@NullBitmapBytes AS VARCHAR(100));

--Row Size Bytes (including header bytes)
SELECT @RowBytes = @StatColBytes + @DynColBytesTracking + @NullBitmapBytes + 4;
PRINT 'Row Bytes: ' + CAST(@RowBytes AS VARCHAR(100));

--Rows Per Page
SELECT @RowsPerPage = 8096 / (@RowBytes + 2);
PRINT 'Rows Per Page: ' + CAST(@RowsPerPage AS VARCHAR(100));

--Total Pages
SELECT @TotalPages = ROUND((@TotalRows * 1.0) / @RowsPerPage, 0);
PRINT 'Total Pages: ' + CAST(@TotalPages AS VARCHAR(100));

--Heaap size in megabytes
SELECT @TotalSizeMB = (8 * @TotalPages) / 1024;
PRINT 'Total Size, MB: ' + CAST(@TotalSizeMB AS VARCHAR(100));

--Current (compressed) values
PRINT 'Current (compressed) Pages: ' + CAST(@ExistingPages AS VARCHAR(100));
PRINT 'Current (compressed) Size, MB: ' + CAST(@ExistingMB AS VARCHAR(100));

--Compression Ratio, for fun...
PRINT 'Compression Ratio: ' + CAST((@TotalSizeMB*1.0/@ExistingMB) AS VARCHAR(100));


Here's some sample output:
Total Rows: 282749658
Total Columns: 79
Total Static Length Columns: 79
Total Variable Length Columns: 0
Total Static Length Column Bytes: 405
Maximum Variable Length Column Bytes: 0
Total Variable Length Column Bytes With Tracking: 0
Null Bitmap Bytes: 12
Row Bytes: 421
Rows Per Page: 19
Total Pages: 14881561
Total Size, MB: 116262
Current (compressed) Pages: 3211431
Current (compressed) Size, MB: 25089
Compression Ratio: 4.633983020447