It was baffling to see how readily and happily the non-technical folk (and some of the technicals, for shame) accepted this as a good enough answer. Instead of asking, why is this happening and how do we fix it, the execs just came to view "The Churn" as a necessary part of the data processing. "So, how long do we expect The Churn to take with this large a file?" "Can you get me an ETA for when The Churn will be finished?" Soon everybody was getting in on it...even people on the business side could understand The Churn. I mean, what could be more natural than that, you run an SSIS package, it works, and then you have a few hours for the DTEXEC process to sit and "churn"?
This kind of silly superstition, where we take something we don't understand and invent a name for it as a kind of acceptance, hobbles an IT organization, papering over ignorance instead of exposing it and digging deeper. It turns out in this case, as our appdev manager surmised, it was massively looped PRINT statements (completely unnecessary of course) in the stored procedures.
To demonstrate, I created a sample stored procedure that loops 100,000 times:
CREATE PROC usp_CHURN
AS
BEGIN
SET NOCOUNT ON;
DECLARE @i INT = 1;
DECLARE @text VARCHAR(200);
WHILE(@i < 100000)
BEGIN
SELECT @text='You are at increment number ' + CAST(@i AS VARCHAR(20));
--PRINT @text
SET @i += 1;
END
INSERT INTO TempTable SELECT GETDATE()
END
As is, with the PRINT statement commented out, this runs very fast (subsecond), as you would expect. The INSERT just lets you know when the proc is actually done.
So I created a DTSX package that does nothing more than run this process, and from a different server, created a batch file that outputs the time, then runs the package, then outputs the time again. Running the proc as is, well, as I say, there's nothing to it, it's just memory getting moved around. But when I uncomment the PRINT statement, things take a bit of a nosedive.
Comparing the package start time (10:41:33.49) to the timestamp logged in the table (10:41:35.010) we see that less than 2 seconds elapsed in the actual execution of the procedure. However, the package completed at 10:45:23.38. So two seconds to do the work, and almost four minutes for SSIS to handle all those PRINT statements getting pushed back up the chain. Now, imagine that scaled up to millions or billions, and you can understand why we had an apparently complete SSIS package chewing up one of the vCPUs on a box for days. Yes, days. And taking the PRINT statements out fixed it.
Tactical moral of the story: be very sparing and cautious when sticking PRINT statements into heavily looped sections of your code, unless absolutely necessary. Strategic moral of the story: stop accepting surface level explanations and lazy superstitions...dig in and figure it out.
No comments:
Post a Comment