Slow While Loop, Query Improvment Assistance
I am working on creating a Datawarehouse.
I have created a Time Dimension (Dim_Time), at 5-minute intervals. Hour Aggregations will have [Minutes] = NULL.
For the purpose of this example:
CREATE TABLE [dbo].[Dim_Time](
[TimeID] [int] IDENTITY(1,1) NOT NULL,
[StartDateTime] [datetime] NULL,
[Hour] [int] NULL,
[Minute] [int] NULL,
CONSTRAINT [PK_Dim_Time] PRIMARY KEY CLUSTERED
([TimeID] ASC)
) ON [PRIMARY]
GO
Then I have an Incoming Table, which is updated every 5 minutes from the OLTP Database.
CREATE TABLE [dbo].[Stg_IncomingQueue](
[IncomingID] [int] IDENTITY(1,1) NOT NULL,
[CustomerID] [int] NOT NULL,
[TimeID] [int] NULL,
[InsertTime] [datetime] NULL,
CONSTRAINT [PK_IncomingQueueMonitor] PRIMARY KEY CLUSTERED
([IncomingID] ASC)
) ON [PRIMARY]
GO
I then have the following While loop. It's purpose is to get the correct 5-Minute time slot (TimeID) that relates to a particular incoming row:
WHILE 0 < (SELECT COUNT(*) FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL)
BEGIN
SELECT TOP 1 @IncomingID = IncomingID, @RowInserTime = InsertTime
FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL
;WITH DimTime
AS (
SELECT MAX(TimeID) AS MaxTimeID FROM [dba_local].[dbo].[Dim_Time]
WHERE StartDateTime < @RowInserTime AND [Minute] IS NOT NULL
)
UPDATE [dba_local].[dbo].[Stg_IncomingQueue]
SET TimeID = (SELECT MaxTimeID FROM DimTime)
WHERE IncomingID = @IncomingID
END
It's such a simple process, and yet I cannot figure out a simpler way to update the TimeID. As per the CTE SELECT in the loop, I need to get the MAX(TimeID) where the StartDateTime is less then the rows InsertTime.
Because time is the only relationship, I am struggling with all options to do this in 1 query without the loop, but I feel it is possible
Please can someone help me out here either with a better option or confirming that this is the simplest way.
Thank you very much for your time and assistance.
Wade
sql-server t-sql sql-server-2016
add a comment |
I am working on creating a Datawarehouse.
I have created a Time Dimension (Dim_Time), at 5-minute intervals. Hour Aggregations will have [Minutes] = NULL.
For the purpose of this example:
CREATE TABLE [dbo].[Dim_Time](
[TimeID] [int] IDENTITY(1,1) NOT NULL,
[StartDateTime] [datetime] NULL,
[Hour] [int] NULL,
[Minute] [int] NULL,
CONSTRAINT [PK_Dim_Time] PRIMARY KEY CLUSTERED
([TimeID] ASC)
) ON [PRIMARY]
GO
Then I have an Incoming Table, which is updated every 5 minutes from the OLTP Database.
CREATE TABLE [dbo].[Stg_IncomingQueue](
[IncomingID] [int] IDENTITY(1,1) NOT NULL,
[CustomerID] [int] NOT NULL,
[TimeID] [int] NULL,
[InsertTime] [datetime] NULL,
CONSTRAINT [PK_IncomingQueueMonitor] PRIMARY KEY CLUSTERED
([IncomingID] ASC)
) ON [PRIMARY]
GO
I then have the following While loop. It's purpose is to get the correct 5-Minute time slot (TimeID) that relates to a particular incoming row:
WHILE 0 < (SELECT COUNT(*) FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL)
BEGIN
SELECT TOP 1 @IncomingID = IncomingID, @RowInserTime = InsertTime
FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL
;WITH DimTime
AS (
SELECT MAX(TimeID) AS MaxTimeID FROM [dba_local].[dbo].[Dim_Time]
WHERE StartDateTime < @RowInserTime AND [Minute] IS NOT NULL
)
UPDATE [dba_local].[dbo].[Stg_IncomingQueue]
SET TimeID = (SELECT MaxTimeID FROM DimTime)
WHERE IncomingID = @IncomingID
END
It's such a simple process, and yet I cannot figure out a simpler way to update the TimeID. As per the CTE SELECT in the loop, I need to get the MAX(TimeID) where the StartDateTime is less then the rows InsertTime.
Because time is the only relationship, I am struggling with all options to do this in 1 query without the loop, but I feel it is possible
Please can someone help me out here either with a better option or confirming that this is the simplest way.
Thank you very much for your time and assistance.
Wade
sql-server t-sql sql-server-2016
2
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
53 mins ago
add a comment |
I am working on creating a Datawarehouse.
I have created a Time Dimension (Dim_Time), at 5-minute intervals. Hour Aggregations will have [Minutes] = NULL.
For the purpose of this example:
CREATE TABLE [dbo].[Dim_Time](
[TimeID] [int] IDENTITY(1,1) NOT NULL,
[StartDateTime] [datetime] NULL,
[Hour] [int] NULL,
[Minute] [int] NULL,
CONSTRAINT [PK_Dim_Time] PRIMARY KEY CLUSTERED
([TimeID] ASC)
) ON [PRIMARY]
GO
Then I have an Incoming Table, which is updated every 5 minutes from the OLTP Database.
CREATE TABLE [dbo].[Stg_IncomingQueue](
[IncomingID] [int] IDENTITY(1,1) NOT NULL,
[CustomerID] [int] NOT NULL,
[TimeID] [int] NULL,
[InsertTime] [datetime] NULL,
CONSTRAINT [PK_IncomingQueueMonitor] PRIMARY KEY CLUSTERED
([IncomingID] ASC)
) ON [PRIMARY]
GO
I then have the following While loop. It's purpose is to get the correct 5-Minute time slot (TimeID) that relates to a particular incoming row:
WHILE 0 < (SELECT COUNT(*) FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL)
BEGIN
SELECT TOP 1 @IncomingID = IncomingID, @RowInserTime = InsertTime
FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL
;WITH DimTime
AS (
SELECT MAX(TimeID) AS MaxTimeID FROM [dba_local].[dbo].[Dim_Time]
WHERE StartDateTime < @RowInserTime AND [Minute] IS NOT NULL
)
UPDATE [dba_local].[dbo].[Stg_IncomingQueue]
SET TimeID = (SELECT MaxTimeID FROM DimTime)
WHERE IncomingID = @IncomingID
END
It's such a simple process, and yet I cannot figure out a simpler way to update the TimeID. As per the CTE SELECT in the loop, I need to get the MAX(TimeID) where the StartDateTime is less then the rows InsertTime.
Because time is the only relationship, I am struggling with all options to do this in 1 query without the loop, but I feel it is possible
Please can someone help me out here either with a better option or confirming that this is the simplest way.
Thank you very much for your time and assistance.
Wade
sql-server t-sql sql-server-2016
I am working on creating a Datawarehouse.
I have created a Time Dimension (Dim_Time), at 5-minute intervals. Hour Aggregations will have [Minutes] = NULL.
For the purpose of this example:
CREATE TABLE [dbo].[Dim_Time](
[TimeID] [int] IDENTITY(1,1) NOT NULL,
[StartDateTime] [datetime] NULL,
[Hour] [int] NULL,
[Minute] [int] NULL,
CONSTRAINT [PK_Dim_Time] PRIMARY KEY CLUSTERED
([TimeID] ASC)
) ON [PRIMARY]
GO
Then I have an Incoming Table, which is updated every 5 minutes from the OLTP Database.
CREATE TABLE [dbo].[Stg_IncomingQueue](
[IncomingID] [int] IDENTITY(1,1) NOT NULL,
[CustomerID] [int] NOT NULL,
[TimeID] [int] NULL,
[InsertTime] [datetime] NULL,
CONSTRAINT [PK_IncomingQueueMonitor] PRIMARY KEY CLUSTERED
([IncomingID] ASC)
) ON [PRIMARY]
GO
I then have the following While loop. It's purpose is to get the correct 5-Minute time slot (TimeID) that relates to a particular incoming row:
WHILE 0 < (SELECT COUNT(*) FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL)
BEGIN
SELECT TOP 1 @IncomingID = IncomingID, @RowInserTime = InsertTime
FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL
;WITH DimTime
AS (
SELECT MAX(TimeID) AS MaxTimeID FROM [dba_local].[dbo].[Dim_Time]
WHERE StartDateTime < @RowInserTime AND [Minute] IS NOT NULL
)
UPDATE [dba_local].[dbo].[Stg_IncomingQueue]
SET TimeID = (SELECT MaxTimeID FROM DimTime)
WHERE IncomingID = @IncomingID
END
It's such a simple process, and yet I cannot figure out a simpler way to update the TimeID. As per the CTE SELECT in the loop, I need to get the MAX(TimeID) where the StartDateTime is less then the rows InsertTime.
Because time is the only relationship, I am struggling with all options to do this in 1 query without the loop, but I feel it is possible
Please can someone help me out here either with a better option or confirming that this is the simplest way.
Thank you very much for your time and assistance.
Wade
sql-server t-sql sql-server-2016
sql-server t-sql sql-server-2016
edited 53 mins ago
Aaron Bertrand♦
152k18289489
152k18289489
asked 1 hour ago
WadeHWadeH
179110
179110
2
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
53 mins ago
add a comment |
2
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
53 mins ago
2
2
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
53 mins ago
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
53 mins ago
add a comment |
1 Answer
1
active
oldest
votes
I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));
This piece replaces your entire WHILE loop, with a single UPDATE statement, which is both more efficient, and easier to understand.
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;
The results, compared side-by-side with the Dim_Time table:
SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;
The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875-06-30 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857-07-01 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854-09-18 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860-05-31 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT() to convert the incoming datetime column into a time(0) value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:
Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue to include a persisted computed column, as in:
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
The update statement then becomes:
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
39 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to atime(0)value.
– Max Vernon
37 mins ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230890%2fslow-while-loop-query-improvment-assistance%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));
This piece replaces your entire WHILE loop, with a single UPDATE statement, which is both more efficient, and easier to understand.
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;
The results, compared side-by-side with the Dim_Time table:
SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;
The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875-06-30 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857-07-01 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854-09-18 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860-05-31 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT() to convert the incoming datetime column into a time(0) value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:
Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue to include a persisted computed column, as in:
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
The update statement then becomes:
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
39 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to atime(0)value.
– Max Vernon
37 mins ago
add a comment |
I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));
This piece replaces your entire WHILE loop, with a single UPDATE statement, which is both more efficient, and easier to understand.
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;
The results, compared side-by-side with the Dim_Time table:
SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;
The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875-06-30 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857-07-01 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854-09-18 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860-05-31 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT() to convert the incoming datetime column into a time(0) value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:
Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue to include a persisted computed column, as in:
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
The update statement then becomes:
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
39 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to atime(0)value.
– Max Vernon
37 mins ago
add a comment |
I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));
This piece replaces your entire WHILE loop, with a single UPDATE statement, which is both more efficient, and easier to understand.
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;
The results, compared side-by-side with the Dim_Time table:
SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;
The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875-06-30 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857-07-01 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854-09-18 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860-05-31 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT() to convert the incoming datetime column into a time(0) value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:
Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue to include a persisted computed column, as in:
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
The update statement then becomes:
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;
I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));
This piece replaces your entire WHILE loop, with a single UPDATE statement, which is both more efficient, and easier to understand.
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;
The results, compared side-by-side with the Dim_Time table:
SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;
The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875-06-30 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857-07-01 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854-09-18 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860-05-31 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT() to convert the incoming datetime column into a time(0) value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:
Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue to include a persisted computed column, as in:
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
The update statement then becomes:
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;
edited 33 mins ago
answered 54 mins ago
Max VernonMax Vernon
51.1k13112225
51.1k13112225
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
39 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to atime(0)value.
– Max Vernon
37 mins ago
add a comment |
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
39 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to atime(0)value.
– Max Vernon
37 mins ago
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
39 mins ago
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
39 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to a
time(0) value.– Max Vernon
37 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to a
time(0) value.– Max Vernon
37 mins ago
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230890%2fslow-while-loop-query-improvment-assistance%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
53 mins ago