Any way to speed up this UPDATE?

Question 1

SQL:

CREATE FUNCTION dbo.fnRandomForeNames ()
RETURNS VARCHAR(50) 
AS
BEGIN
RETURN (
 SELECT TOP 1 [FirstName]
 FROM [tmp_ForeNames] 
 ORDER BY (SELECT new_id from GetNewID)
 )
END
GO

Similar functions for dbo.fnRandomSurNames() etc.

UPDATE Table1
SET firstname = dbo.fnRandomForeNames(),
 lastname = dbo.fnRandomSurNames(),
 address1 = dbo.fnRandomAddress1(),
 address2 = dbo.fnRandomAddress2(),
 address3 = dbo.fnRandomAddress3(),
 birthdate = DATEADD(DAY, ABS(CHECKSUM(NEWID()) % 3650), '1990-01-01')

My C# Code:

 private void RunThis(string connString, StreamReader sr)
 {
 sr.BaseStream.Position = 0;
 string sqlQuery = sr.ReadToEnd();
 using (SqlConnection connection = new SqlConnection(connString))
 {
 Server server = new Server(new ServerConnection(connection));
 server.ConnectionContext.StatementTimeout = 4200;
 server.ConnectionContext.ExecuteNonQuery(sqlQuery);
 }
 sr.Close();
 }

........

 RunThis(e.Argument.ToString(), _updateClaim);

Where e.Argument.ToString() is the connection string.

The CREATE FUNCTION scripts are run earlier, take very little time to run. Also, names are stored in tmp databases, these are entered in C# via arrays. These also take very little time to run.

Table1 contains approx 140,000 rows and takes approx. 14 mins to complete.

I have also tried using parameterised SQL queries, skipping the tmp tables and SQL functions and instead creating the SQL query and executing it from the code, such as the following:

UPDATE Table1
SET lastname = '{0}',
 firstname = '{1}',
 birthdate = DATEADD(DAY, ABS(CHECKSUM(NEWID()) % 3650), '1990-01-01'),
 address1 = '{2}',
 address2 = '{3}',
 address3 = '{4}'
 WHERE u_id = '{6}'

And some C#:

 using (SqlConnection connection = new SqlConnection(connString))
 {
 connection.Open();
 for (int i = 0; i < arraySize; ++i)
 {
 string updateString = string.Format(updateString2, GetRandomSurname(), GetRandomForeName(), GetRandomAddress1(), GetRandomAddress2(), GetRandomAddress3(), "", ids[i]);
 SqlCommand cmd = new SqlCommand(updateString, connection);
 cmd.CommandType = CommandType.Text;
 cmd.ExecuteNonQuery();
 }
 }

The latter method also taking upwards of 14 minutes.

Any ideas on how to cut down the time it takes to update the table?

Question 2

Just a note: what you call a "parametrised query" is not actually it. In parametrised query you refer to parameters like @id, and add SqlParameter objects when running, not use string.Format.

Question 3

To add to @ElDog's comment, using actual parametrized queries is a good idea, using string.Format() this way is not.

Question 4

Additionally, using a real parameterized query is likely to cause a significant speed-up, because SQL will only have to calculate the execution plan the first time it is run. With string.Format building, it has to re-calculate every time the values change. Parsing the query in this case is likely to be the most expensive step of executing the query.

Question 5

Just as an update to this, I have tried using parameterised queries(correctly) but have come into the issue of only being allowed a max of 2100 queries per execution, I have tried to get around this by breaking it up into seperate queries but I couldn't get it to work. Will post code later. Thanks for replies.

Question 6

Not sure what that ORDER BY (SELECT new_id from GetNewID), but comparing the following approaches, second is much faster and spends most of the time in COUNT(*), which could be pre-calculated.

SELECT TOP 1 name FROM master.sys.all_objects ORDER BY NEWID()
DECLARE @n int
SELECT @n = RAND() * (SELECT COUNT(*) FROM master.sys.all_objects)
SELECT name FROM (
 SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) as n, name
 FROM master.sys.all_objects
) AS names
WHERE n = @n

I guess you could make it even faster by materializing integer sequential id inside your names tables and making a clustered index on that.

Question 7

indexes! Index on new_id.

You say you're using temp tables, so I assume you're populating them all at once. Do a update statistics after you fill them.

Finally, why cant you say something like this?

 select firstName from tmp_ForeNames where new_id = getNewId()

order by takes time so you should avoid it if possible.

Eugene Ryabtsev Eugene Ryabtsev 2451 silver badge7 bronze badges · Answer 1 · 2012-09-20 09:22:34Z

Not sure what that ORDER BY (SELECT new_id from GetNewID), but comparing the following approaches, second is much faster and spends most of the time in COUNT(*), which could be pre-calculated.

SELECT TOP 1 name FROM master.sys.all_objects ORDER BY NEWID()
DECLARE @n int
SELECT @n = RAND() * (SELECT COUNT(*) FROM master.sys.all_objects)
SELECT name FROM (
 SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) as n, name
 FROM master.sys.all_objects
) AS names
WHERE n = @n

I guess you could make it even faster by materializing integer sequential id inside your names tables and making a clustered index on that.

radarbob radarbob 8,21921 silver badges35 bronze badges · Answer 2 · 2012-08-23 14:37:43Z

indexes! Index on new_id.

You say you're using temp tables, so I assume you're populating them all at once. Do a update statistics after you fill them.

Finally, why cant you say something like this?

 select firstName from tmp_ForeNames where new_id = getNewId()

order by takes time so you should avoid it if possible.

Stack Exchange Network

Any way to speed up this UPDATE?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Any way to speed up this UPDATE?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions