3
\$\begingroup\$

SQL:

CREATE FUNCTION dbo.fnRandomForeNames ()
RETURNS VARCHAR(50) 
AS
BEGIN
RETURN (
 SELECT TOP 1 [FirstName]
 FROM [tmp_ForeNames] 
 ORDER BY (SELECT new_id from GetNewID)
 )
END
GO

Similar functions for dbo.fnRandomSurNames() etc.

UPDATE Table1
SET firstname = dbo.fnRandomForeNames(),
 lastname = dbo.fnRandomSurNames(),
 address1 = dbo.fnRandomAddress1(),
 address2 = dbo.fnRandomAddress2(),
 address3 = dbo.fnRandomAddress3(),
 birthdate = DATEADD(DAY, ABS(CHECKSUM(NEWID()) % 3650), '1990-01-01')

My C# Code:

 private void RunThis(string connString, StreamReader sr)
 {
 sr.BaseStream.Position = 0;
 string sqlQuery = sr.ReadToEnd();
 using (SqlConnection connection = new SqlConnection(connString))
 {
 Server server = new Server(new ServerConnection(connection));
 server.ConnectionContext.StatementTimeout = 4200;
 server.ConnectionContext.ExecuteNonQuery(sqlQuery);
 }
 sr.Close();
 }

........

 RunThis(e.Argument.ToString(), _updateClaim);

Where e.Argument.ToString() is the connection string.

The CREATE FUNCTION scripts are run earlier, take very little time to run. Also, names are stored in tmp databases, these are entered in C# via arrays. These also take very little time to run.

Table1 contains approx 140,000 rows and takes approx. 14 mins to complete.

I have also tried using parameterised SQL queries, skipping the tmp tables and SQL functions and instead creating the SQL query and executing it from the code, such as the following:

UPDATE Table1
SET lastname = '{0}',
 firstname = '{1}',
 birthdate = DATEADD(DAY, ABS(CHECKSUM(NEWID()) % 3650), '1990-01-01'),
 address1 = '{2}',
 address2 = '{3}',
 address3 = '{4}'
 WHERE u_id = '{6}'

And some C#:

 using (SqlConnection connection = new SqlConnection(connString))
 {
 connection.Open();
 for (int i = 0; i < arraySize; ++i)
 {
 string updateString = string.Format(updateString2, GetRandomSurname(), GetRandomForeName(), GetRandomAddress1(), GetRandomAddress2(), GetRandomAddress3(), "", ids[i]);
 SqlCommand cmd = new SqlCommand(updateString, connection);
 cmd.CommandType = CommandType.Text;
 cmd.ExecuteNonQuery();
 }
 }

The latter method also taking upwards of 14 minutes.

Any ideas on how to cut down the time it takes to update the table?

Phrancis
20.5k6 gold badges69 silver badges155 bronze badges
asked Aug 23, 2012 at 9:43
\$\endgroup\$
4
  • 3
    \$\begingroup\$ Just a note: what you call a "parametrised query" is not actually it. In parametrised query you refer to parameters like @id, and add SqlParameter objects when running, not use string.Format. \$\endgroup\$ Commented Aug 23, 2012 at 14:22
  • 1
    \$\begingroup\$ To add to @ElDog's comment, using actual parametrized queries is a good idea, using string.Format() this way is not. \$\endgroup\$ Commented Aug 23, 2012 at 14:46
  • 1
    \$\begingroup\$ Additionally, using a real parameterized query is likely to cause a significant speed-up, because SQL will only have to calculate the execution plan the first time it is run. With string.Format building, it has to re-calculate every time the values change. Parsing the query in this case is likely to be the most expensive step of executing the query. \$\endgroup\$ Commented Aug 23, 2012 at 18:08
  • \$\begingroup\$ Just as an update to this, I have tried using parameterised queries(correctly) but have come into the issue of only being allowed a max of 2100 queries per execution, I have tried to get around this by breaking it up into seperate queries but I couldn't get it to work. Will post code later. Thanks for replies. \$\endgroup\$ Commented Aug 27, 2012 at 10:39

2 Answers 2

3
\$\begingroup\$

Not sure what that ORDER BY (SELECT new_id from GetNewID), but comparing the following approaches, second is much faster and spends most of the time in COUNT(*), which could be pre-calculated.

SELECT TOP 1 name FROM master.sys.all_objects ORDER BY NEWID()
DECLARE @n int
SELECT @n = RAND() * (SELECT COUNT(*) FROM master.sys.all_objects)
SELECT name FROM (
 SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) as n, name
 FROM master.sys.all_objects
) AS names
WHERE n = @n

I guess you could make it even faster by materializing integer sequential id inside your names tables and making a clustered index on that.

answered Sep 20, 2012 at 9:22
\$\endgroup\$
2
\$\begingroup\$

indexes! Index on new_id.

You say you're using temp tables, so I assume you're populating them all at once. Do a update statistics after you fill them.

Finally, why cant you say something like this?

 select firstName from tmp_ForeNames where new_id = getNewId()

order by takes time so you should avoid it if possible.

answered Aug 23, 2012 at 14:37
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.