SQL:
CREATE FUNCTION dbo.fnRandomForeNames ()
RETURNS VARCHAR(50)
AS
BEGIN
RETURN (
SELECT TOP 1 [FirstName]
FROM [tmp_ForeNames]
ORDER BY (SELECT new_id from GetNewID)
)
END
GO
Similar functions for dbo.fnRandomSurNames() etc.
UPDATE Table1
SET firstname = dbo.fnRandomForeNames(),
lastname = dbo.fnRandomSurNames(),
address1 = dbo.fnRandomAddress1(),
address2 = dbo.fnRandomAddress2(),
address3 = dbo.fnRandomAddress3(),
birthdate = DATEADD(DAY, ABS(CHECKSUM(NEWID()) % 3650), '1990-01-01')
My C# Code:
private void RunThis(string connString, StreamReader sr)
{
sr.BaseStream.Position = 0;
string sqlQuery = sr.ReadToEnd();
using (SqlConnection connection = new SqlConnection(connString))
{
Server server = new Server(new ServerConnection(connection));
server.ConnectionContext.StatementTimeout = 4200;
server.ConnectionContext.ExecuteNonQuery(sqlQuery);
}
sr.Close();
}
........
RunThis(e.Argument.ToString(), _updateClaim);
Where e.Argument.ToString()
is the connection string.
The CREATE FUNCTION
scripts are run earlier, take very little time to run.
Also, names are stored in tmp databases, these are entered in C# via arrays.
These also take very little time to run.
Table1 contains approx 140,000 rows and takes approx. 14 mins to complete.
I have also tried using parameterised SQL queries, skipping the tmp tables and SQL functions and instead creating the SQL query and executing it from the code, such as the following:
UPDATE Table1
SET lastname = '{0}',
firstname = '{1}',
birthdate = DATEADD(DAY, ABS(CHECKSUM(NEWID()) % 3650), '1990-01-01'),
address1 = '{2}',
address2 = '{3}',
address3 = '{4}'
WHERE u_id = '{6}'
And some C#:
using (SqlConnection connection = new SqlConnection(connString))
{
connection.Open();
for (int i = 0; i < arraySize; ++i)
{
string updateString = string.Format(updateString2, GetRandomSurname(), GetRandomForeName(), GetRandomAddress1(), GetRandomAddress2(), GetRandomAddress3(), "", ids[i]);
SqlCommand cmd = new SqlCommand(updateString, connection);
cmd.CommandType = CommandType.Text;
cmd.ExecuteNonQuery();
}
}
The latter method also taking upwards of 14 minutes.
Any ideas on how to cut down the time it takes to update the table?
2 Answers 2
Not sure what that ORDER BY (SELECT new_id from GetNewID)
, but comparing the following approaches, second is much faster and spends most of the time in COUNT(*), which could be pre-calculated.
SELECT TOP 1 name FROM master.sys.all_objects ORDER BY NEWID()
DECLARE @n int
SELECT @n = RAND() * (SELECT COUNT(*) FROM master.sys.all_objects)
SELECT name FROM (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) as n, name
FROM master.sys.all_objects
) AS names
WHERE n = @n
I guess you could make it even faster by materializing integer sequential id
inside your names tables and making a clustered index on that.
indexes! Index on new_id.
You say you're using temp tables, so I assume you're populating them all at once. Do a update statistics after you fill them.
Finally, why cant you say something like this?
select firstName from tmp_ForeNames where new_id = getNewId()
order by takes time so you should avoid it if possible.
string.Format()
this way is not. \$\endgroup\$