I am attempting to speed up the processing of 5000 rows received from the database, which I then render as a crystal report, export to a byte stream and save in a database table.
Currently I am using parallel.ForEach
on a datatable. This uses 40 parallel processes that then sequentially iterates 125 (i.e 5000/40) records each of the 5000 rows.
It take approximately 5 minutes to do 5000 records at the moment. Is there any way I can speed this up?
DataTable dtBills = new DataTable("dtBills");//I used an OdbcDataAdapter to fill this with 5000 records
private void ParallelProcess()
{
int numbills = 5000;
int numprocesses = 40;
int numsequential = numbills / numprocesses;
int p=0;
if (numbills < numprocesses)
p = numbills;
else
p = numprocesses;
Parallel.For(1, p+1, new ParallelOptions { MaxDegreeOfParallelism = 40 }, i =>
{
SequentialProcess(numsequential,i);
});
}
private void SequentialProcess(int batch,int num)
{
ReportDocument cryRpt = new ReportDocument();
cryRpt.Load(Application.StartupPath + "\\" + "Bill.rpt");
foreach(DataRow drow in dtBills.Rows.Cast<System.Data.DataRow>().Skip((num - 1) * batch).Take(batch).CopyToDataTable().Rows)
{
cryRpt.SetParameterValue(..... //here I set Crystal report Parameter values
Stream stream = cryRpt.ExportToStream(CrystalDecisions.Shared.ExportFormatType.PortableDocFormat);
byte[] contents = new byte[stream.Length];
stream.Read(contents, 0, (int)stream.Length);
stream.Close();
if (!string.IsNullOrEmpty(sconnstr))
{
using (SqlConnection sconn = new SqlConnection(sconnstr))
{
String qry = @"INSERT INTO ....."
using (SqlCommand cmd = new SqlCommand(qry, sconn))
{
cmd.CommandType = CommandType.Text;
cmd.Parameters.AddWithValue(// Set values for insert here one of which will be contents
cmd.Connection = sconn;
sconn.Open();
cmd.ExecuteNonQuery();
sconn.Close();
}
}
}
}
cryRpt.Close();
cryRpt.Dispose();
}
2 Answers 2
The first thing here, you shouldn't be using directly the DataTable
to read the results, use the method CreateDataReader
to create a Data Reader from it.
The difference is that a DataTable
will check back on the database server on each iteration, while the DataReader
will get a snapshot of your data and not check back anymore. That saves lots of time when dealing with a big number of records.
In this same channel, don't make the INSERT
directly, but create it on memory, like an array or something (I think there is too something in the DataBase class that manages it in memory and commits later, but don't remember its name). If you stop dealing with the database one-at-a-time and doing in chunks, the speed of your code will improve.
-
2\$\begingroup\$ "The difference is that a DataTable will check back on the database server on each iteration" is not true as a DataTable is disconnected from the database. \$\endgroup\$Heslacher– Heslacher2014年11月27日 14:37:59 +00:00Commented Nov 27, 2014 at 14:37
-
\$\begingroup\$ @Heslacher That disconnected bit is not totally true, because DataTable still has events for RowChanged, ColumnChanged, TableCleared, and allows to insert Columns and Rows in the DataTable, while DataReader has no events, it is a read-only data structure. \$\endgroup\$fernando.reyes– fernando.reyes2014年11月27日 14:52:51 +00:00Commented Nov 27, 2014 at 14:52
-
1\$\begingroup\$ But the DataTable does not check back on the database server on each iteration. \$\endgroup\$Heslacher– Heslacher2014年11月27日 14:54:37 +00:00Commented Nov 27, 2014 at 14:54
-
\$\begingroup\$ I am also not sure what you mean by "The difference is that a DataTable will check back on the database server on each iteration". However I do agree with you on the inserts so I will try to bulk insert the data back to the database. Thanks for taking the time to assist \$\endgroup\$CoDeGiRl– CoDeGiRl2014年11月27日 16:15:13 +00:00Commented Nov 27, 2014 at 16:15
-
\$\begingroup\$ Question though do you all think another datatable consisting of 5000 records each having a byte [] and other data is feasible? I already have one datatable with 5000 records. \$\endgroup\$CoDeGiRl– CoDeGiRl2014年11月27日 16:22:42 +00:00Commented Nov 27, 2014 at 16:22
I would like to point out some other things too:
- MaxDegreeOfParallelism sets a max value on the number of concurrent tasks, it is a limiter and does not give what you want, which is exactly 40 tasks. And from my experience, just let the .net runtime decide how many tasks to run concurrently based on the underlying hardware. see ParallelOptions.MaxDegreeOfParallelism for more info.
- I assume the "open connection, execute, close connection" pattern is not only used here, so why not create a reusable general purpose function that accepts a query, a connection and params?
- You could use the report instance inside a using. What happens if something goes wrong on the foreach loop and an exception is thrown? The report is not disposed.
Explore related questions
See similar questions with these tags.
Parallel.For
might not useMaxDegreeOfParallelism
threads, but it will always perform allp
iterations. \$\endgroup\$