Speeding up Parallel.ForEach iterating through datatable and rendering report

Question 1

I am attempting to speed up the processing of 5000 rows received from the database, which I then render as a crystal report, export to a byte stream and save in a database table.

Currently I am using parallel.ForEach on a datatable. This uses 40 parallel processes that then sequentially iterates 125 (i.e 5000/40) records each of the 5000 rows.

It take approximately 5 minutes to do 5000 records at the moment. Is there any way I can speed this up?

DataTable dtBills = new DataTable("dtBills");//I used an OdbcDataAdapter to fill this with 5000 records
private void ParallelProcess()
{
 int numbills = 5000;
 int numprocesses = 40;
 int numsequential = numbills / numprocesses;
 int p=0;
 if (numbills < numprocesses)
 p = numbills;
 else
 p = numprocesses;
 Parallel.For(1, p+1, new ParallelOptions { MaxDegreeOfParallelism = 40 }, i =>
 {
 SequentialProcess(numsequential,i);
 });
}
private void SequentialProcess(int batch,int num)
{
 ReportDocument cryRpt = new ReportDocument();
 cryRpt.Load(Application.StartupPath + "\\" + "Bill.rpt");
 foreach(DataRow drow in dtBills.Rows.Cast<System.Data.DataRow>().Skip((num - 1) * batch).Take(batch).CopyToDataTable().Rows)
 {
 cryRpt.SetParameterValue(..... //here I set Crystal report Parameter values
 Stream stream = cryRpt.ExportToStream(CrystalDecisions.Shared.ExportFormatType.PortableDocFormat);
 byte[] contents = new byte[stream.Length];
 stream.Read(contents, 0, (int)stream.Length);
 stream.Close();
 if (!string.IsNullOrEmpty(sconnstr))
 {
 using (SqlConnection sconn = new SqlConnection(sconnstr))
 {
 String qry = @"INSERT INTO ....."
 using (SqlCommand cmd = new SqlCommand(qry, sconn))
 {
 cmd.CommandType = CommandType.Text;
 cmd.Parameters.AddWithValue(// Set values for insert here one of which will be contents
 cmd.Connection = sconn;
 sconn.Open();
 cmd.ExecuteNonQuery();
 sconn.Close();
 }
 }
 }
 }
 cryRpt.Close();
 cryRpt.Dispose();
}

Question 2

I am concerned with your hard-coded use of 40 processes. Can you always be sure that parallel.ForEach will always create the maximum of 40 processes, regardless of hardware?

Question 3

Hi Nick. I am really not sure. The current server i am testing on has 2 quad core Intel Xeon X5355 processors. I was playing up with the numbers and 40 seemed stable. when I go to 50 processes sometimes it gets a "load report failed error".

Question 4

My concern is if you were to ever execute this code on something that, for some reason or another, could not produce 40 processes for your program, your results would be incomplete. I wonder if it might be simpler to create one asynchronous call per task, instead of batching them? You could use Parallel LINQ, Tasks or the Thread Pool itself to ensure you have them evenly distributed across all available processes. The downside is the increased overhead of creating all of these threads, so it depends how intensive the sequential process is. The more intensive, the more efficient it will be.

Question 5

@NickUdell I think the results would still be complete. Parallel.For might not use MaxDegreeOfParallelism threads, but it will always perform all p iterations.

Question 6

The first thing here, you shouldn't be using directly the DataTable to read the results, use the method CreateDataReaderto create a Data Reader from it.

The difference is that a DataTable will check back on the database server on each iteration, while the DataReader will get a snapshot of your data and not check back anymore. That saves lots of time when dealing with a big number of records.

In this same channel, don't make the INSERT directly, but create it on memory, like an array or something (I think there is too something in the DataBase class that manages it in memory and commits later, but don't remember its name). If you stop dealing with the database one-at-a-time and doing in chunks, the speed of your code will improve.

Question 7

"The difference is that a DataTable will check back on the database server on each iteration" is not true as a DataTable is disconnected from the database.

Question 8

@Heslacher That disconnected bit is not totally true, because DataTable still has events for RowChanged, ColumnChanged, TableCleared, and allows to insert Columns and Rows in the DataTable, while DataReader has no events, it is a read-only data structure.

Question 9

But the DataTable does not check back on the database server on each iteration.

Question 10

I am also not sure what you mean by "The difference is that a DataTable will check back on the database server on each iteration". However I do agree with you on the inserts so I will try to bulk insert the data back to the database. Thanks for taking the time to assist

Question 11

Question though do you all think another datatable consisting of 5000 records each having a byte [] and other data is feasible? I already have one datatable with 5000 records.

Question 12

I would like to point out some other things too:

MaxDegreeOfParallelism sets a max value on the number of concurrent tasks, it is a limiter and does not give what you want, which is exactly 40 tasks. And from my experience, just let the .net runtime decide how many tasks to run concurrently based on the underlying hardware. see ParallelOptions.MaxDegreeOfParallelism for more info.
I assume the "open connection, execute, close connection" pattern is not only used here, so why not create a reusable general purpose function that accepts a query, a connection and params?
You could use the report instance inside a using. What happens if something goes wrong on the foreach loop and an exception is thrown? The report is not disposed.

fernando.reyes fernando.reyesfernando.reyes 2511 silver badge10 bronze badges · Answer 1 · 2014-11-27 14:05:43Z

1

\$\begingroup\$

The first thing here, you shouldn't be using directly the DataTable to read the results, use the method CreateDataReaderto create a Data Reader from it.

The difference is that a DataTable will check back on the database server on each iteration, while the DataReader will get a snapshot of your data and not check back anymore. That saves lots of time when dealing with a big number of records.

In this same channel, don't make the INSERT directly, but create it on memory, like an array or something (I think there is too something in the DataBase class that manages it in memory and commits later, but don't remember its name). If you stop dealing with the database one-at-a-time and doing in chunks, the speed of your code will improve.

Share

answered Nov 27, 2014 at 14:05

fernando.reyes's user avatar

fernando.reyes fernando.reyesfernando.reyes

2511 silver badge10 bronze badges

\$\endgroup\$

5

2

\$\begingroup\$ "The difference is that a DataTable will check back on the database server on each iteration" is not true as a DataTable is disconnected from the database. \$\endgroup\$

Heslacher
– Heslacher

2014年11月27日 14:37:59 +00:00
Commented Nov 27, 2014 at 14:37
\$\begingroup\$ @Heslacher That disconnected bit is not totally true, because DataTable still has events for RowChanged, ColumnChanged, TableCleared, and allows to insert Columns and Rows in the DataTable, while DataReader has no events, it is a read-only data structure. \$\endgroup\$

fernando.reyes
– fernando.reyes

2014年11月27日 14:52:51 +00:00
Commented Nov 27, 2014 at 14:52
1

\$\begingroup\$ But the DataTable does not check back on the database server on each iteration. \$\endgroup\$

Heslacher
– Heslacher

2014年11月27日 14:54:37 +00:00
Commented Nov 27, 2014 at 14:54
\$\begingroup\$ I am also not sure what you mean by "The difference is that a DataTable will check back on the database server on each iteration". However I do agree with you on the inserts so I will try to bulk insert the data back to the database. Thanks for taking the time to assist \$\endgroup\$

CoDeGiRl
– CoDeGiRl

2014年11月27日 16:15:13 +00:00
Commented Nov 27, 2014 at 16:15
\$\begingroup\$ Question though do you all think another datatable consisting of 5000 records each having a byte [] and other data is feasible? I already have one datatable with 5000 records. \$\endgroup\$

CoDeGiRl
– CoDeGiRl

2014年11月27日 16:22:42 +00:00
Commented Nov 27, 2014 at 16:22

Add a comment |

Mr Balanikas Mr BalanikasMr Balanikas 3111 silver badge5 bronze badges · Answer 2 · 2014-11-29 19:27:46Z

I would like to point out some other things too:

MaxDegreeOfParallelism sets a max value on the number of concurrent tasks, it is a limiter and does not give what you want, which is exactly 40 tasks. And from my experience, just let the .net runtime decide how many tasks to run concurrently based on the underlying hardware. see ParallelOptions.MaxDegreeOfParallelism for more info.
I assume the "open connection, execute, close connection" pattern is not only used here, so why not create a reusable general purpose function that accepts a query, a connection and params?
You could use the report instance inside a using. What happens if something goes wrong on the foreach loop and an exception is thrown? The report is not disposed.

Stack Exchange Network

Speeding up Parallel.ForEach iterating through datatable and rendering report

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Speeding up Parallel.ForEach iterating through datatable and rendering report

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions