6

Scenario: Data is received and written to database with timestamps. I need to process the raw data in the order that is received based on the time stamp and write it back to the database, different table, again maintaining the order based on the timestamp.

I came up with the following design: Created two queues, one for storing raw data from database, another for storing processed data before it's written back to DB. I have two threads, one reading to the Initial queue and another reading from Result queue. In between i spawn multiple threads to process data from Initial queue and write it to Result queue.

I have experimented with SortedList (manual locking) and BlockingCollection. I have used two approaches to process in parallel: Parallel.For(ForEach) and TaskFactory.Task.StartNew.

Each unit of data may take variable amount of time to process, based on several factors. One thread can still be processing the first data point while other threads are done with three or four datapoints each, messing up the timestamp order.

I have found out about OrderingPartitioner recently and i thought it would solve the problem, but following MSDNs example i can see, that it's not sorting the underlying collection either. May be i need to implement custom partitioner to order my collection of complex data types? or may be there's a better way of approaching the problem?

Any suggestions and/or links to articles discussing similar problem is highly appreciated.

MPelletier
16.7k18 gold badges89 silver badges140 bronze badges
asked Apr 14, 2011 at 17:50
0

3 Answers 3

5

Personally, I would at least try to start with using a BlockingCollection<T> for the input and a ConcurrentQueue<T> instance for the results.

I would use Parallel Linq to process the results. In order to preserve the order during your processing, you could use AsOrdered() on the PLINQ statement.

answered Apr 14, 2011 at 18:02

2 Comments

I would assume calling .AsParallel() method on plinq, would give me thread-safe collection of items? Or should i handle locking myself?
@Dimitri: AsParallel() called on an IEnumerable will handle the enumeration correctly, but you still need to handle any internal locking required while processing the data.
2

Have you considered PLINQ and AsOrdered()? It might be helpful for what you're trying to achieve. http://msdn.microsoft.com/en-us/library/dd460719.aspx

answered Apr 14, 2011 at 18:01

1 Comment

Thank you for the link, but i'm gonna have to mark Reed Copsey's reply as the answer
0

Maybe you've considered these things, but...

Why not just pass the timestamp to the database and then either let the database do the ordering or fix the ordering in the database after all processing threads have returned? Do the sql statements have to be executed sequentially?

PLINQ is great but I would try to avoid thread synchronization requirements and simply pass more ordering data to the database if you can.

answered Apr 14, 2011 at 18:27

1 Comment

When selecting i do use ordering in sql statement, but the output data cannot be copied back to DB using bulk insert, each point has to go back as individual insert and the id returned back to the application for further processing. I would love to minimize stress on SQL as much as possible

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.