6
\$\begingroup\$

I have a use case where I need to process about 80,000 emails per day. The client has an existing process that takes about 1/sec per email and is running out of time before the next batch comes in. I implemented a solution using the MailKit library and I am trying to implement some parallel tasks using TPL. I tried both a Pop3 client and IMAP client and the IMAP client seems to be faster. Plus, since IMAP supports multiple clients connecting at the same time so I figured that would be better for TPL.

I tested the code with a small sample of 100 emails and trying different numbers of parallel tasks. The results are below.

I'd appreciate any advice on my code below or other improvement suggestions on how to make it faster.

private static int _num_of_threads = 1;
private static int _num_of_messages = 100;
public override async void DoWork()
{
 try
 {
 IList<UniqueId> uids = new List<UniqueId>();
 uids = GetUids();
 //shrink the list for now
 uids = uids.Take(_num_of_messages).ToList();
 //create {_num_of_threads} lists
 List<List<UniqueId>> list_of_list_of_uids = new List<List<UniqueId>>();
 list_of_list_of_uids = SplitUids(uids);
 // create the TPL tasks
 Task[] tasks = new Task[_num_of_threads];
 for (int index = 0; index < tasks.Length; index++)
 {
 tasks[index] = Task.Factory.StartNew((Object obj) =>
 {
 int i = (int)obj;
 GetMessages(list_of_list_of_uids[i]);
 }, index);
 }
 Task.WaitAll(tasks);
 }
 catch (Exception ex)
 {
 }
}
private List<UniqueId> GetUids()
{
 IList<UniqueId> uids = new List<UniqueId>();
 using (var client = new ImapClient())
 {
 try
 {
 client.Connect("<emailserver>", 993, SecureSocketOptions.SslOnConnect);
 client.Authenticate(<username>, <password>);
 client.Inbox.Open(FolderAccess.ReadOnly);
 uids = client.Inbox.Search(SearchQuery.All);
 }
 finally
 {
 if (client.IsConnected)
 {
 client.Disconnect(true);
 }
 }
 }
 return uids.ToList();
}
private List<List<UniqueId>> SplitUids(IList<UniqueId> uids)
{
 List<List<UniqueId>> list_of_list_of_uids = new List<List<UniqueId>>();
 for (int list_index = 0; list_index < _num_of_threads; list_index++)
 {
 List<UniqueId> list_of_uids = new List<UniqueId>();
 list_of_list_of_uids.Add(list_of_uids);
 }
 // should have {_num_of_threads} lists now
 //now spread the uids across the lists
 int list_iterator = 0;
 for (int uid_index = 0; uid_index < uids.Count(); uid_index++)
 {
 list_of_list_of_uids[list_iterator].Add(uids[uid_index]);
 list_iterator++;
 if (list_iterator == _num_of_threads)
 {
 list_iterator = 0;
 }
 }
 return list_of_list_of_uids;
}
private async Task<int> GetMessages(IList<UniqueId> uids)
{
 int count = 0;
 using (var client = new ImapClient())
 {
 try
 {
 client.Connect("<emailserver>", 993, SecureSocketOptions.SslOnConnect);
 client.Authenticate(<username>, <password>);
 client.Inbox.Open(FolderAccess.ReadOnly);
 foreach (UniqueId uid in uids)
 {
 var message = client.Inbox.GetMessage(uid);
 if (message != null)
 {
 // process the email
 // process aggregates a bunch of data and sends the result off to a web service to be put into a DB.
 count++;
 }
 }
 }
 finally
 {
 if (client.IsConnected)
 {
 client.Disconnect(true);
 }
 }
 }
 return count;
}

Best Result:

1 Thread = ~54 seconds

2 Threads = ~21 seconds

3 Threads = ~15 seconds

4 Threads = ~14 seconds

10 Threads = ~14 seconds

t3chb0t
44.6k9 gold badges84 silver badges190 bronze badges
asked Jul 26, 2016 at 17:33
\$\endgroup\$
7
  • \$\begingroup\$ Thanks for doing the legwork to figure out how to efficiently parallelize this, Jay! \$\endgroup\$ Commented Jul 26, 2016 at 17:54
  • 1
    \$\begingroup\$ 80k emails per day? It looks like someone is abusing emails as an error log, isn't he? \$\endgroup\$ Commented Jul 26, 2016 at 18:27
  • \$\begingroup\$ I suggest you do some load testing before you run with it. A sample of 100 when you get 80K a day may not be significant. 1M might be a better test. \$\endgroup\$ Commented Jul 26, 2016 at 19:46
  • \$\begingroup\$ @-t3chb0t - Not an error log. The client has 80,000+ devices deployed in the field and they are sending status updates every day. Data like counters, measurements, etc. It's unfortunate they weren't designed to just send the data to a web service to be processed, but it is what it is for now. \$\endgroup\$ Commented Jul 27, 2016 at 19:59
  • \$\begingroup\$ @pacmaninbw - thanks, will do some testing on larger data sets as well. \$\endgroup\$ Commented Jul 27, 2016 at 20:00

2 Answers 2

2
\$\begingroup\$

My suggestions

  1. Use var. For example in this piece of code List<List<UniqueId>> list_of_list_of_uids = new List<List<UniqueId>>(); is will be much more readable
  2. I think the code under is redundant. Because using keyword (when it is compiled into IL code) already contains finally statement with Dispose inside. I am pretty sure that Dispose of ImapClient class will disconnect(if needed) itself, so you don't need to do it once again.

    finally
    {
     if (client.IsConnected)
     {
     client.Disconnect(true);
     }
    }
    
  3. I do not see how to significantly improve performance of your code. Only minor suggestion. The capacity of your lists in method SplitUids can be calculated in advance, so use constructor with capacity here IList<UniqueId> uids = new List<UniqueId>(); It will improve performance a little bit

If performance is critical in your case you should paste your for this // process aggregates a bunch of data and sends the result off to a web service to be put into a DB. I suppose these operations are quite time-consuming(because of database and web service) and might be improved.

Shelby115
1,96111 silver badges20 bronze badges
answered Jul 27, 2016 at 9:39
\$\endgroup\$
3
  • \$\begingroup\$ Thanks! The code behind the '// process' is actually a black box DLL call for now. I can't change it (yet), but I do think there is a lot of opportunity there as well. \$\endgroup\$ Commented Jul 27, 2016 at 20:04
  • \$\begingroup\$ @Disappointed Do you have a source for the finally/ImapClient.Dispose suggestion? I can't seem to find anything that explicitly states it disconnects. I would naturally assume this myself if it weren't for the Disconnect(true) parameter which sends a LOGOUT command. Which I would guess Dispose probably doesn't do that. mimekit.net/docs/html/… \$\endgroup\$ Commented Jul 27, 2016 at 20:41
  • \$\begingroup\$ @Shelby115 No, i haven't source. \$\endgroup\$ Commented Jul 27, 2016 at 20:51
1
\$\begingroup\$

Just some minor suggestions to add.

Double Initialization

You're initializing your lists twice. First on declaration, then when you call the function on the next line. Just initialize to the functions. e.g. var uids = GetUids();

var

(As Disappointed suggested) This is mostly preference but List<UniqueId> list_of_uids = new List<UniqueId>(); is less keystrokes and simpler as var list_of_uids = new List<UniqueId>(); without reducing any information/readability.

Naming

Typically in C# it is convention to use PascalCasing for class names and members and camelCasing for local variables.

Consistency is always helpful. So since you're already calling a(n) IList<UniqueId> 'uids' then why include those ugly underscores?

answered Jul 27, 2016 at 12:21
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.