Reading data from a CSV and pushing it to a Salesforce application

Question 1

I'm trying to write code that reads data from a CSV and pushes it to a Salesforce application using the API. My code processes the data in a for loop, but it takes a long time (3 hours) to run the function. What can I do to optimize my code to run faster?

Here's an example of my code which reads Patient Diagnosis data from a flatfile which us more than 200k records. Inside the for loop, I query the patient list which has 100k+ records, transform the object then add it to a list for bulk processing. My code looks like this:

Iterating over ptdiag which contains flatfile data

for (int i = 0; i < ptdiags.Count; i += BATCH_SIZE)
{
 var batchContents = SFToBTMapping.Bulk_PtDiag_Content(ptdiags.Skip(i).Take(BATCH_SIZE).ToList(),sfPatients);
 var batch = BulkUpsert(job.Id, batchContents);
}

Function that transforms the object. Here I query sfpatients to link a patientid to the diagnosis object

 public static string Bulk_PtDiag_Content(List<Ptdiag> ptdiags, List<SfPatient__c> sfpatients)
 {
 string res = "Patient__c,DiagKey__c,NickName__c" +
 ",Sequence__c,ShortDescr__c,PTDiagKey__c" + Environment.NewLine;
 foreach (var d in ptdiags)
 {
 var sfd = Map_BTSQL_Patientdiag_To_SF_Patientdiag(d);
 sfd.Patient__c = sfpatients.FirstOrDefault(c => c.PatientKey__c == d.Ptkey.ToString())?.Id;
 res += string.Join(",", sfd.Patient__c, sfd.DiagKey__c, sfd.NickName__c
 , sfd.Sequence__c, sfd.ShortDescr__c.Replace(",",""), sfd.PTDiagKey__c);
 if (ptdiags.Last() != d)
 res += Environment.NewLine;
 }
 return res;
 }

Method that creates a mapping for Ptdiag

 public static SfPatientDiag__c Map_BTSQL_Patientdiag_To_SF_Patientdiag(Ptdiag d)
 {
 return new SfPatientDiag__c
 {
 DiagKey__c = d.Diagkey.ToString(),
 Diagnosis__r = new SfDiagnosis__c { Diagnosis_Key__c = d.Diagkey.ToString() },
 NickName__c = d.Nickname,
 Patient__r = new SfPatient__c { PatientKey__c = d.Ptkey.ToString() },
 Sequence__c = d.Sequence != null ? Convert.ToDouble(d.Sequence) : 0,
 ShortDescr__c = d.Shortdescr,
 PTDiagKey__c = d.Ptdiagkey.ToString()
 };
 }

Question 2

If your code is processing 200k records and it takes 3h to do so, then 99% of that time will be in BulkUpsert method. Did you try comment it out and run your code to see the results?

Question 3

please share BulkUpsert code as well.

Question 4

Bulk Upsert really wasn't an issue. Querying a large object in a for loop is what was causing the overhead. I replaced list with a dictionary to improve the performance significantly. Now it takes less than 2 mins to complete.

Question 5

In addition to the post by @aepot, I made the following changes which reduced the completion time significantly.

I used a dictionary instead of a list to query patient Id. Querying a large list of objects in for loop is what really slowed down the processing. Here's what the code looks like now:

for (int i = 0; i < ptdiags.Count; i += BATCH_SIZE)
{
 var batchContents = SFToBTMapping.Bulk_PtDiag_Content(ptdiags.Skip(i).Take(BATCH_SIZE).ToList(), sfPatients.ToDictionary(p => p.PatientKey__c));
 var batch = BulkUpsert(job.Id, batchContents);
}
public static string Bulk_PtDiag_Content(List<Ptdiag> ptdiags, Dictionary<string, SfPatient__c> sfpatients)
 {
 var sb = new StringBuilder();
 sb.AppendLine("Patient__c,DiagKey__c,NickName__c,Sequence__c,ShortDescr__c,PTDiagKey__c");
 foreach (Ptdiag d in ptdiags)
 {
 SfPatientDiag__c sfd = Map_BTSQL_Patientdiag_To_SF_Patientdiag(d);
 sfd.Patient__c = sfpatients.GetValueOrDefault(d.Ptkey.ToString()) != null ? sfpatients[d.Ptkey.ToString()].Id : "";
 sb.Append(string.Join(",", sfd.Patient__c, sfd.DiagKey__c, sfd.NickName__c
 , sfd.Sequence__c, sfd.ShortDescr__c.Replace(",", ""), sfd.PTDiagKey__c));
 if (ptdiags.Last() != d)
 sb.AppendLine();
 }
 return sb.ToString();
 }

Question 6

try-catch is a bad practice in this case, use Dictionary.TryGetValue instead.

Question 7

Thanks for the suggestion. I have updated my code to use a dictionary null checker.

Question 8

sfpatients.GetValueOrDefault(d.Ptkey)?.Id ?? "" would be enough. One attempt to read the value is twice faster than two.

Ahmed Mujtaba Ahmed Mujtaba 1215 bronze badges · Answer 1 · 2021-05-15 11:17:43Z

In addition to the post by @aepot, I made the following changes which reduced the completion time significantly.

I used a dictionary instead of a list to query patient Id. Querying a large list of objects in for loop is what really slowed down the processing. Here's what the code looks like now:

for (int i = 0; i < ptdiags.Count; i += BATCH_SIZE)
{
 var batchContents = SFToBTMapping.Bulk_PtDiag_Content(ptdiags.Skip(i).Take(BATCH_SIZE).ToList(), sfPatients.ToDictionary(p => p.PatientKey__c));
 var batch = BulkUpsert(job.Id, batchContents);
}
public static string Bulk_PtDiag_Content(List<Ptdiag> ptdiags, Dictionary<string, SfPatient__c> sfpatients)
 {
 var sb = new StringBuilder();
 sb.AppendLine("Patient__c,DiagKey__c,NickName__c,Sequence__c,ShortDescr__c,PTDiagKey__c");
 foreach (Ptdiag d in ptdiags)
 {
 SfPatientDiag__c sfd = Map_BTSQL_Patientdiag_To_SF_Patientdiag(d);
 sfd.Patient__c = sfpatients.GetValueOrDefault(d.Ptkey.ToString()) != null ? sfpatients[d.Ptkey.ToString()].Id : "";
 sb.Append(string.Join(",", sfd.Patient__c, sfd.DiagKey__c, sfd.NickName__c
 , sfd.Sequence__c, sfd.ShortDescr__c.Replace(",", ""), sfd.PTDiagKey__c));
 if (ptdiags.Last() != d)
 sb.AppendLine();
 }
 return sb.ToString();
 }

try-catch is a bad practice in this case, use Dictionary.TryGetValue instead.
Thanks for the suggestion. I have updated my code to use a dictionary null checker.
sfpatients.GetValueOrDefault(d.Ptkey)?.Id ?? "" would be enough. One attempt to read the value is twice faster than two.

Stack Exchange Network

Reading data from a CSV and pushing it to a Salesforce application

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Reading data from a CSV and pushing it to a Salesforce application

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions