I'm trying to write code that reads data from a CSV and pushes it to a Salesforce application using the API. My code processes the data in a for loop, but it takes a long time (3 hours) to run the function. What can I do to optimize my code to run faster?
Here's an example of my code which reads Patient Diagnosis data from a flatfile which us more than 200k records. Inside the for loop, I query the patient list which has 100k+ records, transform the object then add it to a list for bulk processing. My code looks like this:
Iterating over ptdiag which contains flatfile data
for (int i = 0; i < ptdiags.Count; i += BATCH_SIZE)
{
var batchContents = SFToBTMapping.Bulk_PtDiag_Content(ptdiags.Skip(i).Take(BATCH_SIZE).ToList(),sfPatients);
var batch = BulkUpsert(job.Id, batchContents);
}
Function that transforms the object. Here I query sfpatients to link a patientid to the diagnosis object
public static string Bulk_PtDiag_Content(List<Ptdiag> ptdiags, List<SfPatient__c> sfpatients)
{
string res = "Patient__c,DiagKey__c,NickName__c" +
",Sequence__c,ShortDescr__c,PTDiagKey__c" + Environment.NewLine;
foreach (var d in ptdiags)
{
var sfd = Map_BTSQL_Patientdiag_To_SF_Patientdiag(d);
sfd.Patient__c = sfpatients.FirstOrDefault(c => c.PatientKey__c == d.Ptkey.ToString())?.Id;
res += string.Join(",", sfd.Patient__c, sfd.DiagKey__c, sfd.NickName__c
, sfd.Sequence__c, sfd.ShortDescr__c.Replace(",",""), sfd.PTDiagKey__c);
if (ptdiags.Last() != d)
res += Environment.NewLine;
}
return res;
}
Method that creates a mapping for Ptdiag
public static SfPatientDiag__c Map_BTSQL_Patientdiag_To_SF_Patientdiag(Ptdiag d)
{
return new SfPatientDiag__c
{
DiagKey__c = d.Diagkey.ToString(),
Diagnosis__r = new SfDiagnosis__c { Diagnosis_Key__c = d.Diagkey.ToString() },
NickName__c = d.Nickname,
Patient__r = new SfPatient__c { PatientKey__c = d.Ptkey.ToString() },
Sequence__c = d.Sequence != null ? Convert.ToDouble(d.Sequence) : 0,
ShortDescr__c = d.Shortdescr,
PTDiagKey__c = d.Ptdiagkey.ToString()
};
}
1 Answer 1
In addition to the post by @aepot, I made the following changes which reduced the completion time significantly.
I used a dictionary instead of a list to query patient Id. Querying a large list of objects in for loop is what really slowed down the processing. Here's what the code looks like now:
for (int i = 0; i < ptdiags.Count; i += BATCH_SIZE)
{
var batchContents = SFToBTMapping.Bulk_PtDiag_Content(ptdiags.Skip(i).Take(BATCH_SIZE).ToList(), sfPatients.ToDictionary(p => p.PatientKey__c));
var batch = BulkUpsert(job.Id, batchContents);
}
public static string Bulk_PtDiag_Content(List<Ptdiag> ptdiags, Dictionary<string, SfPatient__c> sfpatients)
{
var sb = new StringBuilder();
sb.AppendLine("Patient__c,DiagKey__c,NickName__c,Sequence__c,ShortDescr__c,PTDiagKey__c");
foreach (Ptdiag d in ptdiags)
{
SfPatientDiag__c sfd = Map_BTSQL_Patientdiag_To_SF_Patientdiag(d);
sfd.Patient__c = sfpatients.GetValueOrDefault(d.Ptkey.ToString()) != null ? sfpatients[d.Ptkey.ToString()].Id : "";
sb.Append(string.Join(",", sfd.Patient__c, sfd.DiagKey__c, sfd.NickName__c
, sfd.Sequence__c, sfd.ShortDescr__c.Replace(",", ""), sfd.PTDiagKey__c));
if (ptdiags.Last() != d)
sb.AppendLine();
}
return sb.ToString();
}
-
\$\begingroup\$
try-catch
is a bad practice in this case, useDictionary.TryGetValue
instead. \$\endgroup\$aepot– aepot2021年05月15日 11:36:27 +00:00Commented May 15, 2021 at 11:36 -
\$\begingroup\$ Thanks for the suggestion. I have updated my code to use a dictionary null checker. \$\endgroup\$Ahmed Mujtaba– Ahmed Mujtaba2021年05月15日 13:53:40 +00:00Commented May 15, 2021 at 13:53
-
\$\begingroup\$
sfpatients.GetValueOrDefault(d.Ptkey)?.Id ?? ""
would be enough. One attempt to read the value is twice faster than two. \$\endgroup\$aepot– aepot2021年05月15日 13:57:19 +00:00Commented May 15, 2021 at 13:57
BulkUpsert
method. Did you try comment it out and run your code to see the results? \$\endgroup\$BulkUpsert
code as well. \$\endgroup\$