I have a piece of code to migrate data from an old app to a new app. The data represents the user's websites.
In the old app the user only had one section for websites, on which you could add any website, even social links.
In the new app, the user can add personal websites, social links (only Facebook, Twitter and LinkedIn are considered social links) and what we call custom social networks (eg. Reddit, Meetup, Tumblr, etc)
This is the code:
public async Task ImportWebsites(int userId)
{
string[] socialNetworksStr =
{
"linkedin.",
"twitter.",
"facebook.",
};
string[] customSocialNetworksStr =
{ "deviantart.",
"viadeo.",
"reddit.",
"xing.",
"pinterest.",
"tumblr.",
};
List<UserLinkDTO> personalWebsites = new List<UserLinkDTO>();
List<UserSocialNetworkDTO> socialNetworks = new List<UserSocialNetworkDTO>();
List<UserLinkDTO> customSocialNetworks = new List<UserLinkDTO>();
List<UserLinkDTO> personalWebsitesRemoved = new List<UserLinkDTO>();
using (OldContext ctx = new OldContext())
{
if (await (ctx.UserSettings.AnyAsync(u => u.UserId == userId)) == false)
throw new RelatedEntityNotFoundException($"{userId} not be found!");
personalWebsites = await ctx.PersonalWebsite.AsNoTracking()
.Where(w => w.UserId == userId)
.Select(w => new UserLinkDTO
{
UserId = w.UserId,
Description = w.PersonalWebsiteDescription,
Url = w.PersonalWebsiteURL,
Name = w.PersonalWebsiteName
}).ToListAsync().ConfigureAwait(false);
}
for (int i = 0; i < personalWebsites.Count; i++)
{
string url = personalWebsites[i].Url;
if (!url.StartsWith("http://"))
{
personalWebsites[i].Url = "http://" + url;
}
}
if (personalWebsites.Any())
{
foreach (var personalWebsite in personalWebsites)
{
foreach (var socialNetwork in socialNetworksStr)
{
if (personalWebsite.Url.Contains(socialNetwork))
{
socialNetworks.Add(new UserSocialNetworkDTO() {
SocialNetworkId = (short)(Array.IndexOf(socialNetworksStr, socialNetwork) + 1),
Account = personalWebsite.Url
});
personalWebsitesRemoved.Add(personalWebsite);
}
}
foreach (var customSocialNetwork in customSocialNetworksStr)
{
if (personalWebsite.Url.Contains(customSocialNetwork))
{
customSocialNetworks.Add(personalWebsite);
personalWebsitesRemoved.Add(personalWebsite);
}
}
}
personalWebsites = personalWebsites.Except(personalWebsitesRemoved).ToList();
var personalInfo = await PersonalInfoService.GetPersonalInfo(userId);
personalInfo.Result.PersonalWebsites = personalWebsites;
if(socialNetworks.Any()) personalInfo.Result.SocialNetworks = socialNetworks;
if(customSocialNetworks.Any()) personalInfo.Result.CustomSocialNetworks = customSocialNetworks;
var response = await PersonalInfoService.SavePersonalInfo(personalInfo.Result);
}
}
var response = await PersonalInfoService.SavePersonalInfo(personalInfo.Result);
}
My code does its job, but it takes some time, on those foreaches, and I think it's a little bit redundant, and I think it may be optimzed. How would you improve this?
PS I think I included all the necessary info, sorry If I forgot something.
-
\$\begingroup\$ Could you post the complete method? I don't see any recursion there. \$\endgroup\$t3chb0t– t3chb0t2016年11月10日 18:20:29 +00:00Commented Nov 10, 2016 at 18:20
-
\$\begingroup\$ If I'm understanding it correctly, your foreach loops are O(n^2). \$\endgroup\$EJoshuaS - Stand with Ukraine– EJoshuaS - Stand with Ukraine2016年11月10日 19:24:44 +00:00Commented Nov 10, 2016 at 19:24
-
\$\begingroup\$ @t3chb0t there's not much more left, I've edited the code. Thanks for the interest \$\endgroup\$Dimitri– Dimitri2016年11月11日 09:31:47 +00:00Commented Nov 11, 2016 at 9:31
2 Answers 2
Reducing loops
You iterate over the personalWebistes
two times.
First here in the for
loop
for (var i = 0; i < personalWebsites.Count; i++)
and once agian in the foreach
foreach (var personalWebsite in personalWebsites)
but this is not necessary. You can do everything with only one loop by adding the http://
prefix in the second loop iterating over the personalWebsites
only once.
Btw. I'll remove this check first because it's not necessary:
if (personalWebsites.Any())
If the collection is empty then the loop simply won't run even once. You use a list here so Any()
doesn't do any harm but if you had a deferred IEnumerable
the it would execute it. If it performs some heavy work this could be very bad for ther performance.
I'll also remove the personalWebsitesRemoved
because you don't need this one either. You can save the results directly into the final list that I call websitesWithoutSocial
. I use this new list inside the else
block. If you've found the url you can break
the inner loop. There's no point in checking other addresses.
var websitesWithoutSocial = new List<UserLinkDTO>();
foreach (var personalWebsite in personalWebsites)
{
var url = personalWebsite.Url;
if (!url.StartsWith("http://"))
{
url = $"http://{url}";
}
var isSocial = false;
foreach (var socialNetwork in socialNetworks)
{
if ((isSocial = url.Contains(socialNetwork)))
{
socialNetworks.Add(new UserSocialNetworkDTO()
{
SocialNetworkId = (short)(Array.IndexOf(socialNetworks, socialNetwork) + 1),
Account = personalWebsite.Url
});
break;
}
}
if(isSocial) { continue; }
foreach (var customSocialNetwork in customSocialNetworks)
{
if ((isSocial = url.Contains(customSocialNetwork)))
{
customSocialNetworks.Add(personalWebsite);
break;
}
}
if(isSocial) { continue; }
websitesWithoutSocial.Add(personalWebsite);
}
Now that everything is already separated into multiple lists you also don't need the Except
and you can save the results right away:
var personalInfo = (await PersonalInfoService.GetPersonalInfo(userId)).Result;
personalInfo.PersonalWebsites = websitesWithoutSocial;
personalInfo.SocialNetworks = socialNetworks;
personalInfo.CustomSocialNetworks = customSocialNetworks;
var response = await PersonalInfoService.SavePersonalInfo(personalInfo);
Using Parallel.ForEach
If even with the simple improvements this is still slow you can try to use the Parallel.ForEach
but be careful. You need to use different collections that are thread-safe. This should further improve the performance.
Here's an example:
var websitesWithoutSocial = new BlockingCollection<UserLinkDTO>();
var socalNetworks = new BlockingCollection<UserLinkDTO>();
var customSocialNetworks = new BlockingCollection<UserLinkDTO>();
Parallel.ForEach(personalWebsites, website =>
{
var url = personalWebsite.Url;
if (!url.StartsWith("http://"))
{
url = $"http://{url}";
}
foreach (var socialNetwork in socialNetworks)
{
if (url.Contains(socialNetwork))
{
socialNetworks.Add(new UserSocialNetworkDTO()
{
SocialNetworkId = (short)(Array.IndexOf(socialNetworks, socialNetwork) + 1),
Account = personalWebsite.Url
});
return;
}
}
foreach (var customSocialNetwork in customSocialNetworks)
{
if (url.Contains(customSocialNetwork))
{
customSocialNetworks.Add(personalWebsite);
return;
}
}
websitesWithoutSocial.Add(personalWebsite);
});
You'll find the BlockingCollection<T>
in the System.Collections.Concurrent
namespace.
With the Parallel.ForEach
loop you don't need the isSocial
helper because the body is an Action
(an anonymous method, a delegate) so you can safely return
and skip the rest of it.
RelatedEntityNotFoundException
if (await (ctx.UserSettings.AnyAsync(u => u.UserId == userId)) == false) { throw new RelatedEntityNotFoundException($"{userId} not be found!"); }
This method shouldn't check the user-id. If it doesn't exist then the method won't do anything useful. Check the user-id before calling ImportWebsites
.
Any
Lastly you can rewrite the inner loops and the isSocial
workaround with LINQ:
if (socialNetworkNames.Any(x => url.Contains(x))
{
socialNetworks.Add(new UserSocialNetworkDTO()
{
SocialNetworkId = (short)(Array.IndexOf(socialNetworks, socialNetwork) + 1),
Account = personalWebsite.Url
});
continue;
}
if (customSocialNetworkNames.Any(x => url.Contains(x))
{
customSocialNetworks.Add(personalWebsite);
continue;
}
websitesWithoutSocial.Add(personalWebsite);
If you use this snippen in the Parallel.ForEach
you may also replace the continue
with return
.
-
\$\begingroup\$ Great answer, complete and educative. Do you need
if(isSocial) { continue; }
if it's a parallel foreach? (which would beif(isSocial) { return; }
in a parralel foreach) \$\endgroup\$Dimitri– Dimitri2016年11月17日 08:30:28 +00:00Commented Nov 17, 2016 at 8:30 -
\$\begingroup\$ @SGN you're right, you don't need it the
Parallel.ForEach
because the body is anAction
(an anonymous method) - but I suggest you use theAny
solution without the inner loops. I find this is easier to understand but even there in the parallel loop you can usereturn
. I'll fix the answer. \$\endgroup\$t3chb0t– t3chb0t2016年11月17日 08:36:37 +00:00Commented Nov 17, 2016 at 8:36 -
\$\begingroup\$ @SGN I hope you could improve the perfomance with this at least a little bit ;-) \$\endgroup\$t3chb0t– t3chb0t2016年11月17日 08:41:14 +00:00Commented Nov 17, 2016 at 8:41
-
\$\begingroup\$ @SGN yes, this can happen because there's a lot of synchronization inside it. This is hard to predict how good or bad it will perform because it depends on how the various urls are distributed. I included it for completnes so that you can try it out. I think there is no easier way to split those urls in various categories programatically. Should it be super-fast then you'll need to change the database models to store the relevant url part as a column and join it with another table that has them categorized. This would outperform everything else but I don't know if it's an option for you. \$\endgroup\$t3chb0t– t3chb0t2016年11月17日 08:49:57 +00:00Commented Nov 17, 2016 at 8:49
The following code is O(n^2)
. More precisely, it's Theta(n^2) because it'll always be exactly n
.
foreach (var personalWebsite in personalWebsites)
{
// You can use a HashSet to eliminate the need for the inner loops, which
// would reduce your computational complexity from Theta(n^2) to Theta(n) - see below
foreach (var socialNetwork in socialNetworksStr)
{
// You could "convert" both of these to some kind of "normal form" (see below) to eliminate the need for the relatively inefficient "Contains" and enable the use of a HashSet
if (personalWebsite.Url.Contains(socialNetwork))
{
socialNetworks.Add(new UserSocialNetworkDTO() { SocialNetworkId = (short)(Array.IndexOf(socialNetworksStr, socialNetwork) + 1), Account = personalWebsite.Url });
personalWebsitesRemoved.Add(personalWebsite);
// If you do stick with a loop here, presumably each personal web site only
// matches one social media site, so you can break out of the loop here to
// improve your computational complexity for this bit from Theta(n) to O(n)
}
}
// Same comments as above
foreach (var customSocialNetwork in customSocialNetworksStr)
{
if (personalWebsite.Url.Contains(customSocialNetwork))
{
customSocialNetworks.Add(personalWebsite);
personalWebsitesRemoved.Add(personalWebsite);
}
}
You can reduce this to O(n) by reducing the URLs to some kind of "normal form" (e.g. http://www.linkedin.com/in/some.person would just be "normalized" to "linkedin.com") and building a Dictionary (hash table) or HashSet. This would reduce the computational complexity from Theta(n^2) to Theta(n).
Also, even if you do stick with the "inner" foreach loops, there's no need to iterate over all of the social network sites every single time; I assume that each personal web site can only match one social media site, so if you find one that it matches it reduces the computational complexity from Theta(n^2) to O(n^2).
-
\$\begingroup\$ Thank you very much, i do admit, i do not know how to use a hashset, or how to get a O(n^2) from Theta(n^2), but everything sounds valid. I will get back to you once I acomplish your recomandations \$\endgroup\$Dimitri– Dimitri2016年11月11日 08:17:20 +00:00Commented Nov 11, 2016 at 8:17
-
1\$\begingroup\$ @SGN It's very similar to using a hash table/dictionary. In terms of the computational complexity, remember that Theta(n) means "exactly n" whereas O(n) means "at worst n". \$\endgroup\$EJoshuaS - Stand with Ukraine– EJoshuaS - Stand with Ukraine2016年11月11日 15:54:15 +00:00Commented Nov 11, 2016 at 15:54
Explore related questions
See similar questions with these tags.