For a list of treeDataGroups
with 5 TreeDataGroup
objects, each treeDataGroup
object contains a maximum 100 dataAndVar
and 100 dataNotVar
. The execution of removeFullyUnexpectedData
takes 11466 ms. I would like to reduce that and I know the only way to reduce that the adopted algorithm and the way that I write the code. The treeDataGroups
parameter contains sorted elements. Can someone suggest how I can improve this algorithm? My goal is to reduce this execution time even just for few seconds only.
What I am trying to do is:
I have list of VG object (it is a TreeDataGroup
object), let us say:
I have 5 elements VG in my lTree
list: VG0
, VG1
, VG2
, VG3
, VG4
objects. Each object contains an andVar
String list and a notVar
String list. The firstListIsCompletelyContainedInMainList
method allows to check if list2
(the second parameter of firstListIsCompletelyContainedInMainList function
) contains list1
, which is the first parameter of the firstListIsCompletelyContainedInMainList
function.
Know that list1
(the first parameter of the firstListIsCompletelyContainedInMainList
function) can be null or empty and list2
(the second parameter of the firstListIsCompletelyContainedInMainList
function) as well can be null or empty. So if list1
is null or empty, method 1 returns true, else if list2
is null or empty, we return false. Otherwise, I need to check if list2
contains all elements of list1
. I need to compare each element of my list, with all other elements of my list except to the current element itself.
And if the current element if different from the other element:
If I start from the end of my list:
Let us say, for VG4
and VG3
:
- For x equal to 4, I need to compare
VG4
andVG3
--> check ifVG4.andVar
containsVG3.andVar
, then check ifVG3.notVar
containsVG4.notVar
. If both conditions are true, then I removeVG4
from thetreeDataGroups
list. - If the current element is
VG4
, I need to check if theVG4.andVar
string list contains allVG3.andVar
string list (for that I need to check if both lists are not null or empty as I explained above)). If yes (true), I will need to check if theVG3.notVar
string list contains allVG4.notVar
string lists. For that, I need to check if both lists are not null or empty as I explained above)). If yes (true), I need to removeVG4
from mytreeDataGroups
list. - Then x equal to 3, I need to compare
VG3
andVG4
--> check ifVG3.andVar
containsVG4.andVar
then check ifVG4.notVar
containsVG3.notVar
. If both conditions are true, then I removeVG3
fromtreeDataGroupslist
.
Then x equal to 2 I need to compare (VG2 and VG3).....
x equal to 2 I need to compare (VG2 and VG4)
Then x equal to 1 I need to compare (VG1 and VG2)
x equal to 1 I need to compare (VG1 and VG3)
x equal to 1 I need to compare (VG1 and VG4)
Then x equal to 0 I need to compare (VG0 and VG1)
x equal to 0 I need to compare (VG0 and VG2)
x equal to 0 I need to compare (VG0 and VG3)
x equal to 0 I need to compare (VG0 and VG4)
public List<TreeDataGroup> removeFullyUnexpectedData(List<TreeDataGroup> treeDataGroups) {
for (int x = treeDataGroups.size()-1; x >= 0; x--) {
TreeDataGroup treeDg1 = treeDataGroups.get(x);
for (int y = treeDataGroups.size()-1; y >= 0; y--) {
if (y != x) {
TreeDataGroup treeDg2 = treeDataGroups.get(y);
//treeDg1.getDataAndVar and treeDg1.getDataNotVar are a list of String
if (firstListIsCompletelyContainedInMainList(treeDg2.getDataAndVar(), treeDg1.getDataAndVar) {
if (firstListIsCompletelyContainedInMainList(treeDg1.getDataNotVar(), treeDg2.getDataNotVar) {
treeDataGroups.remove(x);
break;
}
}
}
}
}
return treeDataGroups;
}
// Are the items in the search list completely contained in the main list...
public static boolean firstListIsCompletelyContainedInMainList(List<String> list1, List<String> list2) {
if (isStringListNullOrEmpty(list1)) {
return true;
}
if (isStringListNullOrEmpty(list2)) {
return false;
}
for (String item : list1) {
if (!list2.contains(item)) {
return false;
}
}
return true;
}
public static boolean isStringListNullOrEmpty(List<String> stringList) {
if ((stringList == null || stringList.isEmpty())) {
return true;
} else {
return false;
}
}
-
1\$\begingroup\$ Could you please change the title to what your code is actually doing? Like what is the purpose of this code? The optimization part is kind of given here on Code Review. \$\endgroup\$holroy– holroy2017年04月20日 15:35:02 +00:00Commented Apr 20, 2017 at 15:35
-
\$\begingroup\$ @Jade_Layyne Just write a short summary of what your code is doing in your title please. (to ping people use @) \$\endgroup\$πάντα ῥεῖ– πάντα ῥεῖ2017年04月20日 16:29:23 +00:00Commented Apr 20, 2017 at 16:29
-
1\$\begingroup\$ I'm gonna say, this is a very low-level explanation. And wordy as hell. Most of it i could already figure out more easily by reading the code. :P You'd do well to zoom out a bit and tell us semantically what you're doing.. \$\endgroup\$cHao– cHao2017年04月20日 19:28:54 +00:00Commented Apr 20, 2017 at 19:28
2 Answers 2
getData
is no better thandoStuff()
as the name of a function.getDataNotVar
andgetDataAndVar
are only slightly better, and are useless to me, as i don't know whatData
orVar
you'reget
ting. :PfirstListIsCompletelyContainedInMainList
is reinventing the wheel. Collections have acontainsAll
method. Use that.public static boolean firstListIsCompletelyContainedInMainList(List<String> list1, List<String> list2) { if (isStringListNullOrEmpty(list1)) return true; if (isStringListNullOrEmpty(list1)) return false; return list2.containsAll(list1); }
But we can do better.
You're using a suboptimal type for the type of operation you're doing here. If you stuff the items into a
HashSet<String>
, and check against that, your check goes from O(NM) to O(N+M).public static boolean firstListIsCompletelyContainedInMainList(List<String> list1, List<String> list2) { if (isStringListNullOrEmpty(list1)) return true; if (isStringListNullOrEmpty(list1)) return false; Set<String> mainContents = new HashSet<>(list2); return mainContents.containsAll(list1); }
Ideally, you'd be using
Set<String>
rather thanList<String>
anyway, if this is the main operation you're performing on these "lists". I can't say whether that's a good idea in your particular case, though, as you haven't shown anything else you're doing.When you're returning the results of conditional tests, do it directly. Don't say
if (x) return true; else return false;
. Justreturn x;
.public static boolean isStringListNullOrEmpty(List<String> stringList) { return (stringList == null || stringList.isEmpty()); }
Ideally, though, you'd make it so
getDataAndVar()
andgetDataNotVar()
always return an empty collection rather thannull
. (It feels a bit hinky when you don't know whether you have an object or not.) Once you do that, this function can go away entirely...as can the null checks infirstListIsCompletelyContainedInMainList
. (You might keep an.isEmpty()
check if you're trying to optimize, but any savings might not be worth the extra code. If you care, measure it.)By the way, your names are extremely wordy. Maybe that's just a Java thing. :P But
firstListIsCompletelyContainedInMainList
could be replaced by, say,listContainsList
if you swaplist1
andlist2
around.
-
\$\begingroup\$ Thank you very much cHao but is there any way to get rid of this nested for loop (2 levels of for loop)? It is very expensive \$\endgroup\$Jade_Layyne– Jade_Layyne2017年04月20日 19:24:11 +00:00Commented Apr 20, 2017 at 19:24
-
\$\begingroup\$ You mean the one in
removeFullyUnexpectedData
, that i haven't bothered with yet because my ADD won't let me read and understand your description? :) Zoom out and describe the process, not the code. \$\endgroup\$cHao– cHao2017年04月20日 19:32:05 +00:00Commented Apr 20, 2017 at 19:32 -
\$\begingroup\$ My list contains of VG objects and a VG has andVar and notVar attributes(String list). My goal is to remove a VG element from my list if the following 2 checks above are satisfied 1- if andVar String list of each element contains andVar string list of one element in my list 2- if condition 1 is satisfied, I need to make sure that the notVar string list of my current element is contained in one notVar list of one VG element of my list. \$\endgroup\$Jade_Layyne– Jade_Layyne2017年04月20日 20:17:17 +00:00Commented Apr 20, 2017 at 20:17
-
\$\begingroup\$ @Jade_Layyne: Dunno what a "VG object" is. But without some idea of what the "andVar" and "notVar" lists are for/about, and the nature of the items within them, or what you're trying to do rather than how you're trying to do it, i can't say how to optimize your algorithm any further. It appears that your comparison operation is not commutative (ie: you can't assume a OP b == b OP a), which disqualifies a couple of possible optimizations. Further optimization would probably have to involve the
TreeDataGroup
class. \$\endgroup\$cHao– cHao2017年04月20日 20:48:26 +00:00Commented Apr 20, 2017 at 20:48 -
1\$\begingroup\$ What i care about right now are the higher-level details. Do these lists have to be lists? Are they sorted? Does order matter? Can they have duplicate values? What's the meaning of the comparison you're doing (ie: what does it tell you)? etc. \$\endgroup\$cHao– cHao2017年04月20日 21:15:57 +00:00Commented Apr 20, 2017 at 21:15
First of all, let me point out that removing elements from a list that you're iterating over is pretty dangerous in general.
The outer loop itself isn't a problem, since you're going over the indices starting at the end and going to the start. Removing the item at the current index doesn't have any effect on the indices that will be handled in the next iterations.
However, when we nest 2 for loops iterating over all the elements we would run into problems.
Let's say you have the list {A,B,C,D}. We ignore comparing D to D. Next we compare D to C and conclude that C can be removed. Next we compare D to B and conclude B can be removed. Finally compare D to A and do nothing. Now the next x will be 2. But since we removed B and C from our list the function list.get(2)
will throw an IndexOutOfBoundsException.
It seems you tried to solve this problem by using a break
statement to jump out of the inner for
loop whenever you remove an item. But this has introduced a new mistake.
Let's say in the same list you are comparing C to D and conclude that D can be removed. You now break out of the inner loop and continue with B. But by doing so you skipped comparing C to B and comparing C to A.
So then what can we do to solve this correctly? Instead of directly removing the VG's from the list we should just mark the VG's for removal in the first pass.
Then after the outer for loop has ended, we loop over all the marked VG's again and safely remove them from the list.
We can also simplify the loops a little bit if we check the contains both ways at once. The code will look like this:
List<TreeDataGroup> markedForRemoval = new ArrayList<>();
for (int x = treeDataGroups.size()-1; x >= 0; x--) {
TreeDataGroup treeDg1 = treeDataGroups.get(x);
//**important** notice the y>x instead of y>=0
for (int y = treeDataGroups.size()-1; y > x; y--) {
TreeDataGroup treeDg2 = treeDataGroups.get(y);
if (treeDg1 fully contains treeDg2) {
markedForRemoval.add(treeDg2);
}
if (treeDg2 fully contains treeDg1) {
markedForRemoval.add(treeDg1);
}
}
}
for(TreeDataGroup group : markedForRemoval){
treeDataGroups.remove(group);
}
If I understood your initial explanation correctly the lists inside the datagroups are sorted. So I think we can do both checks at the same time with something like this: (disclaimer: I did not test this code, you might have to correct it to get it to work). Note that I'm also assuming the lists are non-null but default to an empty list like @cHao suggested in his answer.
private void markGroupForRemoval(TreeDataGroup group1, TreeDataGroup group2, List<TreeDataGroup> markedGroups){
boolean markg1 = true;
boolean markg2 = true;
int i1 = 0;
int i2 = 0;
while(i1 < group1.getDataAndVar().length
&& i2 < group2.getDataAndVar().length
&& (markg1 || markg2)){
if(group1.getDataAndVar().get(i1) == group2.getDataAndVar().get(i2)){
i1++;
i2++;
} else if(group1.getDataAndVar().get(i1) < group2.getDataAndVar().get(i2)){ //note, you should change this to a string comparison I believe
//group 1 contains an element that group2 does not.
markg1 = false;
i1 ++;
} else {
//group 2 contains an element that group1 does not.
markg2 = false;
i2 ++;
}
}
//do the same thing for getDataNotVar, but mark the oposite.
i1 = 0;
i2 = 0;
while(i1 < group1.getDataNotVar().length
&& i2 < group2.getDataNotVar getDataNotVar().length
&& (markg1 || markg2)){
if(group1.getDataNotVar().get(i1) == group2.getDataNotVar().get(i2)){
i1++;
i2++;
} else if(group1.getDataNotVar().get(i1) < group2.getDataNotVar().get(i2)){ //note, you should change this to a string comparison I believe
//group 1 contains a not-element that group2 does not.
markg2 = false;
i1 ++;
} else {
//group 1 contains a not-element that group2 does not.
markg1 = false;
i2 ++;
}
}
if(markg1){
markedGroups.add(group1);
}
if(markg2){
markedGroups.add(group2);
}
}
Notice here that as soon as we marked both groups as still needed (so both markg1 and markg2 are false) then we quit the while loop early.
If you can get this to work it should speed things up a bit (or a lot, if the early exit while checking happens fast consistently).
You'll need to at least replace the <
with the correct comparison check. (StringComparitor?). Maybe group2.getDataAndVar().length should be group2.getDataAndVar().size() instead. And check the logic because I might have switched the cases around by accident.
If you're familiar with merge sort this algorithm should be relatively easy to understand.
EDIT: as per suggestion by @cHao, we're now comparing VG's even if they're already marked for removal. Since the checks themselves are rather expensive it's better to avoid that.
My first solution for this would be something like this:
List<TreeDataGroup> markedForRemoval = new ArrayList<>();
for (int x = treeDataGroups.size()-1; x >= 0; x--) {
TreeDataGroup treeDg1 = treeDataGroups.get(x);
for (int y = treeDataGroups.size()-1; y > x; y--) {
TreeDataGroup treeDg2 = treeDataGroups.get(y);
if(marmarkedForRemoval.contains(treeDG2) {
continue;
}
markGroupForRemoval(treeDG1, treeDG2, markedForRemoval);
if(markedForRemoval.contains(treeDG1){
break;
}
}
}
Where the check for treeDG2 results in skipping that one in the inner loop only (if we would break out of the loop we get the same missing checks like before). And the check for treeDG1 right after the marking means that since the VG of the outer loop is marked, there's no reason to continue checking that one against the rest of the inner loop.
But we can actually do a little better than this. Remember that I said that removing in a single for loop is safe? Let's use that fact and place the removal inside the outer for loop:
List<TreeDataGroup> markedForRemoval = new ArrayList<>();
for (int x = treeDataGroups.size()-1; x >= 0; x--) {
TreeDataGroup treeDg1 = treeDataGroups.get(x);
for (int y = treeDataGroups.size()-1; y > x; y--) {
TreeDataGroup treeDg2 = treeDataGroups.get(y);
markGroupForRemoval(treeDG1, treeDG2, markedForRemoval);
if(markedForRemoval.contains(treeDG1){
break;
}
}
for(TreeDataGroup tdg : markedForRemoval){
treeDataGroups.remove(tdg);
}
markedForRemoval.clear();
}
Now we're sure that we check all combinations but without doing any unnecessary checks.
I did however spot a mistake thanks to @cHao's comment. If the 2 VG's are exactly the same, both will be marked for removal. This is not what we want. So let's fix this at the end of the markGroupForRemoval
method
if(markg1){
markedGroups.add(group1);
return; //important. If both are equal, we shouldn't remove them both.
}
if(markg2){
markedGroups.add(group2);
}
}
-
\$\begingroup\$ For your first bit, i was worried about the outer loop falling off the end too. But the inner loop breaks when it's found something to remove, so you're removing a max of one item per inner-loop run. It's pretty ugly, but it's not as broken as it looks. :) \$\endgroup\$cHao– cHao2017年04月20日 22:44:56 +00:00Commented Apr 20, 2017 at 22:44
-
\$\begingroup\$ @cHao I did explain this in the post didn't I? That it is indeed not that broken, but you might be skipping some combinations that could result in removing groups. In practice this might indeed not be that bad if removing groups is rather rare. \$\endgroup\$Imus– Imus2017年04月21日 06:27:26 +00:00Commented Apr 21, 2017 at 6:27
-
\$\begingroup\$ "But it's the nested loop that might cause problems." \$\endgroup\$cHao– cHao2017年04月21日 14:24:10 +00:00Commented Apr 21, 2017 at 14:24
-
\$\begingroup\$ Is the problem not clear in my answer? In that case I should reword it, but I need to find out what is bad about it first. My answer contains 2 parts. The first assumes not having the
break
statement which can result in anIndexOutOfBoundsException
. The second part takes note of the break statement ensuring the code won't crash, but now we don't consider all possible combinations of VG objects anymore. Thus the result isn't entirely correct. \$\endgroup\$Imus– Imus2017年04月21日 14:32:19 +00:00Commented Apr 21, 2017 at 14:32 -
\$\begingroup\$ The bit about the
IndexOutOfBoundsException
is off, i think, and maybe out of place. The way it's worded makes it seem like that is an actual issue, which it isn't here. \$\endgroup\$cHao– cHao2017年04月21日 14:55:17 +00:00Commented Apr 21, 2017 at 14:55