So I have the following code that takes the input of two arrays, and apply some queries to match elements from DBpediaClassesArray
with elements from YagoClassesArray
, then it returns the number of elements that are similar in the two ArrayLists.
public static void get_ND_Matches() throws IOException{
@SuppressWarnings("rawtypes")
List<String> DBpediaClassesArray = new ArrayList<String>();
List<String> YagoClassesArray = new ArrayList<String>();
DBpediaClassesArray = new ArrayList<String>( ClassesRetrieval.getDBpediaClasses());
YagoClassesArray = new ArrayList<String>( ClassesRetrieval.fileToArrayListYago());
String YagoClassWithMaxMatches="";
HashMap<String,Integer> mappingYagoLabelMaxValue = new HashMap<String,Integer>();
int NumberOfCommonInstances;
HashMap<String,ArrayList<String>> MappingYagoClassWithInstances = new HashMap<>();
for(String yagoClass:YagoClassesArray){
MappingYagoClassWithInstances.put(yagoClass, GetYagooClassInstances(yagoClass));
System.out.println("Done for : "+yagoClass );
}
for(String dbClass:DBpediaClassesArray){
ArrayList<String> DBpediaInstancesArray = GetDBpediaClassInstances(dbClass);
for(Map.Entry<String, ArrayList<String>> YagoMapEntry : MappingYagoClassWithInstances.entrySet()){
String yagoClass=YagoMapEntry.getKey();
Set<String> IntersectionSet =Sets.intersection(Sets.newHashSet(DBpediaInstancesArray), Sets.newHashSet(YagoMapEntry.getValue()));
//System.out.println(dbClass + " and "+ yagoClass+ " = "+ IntersectionSet.size());
NumberOfCommonInstances = IntersectionSet.size();
mappingYagoLabelMaxValue.put(yagoClass, NumberOfCommonInstances);
}
int MaximumValueOfIntersection=(Collections.max(mappingYagoLabelMaxValue.values()));
for(Entry<String, Integer> YagoLabelToValueEntry:mappingYagoLabelMaxValue.entrySet()){
if(YagoLabelToValueEntry.getValue()==MaximumValueOfIntersection && MaximumValueOfIntersection != 0){
YagoClassWithMaxMatches = YagoLabelToValueEntry.getKey();
}
if(MaximumValueOfIntersection==0){
YagoClassWithMaxMatches = "Nothing in yago";
}
}
System.out.println("-------------------------------");
System.out.println(dbClass+" Corresponds to "+ YagoClassWithMaxMatches+" with : "+MaximumValueOfIntersection);
System.out.println("-------------------------------");
}
}
This code returns for example:
Actor from DBPEDIA Corresponds to Yago_Actor
Album from DBPEDIA Corresponds to Yago_Album
SomeClass from DBPEDIA Corresponds to nothing in Yago
Etc..
Behind the scenes, this code uses getDBpediaClasses
and then applies GetDBpediaClassInstances
method to get an ArrayList
of results for each class. Each ArrayList resulted is then compared to another ArrayList
generated by GetYagooClassInstances
for each class of fileToArrayListYago()
.
Now, because of all the calculations made in the background (there are millions of elements in each array), this process takes hours to execute, because it runs each comparison after another, while it would be better to get them all done at once and then just get the YagoClassWithMaxMatches
after all of them have been computed.
I was thinking of concurrency, but I first wanted to see if there is an easier way.
Are there any suggestions to make the code run faster?
2 Answers 2
First I must encourage you to follow java conventions. Local variables and methods should be typed in lowerCamelCase
. You shouldn't also assign a variable twice if you can avoid it for example you shouldn't do this
List<String> DBpediaClassesArray = new ArrayList<String>(); DBpediaClassesArray = new ArrayList<String>( ClassesRetrieval.getDBpediaClasses());
You could also guive a better name to this variable, the array sufix isn't doing much there, I would prefer dbpediaClasses. You should do this instead:
List<String> dbpediaClasses= new ArrayList<String>( ClassesRetrieval.getDBpediaClasses());
About your implementation:
You are in the wrong path! It seems to me that you aren't writing code in the simplest way to achieve the expected behaviour. You are creating way to many maps in order to simply find a max number of intersections.
I wouldn't put my hands on fire for it, because I can't test it, but I think that your code is equivalent to this:
public static void getNdMatches() throws IOException{
List<String> dbpediaClasses = new ArrayList<String>( ClassesRetrieval.getdbpediaClasses());
List<String> yagoClasses = new ArrayList<String>( ClassesRetrieval.fileToArrayListYago());
HashMap<String, Set<String>> cachedYagos = new HashMap<String, Set<String>>();
String yagoClassWithMaxMatches="";
for(String dbClass:dbpediaClasses){
int max = 0;
Set<String> dbClassSet = Sets.newHashSet(GetDBpediaClassInstances(dbClass));
for(String yagoClass:yagoClasses){
Set<String> yagoSet = cachedYagos.get(yagoClass);
if(yagoSet == null){
yagoSet = Sets.newHashSet(GetYagooClassInstances(yagoClass);
cachedYagos.put(yagoClass, yagoSet);
}
Set<String> IntersectionSet = Sets.intersection(dbClassSet, yagoSet);
if(IntersectionSet.size() > max){
max = IntersectionSet.size();
yagoClassWithMaxMatches = yagoClass
}
}
if(max == 0){
System.out.println(dbClass+" Corresponds to nothing");
}else{
System.out.println(dbClass+" Corresponds to "+ yagoClassWithMaxMatches+" with : "+max);
}
}
}
-
\$\begingroup\$ Actually your method is less performant because it has to repeat
GetYagooClassInstances(yagoClass)
for each loop ofdbClass
\$\endgroup\$callback– callback2014年09月07日 13:18:36 +00:00Commented Sep 7, 2014 at 13:18 -
\$\begingroup\$ @user1350162 if that takes time then you should create an hashmap to cache the results. The point of this review was to increase the readibility of your code. You didn't ever mention which the methods that were troubling your execution time so I couldn't guess and take that into account in my answer. Should I update my answer acoordingly? Or do you know how to solve the problem? \$\endgroup\$Bruno Costa– Bruno Costa2014年09月07日 13:26:07 +00:00Commented Sep 7, 2014 at 13:26
-
\$\begingroup\$ Thanks replying! It would e great if you could update your answer in a way that stores the results of that function in a map! \$\endgroup\$callback– callback2014年09月07日 13:32:56 +00:00Commented Sep 7, 2014 at 13:32
-
\$\begingroup\$ @user1350162 voila! \$\endgroup\$Bruno Costa– Bruno Costa2014年09月07日 13:38:49 +00:00Commented Sep 7, 2014 at 13:38
I am not going to review the logic of your code because it is very difficult to read for a Java developer.
Please use basic Java conventions. It might seem like a small detail to you, but it really does make the code nearly unreadable to a Java developer. Quick list: camel case notation, spacing, avoiding static methods and using the most general type when defining variables (
Map
instead ofHashMap
).Give more thought to variable names. Again, this seems like a small detail, but naming things is actually a huge part of clean software development.
Break things down using OO, or at least methods. This looks like imperative C code and it is painful to read for a Java developer.
Small details:
@SuppressWarnings("rawtypes")
?You set some values to
DBpediaClassesArray
andYagoClassesArray
, and you overwrite them the following line.
Sorry for being harsh, but it really is difficult for me to follow the logic of your code because of the code style.
Explore related questions
See similar questions with these tags.