I'd use a simple List
or ArrayList
here. Vector is considered obsolete Vector is considered obsolete.
I'd use a simple List
or ArrayList
here. Vector is considered obsolete.
I'd use a simple List
or ArrayList
here. Vector is considered obsolete.
- First of all, two things to read or search for: Cluster analysis (just a hint, I'm not an expert about that) and Linked: The New Science of Networks by Albert-László Barabási.
Barabási in his book shows that networks usually have some nodes which have a lot more connections than the others. The distribution is not the same in the real world as the sample shell script generates.
The code is quite good, I like your variable and method names and separated methods. I wonder why haven't anyone reviewed it yet.
Vector<Set<String>> friendBuckets = bucketizeEmployees(employees);
I'd use a simple List
or ArrayList
here. Vector is considered obsolete.
- In the
getConnectedFriends
method the
Set<String> queuedFriends = new LinkedHashSet<String>();
could be a Queue. It has a poll
method. As far as I tested it's faster than the currently used iterator-based remove.
1.
public class Main {
Main
isn't a good class name. Everyone can have a Main
. What's it purpose? Try to find a more descriptive name.
1.
static HashMap<String, Set<String>> friendShips;
HashMap<...>
reference types should be simply Map<...>
. See: Effective Java, 2nd edition, Item 52: Refer to objects by their interfaces
Should I always use the private access modifier for class fields?; Item 13 of Effective Java 2nd Edition: Minimize the accessibility of classes and members.
1.
static HashMap<String, Set<String>> friendShips;
Instead of Map<String, Set<String>>
you could use Guava's Multimap
(doc, javadoc) which was designed exactly for that. It would reduce the size of the mapFriends
method:
public static void mapFriends(final String friendA, final String friendB,
final Multimap<String, String> friendsShipMap) {
friendsShipMap.put(friendA, friendB);
}
So, it could be removed.
1.
public static Vector<Set<String>> bucketizeEmployees(Set<String> employees) { ... }
This method calls getConnectedFriends(employee)
, which is the following:
private static Set<String> getConnectedFriends(String friend) { ... }
It's confusing: what is the difference between an employee and friend? Are they the same?
1.
if (friendShips.get(friend) != null) {
The following is the same:
if (friendShips.containsKey(friend)) {
A guard clause would be even better.
if (!friendShips.containsKey(friend)) {
return connectedFriends;
}
if (!connectedFriends.contains(directFriend) && !queuedFriends.contains(directFriend)) { queuedFriends.add(directFriend); }
The !queuedFriends.contains(directFriend)
condition is unnecessary, it's a set which can't contain elements twice and adding an already added element to a LinkedHashSet
doesn't modify anything. From the javadoc:
Note that insertion order is not affected if an element is re-inserted into the set.
The following pattern occurs more than once in the code:
if (map.containsKey(key)) { String value = map.get(key); ... } ...
It might be a microoptimization, but if you profile the code and the results shows it as a bottleneck the following structure is the same:
String value = map.get(key);
if (value != null) {
...
}
...
A few guard clause would help to make
getConnectedFriends
more flatten:while (!queuedFriends.isEmpty()) { final String poppedFriend = getHeadElement(queuedFriends); connectedFriends.add(poppedFriend); if (!friendShips.containsKey(poppedFriend)) { continue; } for (final String directFriend: friendShips.get(poppedFriend)) { if (connectedFriends.contains(directFriend)) { continue; } queuedFriends.add(directFriend); } }