This is in continuation to a question I asked here. Given the total number of nodes (employees) and the adjacency list (friendship amongst employees), I need to find all the connected components.
public class Main {
static HashMap<String, Set<String>> friendShips;
public static void main(String[] args) throws IOException {
BufferedReader in= new BufferedReader(new InputStreamReader(System.in));
String dataLine = in.readLine();
String[] lineParts = dataLine.split(" ");
int employeeCount = Integer.parseInt(lineParts[0]);
int friendShipCount = Integer.parseInt(lineParts[1]);
friendShips = new HashMap<String, Set<String>>();
for (int i = 0; i < friendShipCount; i++) {
String friendShipLine = in.readLine();
String[] friendParts = friendShipLine.split(" ");
mapFriends(friendParts[0], friendParts[1], friendShips);
mapFriends(friendParts[1], friendParts[0], friendShips);
}
Set<String> employees = new HashSet<String>();
for (int i = 1; i <= employeeCount; i++) {
employees.add(Integer.toString(i));
}
Vector<Set<String>> friendBuckets = bucketizeEmployees(employees);
System.out.println(friendBuckets.size());
}
public static void mapFriends(String friendA, String friendB, Map<String, Set<String>> friendsShipMap) {
if (friendsShipMap.containsKey(friendA)) {
friendsShipMap.get(friendA).add(friendB);
} else {
Set<String> friends = new HashSet<String>();
friends.add(friendB);
friendsShipMap.put(friendA, friends);
}
}
public static Vector<Set<String>> bucketizeEmployees(Set<String> employees) {
Vector<Set<String>> friendBuckets = new Vector<Set<String>>();
while (!employees.isEmpty()) {
String employee = getHeadElement(employees);
Set<String> connectedEmployeesBucket = getConnectedFriends(employee);
friendBuckets.add(connectedEmployeesBucket);
employees.removeAll(connectedEmployeesBucket);
}
return friendBuckets;
}
private static Set<String> getConnectedFriends(String friend) {
Set<String> connectedFriends = new HashSet<String>();
connectedFriends.add(friend);
Set<String> queuedFriends = new LinkedHashSet<String>();
if (friendShips.get(friend) != null) {
queuedFriends.addAll(friendShips.get(friend));
}
while (!queuedFriends.isEmpty()) {
String poppedFriend = getHeadElement(queuedFriends);
connectedFriends.add(poppedFriend);
if (friendShips.containsKey(poppedFriend))
for (String directFriend : friendShips.get(poppedFriend)) {
if (!connectedFriends.contains(directFriend) && !queuedFriends.contains(directFriend)) {
queuedFriends.add(directFriend);
}
}
}
return connectedFriends;
}
private static String getHeadElement(Set<String> setFriends) {
Iterator<String> iter = setFriends.iterator();
String head = iter.next();
iter.remove();
return head;
}
}
I have tested my code using the following script, the results of which I consume as sdtIn
#!/bin/bash
echo "100000 100000"
for i in {1..100000}
do
r1=$(( $RANDOM % 100000 ))
r2=$(( $RANDOM % 100000 ))
echo "$r1 $r2"
done
While I was able to verify (for trivial inputs) that my answer is correct, when I try with huge inputs as with the above script, I see that the run takes long (~20s).
Is there anything I can do better in my implementation ?
-
\$\begingroup\$ Use a profiler to profile the code and find out where you spent the time. It is hard to guess, because a lot is happening there. \$\endgroup\$tb-– tb-2013年03月16日 10:20:57 +00:00Commented Mar 16, 2013 at 10:20
-
\$\begingroup\$ The code is pretty fast, the server JDK (using command line parameter -server) is 2 times faster on my PC (C2D E7200) 12sec vs 24sec. \$\endgroup\$cat_baxter– cat_baxter2013年04月22日 12:02:29 +00:00Commented Apr 22, 2013 at 12:02
1 Answer 1
First of all, two things to read or search for: Cluster analysis (just a hint, I'm not an expert about that) and Linked: The New Science of Networks by Albert-László Barabási.
Barabási in his book shows that networks usually have some nodes which have a lot more connections than the others. The distribution is not the same in the real world as the sample shell script generates.
The code is quite good, I like your variable and method names and separated methods. I wonder why haven't anyone reviewed it yet.
Vector<Set<String>> friendBuckets = bucketizeEmployees(employees);
I'd use a simple
List
orArrayList
here. Vector is considered obsolete.In the
getConnectedFriends
method theSet<String> queuedFriends = new LinkedHashSet<String>();
could be a Queue. It has a
poll
method. As far as I tested it's faster than the currently used iterator-based remove.public class Main {
Main
isn't a good class name. Everyone can have aMain
. What's it purpose? Try to find a more descriptive name.static HashMap<String, Set<String>> friendShips;
HashMap<...>
reference types should be simplyMap<...>
. See: Effective Java, 2nd edition, Item 52: Refer to objects by their interfacesShould I always use the private access modifier for class fields?; Item 13 of Effective Java 2nd Edition: Minimize the accessibility of classes and members.
static HashMap<String, Set<String>> friendShips;
Instead of
Map<String, Set<String>>
you could use Guava'sMultimap
(doc, javadoc) which was designed exactly for that. It would reduce the size of themapFriends
method:public static void mapFriends(final String friendA, final String friendB, final Multimap<String, String> friendsShipMap) { friendsShipMap.put(friendA, friendB); }
So, it could be removed.
public static Vector<Set<String>> bucketizeEmployees(Set<String> employees) { ... }
This method calls
getConnectedFriends(employee)
, which is the following:private static Set<String> getConnectedFriends(String friend) { ... }
It's confusing: what is the difference between an employee and friend? Are they the same?
if (friendShips.get(friend) != null) {
The following is the same:
if (friendShips.containsKey(friend)) {
A guard clause would be even better.
if (!friendShips.containsKey(friend)) { return connectedFriends; }
if (!connectedFriends.contains(directFriend) && !queuedFriends.contains(directFriend)) { queuedFriends.add(directFriend); }
The
!queuedFriends.contains(directFriend)
condition is unnecessary, it's a set which can't contain elements twice and adding an already added element to aLinkedHashSet
doesn't modify anything. From the javadoc:Note that insertion order is not affected if an element is re-inserted into the set.
The following pattern occurs more than once in the code:
if (map.containsKey(key)) { String value = map.get(key); ... } ...
It might be a microoptimization, but if you profile the code and the results shows it as a bottleneck the following structure is the same:
String value = map.get(key); if (value != null) { ... } ...
A few guard clause would help to make
getConnectedFriends
more flatten:while (!queuedFriends.isEmpty()) { final String poppedFriend = getHeadElement(queuedFriends); connectedFriends.add(poppedFriend); if (!friendShips.containsKey(poppedFriend)) { continue; } for (final String directFriend: friendShips.get(poppedFriend)) { if (connectedFriends.contains(directFriend)) { continue; } queuedFriends.add(directFriend); } }
Explore related questions
See similar questions with these tags.