BFS implementation to find connected components taking too long

Question 1

This is in continuation to a question I asked here. Given the total number of nodes (employees) and the adjacency list (friendship amongst employees), I need to find all the connected components.

public class Main {
 static HashMap<String, Set<String>> friendShips;
 public static void main(String[] args) throws IOException {
 BufferedReader in= new BufferedReader(new InputStreamReader(System.in));
 String dataLine = in.readLine();
 String[] lineParts = dataLine.split(" ");
 int employeeCount = Integer.parseInt(lineParts[0]);
 int friendShipCount = Integer.parseInt(lineParts[1]);
 friendShips = new HashMap<String, Set<String>>();
 for (int i = 0; i < friendShipCount; i++) {
 String friendShipLine = in.readLine();
 String[] friendParts = friendShipLine.split(" ");
 mapFriends(friendParts[0], friendParts[1], friendShips);
 mapFriends(friendParts[1], friendParts[0], friendShips);
 }
 Set<String> employees = new HashSet<String>();
 for (int i = 1; i <= employeeCount; i++) {
 employees.add(Integer.toString(i));
 }
 Vector<Set<String>> friendBuckets = bucketizeEmployees(employees);
 System.out.println(friendBuckets.size());
 }
 public static void mapFriends(String friendA, String friendB, Map<String, Set<String>> friendsShipMap) {
 if (friendsShipMap.containsKey(friendA)) {
 friendsShipMap.get(friendA).add(friendB);
 } else {
 Set<String> friends = new HashSet<String>();
 friends.add(friendB);
 friendsShipMap.put(friendA, friends);
 }
 }
 public static Vector<Set<String>> bucketizeEmployees(Set<String> employees) {
 Vector<Set<String>> friendBuckets = new Vector<Set<String>>();
 while (!employees.isEmpty()) {
 String employee = getHeadElement(employees);
 Set<String> connectedEmployeesBucket = getConnectedFriends(employee);
 friendBuckets.add(connectedEmployeesBucket);
 employees.removeAll(connectedEmployeesBucket);
 }
 return friendBuckets;
 }
 private static Set<String> getConnectedFriends(String friend) {
 Set<String> connectedFriends = new HashSet<String>();
 connectedFriends.add(friend);
 Set<String> queuedFriends = new LinkedHashSet<String>();
 if (friendShips.get(friend) != null) {
 queuedFriends.addAll(friendShips.get(friend));
 }
 while (!queuedFriends.isEmpty()) {
 String poppedFriend = getHeadElement(queuedFriends);
 connectedFriends.add(poppedFriend);
 if (friendShips.containsKey(poppedFriend))
 for (String directFriend : friendShips.get(poppedFriend)) {
 if (!connectedFriends.contains(directFriend) && !queuedFriends.contains(directFriend)) {
 queuedFriends.add(directFriend);
 }
 }
 }
 return connectedFriends;
 }
 private static String getHeadElement(Set<String> setFriends) {
 Iterator<String> iter = setFriends.iterator();
 String head = iter.next();
 iter.remove();
 return head;
 }
}

I have tested my code using the following script, the results of which I consume as sdtIn

#!/bin/bash
echo "100000 100000"
for i in {1..100000}
do
 r1=$(( $RANDOM % 100000 ))
 r2=$(( $RANDOM % 100000 ))
 echo "$r1 $r2"
 done

While I was able to verify (for trivial inputs) that my answer is correct, when I try with huge inputs as with the above script, I see that the run takes long (~20s).

Is there anything I can do better in my implementation ?

Question 2

Use a profiler to profile the code and find out where you spent the time. It is hard to guess, because a lot is happening there.

Question 3

The code is pretty fast, the server JDK (using command line parameter -server) is 2 times faster on my PC (C2D E7200) 12sec vs 24sec.

Question 4

First of all, two things to read or search for: Cluster analysis (just a hint, I'm not an expert about that) and Linked: The New Science of Networks by Albert-László Barabási.

Barabási in his book shows that networks usually have some nodes which have a lot more connections than the others. The distribution is not the same in the real world as the sample shell script generates.
The code is quite good, I like your variable and method names and separated methods. I wonder why haven't anyone reviewed it yet.

Vector<Set<String>> friendBuckets = bucketizeEmployees(employees);

I'd use a simple List or ArrayList here. Vector is considered obsolete.

In the getConnectedFriends method the
```
Set<String> queuedFriends = new LinkedHashSet<String>();
```
could be a Queue. It has a poll method. As far as I tested it's faster than the currently used iterator-based remove.
```
public class Main {
```
Main isn't a good class name. Everyone can have a Main. What's it purpose? Try to find a more descriptive name.
```
static HashMap<String, Set<String>> friendShips; 
```
HashMap<...> reference types should be simply Map<...>. See: Effective Java, 2nd edition, Item 52: Refer to objects by their interfaces

Should I always use the private access modifier for class fields?; Item 13 of Effective Java 2nd Edition: Minimize the accessibility of classes and members.

 static HashMap<String, Set<String>> friendShips;

Instead of Map<String, Set<String>> you could use Guava's Multimap (doc, javadoc) which was designed exactly for that. It would reduce the size of the mapFriends method:

public static void mapFriends(final String friendA, final String friendB,
 final Multimap<String, String> friendsShipMap) {
 friendsShipMap.put(friendA, friendB);
}

So, it could be removed.

```
public static Vector<Set<String>> bucketizeEmployees(Set<String> employees) {
 ...
}
```
This method calls getConnectedFriends(employee), which is the following:
```
private static Set<String> getConnectedFriends(String friend) {
 ...
}
```
It's confusing: what is the difference between an employee and friend? Are they the same?

if (friendShips.get(friend) != null) {

The following is the same:

if (friendShips.containsKey(friend)) {

A guard clause would be even better.

if (!friendShips.containsKey(friend)) {
 return connectedFriends;
}

```
if (!connectedFriends.contains(directFriend) && !queuedFriends.contains(directFriend)) {
 queuedFriends.add(directFriend);
}
```
The !queuedFriends.contains(directFriend) condition is unnecessary, it's a set which can't contain elements twice and adding an already added element to a LinkedHashSet doesn't modify anything. From the javadoc:

Note that insertion order is not affected if an element is re-inserted into the set.
The following pattern occurs more than once in the code:
```
if (map.containsKey(key)) {
 String value = map.get(key);
 ...
}
...
```
It might be a microoptimization, but if you profile the code and the results shows it as a bottleneck the following structure is the same:
```
String value = map.get(key); 
if (value != null) {
 ...
}
...
```

A few guard clause would help to make getConnectedFriends more flatten:

while (!queuedFriends.isEmpty()) {
 final String poppedFriend = getHeadElement(queuedFriends);
 connectedFriends.add(poppedFriend);
 if (!friendShips.containsKey(poppedFriend)) {
 continue;
 }
 for (final String directFriend: friendShips.get(poppedFriend)) {
 if (connectedFriends.contains(directFriend)) {
 continue;
 }
 queuedFriends.add(directFriend);
 }
}

palacsint palacsint 30.3k9 gold badges82 silver badges157 bronze badges · Answer 1 · 2014-03-05 22:49:53Z

First of all, two things to read or search for: Cluster analysis (just a hint, I'm not an expert about that) and Linked: The New Science of Networks by Albert-László Barabási.

Barabási in his book shows that networks usually have some nodes which have a lot more connections than the others. The distribution is not the same in the real world as the sample shell script generates.
The code is quite good, I like your variable and method names and separated methods. I wonder why haven't anyone reviewed it yet.

Vector<Set<String>> friendBuckets = bucketizeEmployees(employees);

I'd use a simple List or ArrayList here. Vector is considered obsolete.

In the getConnectedFriends method the
```
Set<String> queuedFriends = new LinkedHashSet<String>();
```
could be a Queue. It has a poll method. As far as I tested it's faster than the currently used iterator-based remove.
```
public class Main {
```
Main isn't a good class name. Everyone can have a Main. What's it purpose? Try to find a more descriptive name.
```
static HashMap<String, Set<String>> friendShips; 
```
HashMap<...> reference types should be simply Map<...>. See: Effective Java, 2nd edition, Item 52: Refer to objects by their interfaces

Should I always use the private access modifier for class fields?; Item 13 of Effective Java 2nd Edition: Minimize the accessibility of classes and members.

 static HashMap<String, Set<String>> friendShips;

Instead of Map<String, Set<String>> you could use Guava's Multimap (doc, javadoc) which was designed exactly for that. It would reduce the size of the mapFriends method:

public static void mapFriends(final String friendA, final String friendB,
 final Multimap<String, String> friendsShipMap) {
 friendsShipMap.put(friendA, friendB);
}

So, it could be removed.

```
public static Vector<Set<String>> bucketizeEmployees(Set<String> employees) {
 ...
}
```
This method calls getConnectedFriends(employee), which is the following:
```
private static Set<String> getConnectedFriends(String friend) {
 ...
}
```
It's confusing: what is the difference between an employee and friend? Are they the same?

if (friendShips.get(friend) != null) {

The following is the same:

if (friendShips.containsKey(friend)) {

A guard clause would be even better.

if (!friendShips.containsKey(friend)) {
 return connectedFriends;
}

```
if (!connectedFriends.contains(directFriend) && !queuedFriends.contains(directFriend)) {
 queuedFriends.add(directFriend);
}
```
The !queuedFriends.contains(directFriend) condition is unnecessary, it's a set which can't contain elements twice and adding an already added element to a LinkedHashSet doesn't modify anything. From the javadoc:

Note that insertion order is not affected if an element is re-inserted into the set.
The following pattern occurs more than once in the code:
```
if (map.containsKey(key)) {
 String value = map.get(key);
 ...
}
...
```
It might be a microoptimization, but if you profile the code and the results shows it as a bottleneck the following structure is the same:
```
String value = map.get(key); 
if (value != null) {
 ...
}
...
```

A few guard clause would help to make getConnectedFriends more flatten:

while (!queuedFriends.isEmpty()) {
 final String poppedFriend = getHeadElement(queuedFriends);
 connectedFriends.add(poppedFriend);
 if (!friendShips.containsKey(poppedFriend)) {
 continue;
 }
 for (final String directFriend: friendShips.get(poppedFriend)) {
 if (connectedFriends.contains(directFriend)) {
 continue;
 }
 queuedFriends.add(directFriend);
 }
}

Stack Exchange Network

BFS implementation to find connected components taking too long

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

BFS implementation to find connected components taking too long

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions