Read Twitter users from a text file

Question 1

I have this code for reading Twitter users from a text file represented by integers in the format "int (space) int", where the first int is the ID of the user doing the following and the second int is the user being followed. It works for small files but takes a really long time with larger files. What can I do to make it more efficient?

users is an ArrayList

public void open() {
 try {
 if (file.exists()) {
 FileReader fileReader = new FileReader(file);
 BufferedReader bufferedReader = new BufferedReader(fileReader);
 String line;
 String[] strings;
 while ((line = bufferedReader.readLine()) != null) {
 strings = line.split(" ");
 String user1 = strings[0];
 String user2 = strings[1];
 if (!users.toString().contains(user1)) {
 TwitterUser tUser1 = new TwitterUser(
 Integer.parseInt(user1));
 users.add(tUser1);
 TwitterUser tUser2 = new TwitterUser(
 Integer.parseInt(user2));
 tUser1.follow(tUser2);
 } else {
 TwitterUser target = null;
 for (TwitterUser i : users)
 if (i.toString().equals(user1))
 target = i;
 TwitterUser tUser3 = new TwitterUser(
 Integer.parseInt(user2));
 target.follow(tUser3);
 }
 }
 fileReader.close();
 }
 } catch (IOException e) {
 e.printStackTrace();
 }
}

Question 2

Welcome to Code Review! I hope you get some helpful answers.

Question 3

Please include the definition of users.

Question 4

private ArrayList<TwitterUser> users = new ArrayList<TwitterUser>();

Question 5

users.toString().contains(user1)

This is not a very optimal way of checking if a user from the file is in your users object. For starters, ArrayList.toString() will generate the String representation every time, and you may run into false positives. For example, if the toString() representation of a TwitterUser contains a comma, which is the delimiter used by ArrayList:

String user1 = user1.toString(); // = "John, Doe"
String user2 = user2.toString(); // = "Jane, Doe"
// Just to make sure we are using the ArrayList implementation
List<String> users = new ArrayList<>(Arrays.asList(user1, user2));
String allUsers = users.toString(); // = "[John, Doe, Jane, Doe]"
allUsers.contains("Doe, Jane"); // = true

Sure, you are actually comparing numeric values, but consider this then:

String user11 = user11.toString(); // = "11"
String user22 = user22.toString(); // = "22"
// Just to make sure we are using the ArrayList implementation
List<String> users = new ArrayList<>(Arrays.asList(user11, user22));
String allUsers = users.toString(); // = "[11, 22]"
allUsers.contains("1"); // = true

The point here is that ideally, your TwitterUser class should be constructed first so that you can use one of its method to test for equivalence, using equals(Object) for example:

public class TwitterUser {
 // ...
 public boolean equals(Object o) {
 // assuming id is a primitive value
 return o instanceof TwitterUser && ((TwitterUser) o).id == id;
 }
}

Again, you should avoid using toString()-based comparison below for testing equivalence due to the 'false positives' point. Furthermore, you should not skimp on braces ({ ... }), and break early once you have a match:

for (TwitterUser i : users) {
 TwitterUser twitterUser = // ...
 if (i.equals(twitterUser)) {
 target = i;
 break;
 }
}

Hold your horses! If you have implemented TwitterUser.equals() to compare by the ID, you wouldn't even need the for-loop:

TwitterUser user = new TwitterUser(Integer.parseInt(user1));
if (!users.contains(user)) {
 users.add(user);
 user.follow(new TwitterUser(Integer.parseInt(user2)));
} else {
 TwitterUser currentUser = users.get(users.indexOf(user));
 currentUser.follow(new TwitterUser(Integer.parseInt(user2)));
}

To make the performance better, you can consider using a Set instead of a List.

To avoid the huge nested if-block after if (file.exists()), consider return-ing early:

public void open() {
 if (!file.exists()) {
 return;
 }
 try {
 // ...
 }
}

Ideally, the File object should be passed to this method so that the method callers know what exact file is being opened here.

edit: If you claim that the processing still tends to infinity, consider reworking your process:

Read all the lines from the file (even if it runs into the millions, it's just two numbers per line).
Use a number of background threads to process the lines concurrently. You'll need a Runnable/Callable implementation, an ExecutorService and a thread-safe Collection implementation. This is probably more suited for another question/answer though...

Question 6

I tried using your code and my program takes still takes infinite time to run. Could you show me an example of what the final code should look like? Thanks.

Question 7

Define infinite. How large is your file? Millions of lines?

Question 8

I'ts a 160MB text file. I'm saying infinite because it never actually runs, it keeps on loading forever.

Question 9

See my edit. You may want to rework your process.

Question 10

Then you should seriously consider switching to a Set, if possible. Checking if a Set contains an object is O(1), whereas it's O(n) for List. For more info, see this StackOverflow answer.

h.j.k. h.j.k. 19.4k3 gold badges37 silver badges93 bronze badges · Answer 1 · 2015-10-19 00:58:59Z

users.toString().contains(user1)

This is not a very optimal way of checking if a user from the file is in your users object. For starters, ArrayList.toString() will generate the String representation every time, and you may run into false positives. For example, if the toString() representation of a TwitterUser contains a comma, which is the delimiter used by ArrayList:

String user1 = user1.toString(); // = "John, Doe"
String user2 = user2.toString(); // = "Jane, Doe"
// Just to make sure we are using the ArrayList implementation
List<String> users = new ArrayList<>(Arrays.asList(user1, user2));
String allUsers = users.toString(); // = "[John, Doe, Jane, Doe]"
allUsers.contains("Doe, Jane"); // = true

Sure, you are actually comparing numeric values, but consider this then:

String user11 = user11.toString(); // = "11"
String user22 = user22.toString(); // = "22"
// Just to make sure we are using the ArrayList implementation
List<String> users = new ArrayList<>(Arrays.asList(user11, user22));
String allUsers = users.toString(); // = "[11, 22]"
allUsers.contains("1"); // = true

The point here is that ideally, your TwitterUser class should be constructed first so that you can use one of its method to test for equivalence, using equals(Object) for example:

public class TwitterUser {
 // ...
 public boolean equals(Object o) {
 // assuming id is a primitive value
 return o instanceof TwitterUser && ((TwitterUser) o).id == id;
 }
}

Again, you should avoid using toString()-based comparison below for testing equivalence due to the 'false positives' point. Furthermore, you should not skimp on braces ({ ... }), and break early once you have a match:

for (TwitterUser i : users) {
 TwitterUser twitterUser = // ...
 if (i.equals(twitterUser)) {
 target = i;
 break;
 }
}

Hold your horses! If you have implemented TwitterUser.equals() to compare by the ID, you wouldn't even need the for-loop:

TwitterUser user = new TwitterUser(Integer.parseInt(user1));
if (!users.contains(user)) {
 users.add(user);
 user.follow(new TwitterUser(Integer.parseInt(user2)));
} else {
 TwitterUser currentUser = users.get(users.indexOf(user));
 currentUser.follow(new TwitterUser(Integer.parseInt(user2)));
}

To make the performance better, you can consider using a Set instead of a List.

To avoid the huge nested if-block after if (file.exists()), consider return-ing early:

public void open() {
 if (!file.exists()) {
 return;
 }
 try {
 // ...
 }
}

Ideally, the File object should be passed to this method so that the method callers know what exact file is being opened here.

edit: If you claim that the processing still tends to infinity, consider reworking your process:

Read all the lines from the file (even if it runs into the millions, it's just two numbers per line).
Use a number of background threads to process the lines concurrently. You'll need a Runnable/Callable implementation, an ExecutorService and a thread-safe Collection implementation. This is probably more suited for another question/answer though...

I tried using your code and my program takes still takes infinite time to run. Could you show me an example of what the final code should look like? Thanks.
I'ts a 160MB text file. I'm saying infinite because it never actually runs, it keeps on loading forever.
Then you should seriously consider switching to a Set, if possible. Checking if a Set contains an object is O(1), whereas it's O(n) for List. For more info, see this StackOverflow answer.

Stack Exchange Network

Read Twitter users from a text file

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Read Twitter users from a text file

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions