4
\$\begingroup\$

I have this code for reading Twitter users from a text file represented by integers in the format "int (space) int", where the first int is the ID of the user doing the following and the second int is the user being followed. It works for small files but takes a really long time with larger files. What can I do to make it more efficient?

users is an ArrayList

public void open() {
 try {
 if (file.exists()) {
 FileReader fileReader = new FileReader(file);
 BufferedReader bufferedReader = new BufferedReader(fileReader);
 String line;
 String[] strings;
 while ((line = bufferedReader.readLine()) != null) {
 strings = line.split(" ");
 String user1 = strings[0];
 String user2 = strings[1];
 if (!users.toString().contains(user1)) {
 TwitterUser tUser1 = new TwitterUser(
 Integer.parseInt(user1));
 users.add(tUser1);
 TwitterUser tUser2 = new TwitterUser(
 Integer.parseInt(user2));
 tUser1.follow(tUser2);
 } else {
 TwitterUser target = null;
 for (TwitterUser i : users)
 if (i.toString().equals(user1))
 target = i;
 TwitterUser tUser3 = new TwitterUser(
 Integer.parseInt(user2));
 target.follow(tUser3);
 }
 }
 fileReader.close();
 }
 } catch (IOException e) {
 e.printStackTrace();
 }
}
RubberDuck
31.1k6 gold badges73 silver badges176 bronze badges
asked Oct 19, 2015 at 0:14
\$\endgroup\$
3
  • \$\begingroup\$ Welcome to Code Review! I hope you get some helpful answers. \$\endgroup\$ Commented Oct 19, 2015 at 0:17
  • \$\begingroup\$ Please include the definition of users. \$\endgroup\$ Commented Oct 19, 2015 at 0:24
  • \$\begingroup\$ private ArrayList<TwitterUser> users = new ArrayList<TwitterUser>(); \$\endgroup\$ Commented Oct 19, 2015 at 0:33

1 Answer 1

4
\$\begingroup\$
users.toString().contains(user1)

This is not a very optimal way of checking if a user from the file is in your users object. For starters, ArrayList.toString() will generate the String representation every time, and you may run into false positives. For example, if the toString() representation of a TwitterUser contains a comma, which is the delimiter used by ArrayList:

String user1 = user1.toString(); // = "John, Doe"
String user2 = user2.toString(); // = "Jane, Doe"
// Just to make sure we are using the ArrayList implementation
List<String> users = new ArrayList<>(Arrays.asList(user1, user2));
String allUsers = users.toString(); // = "[John, Doe, Jane, Doe]"
allUsers.contains("Doe, Jane"); // = true

Sure, you are actually comparing numeric values, but consider this then:

String user11 = user11.toString(); // = "11"
String user22 = user22.toString(); // = "22"
// Just to make sure we are using the ArrayList implementation
List<String> users = new ArrayList<>(Arrays.asList(user11, user22));
String allUsers = users.toString(); // = "[11, 22]"
allUsers.contains("1"); // = true

The point here is that ideally, your TwitterUser class should be constructed first so that you can use one of its method to test for equivalence, using equals(Object) for example:

public class TwitterUser {
 // ...
 public boolean equals(Object o) {
 // assuming id is a primitive value
 return o instanceof TwitterUser && ((TwitterUser) o).id == id;
 }
}

Again, you should avoid using toString()-based comparison below for testing equivalence due to the 'false positives' point. Furthermore, you should not skimp on braces ({ ... }), and break early once you have a match:

for (TwitterUser i : users) {
 TwitterUser twitterUser = // ...
 if (i.equals(twitterUser)) {
 target = i;
 break;
 }
}

Hold your horses! If you have implemented TwitterUser.equals() to compare by the ID, you wouldn't even need the for-loop:

TwitterUser user = new TwitterUser(Integer.parseInt(user1));
if (!users.contains(user)) {
 users.add(user);
 user.follow(new TwitterUser(Integer.parseInt(user2)));
} else {
 TwitterUser currentUser = users.get(users.indexOf(user));
 currentUser.follow(new TwitterUser(Integer.parseInt(user2)));
}

To make the performance better, you can consider using a Set instead of a List.

To avoid the huge nested if-block after if (file.exists()), consider return-ing early:

public void open() {
 if (!file.exists()) {
 return;
 }
 try {
 // ...
 }
}

Ideally, the File object should be passed to this method so that the method callers know what exact file is being opened here.

edit: If you claim that the processing still tends to infinity, consider reworking your process:

  1. Read all the lines from the file (even if it runs into the millions, it's just two numbers per line).
  2. Use a number of background threads to process the lines concurrently. You'll need a Runnable/Callable implementation, an ExecutorService and a thread-safe Collection implementation. This is probably more suited for another question/answer though...
answered Oct 19, 2015 at 0:58
\$\endgroup\$
6
  • \$\begingroup\$ I tried using your code and my program takes still takes infinite time to run. Could you show me an example of what the final code should look like? Thanks. \$\endgroup\$ Commented Oct 19, 2015 at 1:32
  • \$\begingroup\$ Define infinite. How large is your file? Millions of lines? \$\endgroup\$ Commented Oct 19, 2015 at 1:33
  • \$\begingroup\$ I'ts a 160MB text file. I'm saying infinite because it never actually runs, it keeps on loading forever. \$\endgroup\$ Commented Oct 19, 2015 at 1:35
  • \$\begingroup\$ See my edit. You may want to rework your process. \$\endgroup\$ Commented Oct 19, 2015 at 1:36
  • 1
    \$\begingroup\$ Then you should seriously consider switching to a Set, if possible. Checking if a Set contains an object is O(1), whereas it's O(n) for List. For more info, see this StackOverflow answer. \$\endgroup\$ Commented Oct 19, 2015 at 1:57

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.