I have this code for reading Twitter users from a text file represented by integers in the format "int (space) int", where the first int
is the ID of
the user doing the following and the second int
is the user being followed. It works for small files but takes a really long time with larger files. What can I do to make it more efficient?
users
is an ArrayList
public void open() {
try {
if (file.exists()) {
FileReader fileReader = new FileReader(file);
BufferedReader bufferedReader = new BufferedReader(fileReader);
String line;
String[] strings;
while ((line = bufferedReader.readLine()) != null) {
strings = line.split(" ");
String user1 = strings[0];
String user2 = strings[1];
if (!users.toString().contains(user1)) {
TwitterUser tUser1 = new TwitterUser(
Integer.parseInt(user1));
users.add(tUser1);
TwitterUser tUser2 = new TwitterUser(
Integer.parseInt(user2));
tUser1.follow(tUser2);
} else {
TwitterUser target = null;
for (TwitterUser i : users)
if (i.toString().equals(user1))
target = i;
TwitterUser tUser3 = new TwitterUser(
Integer.parseInt(user2));
target.follow(tUser3);
}
}
fileReader.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
1 Answer 1
users.toString().contains(user1)
This is not a very optimal way of checking if a user from the file is in your users
object. For starters, ArrayList.toString()
will generate the String
representation every time, and you may run into false positives. For example, if the toString()
representation of a TwitterUser
contains a comma, which is the delimiter used by ArrayList
:
String user1 = user1.toString(); // = "John, Doe"
String user2 = user2.toString(); // = "Jane, Doe"
// Just to make sure we are using the ArrayList implementation
List<String> users = new ArrayList<>(Arrays.asList(user1, user2));
String allUsers = users.toString(); // = "[John, Doe, Jane, Doe]"
allUsers.contains("Doe, Jane"); // = true
Sure, you are actually comparing numeric values, but consider this then:
String user11 = user11.toString(); // = "11"
String user22 = user22.toString(); // = "22"
// Just to make sure we are using the ArrayList implementation
List<String> users = new ArrayList<>(Arrays.asList(user11, user22));
String allUsers = users.toString(); // = "[11, 22]"
allUsers.contains("1"); // = true
The point here is that ideally, your TwitterUser
class should be constructed first so that you can use one of its method to test for equivalence, using equals(Object)
for example:
public class TwitterUser {
// ...
public boolean equals(Object o) {
// assuming id is a primitive value
return o instanceof TwitterUser && ((TwitterUser) o).id == id;
}
}
Again, you should avoid using toString()
-based comparison below for testing equivalence due to the 'false positives' point. Furthermore, you should not skimp on braces ({ ... }
), and break
early once you have a match:
for (TwitterUser i : users) {
TwitterUser twitterUser = // ...
if (i.equals(twitterUser)) {
target = i;
break;
}
}
Hold your horses! If you have implemented TwitterUser.equals()
to compare by the ID, you wouldn't even need the for
-loop:
TwitterUser user = new TwitterUser(Integer.parseInt(user1));
if (!users.contains(user)) {
users.add(user);
user.follow(new TwitterUser(Integer.parseInt(user2)));
} else {
TwitterUser currentUser = users.get(users.indexOf(user));
currentUser.follow(new TwitterUser(Integer.parseInt(user2)));
}
To make the performance better, you can consider using a Set
instead of a List
.
To avoid the huge nested if
-block after if (file.exists())
, consider return
-ing early:
public void open() {
if (!file.exists()) {
return;
}
try {
// ...
}
}
Ideally, the File
object should be passed to this method so that the method callers know what exact file is being opened here.
edit: If you claim that the processing still tends to infinity, consider reworking your process:
- Read all the lines from the file (even if it runs into the millions, it's just two numbers per line).
- Use a number of background threads to process the lines concurrently. You'll need a
Runnable
/Callable
implementation, anExecutorService
and a thread-safeCollection
implementation. This is probably more suited for another question/answer though...
-
\$\begingroup\$ I tried using your code and my program takes still takes infinite time to run. Could you show me an example of what the final code should look like? Thanks. \$\endgroup\$shishi– shishi2015年10月19日 01:32:40 +00:00Commented Oct 19, 2015 at 1:32
-
\$\begingroup\$ Define infinite. How large is your file? Millions of lines? \$\endgroup\$h.j.k.– h.j.k.2015年10月19日 01:33:18 +00:00Commented Oct 19, 2015 at 1:33
-
\$\begingroup\$ I'ts a 160MB text file. I'm saying infinite because it never actually runs, it keeps on loading forever. \$\endgroup\$shishi– shishi2015年10月19日 01:35:50 +00:00Commented Oct 19, 2015 at 1:35
-
\$\begingroup\$ See my edit. You may want to rework your process. \$\endgroup\$h.j.k.– h.j.k.2015年10月19日 01:36:55 +00:00Commented Oct 19, 2015 at 1:36
-
1\$\begingroup\$ Then you should seriously consider switching to a
Set
, if possible. Checking if aSet
contains an object isO(1)
, whereas it'sO(n)
forList
. For more info, see this StackOverflow answer. \$\endgroup\$h.j.k.– h.j.k.2015年10月19日 01:57:46 +00:00Commented Oct 19, 2015 at 1:57
users
. \$\endgroup\$