I am creating a mock Twitter project which loads user data from a somewhat large text file containing ~3.6 million lines formatted like this:
0 12
0 32
1 9
1 54
2 33
etc...
The first string token is the userId
and the second is the followId
.
The first half of this helper method takes in the current user's ID, checks to see if it exists and creates a new user if necessary. After that, the followId
is added to this new or existing user's following list of type ArrayList<Integer>
.
With ~3.6 million lines to read, this doesn't take long (9868 ms).
Now the second half creates or finds the followed user (followId
) and adds the userId
to their followers list, but this additional code extends the amount of time to read the file exponentially (172744 ms).
I tried using the same TwitterUser
object throughout the method. All of the adding methods (follow
, addFollower
) are simple ArrayList.add()
methods. Is there anything I can do to make this method more efficient?
Please note: While this is school-related, I'm not asking for an answer to my solution. My professor permitted this slow object initialization, but I'd like to understand how I can make it faster.
private Map<Integer, TwitterUser> twitterUsers = new HashMap<Integer, TwitterUser>();
private void AddUser(int userId, int followId){
TwitterUser user = getUser(userId);
if (user == null){
user = new TwitterUser(userId);
user.follow(followId);
twitterUsers.putIfAbsent(userId, user);
} else{
user.follow(followId);
}
//adding the code below, slows the whole process enormously
user = getUser(followId);
if (user == null){
user = new TwitterUser(followId);
user.addFollower(userId);
twitterUsers.putIfAbsent(followId, user);
} else{
user.addFollower(userId);
}
}
private TwitterUser getUser(int id){
if (twitterUsers.isEmpty()) return null;
return twitterUsers.get(id);
}
1 Answer 1
If putIfAbsent(int, User)
does what you would expect it to do, that is: checking if it's there before inserting, why do you use it within an if
block whose condition already checks if the user is there?
In other words, if fetching a user returned a null
value you can safely assume that the user was not there.
Now I'm not too sure about the internal workings of the *putIfAbsent*
method (probably it would loop through the set of the keys in the map), but intuitively I would expect a normal put(int, User)
to perform better, even more with a map that gets as large as yours as the input file gets scanned through.
Therefore I would suggest to try something like:
user = getUser(followId);
if (user == null){
user = new TwitterUser(followId);
user.addFollower(userId);
twitterUsers.put(followId, user);
} else{
user.addFollower(userId);
}
which would apply to the first half as well.
twitterUsers.isEmpty()
in#getUser
,Map#get
returnsnull
if the specified key is not found. This won't substantially reduce runtime but it is redundant. Could you also post theTwitterUser#follow
andTwitterUser#addFollower
code?putIfAbsent
is a HashMap method andfollow
is simply: 1.) check if !ArrayList.contains(int) ... 2.) ArrayList.add(int) @JonnyHenlyisEmpty()
check.. silly of meputIfAbsent
was a method ofHashMap
. Have you considered using anotherHashMap
instead of anArrayList
, since the lookup time for anArrayList
is linear?