- If I'm right you could eliminate most of the threading-logic with an
ExecutorService
and anawaitTermination
call. The linked javadoc has some usage examples too. (Don't forget to check the return value ofawaitTermination
.)
(Effective Java, 2nd edition, Item 47: Know and use the libraries)
int max
is a little bit mysterious. What does it maximize? I'd put that into the name.urlQqueue
is used from multiple threads without proper synchronization (urlQqueue.poll()
), it's not thread-safe.Queue
has other, thread-safe implementations, I'd use one of them instead ofLinkedList
.
You could use a BlockingQueue
to avoid loops with Thread.yield()
(which is rarely approprite, according to the javadoc):
while (bot.currentURLNum < bot.MAXURL) { String url = bot.urlQqueue.poll(); if (url == null) { Thread.yield(); continue; }
currentURLNum
is also not properly synchronized, it is read by thewhile
statement without synchronization.
[...] synchronization has no effect unless both read and write operations are synchronized.
From Effective Java, 2nd Edition, Item 66: Synchronize access to shared mutable data.
- The reference type could be simply
Map<String, Integer>
here:
private HashMap<String, Integer> wordFrequency = new HashMap<String, Integer>();
See also: Effective Java, 2nd edition, Item 52: Refer to objects by their interfaces
- Actually, you could use a
Multiset
for counting the words. (Here is a similar example Here is a similar example)
You need some more synchronization for
WordExtractor
, since the underlyingHashMap
is not thread safe.printTopWords
might not see your latest data.foundWord
is alsopublic
, so if anyone call he or she could get weird results too.Gauava has a helper method (
copyHighestCountFirst
) for multisets which might be very similar to the algorithm in theprintTopWords
method.
(Effective Java, 2nd edition, Item 47: Know and use the libraries The author mentions only the JDK's built-in libraries but I think the reasoning could be true for other libraries too.)
- If I'm right you could eliminate most of the threading-logic with an
ExecutorService
and anawaitTermination
call. The linked javadoc has some usage examples too. (Don't forget to check the return value ofawaitTermination
.)
(Effective Java, 2nd edition, Item 47: Know and use the libraries)
int max
is a little bit mysterious. What does it maximize? I'd put that into the name.urlQqueue
is used from multiple threads without proper synchronization (urlQqueue.poll()
), it's not thread-safe.Queue
has other, thread-safe implementations, I'd use one of them instead ofLinkedList
.
You could use a BlockingQueue
to avoid loops with Thread.yield()
(which is rarely approprite, according to the javadoc):
while (bot.currentURLNum < bot.MAXURL) { String url = bot.urlQqueue.poll(); if (url == null) { Thread.yield(); continue; }
currentURLNum
is also not properly synchronized, it is read by thewhile
statement without synchronization.
[...] synchronization has no effect unless both read and write operations are synchronized.
From Effective Java, 2nd Edition, Item 66: Synchronize access to shared mutable data.
- The reference type could be simply
Map<String, Integer>
here:
private HashMap<String, Integer> wordFrequency = new HashMap<String, Integer>();
See also: Effective Java, 2nd edition, Item 52: Refer to objects by their interfaces
- Actually, you could use a
Multiset
for counting the words. (Here is a similar example)
You need some more synchronization for
WordExtractor
, since the underlyingHashMap
is not thread safe.printTopWords
might not see your latest data.foundWord
is alsopublic
, so if anyone call he or she could get weird results too.Gauava has a helper method (
copyHighestCountFirst
) for multisets which might be very similar to the algorithm in theprintTopWords
method.
(Effective Java, 2nd edition, Item 47: Know and use the libraries The author mentions only the JDK's built-in libraries but I think the reasoning could be true for other libraries too.)
- If I'm right you could eliminate most of the threading-logic with an
ExecutorService
and anawaitTermination
call. The linked javadoc has some usage examples too. (Don't forget to check the return value ofawaitTermination
.)
(Effective Java, 2nd edition, Item 47: Know and use the libraries)
int max
is a little bit mysterious. What does it maximize? I'd put that into the name.urlQqueue
is used from multiple threads without proper synchronization (urlQqueue.poll()
), it's not thread-safe.Queue
has other, thread-safe implementations, I'd use one of them instead ofLinkedList
.
You could use a BlockingQueue
to avoid loops with Thread.yield()
(which is rarely approprite, according to the javadoc):
while (bot.currentURLNum < bot.MAXURL) { String url = bot.urlQqueue.poll(); if (url == null) { Thread.yield(); continue; }
currentURLNum
is also not properly synchronized, it is read by thewhile
statement without synchronization.
[...] synchronization has no effect unless both read and write operations are synchronized.
From Effective Java, 2nd Edition, Item 66: Synchronize access to shared mutable data.
- The reference type could be simply
Map<String, Integer>
here:
private HashMap<String, Integer> wordFrequency = new HashMap<String, Integer>();
See also: Effective Java, 2nd edition, Item 52: Refer to objects by their interfaces
- Actually, you could use a
Multiset
for counting the words. (Here is a similar example)
You need some more synchronization for
WordExtractor
, since the underlyingHashMap
is not thread safe.printTopWords
might not see your latest data.foundWord
is alsopublic
, so if anyone call he or she could get weird results too.Gauava has a helper method (
copyHighestCountFirst
) for multisets which might be very similar to the algorithm in theprintTopWords
method.
(Effective Java, 2nd edition, Item 47: Know and use the libraries The author mentions only the JDK's built-in libraries but I think the reasoning could be true for other libraries too.)
- If I'm right you could eliminate most of the threading-logic with an
ExecutorService
and anawaitTermination
call. The linked javadoc has some usage examples too. (Don't forget to check the return value ofawaitTermination
.)
(Effective Java, 2nd edition, Item 47: Know and use the libraries)
int max
is a little bit mysterious. What does it maximize? I'd put that into the name.urlQqueue
is used from multiple threads without proper synchronization (urlQqueue.poll()
), it's not thread-safe.Queue
has other, thread-safe implementations, I'd use one of them instead ofLinkedList
.
You could use a BlockingQueue
to avoid loops with Thread.yield()
(which is rarely approprite, according to the javadoc):
while (bot.currentURLNum < bot.MAXURL) { String url = bot.urlQqueue.poll(); if (url == null) { Thread.yield(); continue; }
currentURLNum
is also not properly synchronized, it is read by thewhile
statement without synchronization.
[...] synchronization has no effect unless both read and write operations are synchronized.
From Effective Java, 2nd Edition, Item 66: Synchronize access to shared mutable data.
- The reference type could be simply
Map<String, Integer>
here:
private HashMap<String, Integer> wordFrequency = new HashMap<String, Integer>();
See also: Effective Java, 2nd edition, Item 52: Refer to objects by their interfaces
- Actually, you could use a
Multiset
for counting the words. (Here is a similar example)
You need some more synchronization for
WordExtractor
, since the underlyingHashMap
is not thread safe.printTopWords
might not see your latest data.foundWord
is alsopublic
, so if anyone call he or she could get weird results too.Gauava has a helper method (
copyHighestCountFirst
) for multisets which might be very similar to the algorithm in theprintTopWords
method.
(Effective Java, 2nd edition, Item 47: Know and use the libraries The author mentions only the JDK's built-in libraries but I think the reasoning could be true for other libraries too.)