Skip to main content
Code Review

Return to Answer

replaced http://codereview.stackexchange.com/ with https://codereview.stackexchange.com/
Source Link
  1. If I'm right you could eliminate most of the threading-logic with an ExecutorService and an awaitTermination call. The linked javadoc has some usage examples too. (Don't forget to check the return value of awaitTermination.)

(Effective Java, 2nd edition, Item 47: Know and use the libraries)

  1. int max is a little bit mysterious. What does it maximize? I'd put that into the name.

  2. urlQqueue is used from multiple threads without proper synchronization (urlQqueue.poll()), it's not thread-safe. Queue has other, thread-safe implementations, I'd use one of them instead of LinkedList.

You could use a BlockingQueue to avoid loops with Thread.yield() (which is rarely approprite, according to the javadoc):

while (bot.currentURLNum < bot.MAXURL) {
 String url = bot.urlQqueue.poll();
 if (url == null) {
 Thread.yield();
 continue;
 }
  1. currentURLNum is also not properly synchronized, it is read by the while statement without synchronization.

[...] synchronization has no effect unless both read and write operations are synchronized.

From Effective Java, 2nd Edition, Item 66: Synchronize access to shared mutable data.

  1. The reference type could be simply Map<String, Integer> here:
private HashMap<String, Integer> wordFrequency = new HashMap<String, Integer>();

See also: Effective Java, 2nd edition, Item 52: Refer to objects by their interfaces

  1. Actually, you could use a Multiset for counting the words. (Here is a similar example Here is a similar example)
  1. You need some more synchronization for WordExtractor, since the underlying HashMap is not thread safe. printTopWords might not see your latest data. foundWord is also public, so if anyone call he or she could get weird results too.

  2. Gauava has a helper method (copyHighestCountFirst) for multisets which might be very similar to the algorithm in the printTopWords method.

(Effective Java, 2nd edition, Item 47: Know and use the libraries The author mentions only the JDK's built-in libraries but I think the reasoning could be true for other libraries too.)

  1. If I'm right you could eliminate most of the threading-logic with an ExecutorService and an awaitTermination call. The linked javadoc has some usage examples too. (Don't forget to check the return value of awaitTermination.)

(Effective Java, 2nd edition, Item 47: Know and use the libraries)

  1. int max is a little bit mysterious. What does it maximize? I'd put that into the name.

  2. urlQqueue is used from multiple threads without proper synchronization (urlQqueue.poll()), it's not thread-safe. Queue has other, thread-safe implementations, I'd use one of them instead of LinkedList.

You could use a BlockingQueue to avoid loops with Thread.yield() (which is rarely approprite, according to the javadoc):

while (bot.currentURLNum < bot.MAXURL) {
 String url = bot.urlQqueue.poll();
 if (url == null) {
 Thread.yield();
 continue;
 }
  1. currentURLNum is also not properly synchronized, it is read by the while statement without synchronization.

[...] synchronization has no effect unless both read and write operations are synchronized.

From Effective Java, 2nd Edition, Item 66: Synchronize access to shared mutable data.

  1. The reference type could be simply Map<String, Integer> here:
private HashMap<String, Integer> wordFrequency = new HashMap<String, Integer>();

See also: Effective Java, 2nd edition, Item 52: Refer to objects by their interfaces

  1. Actually, you could use a Multiset for counting the words. (Here is a similar example)
  1. You need some more synchronization for WordExtractor, since the underlying HashMap is not thread safe. printTopWords might not see your latest data. foundWord is also public, so if anyone call he or she could get weird results too.

  2. Gauava has a helper method (copyHighestCountFirst) for multisets which might be very similar to the algorithm in the printTopWords method.

(Effective Java, 2nd edition, Item 47: Know and use the libraries The author mentions only the JDK's built-in libraries but I think the reasoning could be true for other libraries too.)

  1. If I'm right you could eliminate most of the threading-logic with an ExecutorService and an awaitTermination call. The linked javadoc has some usage examples too. (Don't forget to check the return value of awaitTermination.)

(Effective Java, 2nd edition, Item 47: Know and use the libraries)

  1. int max is a little bit mysterious. What does it maximize? I'd put that into the name.

  2. urlQqueue is used from multiple threads without proper synchronization (urlQqueue.poll()), it's not thread-safe. Queue has other, thread-safe implementations, I'd use one of them instead of LinkedList.

You could use a BlockingQueue to avoid loops with Thread.yield() (which is rarely approprite, according to the javadoc):

while (bot.currentURLNum < bot.MAXURL) {
 String url = bot.urlQqueue.poll();
 if (url == null) {
 Thread.yield();
 continue;
 }
  1. currentURLNum is also not properly synchronized, it is read by the while statement without synchronization.

[...] synchronization has no effect unless both read and write operations are synchronized.

From Effective Java, 2nd Edition, Item 66: Synchronize access to shared mutable data.

  1. The reference type could be simply Map<String, Integer> here:
private HashMap<String, Integer> wordFrequency = new HashMap<String, Integer>();

See also: Effective Java, 2nd edition, Item 52: Refer to objects by their interfaces

  1. Actually, you could use a Multiset for counting the words. (Here is a similar example)
  1. You need some more synchronization for WordExtractor, since the underlying HashMap is not thread safe. printTopWords might not see your latest data. foundWord is also public, so if anyone call he or she could get weird results too.

  2. Gauava has a helper method (copyHighestCountFirst) for multisets which might be very similar to the algorithm in the printTopWords method.

(Effective Java, 2nd edition, Item 47: Know and use the libraries The author mentions only the JDK's built-in libraries but I think the reasoning could be true for other libraries too.)

Source Link
palacsint
  • 30.3k
  • 9
  • 82
  • 157
  1. If I'm right you could eliminate most of the threading-logic with an ExecutorService and an awaitTermination call. The linked javadoc has some usage examples too. (Don't forget to check the return value of awaitTermination.)

(Effective Java, 2nd edition, Item 47: Know and use the libraries)

  1. int max is a little bit mysterious. What does it maximize? I'd put that into the name.

  2. urlQqueue is used from multiple threads without proper synchronization (urlQqueue.poll()), it's not thread-safe. Queue has other, thread-safe implementations, I'd use one of them instead of LinkedList.

You could use a BlockingQueue to avoid loops with Thread.yield() (which is rarely approprite, according to the javadoc):

while (bot.currentURLNum < bot.MAXURL) {
 String url = bot.urlQqueue.poll();
 if (url == null) {
 Thread.yield();
 continue;
 }
  1. currentURLNum is also not properly synchronized, it is read by the while statement without synchronization.

[...] synchronization has no effect unless both read and write operations are synchronized.

From Effective Java, 2nd Edition, Item 66: Synchronize access to shared mutable data.

  1. The reference type could be simply Map<String, Integer> here:
private HashMap<String, Integer> wordFrequency = new HashMap<String, Integer>();

See also: Effective Java, 2nd edition, Item 52: Refer to objects by their interfaces

  1. Actually, you could use a Multiset for counting the words. (Here is a similar example)
  1. You need some more synchronization for WordExtractor, since the underlying HashMap is not thread safe. printTopWords might not see your latest data. foundWord is also public, so if anyone call he or she could get weird results too.

  2. Gauava has a helper method (copyHighestCountFirst) for multisets which might be very similar to the algorithm in the printTopWords method.

(Effective Java, 2nd edition, Item 47: Know and use the libraries The author mentions only the JDK's built-in libraries but I think the reasoning could be true for other libraries too.)

lang-java

AltStyle によって変換されたページ (->オリジナル) /