Skip to main content
Code Review

Return to Answer

Commonmark migration
Source Link

I would simplify your solution a little bit. Some classes are unnecessary.

##Sugestion for CrawlerApp

Sugestion for CrawlerApp

private static final Integer WORKER_LIMIT = 10;
private static final BlockingQueue queue = new LinkedBlockingQueue<Runnable>();
 // number of active threads...
private static final AtomicInteger NUMBER_ACTIVE_THREADS= new AtomicInteger(0);
private final static ExecutorService executor = new ThreadPoolExecutor(WORKER_LIMIT, WORKER_LIMIT, 0L,
 TimeUnit.MILLISECONDS, queue);
private static Crawler crawler;
public static void main(String[] args) throws InterruptedException {
 crawler = new Crawler();
 initializeApp();
 startCrawling();
}
private static void startCrawling() throws InterruptedException {
 while (ATOMIC_INTEGER.intValue() > 0 || !queue.isEmpty()){
 Thread.sleep(100);
 }
 executor.shutdown();
 executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
}
private static void initializeApp() {
 Properties config = new Properties();
 try {
 config.load(CrawlerApp.class.getClassLoader().getResourceAsStream("url-horizon.properties"));
 String[] horizon = config.getProperty("urls").split(",");
 for (String link : horizon) {
 URL url = new URL(link);
 executor.submit(new CrawlerTask(url, crawler, executor, NUMBER_ACTIVE_THREADS)); // don't forget to increase/decrease atomic_integer.
 }
 } catch (IOException e) {
 e.printStackTrace();
 }
}

There is a huge contention problem in CrawlerTask. Reduce the synchronized area. Better visit twice a page than block all the execution.

CrawlerTask

 private void crawlTask() {
 if (crawler.getUrlVisited().contains(url)){
 return ;
 }
 new Scraper().scrape(url);
 crawler.addURLToVisited(url);
}

I would simplify your solution a little bit. Some classes are unnecessary.

##Sugestion for CrawlerApp

private static final Integer WORKER_LIMIT = 10;
private static final BlockingQueue queue = new LinkedBlockingQueue<Runnable>();
 // number of active threads...
private static final AtomicInteger NUMBER_ACTIVE_THREADS= new AtomicInteger(0);
private final static ExecutorService executor = new ThreadPoolExecutor(WORKER_LIMIT, WORKER_LIMIT, 0L,
 TimeUnit.MILLISECONDS, queue);
private static Crawler crawler;
public static void main(String[] args) throws InterruptedException {
 crawler = new Crawler();
 initializeApp();
 startCrawling();
}
private static void startCrawling() throws InterruptedException {
 while (ATOMIC_INTEGER.intValue() > 0 || !queue.isEmpty()){
 Thread.sleep(100);
 }
 executor.shutdown();
 executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
}
private static void initializeApp() {
 Properties config = new Properties();
 try {
 config.load(CrawlerApp.class.getClassLoader().getResourceAsStream("url-horizon.properties"));
 String[] horizon = config.getProperty("urls").split(",");
 for (String link : horizon) {
 URL url = new URL(link);
 executor.submit(new CrawlerTask(url, crawler, executor, NUMBER_ACTIVE_THREADS)); // don't forget to increase/decrease atomic_integer.
 }
 } catch (IOException e) {
 e.printStackTrace();
 }
}

There is a huge contention problem in CrawlerTask. Reduce the synchronized area. Better visit twice a page than block all the execution.

CrawlerTask

 private void crawlTask() {
 if (crawler.getUrlVisited().contains(url)){
 return ;
 }
 new Scraper().scrape(url);
 crawler.addURLToVisited(url);
}

I would simplify your solution a little bit. Some classes are unnecessary.

Sugestion for CrawlerApp

private static final Integer WORKER_LIMIT = 10;
private static final BlockingQueue queue = new LinkedBlockingQueue<Runnable>();
 // number of active threads...
private static final AtomicInteger NUMBER_ACTIVE_THREADS= new AtomicInteger(0);
private final static ExecutorService executor = new ThreadPoolExecutor(WORKER_LIMIT, WORKER_LIMIT, 0L,
 TimeUnit.MILLISECONDS, queue);
private static Crawler crawler;
public static void main(String[] args) throws InterruptedException {
 crawler = new Crawler();
 initializeApp();
 startCrawling();
}
private static void startCrawling() throws InterruptedException {
 while (ATOMIC_INTEGER.intValue() > 0 || !queue.isEmpty()){
 Thread.sleep(100);
 }
 executor.shutdown();
 executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
}
private static void initializeApp() {
 Properties config = new Properties();
 try {
 config.load(CrawlerApp.class.getClassLoader().getResourceAsStream("url-horizon.properties"));
 String[] horizon = config.getProperty("urls").split(",");
 for (String link : horizon) {
 URL url = new URL(link);
 executor.submit(new CrawlerTask(url, crawler, executor, NUMBER_ACTIVE_THREADS)); // don't forget to increase/decrease atomic_integer.
 }
 } catch (IOException e) {
 e.printStackTrace();
 }
}

There is a huge contention problem in CrawlerTask. Reduce the synchronized area. Better visit twice a page than block all the execution.

CrawlerTask

 private void crawlTask() {
 if (crawler.getUrlVisited().contains(url)){
 return ;
 }
 new Scraper().scrape(url);
 crawler.addURLToVisited(url);
}
removed edid-scars
Source Link
Vogel612
  • 25.5k
  • 7
  • 59
  • 141

I would simplify your solution a little bit. Some classes are unnecessary.

##Sugestion for CrawlerApp

private static final Integer WORKER_LIMIT = 10;
private static final BlockingQueue queue = new LinkedBlockingQueue<Runnable>();
 // number of active threads...
private static final AtomicInteger NUMBER_ACTIVE_THREADS= new AtomicInteger(0);
private final static ExecutorService executor = new ThreadPoolExecutor(WORKER_LIMIT, WORKER_LIMIT, 0L,
 TimeUnit.MILLISECONDS, queue);
private static Crawler crawler;
public static void main(String[] args) throws InterruptedException {
 crawler = new Crawler();
 initializeApp();
 startCrawling();
}
private static void startCrawling() throws InterruptedException {
 while (ATOMIC_INTEGER.intValue() > 0 || !queue.isEmpty()){
 Thread.sleep(100);
 }
 executor.shutdown();
 executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
}
private static void initializeApp() {
 Properties config = new Properties();
 try {
 config.load(CrawlerApp.class.getClassLoader().getResourceAsStream("url-horizon.properties"));
 String[] horizon = config.getProperty("urls").split(",");
 for (String link : horizon) {
 URL url = new URL(link);
 executor.submit(new CrawlerTask(url, crawler, executor, NUMBER_ACTIVE_THREADS)); // don't forget to increase/decrease atomic_integer.
 }
 } catch (IOException e) {
 e.printStackTrace();
 }
}

There is a huge contention problem in CrawlerTask. Reduce the synchronized area. Better visit twice a page than block all the execution.

CrawlerTask

 private void crawlTask() {
 if (crawler.getUrlVisited().contains(url)){
 return ;
 }
 new Scraper().scrape(url);
 crawler.addURLToVisited(url);
}

Edited

I changed the code a little bit.

I would simplify your solution a little bit. Some classes are unnecessary.

##Sugestion for CrawlerApp

private static final Integer WORKER_LIMIT = 10;
private static final BlockingQueue queue = new LinkedBlockingQueue<Runnable>();
 // number of active threads...
private static final AtomicInteger NUMBER_ACTIVE_THREADS= new AtomicInteger(0);
private final static ExecutorService executor = new ThreadPoolExecutor(WORKER_LIMIT, WORKER_LIMIT, 0L,
 TimeUnit.MILLISECONDS, queue);
private static Crawler crawler;
public static void main(String[] args) throws InterruptedException {
 crawler = new Crawler();
 initializeApp();
 startCrawling();
}
private static void startCrawling() throws InterruptedException {
 while (ATOMIC_INTEGER.intValue() > 0 || !queue.isEmpty()){
 Thread.sleep(100);
 }
 executor.shutdown();
 executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
}
private static void initializeApp() {
 Properties config = new Properties();
 try {
 config.load(CrawlerApp.class.getClassLoader().getResourceAsStream("url-horizon.properties"));
 String[] horizon = config.getProperty("urls").split(",");
 for (String link : horizon) {
 URL url = new URL(link);
 executor.submit(new CrawlerTask(url, crawler, executor, NUMBER_ACTIVE_THREADS)); // don't forget to increase/decrease atomic_integer.
 }
 } catch (IOException e) {
 e.printStackTrace();
 }
}

There is a huge contention problem in CrawlerTask. Reduce the synchronized area. Better visit twice a page than block all the execution.

CrawlerTask

 private void crawlTask() {
 if (crawler.getUrlVisited().contains(url)){
 return ;
 }
 new Scraper().scrape(url);
 crawler.addURLToVisited(url);
}

Edited

I changed the code a little bit.

I would simplify your solution a little bit. Some classes are unnecessary.

##Sugestion for CrawlerApp

private static final Integer WORKER_LIMIT = 10;
private static final BlockingQueue queue = new LinkedBlockingQueue<Runnable>();
 // number of active threads...
private static final AtomicInteger NUMBER_ACTIVE_THREADS= new AtomicInteger(0);
private final static ExecutorService executor = new ThreadPoolExecutor(WORKER_LIMIT, WORKER_LIMIT, 0L,
 TimeUnit.MILLISECONDS, queue);
private static Crawler crawler;
public static void main(String[] args) throws InterruptedException {
 crawler = new Crawler();
 initializeApp();
 startCrawling();
}
private static void startCrawling() throws InterruptedException {
 while (ATOMIC_INTEGER.intValue() > 0 || !queue.isEmpty()){
 Thread.sleep(100);
 }
 executor.shutdown();
 executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
}
private static void initializeApp() {
 Properties config = new Properties();
 try {
 config.load(CrawlerApp.class.getClassLoader().getResourceAsStream("url-horizon.properties"));
 String[] horizon = config.getProperty("urls").split(",");
 for (String link : horizon) {
 URL url = new URL(link);
 executor.submit(new CrawlerTask(url, crawler, executor, NUMBER_ACTIVE_THREADS)); // don't forget to increase/decrease atomic_integer.
 }
 } catch (IOException e) {
 e.printStackTrace();
 }
}

There is a huge contention problem in CrawlerTask. Reduce the synchronized area. Better visit twice a page than block all the execution.

CrawlerTask

 private void crawlTask() {
 if (crawler.getUrlVisited().contains(url)){
 return ;
 }
 new Scraper().scrape(url);
 crawler.addURLToVisited(url);
}
added 151 characters in body
Source Link
rdllopes
  • 868
  • 4
  • 15

I would simplify your solution a little bit. Some classes are unnecessary.

##Sugestion for CrawlerApp

private static final Integer WORKER_LIMIT = 10;
private static final BlockingQueue queue = new LinkedBlockingQueue<Runnable>();
 // number of active threads...
private static final AtomicInteger ATOMIC_INTEGER =NUMBER_ACTIVE_THREADS= new AtomicInteger(0);
private final static ExecutorService executor = new ThreadPoolExecutor(WORKER_LIMIT, WORKER_LIMIT, 0L,
 TimeUnit.MILLISECONDS, queue);
private static Crawler crawler;
public static void main(String[] args) throws InterruptedException {
 crawler = new Crawler();
 initializeApp();
 startCrawling();
}
private static void startCrawling() throws InterruptedException {
 while (ATOMIC_INTEGER.intValue() > 0 || !queue.isEmpty()){
 Thread.sleep(100);
 }
 executor.shutdown();
 executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
}
private static void initializeApp() {
 Properties config = new Properties();
 try {
 config.load(CrawlerApp.class.getClassLoader().getResourceAsStream("url-horizon.properties"));
 String[] horizon = config.getProperty("urls").split(",");
 for (String link : horizon) {
 URL url = new URL(link);
 executor.submit(new CrawlerTask(url, crawler, queueexecutor, ATOMIC_INTEGERNUMBER_ACTIVE_THREADS)); // don't forget to increase/decrease atomic_integer.
 }
 } catch (IOException e) {
 e.printStackTrace();
 }
}

There is a huge contention problem in CrawlerTask. Reduce the synchronized area. Better visit twice a page thenthan block all the execution.

CrawlerTask

 private void crawlTask() {
 if (crawler.getUrlVisited().contains(url)){
 return ;
 }
 new Scraper().scrape(url);
 crawler.addURLToVisited(url);
}

Edited

I changed the code a little bit.

I would simplify your solution a little bit. Some classes are unnecessary.

##Sugestion for CrawlerApp

private static final Integer WORKER_LIMIT = 10;
private static final BlockingQueue queue = new LinkedBlockingQueue<Runnable>();
private static final AtomicInteger ATOMIC_INTEGER = new AtomicInteger(0);
private final static ExecutorService executor = new ThreadPoolExecutor(WORKER_LIMIT, WORKER_LIMIT, 0L,
 TimeUnit.MILLISECONDS, queue);
private static Crawler crawler;
public static void main(String[] args) throws InterruptedException {
 crawler = new Crawler();
 initializeApp();
 startCrawling();
}
private static void startCrawling() throws InterruptedException {
 while (ATOMIC_INTEGER.intValue() > 0 || !queue.isEmpty()){
 Thread.sleep(100);
 }
 executor.shutdown();
 executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
}
private static void initializeApp() {
 Properties config = new Properties();
 try {
 config.load(CrawlerApp.class.getClassLoader().getResourceAsStream("url-horizon.properties"));
 String[] horizon = config.getProperty("urls").split(",");
 for (String link : horizon) {
 URL url = new URL(link);
 executor.submit(new CrawlerTask(url, crawler, queue, ATOMIC_INTEGER));
 }
 } catch (IOException e) {
 e.printStackTrace();
 }
}

There is a huge contention problem in CrawlerTask. Reduce the synchronized area. Better visit twice a page then block all the execution.

CrawlerTask

 private void crawlTask() {
 if (crawler.getUrlVisited().contains(url)){
 return ;
 }
 new Scraper().scrape(url);
 crawler.addURLToVisited(url);
}

I would simplify your solution a little bit. Some classes are unnecessary.

##Sugestion for CrawlerApp

private static final Integer WORKER_LIMIT = 10;
private static final BlockingQueue queue = new LinkedBlockingQueue<Runnable>();
 // number of active threads...
private static final AtomicInteger NUMBER_ACTIVE_THREADS= new AtomicInteger(0);
private final static ExecutorService executor = new ThreadPoolExecutor(WORKER_LIMIT, WORKER_LIMIT, 0L,
 TimeUnit.MILLISECONDS, queue);
private static Crawler crawler;
public static void main(String[] args) throws InterruptedException {
 crawler = new Crawler();
 initializeApp();
 startCrawling();
}
private static void startCrawling() throws InterruptedException {
 while (ATOMIC_INTEGER.intValue() > 0 || !queue.isEmpty()){
 Thread.sleep(100);
 }
 executor.shutdown();
 executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
}
private static void initializeApp() {
 Properties config = new Properties();
 try {
 config.load(CrawlerApp.class.getClassLoader().getResourceAsStream("url-horizon.properties"));
 String[] horizon = config.getProperty("urls").split(",");
 for (String link : horizon) {
 URL url = new URL(link);
 executor.submit(new CrawlerTask(url, crawler, executor, NUMBER_ACTIVE_THREADS)); // don't forget to increase/decrease atomic_integer.
 }
 } catch (IOException e) {
 e.printStackTrace();
 }
}

There is a huge contention problem in CrawlerTask. Reduce the synchronized area. Better visit twice a page than block all the execution.

CrawlerTask

 private void crawlTask() {
 if (crawler.getUrlVisited().contains(url)){
 return ;
 }
 new Scraper().scrape(url);
 crawler.addURLToVisited(url);
}

Edited

I changed the code a little bit.

Fixing problem with ending task
Source Link
rdllopes
  • 868
  • 4
  • 15
Loading
Source Link
rdllopes
  • 868
  • 4
  • 15
Loading
lang-java

AltStyle によって変換されたページ (->オリジナル) /