2
\$\begingroup\$

This method in my code uses a producer-consumer design to read many files and process them for use in an NLP algorithm. The completion service collects files as they are processed, which means that docId != index. Given that I need to retrieve probabilities elsewhere where docID == index, I've implemented a priority queue to sort. However, I do this by sequentially retrieving documents from a list of futures and then putting them into the queue.

I know this implementation isn't great. I keep staring at the concurrentskiplistset thinking I should be able to implement it for better performance and not have these issues. I'm brand new to concurrency and Java, so I'm sure there's a lot to critique.

How can I make this multithreading more efficient?

Vectorizer:

public class Vectorizer {
public Vectorizer(){
}
public PriorityQueue<Document> readAll(File fileDir) 
throws InterruptedException, ExecutionException, IOException{
 //read each file. When each file is vectorized, put it in a minibatch.
 //producer-consumer threading structure.
 int NUM_THREADS = Runtime.getRuntime().availableProcessors();
 BlockingQueue<LinkedList<String>> queue = new ArrayBlockingQueue<>(50);
 ExecutorService service = Executors.newFixedThreadPool(NUM_THREADS);
 CompletionService<Document> completionService = 
 new ExecutorCompletionService<Document>(service);
 //TODO: is list the most efficient structure to handle future?
 List<Future<Document>> docs = new ArrayList<Future<Document>>();
 for (int i = 0; i < (NUM_THREADS - 1); i++) {
 docs.add(completionService.submit(new DocumentConsumer(queue)));
 }
 // Wait for ReadFile to complete
 service.submit(new ReadFile(queue, fileDir)).get();
 service.shutdownNow(); // interrupt CPUTasks
 // Wait for DocumentConsumer to complete
 service.awaitTermination(365, TimeUnit.DAYS);
 //do things with processed docs.
 //should I be doing this, though?
 PriorityQueue<Document> Documents = new PriorityQueue<Document>();
 for(Future<Document> d : docs){
 try{
 Document doc = d.get();
 Documents.add(doc);
 System.out.println(Integer.toString(doc.Cj));
 } catch(ExecutionException e) {
 e.getCause();e.printStackTrace();
 }
 }
 return Documents;
}

Document object (without compare and overload):

public class Document implements Comparator<Document>{
int docId, Cj;
HashMap<String,Integer> termTable;
public Document(int id,int Cj, HashMap<String,Integer> termMap){
 this.docId = id;
 this.Cj = Cj;
 this.termTable = termMap;
}

docId is set during file reading. I read in a file and append the count to the first line, which is then set during processing.

The rest of the code is available here.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Aug 5, 2014 at 3:48
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

I didn't look at the specific performance issue you described. However, my general review:

Readability

for (int i = 0; i < (NUM_THREADS - 1); i++) {
 docs.add(completionService.submit(new DocumentConsumer(queue)));
}

Starting at 0 and ending at somevalue - 1. Why don't you start at 1? You don't even do anything with the index, so it's not like it matters.

 e.getCause();e.printStackTrace();

Don't put multiple statements on the same line. It turns your code into a wall of code. You think understanding a wall of text is hard? Try a wall of code...

Naming

Use camelCase for methods and variables. Use PascalCase for types. Use ALL_CAPS for constants. This means that...

int NUM_THREADS = Runtime.getRuntime().availableProcessors();

should be final and

PriorityQueue<Document> Documents = new PriorityQueue<Document>();

should be renamed to documents.

public Document(int id,int Cj, HashMap<String,Integer> termMap){

Should rename Cj to cj. ... And even better would be to rename it to something that explains what it is. Right now I have no idea what a cj is.

answered Aug 7, 2014 at 8:17
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.