Optimization of Barnes-Hut Multithread Insertion algorithm

Question 1

I'm currently working on optimizing a Barnes-Hut implementation (connected to a previous post I did) and I need help to optimize it further.

After some testing, the main problem appears to be in the Insertion algorithm, which takes roughly 3-10 times as long as the second longest part of the program.

The basis of the program are a number of threads that each act as a worker that does the following when building the tree:

//particle[] is a list that is set in the constructor
public void run(){
 boolean run = true;
 while(run){
 i = barnesHut.getIndex();
 if(i < particle.length){ tree.insertThreaded(particle[i], thread); }
 else{ run = false; }
 }
}

the getIndex function can be found in the BarnesHut class which handles the start of the program and the insertion algorithm. It's synchronized to work as a

public synchronized int getIndex(){
 return index++;
}

Now, the insertion algorithm looks like this:

//entryLock and quadLock are two atomic mutex locks
//childs[][] are the leaves to the tree structur that makes up a Barnes-Hut
public void insertThreaded(BHBody body, int threadID){
 entryLock.lock();
 numberOfParticles++;
 if(numberOfParticles == 1){
 existingParticle = body;
 entryLock.unlock();
 }
 else if(numberOfParticles == 2){
 quadLock.lock();
 entryLock.unlock();
 int[] position = getQuadForBody(existingParticle);
 if(childs[position[0]][position[1]] == null){
 startQuad(position[0],position[1]);
 }
 quadLock.unlock();
 childs[ position[0] ][ position[1] ].insertThreaded(existingParticle, threadID);
 quadLock.lock();
 position = getQuadForBody(body);
 if(childs[position[0]][position[1]] == null){
 startQuad(position[0],position[1]);
 }
 quadLock.unlock();
 childs[ position[0] ][ position[1] ].insertThreaded(body, threadID);
 }
 else if(numberOfParticles > 2){
 entryLock.unlock();
 quadLock.lock();
 int[] position = getQuadForBody(body);
 if(childs[position[0]][position[1]] == null){
 startQuad(position[0],position[1]);
 }
 quadLock.unlock();
 childs[position[0]][position[1]].insertThreaded(body, threadID); 
 }
}

The locks are to make sure that no other thread gets into the way of the thread when it's creating something vital, such as a new leaf or even trying to update the number of particles.

The startQuad() function which creates a new leaf for the current node:

public void startQuad(int x, int y){
 if(x == 0 && y == 0)
 childs[0][0] = new BHnode(xStart,xMid,yMid,yEnd, debug, range,correction);
 if(x == 0 && y == 1)
 childs[0][1] = new BHnode(xStart,xMid,yStart,yMid, debug, range,correction);
 if(x == 1 && y == 0)
 childs[1][0] = new BHnode(xMid,xEnd, yMid,yEnd, debug, range,correction);
 if(x == 1 && y == 1)
 childs[1][1] = new BHnode(xMid,xEnd,yStart,yMid, debug, range,correction);
}

The getQuad() function which returns the quad for the body to be inserted into:

public int[] getQuadForBody(BHBody body){
 int x = 0, y = 0;
 if( body.getX() > xMid ){
 x++;
 }
 if( body.getY() < yMid ){
 y++;
 }
 int[] a = {x,y};
 if(debug){
 System.out.println("setting ("+body.getX()+","+body.getY()+") at ("+x+","+y+") with mid " +xMid+","+yMid);
 helper.sleepSeconds(2);
 }
 return a;
}

If anyone has any suggestions on how to improve this code I would be grateful. I've tried a lot of different things, from completly rewriting parts of the code to making things in a different order.

Question 2

You should know, what's the bottleneck. Do you have two locks altogether? This doesn't sound good, as each part of the tree can be worked independently on. This means you could run on hundreds of CPUs efficiently.

The operations you do under lock are rather simple, which means that the overhead of locking can be significant. Can't you separate the three into a few parts and assign one CPU to each part?

Now some boring general comments:

boolean run = true;
while(run){
 i = barnesHut.getIndex();
 if(i < particle.length){ tree.insertThreaded(particle[i], thread); }
 else{ run = false; }
}

Where is i defined? Anyway, unobfuscate to

while (true) {
 int i = barnesHut.getIndex();
 if (i >= particle.length) break;
 tree.insertThreaded(particle[i], thread);
}

public synchronized int getIndex(){
 return index++;
}

That's what AtomicInt#incrementAndGet() is for.

Use better spacing, if (i < particle.length) { instead of if(i < particle.length){.

Question 3

As far as I know, the locks are unqiue for every instance of the node, so the locks -should- be different. I will look into it

Question 4

To my knowledge I've moved the locks so that they should be unique for each new instance of the node class that is created (the class which the insertion method is a part of).

Question 5

@Gemstone This is good for parallelism, but locking a lot may be expensive. If you want to optimize, you should do proper benchmarks. Get JMH or Caliper, add the results...

Question 6

@Gemstone What do you use entryLock for. You lock it and immediately unlock. It looks wrong, maybe it's alright and maybe it can be optimized.

maaartinus maaartinus 13.6k1 gold badge35 silver badges73 bronze badges · Answer 1 · 2014-09-10 15:46:11Z

You should know, what's the bottleneck. Do you have two locks altogether? This doesn't sound good, as each part of the tree can be worked independently on. This means you could run on hundreds of CPUs efficiently.

The operations you do under lock are rather simple, which means that the overhead of locking can be significant. Can't you separate the three into a few parts and assign one CPU to each part?

Now some boring general comments:

boolean run = true;
while(run){
 i = barnesHut.getIndex();
 if(i < particle.length){ tree.insertThreaded(particle[i], thread); }
 else{ run = false; }
}

Where is i defined? Anyway, unobfuscate to

while (true) {
 int i = barnesHut.getIndex();
 if (i >= particle.length) break;
 tree.insertThreaded(particle[i], thread);
}

public synchronized int getIndex(){
 return index++;
}

That's what AtomicInt#incrementAndGet() is for.

Use better spacing, if (i < particle.length) { instead of if(i < particle.length){.

As far as I know, the locks are unqiue for every instance of the node, so the locks -should- be different. I will look into it
To my knowledge I've moved the locks so that they should be unique for each new instance of the node class that is created (the class which the insertion method is a part of).
@Gemstone This is good for parallelism, but locking a lot may be expensive. If you want to optimize, you should do proper benchmarks. Get JMH or Caliper, add the results...
@Gemstone What do you use entryLock for. You lock it and immediately unlock. It looks wrong, maybe it's alright and maybe it can be optimized.

Stack Exchange Network

Optimization of Barnes-Hut Multithread Insertion algorithm

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimization of Barnes-Hut Multithread Insertion algorithm

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions