I'm currently working on optimizing a Barnes-Hut implementation (connected to a previous post I did) and I need help to optimize it further.
After some testing, the main problem appears to be in the Insertion algorithm, which takes roughly 3-10 times as long as the second longest part of the program.
The basis of the program are a number of threads that each act as a worker that does the following when building the tree:
//particle[] is a list that is set in the constructor
public void run(){
boolean run = true;
while(run){
i = barnesHut.getIndex();
if(i < particle.length){ tree.insertThreaded(particle[i], thread); }
else{ run = false; }
}
}
the getIndex function can be found in the BarnesHut class which handles the start of the program and the insertion algorithm. It's synchronized to work as a
public synchronized int getIndex(){
return index++;
}
Now, the insertion algorithm looks like this:
//entryLock and quadLock are two atomic mutex locks
//childs[][] are the leaves to the tree structur that makes up a Barnes-Hut
public void insertThreaded(BHBody body, int threadID){
entryLock.lock();
numberOfParticles++;
if(numberOfParticles == 1){
existingParticle = body;
entryLock.unlock();
}
else if(numberOfParticles == 2){
quadLock.lock();
entryLock.unlock();
int[] position = getQuadForBody(existingParticle);
if(childs[position[0]][position[1]] == null){
startQuad(position[0],position[1]);
}
quadLock.unlock();
childs[ position[0] ][ position[1] ].insertThreaded(existingParticle, threadID);
quadLock.lock();
position = getQuadForBody(body);
if(childs[position[0]][position[1]] == null){
startQuad(position[0],position[1]);
}
quadLock.unlock();
childs[ position[0] ][ position[1] ].insertThreaded(body, threadID);
}
else if(numberOfParticles > 2){
entryLock.unlock();
quadLock.lock();
int[] position = getQuadForBody(body);
if(childs[position[0]][position[1]] == null){
startQuad(position[0],position[1]);
}
quadLock.unlock();
childs[position[0]][position[1]].insertThreaded(body, threadID);
}
}
The locks are to make sure that no other thread gets into the way of the thread when it's creating something vital, such as a new leaf or even trying to update the number of particles.
The startQuad()
function which creates a new leaf for the current node:
public void startQuad(int x, int y){
if(x == 0 && y == 0)
childs[0][0] = new BHnode(xStart,xMid,yMid,yEnd, debug, range,correction);
if(x == 0 && y == 1)
childs[0][1] = new BHnode(xStart,xMid,yStart,yMid, debug, range,correction);
if(x == 1 && y == 0)
childs[1][0] = new BHnode(xMid,xEnd, yMid,yEnd, debug, range,correction);
if(x == 1 && y == 1)
childs[1][1] = new BHnode(xMid,xEnd,yStart,yMid, debug, range,correction);
}
The getQuad()
function which returns the quad for the body to be inserted into:
public int[] getQuadForBody(BHBody body){
int x = 0, y = 0;
if( body.getX() > xMid ){
x++;
}
if( body.getY() < yMid ){
y++;
}
int[] a = {x,y};
if(debug){
System.out.println("setting ("+body.getX()+","+body.getY()+") at ("+x+","+y+") with mid " +xMid+","+yMid);
helper.sleepSeconds(2);
}
return a;
}
If anyone has any suggestions on how to improve this code I would be grateful. I've tried a lot of different things, from completly rewriting parts of the code to making things in a different order.
1 Answer 1
You should know, what's the bottleneck. Do you have two locks altogether? This doesn't sound good, as each part of the tree can be worked independently on. This means you could run on hundreds of CPUs efficiently.
The operations you do under lock are rather simple, which means that the overhead of locking can be significant. Can't you separate the three into a few parts and assign one CPU to each part?
Now some boring general comments:
boolean run = true;
while(run){
i = barnesHut.getIndex();
if(i < particle.length){ tree.insertThreaded(particle[i], thread); }
else{ run = false; }
}
Where is i
defined? Anyway, unobfuscate to
while (true) {
int i = barnesHut.getIndex();
if (i >= particle.length) break;
tree.insertThreaded(particle[i], thread);
}
public synchronized int getIndex(){
return index++;
}
That's what AtomicInt#incrementAndGet()
is for.
Use better spacing, if (i < particle.length) {
instead of if(i < particle.length){
.
-
\$\begingroup\$ As far as I know, the locks are unqiue for every instance of the node, so the locks -should- be different. I will look into it \$\endgroup\$Gemstone– Gemstone2014年09月10日 19:06:13 +00:00Commented Sep 10, 2014 at 19:06
-
\$\begingroup\$ To my knowledge I've moved the locks so that they should be unique for each new instance of the node class that is created (the class which the insertion method is a part of). \$\endgroup\$Gemstone– Gemstone2014年09月11日 11:29:11 +00:00Commented Sep 11, 2014 at 11:29
-
\$\begingroup\$ @Gemstone This is good for parallelism, but locking a lot may be expensive. If you want to optimize, you should do proper benchmarks. Get JMH or Caliper, add the results... \$\endgroup\$maaartinus– maaartinus2014年09月11日 11:32:02 +00:00Commented Sep 11, 2014 at 11:32
-
\$\begingroup\$ @Gemstone What do you use
entryLock
for. You lock it and immediately unlock. It looks wrong, maybe it's alright and maybe it can be optimized. \$\endgroup\$maaartinus– maaartinus2014年09月13日 15:29:59 +00:00Commented Sep 13, 2014 at 15:29
Explore related questions
See similar questions with these tags.