- 
  Notifications
 You must be signed in to change notification settings 
- Fork 58
Efficiently solving a variant of the N-body problem #281
-
This is a continuation of a google groups conversation https://groups.google.com/g/arrayfire-users/c/NFBMVR8vu5w 
To restate my problem quite briefly for this thread, I am trying to a variant of the N-body problem (a simulation of n particles each interacting with every other particle)
Specifically I am making a type of simulation called "particle life" ("plife") loosely inspired by Conway's Game of Life. I've written a bit about it here if you'd like to read more about it.
My current AF implementation is creating two N x N arrays such that each 2D coordinate corresponds with a particular pair, and then doing all of my calculations on those matrices. After doing some research I have found that while the N x N matrix is used to solve this problem, one of the dimensions operates in time instead of space. This image from NVidia's N-body CUDA website illustrates what I mean. I'm unsure how to implement this using AF, as any loop I create will run on the CPU instead of the GPU. This particular simulation would also benefit greatly from being able to skip most of the computation, since most particles are far enough away from most other particles not to have any effect, but without branching I'm unsure how to achieve this.
Here is some of my code to better illustrate what I am currently doing (my project will likely be open source soon, but I am working on it by myself just for now):
fn get_velocities(&self) -> Array<Complex<f32>> { let p = tile(&self.positions, Dim4::new(&[1, self.num_points, 1, 1])); let q = tile( &transpose(&self.positions, false), Dim4::new(&[self.num_points, 1, 1, 1]), ); let mut delta = q - p; // implementation details, all operating on the N x N delta matrix let force = &outside_minimum * if_outside + !&outside_minimum * if_inside; sum_nan(&(delta / dist * force * !&skip), 1, 0.0) }
Any help or suggestions would be greatly appreciated! :)
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment 4 replies
-
This is a continuation of a google groups conversation https://groups.google.com/g/arrayfire-users/c/NFBMVR8vu5w
To restate my problem quite briefly for this thread, I am trying to a variant of the N-body problem (a simulation of n particles each interacting with every other particle)
Specifically I am making a type of simulation called "particle life" ("plife") loosely inspired by Conway's Game of Life. I've written a bit about it here if you'd like to read more about it.
My current AF implementation is creating two N x N arrays such that each 2D coordinate corresponds with a particular pair, and then doing all of my calculations on those matrices. After doing some research I have found that while the N x N matrix is used to solve this problem, one of the dimensions operates in time instead of space. This image from NVidia's N-body CUDA website illustrates what I mean. I'm unsure how to implement this using AF, as any loop I create will run on the CPU instead of the GPU. This particular simulation would also benefit greatly from being able to skip most of the computation, since most particles are far enough away from most other particles not to have any effect, but without branching I'm unsure how to achieve this.
Here is some of my code to better illustrate what I am currently doing (my project will likely be open source soon, but I am working on it by myself just for now):
fn get_velocities(&self) -> Array<Complex> {
let p = tile(&self.positions, Dim4::new(&[1, self.num_points, 1, 1]));
let q = tile(
&transpose(&self.positions, false),
Dim4::new(&[self.num_points, 1, 1, 1]),
);
let mut delta = q - p;
// implementation details, all operating on the N x N delta matrix
let force = &outside_minimum * if_outside + !&outside_minimum * if_inside;
sum_nan(&(delta / dist * force * !&skip), 1, 0.0)
}
Any help or suggestions would be greatly appreciated! :)
Thank you for moving the discussion here. I will look into it and let you know.
Beta Was this translation helpful? Give feedback.
All reactions
-
Initially I wasn't sure if you were doing the same thing as gravity_sim example from ArrayFire project. It seems like your problem and gravity_sim kind of belong to same category of problems.
https://github.com/arrayfire/arrayfire/blob/master/examples/graphics/gravity_sim.cpp
The code snippet you shared seems fine to me. Looks like you omitted some code here -  // implementation details, all operating on the N x N delta matrix . Perhaps something in the omitted code is giving you performance issues.
I can really suggest perf improvements once you share some code snippets you think are slowing down your application.
If you are looking for an general overall idea on how to write n-body algorithm using ArrayFire. Kindly take a look at our gravity_sim C++ example. Porting that example to rust shouldn't be hard since we have all functions available via rust wrapper too.
Beta Was this translation helpful? Give feedback.
All reactions
-
I took a look at the included gravity_sim example, and it is indeed much faster than my program, probably for a number of reasons (I'm sure my code is not the most optimized, and plife simulations are just more taxing than gravity sims in general due to there being more array accesses), but the example still does not meet my needs in terms of performance. I increased total_particles from 4000 to 16000, and both programs (the example program and my program) simply segfault upon opening. (I am running this on a GTX 1660S with 6GB of vram, but even doing the math 16000^2*3 32 bit floats is only 2.8 gigabytes of memory) I was hoping for performance in-line with this simulator written in Rust and using wgpu-rs which defaults to running 200,000 particles without slowdown, and which I have just run with 2,000,000 particles  (this would take 48 TB of vram using the example program). It seems infeasible for AF to achieve this performance by simply operating on matrices of N^2 elements. Is there perhaps any way to push a small recurring set of AF function calls to the GPU as a shader program?
Beta Was this translation helpful? Give feedback.
All reactions
-
gravity_sim is an example code to show case certain aspects of ArrayFire, namely graphics and the possibility of doing such a simulation. I doubt it would have been super optimized with respect to memory usage and performance. You have to observe that gravity_sim is not an inbuilt-in function and it's performance and memory usage are dependent on how it is written. So there is a scope of improvement in that particular example if you want to address it. Sometimes, certain portions of algorithms may need custom hand written kernels to address unique access patterns. I don't think it is fair to compare an illustration code to a simulator which is written to do n-body simulations only. I think you should first explore writing your logic in an optimized way using ArrayFire (i.e. vectorize your algorithm) and then compare and improve in the areas necessary. We can help you with any questions you may have on how to use ArrayFire in your code. While working on your algorithm, If you can come up with features that are generic enough to be a function by themselves, feel free to raise a feature request in our project and we will look into it.
Coming back to your final question of using shaders in ArrayFire, no that is not possible. ArryaFire is for general purpose GPU programming, which is far more useful and performance oriented that what a shader can do. If you think a shader can address your problem and current example logic is lacking in performance, then that definitely indicates that the current logic is not optimized because a shader can do a very very limited set of things a GPU program can do.
Beta Was this translation helpful? Give feedback.
All reactions
-
@MightyAlex200 Have you narrowed which section of the code is causing the slowdown ?
Beta Was this translation helpful? Give feedback.