-
Notifications
You must be signed in to change notification settings - Fork 58
-
I've come across some strange behaviour while experimenting with some simple machine learning code.
I have two compiled versions of a model run differentiated by presence/non-presence of some normal distribution
initialisation - random number generation (but pseudo random ie. starting from the same seed)
Running the non-present version any number of times I see expected results.
Then running another binary (another model) once, after which the prior program will output incorrect results
invariably.
Then running the binary with a normal distribution aspect appears to reset something and thereafter running the original non-present program gives correct results again.
I surmise that the gpu card itself is in some way retaining state between program executions.
The question I'd like to ask is whether there is anything more that can be done through AF other than 'init()' to
put the gpu into a known state (it's an AMD/OpenCL card here)?
Beta Was this translation helpful? Give feedback.
All reactions
The only state ArrayFire retains across sessions is the kernel binaries saved to the disk (to avoid recompilation on target machine at every startup session of the application, this is also started with recent 3.7.2 release) which don't hard code any runtime information. At every session (i.e. ArrayFire library is loaded into memory), it starts afresh - only global state that gets populated on startup is the information of what devices are accessible.
I don't think even vendor drivers retain any application specific state other than caching kernel binaries, which is for speed related reasons. I know that NVIDIA does this, not sure about AMD - but they also probably do cache kernel binarie...
Replies: 6 comments
-
The only state ArrayFire retains across sessions is the kernel binaries saved to the disk (to avoid recompilation on target machine at every startup session of the application, this is also started with recent 3.7.2 release) which don't hard code any runtime information. At every session (i.e. ArrayFire library is loaded into memory), it starts afresh - only global state that gets populated on startup is the information of what devices are accessible.
I don't think even vendor drivers retain any application specific state other than caching kernel binaries, which is for speed related reasons. I know that NVIDIA does this, not sure about AMD - but they also probably do cache kernel binaries. Nevertheless, none of these store any app specific info.
From what you describe, this seems to be seed related issues but I am not certain about it unless I look at some kind of reproducible code, preferable a stand alone example.
Beta Was this translation helpful? Give feedback.
All reactions
-
Beta Was this translation helpful? Give feedback.
All reactions
-
Beta Was this translation helpful? Give feedback.
All reactions
-
I can't speak for drivers with 100% certainty, but ArrayFire for sure doesn't keep any global state that persists across sessions.
A short code snippet (that reproduces the issue) would be great. As you said in original description if it is related random number generation, we have only a couple of random number generation functions. Have you tried using the the functions in standalone code, as in bottom-up approach ? Like keep adding your application logic (that surrounds the random number generation) gradually to this standalone program to find which code addition breaks the output consistency. That should help narrow down the problem too.
Beta Was this translation helpful? Give feedback.
All reactions
-
@progman1 Have you figured out what is the problem ? If it is something related to system setup, please share your work around, it would be helpful for any future users who face similar issue. Thank you.
Beta Was this translation helpful? Give feedback.
All reactions
-
FYI - If by chance this has anything to do with arrayfire/arrayfire#2980 , a couple of randu and randn related issues are being handled in that upstream PR.
Closing due to inactivity.
Beta Was this translation helpful? Give feedback.