I have a helper struct called Scratch, that is a calculation 'Scratch Pad'. Generally, I need only one of these objects, but if I have nested calculations I'll need one scratch per level of nesting. And if I want to make the calculation thread safe then each thread will need one or more Scratch objects.
For simple cases a singleton Scratch object works fine and I'm able to achieve 38 steps per second. If I change from using the singleton to just instantiating a Scratch object each time I drop to 10 steps per second.
Using a non thread safe version of the Pool below (commenting out the mutex locks and unlocks) got me just about back up to 35-38ish steps per second. But the thread safe version as listed below knocks me down to 16 steps per second. So, in order to make threading worth it, I'd have to overcome that initial performance blow.
I guess theoretically two cores should be able to almost breakeven and 3 or more should be able to make the lock cost worth it, but I haven't tried that yet.
Any suggestions, comments, criticisms greatly appreciated.
typedef struct Pool {
Scratch** scratches;
int n;
int i;
pthread_mutex_t mutex;
} Pool;
Pool* pool_;
Pool* AEPoolCreate() {
Pool* pool = (Pool*)malloc(sizeof(Pool));
pool->scratches = (Scratch**)malloc(sizeof(Scratch*));
pool->scratches[0] = AEScratchCreate();
pool->n = 1;
pool->i = 0;
pthread_mutex_init(&pool->mutex, NULL);
return pool;
}
Scratch* AEPoolGet(Pool* pool) {
pthread_mutex_lock(&pool->mutex);
if (pool->i == pool->n) {
pool->n *= 2;
printf("[ Pool ===================== ]\n");
printf(" increased to a size of %d\n", pool->n);
printf("[ =========================== ]\n\n");
pool->scratches = (Scratch**)realloc(pool->scratches, sizeof(Scratch*)*pool->n);
for (int i=pool->i;i<pool->n;i++)
pool->scratches[i] = AEScratchCreate();
}
Scratch* scratch = pool->scratches[pool->i++];
pthread_mutex_unlock(&pool->mutex);
return scratch;
}
void AEPoolPut(Pool* pool, Scratch* scratch) {
pthread_mutex_lock(&pool->mutex);
scratch->cp = 0;
scratch->vp = 0;
scratch->sp = 0;
pool->i--;
pool->scratches[pool->i] = scratch;
pthread_mutex_unlock(&pool->mutex);
}
Here is the Scratch object as requested:
// Scratch =
typedef struct Scratch {
Obj* stack;
byte sp;
byte cp;
byte vp;
} Scratch;
Scratch* AEScratchCreate() {
Scratch* scratch = (Scratch*)malloc(sizeof(Scratch));
scratch->stack = (Obj*)malloc(sizeof(Obj)*10);
scratch->cp = 0;
scratch->vp = 0;
scratch->sp = 0;
return scratch;
}
void AEScratchRelease(Scratch* scratch) {
if (scratch == 0) return;
free(scratch->stack);
free(scratch);
}
// Dim =====
typedef union {
long n;
double x;
void* p;
} Dim;
// Obj =====
typedef struct {
Dim a;
Dim b;
Dim c;
// byte type;
} Obj;
1 Answer 1
One small improvement would be to not acquire the lock in AEPoolPut
until right before you access pool
(i.e., clear out the fields in scratch
before acquiring the lock).
There isn't really anything else I see that can change here that would help (all assuming the number of scratches required is more-or-less stable, so that AEPoolGet
will quickly reach the point where it won't have to allocate any more scratches. There are some tweaks that might be applicable if you continually grow the number of pools.
In code the threads that use this, you can implement some sort of thread local scratch caching mechanism, so that if a thread finishes using a scratch object it'll hang on to it locally and reuse it. This will bypass all this code. If there are too many of these "free" scratch objects, you can return the excess back to the pool. Depending on your usage, you might only need to cache one scratch object.
-
\$\begingroup\$ Ya, generally the number of scratches created is going to be very small. For my current test only one gets created. But I think you’ve triggered the solution for me. I just need to store one Pool per thread and than I can remove the locks while maintaining thread safety and speed. Thanks! \$\endgroup\$aepryus– aepryus2018年04月09日 16:00:12 +00:00Commented Apr 9, 2018 at 16:00
Scratch
typedef? How big is it? Did you realize thatAEScratchCreate()
is making a copy of that struct? \$\endgroup\$AEPoolGet
andAEPoolPut
called per step? \$\endgroup\$