I am never forget the day I first meet the great Lobachevsky. In one word he told me secret of success in NumPy: Vectorize!
Vectorize!
In every loop inspect youri
's
Remember why the good Lord made your eyes!
So don't shade your eyes,
But vectorize, vectorize, vectorize—
Only be sure always to call it please 'refactoring'.
def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
values = np.vstack((m1, m2))
kernel = stats.gaussian_kde(values, bw_method=None)
iso = kernel(values)
sample = kernel.resample(size=sample_size * len(m1))
insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
return np.maximum(insample.mean(axis=0), epsilon)
Update
I find that my kde_integration
is about ten times as fast as yours with sample_size=100
:
>>> m1, m2 = measure(100)
>>> from timeit import timeit
>>> timeit(lambda:kde_integration1(m1, m2), number=1) # yours
0.7005664870084729
>>> timeit(lambda:kde_integration2(m1, m2), number=1) # mine (above)
0.07272820601065177
But the advantage disappears with larger sample sizes:
>>> timeit(lambda:kde_integration1(m1, m2, sample_size=1024), number=1)
1.1872510590037564
>>> timeit(lambda:kde_integration2(m1, m2, sample_size=1024), number=1)
1.2788789629994426
What this suggests is that there is a "sweet spot" for sample_size * len(m1)
(perhaps related to the computer's cache size) and so you'll get the best results by processing the samples in chunks of that length.
You don't say exactly how you're testing it, but different results are surely to be expected, since scipy.stats.gaussian_kde.resample
"Randomly samples a dataset from the estimated pdf".
I am never forget the day I first meet the great Lobachevsky. In one word he told me secret of success in NumPy: Vectorize!
Vectorize!
In every loop inspect youri
's
Remember why the good Lord made your eyes!
So don't shade your eyes,
But vectorize, vectorize, vectorize—
Only be sure always to call it please 'refactoring'.
def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
values = np.vstack((m1, m2))
kernel = stats.gaussian_kde(values, bw_method=None)
iso = kernel(values)
sample = kernel.resample(size=sample_size * len(m1))
insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
return np.maximum(insample.mean(axis=0), epsilon)
Update
I find that my kde_integration
is about ten times as fast as yours with sample_size=100
:
>>> m1, m2 = measure(100)
>>> from timeit import timeit
>>> timeit(lambda:kde_integration1(m1, m2), number=1) # yours
0.7005664870084729
>>> timeit(lambda:kde_integration2(m1, m2), number=1) # mine (above)
0.07272820601065177
But the advantage disappears with larger sample sizes:
>>> timeit(lambda:kde_integration1(m1, m2, sample_size=1024), number=1)
1.1872510590037564
>>> timeit(lambda:kde_integration2(m1, m2, sample_size=1024), number=1)
1.2788789629994426
What this suggests is that there is a "sweet spot" for sample_size * len(m1)
(perhaps related to the computer's cache size) and so you'll get the best results by processing the samples in chunks of that length.
You don't say exactly how you're testing it, but different results are surely to be expected, since scipy.stats.gaussian_kde.resample
"Randomly samples a dataset from the estimated pdf".
I am never forget the day I first meet the great Lobachevsky. In one word he told me secret of success in NumPy: Vectorize!
Vectorize!
In every loop inspect youri
's
Remember why the good Lord made your eyes!
So don't shade your eyes,
But vectorize, vectorize, vectorize—
Only be sure always to call it please 'refactoring'.
def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
values = np.vstack((m1, m2))
kernel = stats.gaussian_kde(values, bw_method=None)
iso = kernel(values)
sample = kernel.resample(size=sample_size * len(m1))
insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
return np.maximum(insample.mean(axis=0), epsilon)
Update
I find that my kde_integration
is about ten times as fast as yours with sample_size=100
:
>>> m1, m2 = measure(100)
>>> from timeit import timeit
>>> timeit(lambda:kde_integration1(m1, m2), number=1) # yours
0.7005664870084729
>>> timeit(lambda:kde_integration2(m1, m2), number=1) # mine (above)
0.07272820601065177
But the advantage disappears with larger sample sizes:
>>> timeit(lambda:kde_integration1(m1, m2, sample_size=1024), number=1)
1.1872510590037564
>>> timeit(lambda:kde_integration2(m1, m2, sample_size=1024), number=1)
1.2788789629994426
What this suggests is that there is a "sweet spot" for sample_size * len(m1)
(perhaps related to the computer's cache size) and so you'll get the best results by processing the samples in chunks of that length.
You don't say exactly how you're testing it, but different results are surely to be expected, since scipy.stats.gaussian_kde.resample
"Randomly samples a dataset from the estimated pdf".
I am never forget the day I first meet the great Lobachevsky. In one word he told me secret of success in NumPy: Vectorize!
Vectorize!
In every loop inspect youri
's
Remember why the good Lord made your eyes!
So don't shade your eyes,
But vectorize, vectorize, vectorize—
Only be sure always to call it please 'refactoring'.
def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
values = np.vstack((m1, m2))
kernel = stats.gaussian_kde(values, bw_method=None)
iso = kernel(values)
sample = kernel.resample(size=sample_size * len(m1))
insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
return np.maximum(insample.mean(axis=0), epsilon)
Update
I find that my kde_integration
is about ten times as fast as yours with sample_size=100
:
>>> m1, m2 = measure(100)
>>> from timeit import timeit
>>> timeit(lambda:kde_integration1(m1, m2), number=1) # yours
0.7005664870084729
>>> timeit(lambda:kde_integration2(m1, m2), number=1) # mine (above)
0.07272820601065177
But the advantage disappears with larger sample sizes:
>>> timeit(lambda:kde_integration1(m1, m2, sample_size=1024), number=1)
1.1872510590037564
>>> timeit(lambda:kde_integration2(m1, m2, sample_size=1024), number=1)
1.2788789629994426
What this suggests is that there is a "sweet spot" for sample_size * len(m1)
(perhaps related to the computer's cache size) and so you'll get the best results by processing the samples in chunks of that length.
You don't say exactly how you're testing it, but different results are surely to be expected, since scipy.stats.gaussian_kde.resample
"Randomly samples a dataset from the estimated pdf".
I am never forget the day I first meet the great Lobachevsky. In one word he told me secret of success in NumPy: Vectorize!
Vectorize!
In every loop inspect youri
's
Remember why the good Lord made your eyes!
So don't shade your eyes,
But vectorize, vectorize, vectorize—
Only be sure always to call it please 'refactoring'.
def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
values = np.vstack((m1, m2))
kernel = stats.gaussian_kde(values, bw_method=None)
iso = kernel(values)
sample = kernel.resample(size=sample_size * len(m1))
insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
return np.maximum(insample.mean(axis=0), epsilon)
I am never forget the day I first meet the great Lobachevsky. In one word he told me secret of success in NumPy: Vectorize!
Vectorize!
In every loop inspect youri
's
Remember why the good Lord made your eyes!
So don't shade your eyes,
But vectorize, vectorize, vectorize—
Only be sure always to call it please 'refactoring'.
def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
values = np.vstack((m1, m2))
kernel = stats.gaussian_kde(values, bw_method=None)
iso = kernel(values)
sample = kernel.resample(size=sample_size * len(m1))
insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
return np.maximum(insample.mean(axis=0), epsilon)
Update
I find that my kde_integration
is about ten times as fast as yours with sample_size=100
:
>>> m1, m2 = measure(100)
>>> from timeit import timeit
>>> timeit(lambda:kde_integration1(m1, m2), number=1) # yours
0.7005664870084729
>>> timeit(lambda:kde_integration2(m1, m2), number=1) # mine (above)
0.07272820601065177
But the advantage disappears with larger sample sizes:
>>> timeit(lambda:kde_integration1(m1, m2, sample_size=1024), number=1)
1.1872510590037564
>>> timeit(lambda:kde_integration2(m1, m2, sample_size=1024), number=1)
1.2788789629994426
What this suggests is that there is a "sweet spot" for sample_size * len(m1)
(perhaps related to the computer's cache size) and so you'll get the best results by processing the samples in chunks of that length.
You don't say exactly how you're testing it, but different results are surely to be expected, since scipy.stats.gaussian_kde.resample
"Randomly samples a dataset from the estimated pdf".
I am never forget the day I first meet the great Lobachevsky. In one word he told me secret of success in NumPy: Vectorize!
Vectorize!
In every loop inspect youri
's
Remember why the good Lord made your eyes!
So don't shade your eyes,
But vectorize, vectorize, vectorize—
Only be sure always to call it please 'refactoring'.
def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
values = np.vstack((m1, m2))
kernel = stats.gaussian_kde(values, bw_method=None)
iso = kernel(values)
sample = kernel.resample(size=sample_size * len(m1))
insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
return np.maximum(insample.mean(axis=0), epsilon)