Skip to main content
Code Review

Return to Answer

Commonmark migration
Source Link

I am never forget the day I first meet the great Lobachevsky. In one word he told me secret of success in NumPy: Vectorize!

Vectorize!
In every loop inspect your i's
Remember why the good Lord made your eyes!
So don't shade your eyes,
But vectorize, vectorize, vectorize—
Only be sure always to call it please 'refactoring'.

def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
 values = np.vstack((m1, m2))
 kernel = stats.gaussian_kde(values, bw_method=None)
 iso = kernel(values)
 sample = kernel.resample(size=sample_size * len(m1))
 insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
 return np.maximum(insample.mean(axis=0), epsilon)

Update

I find that my kde_integration is about ten times as fast as yours with sample_size=100:

>>> m1, m2 = measure(100)
>>> from timeit import timeit
>>> timeit(lambda:kde_integration1(m1, m2), number=1) # yours
0.7005664870084729
>>> timeit(lambda:kde_integration2(m1, m2), number=1) # mine (above)
0.07272820601065177

But the advantage disappears with larger sample sizes:

>>> timeit(lambda:kde_integration1(m1, m2, sample_size=1024), number=1)
1.1872510590037564
>>> timeit(lambda:kde_integration2(m1, m2, sample_size=1024), number=1)
1.2788789629994426

What this suggests is that there is a "sweet spot" for sample_size * len(m1) (perhaps related to the computer's cache size) and so you'll get the best results by processing the samples in chunks of that length.

You don't say exactly how you're testing it, but different results are surely to be expected, since scipy.stats.gaussian_kde.resample "Randomly samples a dataset from the estimated pdf".

I am never forget the day I first meet the great Lobachevsky. In one word he told me secret of success in NumPy: Vectorize!

Vectorize!
In every loop inspect your i's
Remember why the good Lord made your eyes!
So don't shade your eyes,
But vectorize, vectorize, vectorize—
Only be sure always to call it please 'refactoring'.

def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
 values = np.vstack((m1, m2))
 kernel = stats.gaussian_kde(values, bw_method=None)
 iso = kernel(values)
 sample = kernel.resample(size=sample_size * len(m1))
 insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
 return np.maximum(insample.mean(axis=0), epsilon)

Update

I find that my kde_integration is about ten times as fast as yours with sample_size=100:

>>> m1, m2 = measure(100)
>>> from timeit import timeit
>>> timeit(lambda:kde_integration1(m1, m2), number=1) # yours
0.7005664870084729
>>> timeit(lambda:kde_integration2(m1, m2), number=1) # mine (above)
0.07272820601065177

But the advantage disappears with larger sample sizes:

>>> timeit(lambda:kde_integration1(m1, m2, sample_size=1024), number=1)
1.1872510590037564
>>> timeit(lambda:kde_integration2(m1, m2, sample_size=1024), number=1)
1.2788789629994426

What this suggests is that there is a "sweet spot" for sample_size * len(m1) (perhaps related to the computer's cache size) and so you'll get the best results by processing the samples in chunks of that length.

You don't say exactly how you're testing it, but different results are surely to be expected, since scipy.stats.gaussian_kde.resample "Randomly samples a dataset from the estimated pdf".

I am never forget the day I first meet the great Lobachevsky. In one word he told me secret of success in NumPy: Vectorize!

Vectorize!
In every loop inspect your i's
Remember why the good Lord made your eyes!
So don't shade your eyes,
But vectorize, vectorize, vectorize—
Only be sure always to call it please 'refactoring'.

def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
 values = np.vstack((m1, m2))
 kernel = stats.gaussian_kde(values, bw_method=None)
 iso = kernel(values)
 sample = kernel.resample(size=sample_size * len(m1))
 insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
 return np.maximum(insample.mean(axis=0), epsilon)

Update

I find that my kde_integration is about ten times as fast as yours with sample_size=100:

>>> m1, m2 = measure(100)
>>> from timeit import timeit
>>> timeit(lambda:kde_integration1(m1, m2), number=1) # yours
0.7005664870084729
>>> timeit(lambda:kde_integration2(m1, m2), number=1) # mine (above)
0.07272820601065177

But the advantage disappears with larger sample sizes:

>>> timeit(lambda:kde_integration1(m1, m2, sample_size=1024), number=1)
1.1872510590037564
>>> timeit(lambda:kde_integration2(m1, m2, sample_size=1024), number=1)
1.2788789629994426

What this suggests is that there is a "sweet spot" for sample_size * len(m1) (perhaps related to the computer's cache size) and so you'll get the best results by processing the samples in chunks of that length.

You don't say exactly how you're testing it, but different results are surely to be expected, since scipy.stats.gaussian_kde.resample "Randomly samples a dataset from the estimated pdf".

respond to comments
Source Link
Gareth Rees
  • 50.1k
  • 3
  • 130
  • 210

I am never forget the day I first meet the great Lobachevsky. In one word he told me secret of success in NumPy: Vectorize!

Vectorize!
In every loop inspect your i's
Remember why the good Lord made your eyes!
So don't shade your eyes,
But vectorize, vectorize, vectorize—
Only be sure always to call it please 'refactoring'.

def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
 values = np.vstack((m1, m2))
 kernel = stats.gaussian_kde(values, bw_method=None)
 iso = kernel(values)
 sample = kernel.resample(size=sample_size * len(m1))
 insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
 return np.maximum(insample.mean(axis=0), epsilon)

Update

I find that my kde_integration is about ten times as fast as yours with sample_size=100:

>>> m1, m2 = measure(100)
>>> from timeit import timeit
>>> timeit(lambda:kde_integration1(m1, m2), number=1) # yours
0.7005664870084729
>>> timeit(lambda:kde_integration2(m1, m2), number=1) # mine (above)
0.07272820601065177

But the advantage disappears with larger sample sizes:

>>> timeit(lambda:kde_integration1(m1, m2, sample_size=1024), number=1)
1.1872510590037564
>>> timeit(lambda:kde_integration2(m1, m2, sample_size=1024), number=1)
1.2788789629994426

What this suggests is that there is a "sweet spot" for sample_size * len(m1) (perhaps related to the computer's cache size) and so you'll get the best results by processing the samples in chunks of that length.

You don't say exactly how you're testing it, but different results are surely to be expected, since scipy.stats.gaussian_kde.resample "Randomly samples a dataset from the estimated pdf".

I am never forget the day I first meet the great Lobachevsky. In one word he told me secret of success in NumPy: Vectorize!

Vectorize!
In every loop inspect your i's
Remember why the good Lord made your eyes!
So don't shade your eyes,
But vectorize, vectorize, vectorize—
Only be sure always to call it please 'refactoring'.

def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
 values = np.vstack((m1, m2))
 kernel = stats.gaussian_kde(values, bw_method=None)
 iso = kernel(values)
 sample = kernel.resample(size=sample_size * len(m1))
 insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
 return np.maximum(insample.mean(axis=0), epsilon)

I am never forget the day I first meet the great Lobachevsky. In one word he told me secret of success in NumPy: Vectorize!

Vectorize!
In every loop inspect your i's
Remember why the good Lord made your eyes!
So don't shade your eyes,
But vectorize, vectorize, vectorize—
Only be sure always to call it please 'refactoring'.

def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
 values = np.vstack((m1, m2))
 kernel = stats.gaussian_kde(values, bw_method=None)
 iso = kernel(values)
 sample = kernel.resample(size=sample_size * len(m1))
 insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
 return np.maximum(insample.mean(axis=0), epsilon)

Update

I find that my kde_integration is about ten times as fast as yours with sample_size=100:

>>> m1, m2 = measure(100)
>>> from timeit import timeit
>>> timeit(lambda:kde_integration1(m1, m2), number=1) # yours
0.7005664870084729
>>> timeit(lambda:kde_integration2(m1, m2), number=1) # mine (above)
0.07272820601065177

But the advantage disappears with larger sample sizes:

>>> timeit(lambda:kde_integration1(m1, m2, sample_size=1024), number=1)
1.1872510590037564
>>> timeit(lambda:kde_integration2(m1, m2, sample_size=1024), number=1)
1.2788789629994426

What this suggests is that there is a "sweet spot" for sample_size * len(m1) (perhaps related to the computer's cache size) and so you'll get the best results by processing the samples in chunks of that length.

You don't say exactly how you're testing it, but different results are surely to be expected, since scipy.stats.gaussian_kde.resample "Randomly samples a dataset from the estimated pdf".

Source Link
Gareth Rees
  • 50.1k
  • 3
  • 130
  • 210

I am never forget the day I first meet the great Lobachevsky. In one word he told me secret of success in NumPy: Vectorize!

Vectorize!
In every loop inspect your i's
Remember why the good Lord made your eyes!
So don't shade your eyes,
But vectorize, vectorize, vectorize—
Only be sure always to call it please 'refactoring'.

def kde_integration(m1, m2, sample_size=100, epsilon=0.000001):
 values = np.vstack((m1, m2))
 kernel = stats.gaussian_kde(values, bw_method=None)
 iso = kernel(values)
 sample = kernel.resample(size=sample_size * len(m1))
 insample = kernel(sample).reshape(sample_size, -1) < iso.reshape(1, -1)
 return np.maximum(insample.mean(axis=0), epsilon)
lang-py

AltStyle によって変換されたページ (->オリジナル) /