Skip to main content
Code Review

Return to Answer

replaced http://stackoverflow.com/ with https://stackoverflow.com/
Source Link

Though it is not really in-place not really in-place, you still gain some speed.

Though it is not really in-place, you still gain some speed.

Though it is not really in-place, you still gain some speed.

Add remark to handling of masked values, fix typo
Source Link
AlexV
  • 7.4k
  • 2
  • 24
  • 47

Though it is not realyreally in-place, you still gain some speed.

Another idea I had, was to use numpy.clip as it might be faster than setting the upper bound for your indices manually. You may be able to gain further improvements using numpy.put with mode=clip instead of realyreally clipping the values. Positive effects for numpy.take (which is the opposite of numpy.put) are described here.

Because of this I think numpy is not realyreally the problem in your case. If your mask hits a lot of points, it might be a good idea to remove the corresponding indices from lats and lons with numpy.delete , because both lists will get shorter and mainarr[0][0] will not be messed up with all the values you put there.

As the operations on the files do not realyreally depend on the other files, one could come up with the idea to split up the process and merge their results later. Your task would be to re-implement your function, so that you can run it in parallel in different processes for parts of your file list. But even this might not help you that much, if your I/O capacity is the bottleneck.

Though it is not realy in-place, you still gain some speed.

Another idea I had, was to use numpy.clip as it might be faster than setting the upper bound for your indices manually. You may be able to gain further improvements using numpy.put with mode=clip instead of realy clipping the values. Positive effects for numpy.take (which is the opposite of numpy.put) are described here.

Because of this I think numpy is not realy the problem in your case.

As the operations on the files do not realy depend on the other files, one could come up with the idea to split up the process and merge their results later. Your task would be to re-implement your function, so that you can run it in parallel in different processes for parts of your file list. But even this might not help you that much, if your I/O capacity is the bottleneck.

Though it is not really in-place, you still gain some speed.

Another idea I had, was to use numpy.clip as it might be faster than setting the upper bound for your indices manually. You may be able to gain further improvements using numpy.put with mode=clip instead of really clipping the values. Positive effects for numpy.take (which is the opposite of numpy.put) are described here.

Because of this I think numpy is not really the problem in your case. If your mask hits a lot of points, it might be a good idea to remove the corresponding indices from lats and lons with numpy.delete , because both lists will get shorter and mainarr[0][0] will not be messed up with all the values you put there.

As the operations on the files do not really depend on the other files, one could come up with the idea to split up the process and merge their results later. Your task would be to re-implement your function, so that you can run it in parallel in different processes for parts of your file list. But even this might not help you that much, if your I/O capacity is the bottleneck.

Source Link
AlexV
  • 7.4k
  • 2
  • 24
  • 47

You can get minor performance improvements if you use the += synthax of numpy, e.g.

mainarr[0, lats, lons] = mainarr[0, lats, lons] + c550

would reduce to

mainarr[0, lats, lons] += c550

Though it is not realy in-place, you still gain some speed.

Another idea I had, was to use numpy.clip as it might be faster than setting the upper bound for your indices manually. You may be able to gain further improvements using numpy.put with mode=clip instead of realy clipping the values. Positive effects for numpy.take (which is the opposite of numpy.put) are described here.

I put up a little program to test the gain compared to your proposed code. As you did not provide much additional information, I had to do some assumptions, e.g. for the way your data looks like (see mock_data).

The result is something like the following

Experiment (10 files, 10000 points)
Original: 10.093s Refactored: 9.964s
Experiment (10 files, 1000 points)
Original: 2.399s Refactored: 2.362s
Experiment (10 files, 100 points)
Original: 0.349s Refactored: 0.342s
Experiment (100 files, 10000 points)
Original: 47.823s Refactored: 47.672s
Experiment (100 files, 1000 points)
Original: 10.888s Refactored: 10.781s
Experiment (100 files, 100 points)
Original: 3.306s Refactored: 3.195s
Experiment (1000 files, 10000 points)
Original: 423.727s Refactored: 420.922s
Experiment (1000 files, 1000 points)
Original: 58.546s Refactored: 56.579s
Experiment (1000 files, 100 points)
Original: 20.227s Refactored: 18.260s

Conclusion

In my oppinion, you will not be able to gain a great speed-up by fiddling around with numpy here. I made a small test not included in the listing above with 1 file and 1M points

Experiment (1 files, 1000000 points)
Original: 51.245s Refactored: 48.703s

Because of this I think numpy is not realy the problem in your case.

As the operations on the files do not realy depend on the other files, one could come up with the idea to split up the process and merge their results later. Your task would be to re-implement your function, so that you can run it in parallel in different processes for parts of your file list. But even this might not help you that much, if your I/O capacity is the bottleneck.

Test script

import numpy as np
import timeit
def mock_data(n_samples=1000):
 """Mock data generation"""
 lats = np.random.uniform(-90.0, 90.0, (n_samples, ))
 lons = np.random.uniform(-180.0, 180.0, (n_samples, ))
 c550 = np.random.rand(n_samples)
 c670 = np.random.rand(n_samples)
 c870 = np.random.rand(n_samples)
 cldy = np.random.rand(n_samples)
 return lats, lons, c550, c670, c870, cldy
def mock_files(n_files=500, n_data=1000):
 """Mock files for benchmarking"""
 np.random.seed(42)
 return [mock_data(n_data) for _ in range(n_files)]
def getmask(cldy, tresh, max_val, min_val):
 """Mock getmask"""
 return cldy > (tresh / 2.0)
def original(files):
 """Original function by @Simon"""
 mainarr = np.zeros((4, 1800, 3600))
 i = 0
 #Loop over all input files
 for infile in files:
 lats, lons, c550, c670, c870, cldy = infile
 lats = np.array(np.round((lats+90)*10), dtype=np.int16)
 lons = np.array(np.round((lons+180)*10), dtype=np.int16)
 lats[(lats>=1800).nonzero()]=1799
 lons[(lons>=3600).nonzero()]=3599
 #The function below is already optimized
 masker = getmask(cldy, 1.0, 1.0, 0.0)
 pts = (masker != 1).nonzero()
 lats[pts] = 0
 lons[pts] = 0
 mainarr[0, lats, lons] = mainarr[0, lats, lons]+c550
 mainarr[1, lats, lons] = mainarr[1, lats, lons]+c670
 mainarr[2, lats, lons] = mainarr[2, lats, lons]+c870
 mainarr[3, lats, lons] = mainarr[3, lats, lons]+1
 i = i+1
 return mainarr
def refactored(files):
 """refactored function by @Alex Vorndran"""
 mainarr = np.zeros((4, 1800, 3600))
 #Loop over all input files
 for i, infile in enumerate(files): 
 lats, lons, c550, c670, c870, cldy = infile
 lats = np.array(np.round((lats+90)*10), dtype=np.int16)
 lons = np.array(np.round((lons+180)*10), dtype=np.int16)
 np.clip(lats, 0, 1799, out=lats)
 np.clip(lons, 0, 3599, out=lons)
 #The function below is already optimized
 masker = getmask(cldy, 1.0, 1.0, 0.0)
 pts = (masker != 1).nonzero()
 lats[pts] = 0
 lons[pts] = 0
 # make use of "in-place" additions
 mainarr[0, lats, lons] += c550
 mainarr[1, lats, lons] += c670
 mainarr[2, lats, lons] += c870
 mainarr[3, lats, lons] += 1.0
 return mainarr
def validate_refactoring(n_files, n_data):
 """Validate the refactoring with comparison by value"""
 files = mock_files(n_files, n_data)
 mainarr_o = original(files)
 mainarr_r = refactored(files)
 return np.allclose(mainarr_o, mainarr_r)
def test_functions():
 """Test different versions"""
 n_runs = 100
 n_files_all = (10, 100, 1000)
 n_data_all = (100, 1000, 10000)
 # a list of tuples where the first element is the number of files
 # and the second one is the number of elements
 files_and_points = [( 10, 10000), ( 10, 1000), ( 10, 100),
 ( 100, 10000), ( 100, 1000), ( 100, 100),
 (1000, 10000), (1000, 1000), (1000, 100)]
 assert validate_refactoring(10, 100)
 imports = 'from __main__ import original, refactored, mock_files;'
 for n_files, n_data in files_and_points:
 data = 'files = mock_files({}, {});'.format(n_files, n_data)
 time_original = timeit.timeit(
 'original(files)', setup=imports+data, number=n_runs)
 time_refactored = timeit.timeit(
 'refactored(files)', setup=imports+data, number=n_runs)
 print('Experiment ({} files, {} points)'.format(n_files, n_data))
 print('Original: {:.3f}s Refactored: {:.3f}s\n'.format(
 time_original, time_refactored))
if __name__ == '__main__':
 test_functions()
lang-py

AltStyle によって変換されたページ (->オリジナル) /