Hi, I'm a fairly heavy user of matplotlib (to plot results from plasma physics simulations) and my use requires the display of fairly large images. Having done some testing I've discovered (after bypassing anything slow from the python code) that for large images where there image size approaches the available memory the main performance bar seems to be the conversion of the raw data to the _image.Image class. The way in which the conversion takes place - with data being taken non-sequentially from many points in a floating point source array and then converted to an 1 byte integer - is slow and if swapping becomes involved even slower. To overcome this problem I suggest implementing c++ code to allow the creation of the image from a buffer (with each rgba pixel as 4 bytes) rather than a floating point array. Where image data is being generated elsewhere (in my case in Fortran code) it's trivial to output to a different format and doing so means that the size of the input data can be significantly smaller and that the data in the source array is accessed sequentially (it's likely that a compiler will also be able to optimise a copy of this data more effectively). The image can then be scaled and over plotted as with any existing image. I've attempted to implement this code myself (see attached patch to src/_image.cpp) but I'm not a regular c++ or even c programmer so it's fairly likely there will be memory leaks in the code. For a 1024x2048 array using the GTKAgg backend and with plenty of memory free this change results in show() taking <0.7s rather than >4.6s; if there is a memory shortage and swapping becomes involved the change is much more noticeable. I haven't made any decent Python wrapping code yet - but would be happy do do so if someone familiar with c++ could tidy up my attachment. Hope this is useful to others, Nicholas Young
>>>>> "Nicholas" == Nicholas Young <su...@su...> writes: Nicholas> I've attempted to implement this code myself (see Nicholas> attached patch to src/_image.cpp) but I'm not a regular Nicholas> c++ or even c programmer so it's fairly likely there Nicholas> will be memory leaks in the code. For a 1024x2048 array Nicholas> using the GTKAgg backend and with plenty of memory free Nicholas> this change results in show() taking <0.7s rather than Nicholas> >4.6s; if there is a memory shortage and swapping Nicholas> becomes involved the change is much more noticeable. I Nicholas> haven't made any decent Python wrapping code yet - but Nicholas> would be happy do do so if someone familiar with c++ Nicholas> could tidy up my attachment. Hi Nicholas, Thanks for the suggestions and patch. I incorporated frombuffer and have been testing it. I've been testing the performance of frombuffer vs fromarray, and have seen some 2-3x speedups but nothing like the numbers you are reporting. [Also, I don't see any detectable memory leaks so I don't think you have any worries there] Here is the test script I am using - does this look like a fair test? You can uncomment report_memory on unix like systems to get a memory report on each pass through the loop, and switch out fromarray vs frombuffer to compare your function with mine On a related note, below I'm pasting in a representative section the code I am currently using in fromarray for MxNx3 and MxNx4 arrays -- any obvious performance gains to be had here numerix gurus? Another suggestion for Nicholas -- perhaps you want to support MxN, MxNx3 and MxNx4 arrays in your frombuffer function? And a final question -- how are you getting your function into the matplotlib image pipeline. Did you alter the image.py AxesImage.set_data function to test whether A is a buffer object? If so, you might want to post these changes to the codebase as well. // some fromarray code //PyArrayObject *A = (PyArrayObject *) PyArray_ContiguousFromObject(x.ptr(), PyArray_DOUBLE, 2, 3); PyArrayObject *A = (PyArrayObject *) PyArray_FromObject(x.ptr(), PyArray_DOUBLE, 2, 3); int rgba = A->dimensions[2]==4; double r,g,b,alpha; int offset =0; for (size_t rownum=0; rownum<imo->rowsIn; rownum++) { for (size_t colnum=0; colnum<imo->colsIn; colnum++) { offset = rownum*A->strides[0] + colnum*A->strides[1]; r = *(double *)(A->data + offset); g = *(double *)(A->data + offset + A->strides[2] ); b = *(double *)(A->data + offset + 2*A->strides[2] ); if (rgba) alpha = *(double *)(A->data + offset + 3*A->strides[2] ); else alpha = 1.0; *buffer++ = int(255*r); // red *buffer++ = int(255*g); // green *buffer++ = int(255*b); // blue *buffer++ = int(255*alpha); // alpha } } ## ... and here is the profile script .... import sys, os, time, gc from matplotlib._image import fromarray, fromarray2, frombuffer from matplotlib.numerix.mlab import rand from matplotlib.numerix import UInt8 def report_memory(i): pid = os.getpid() a2 = os.popen('ps -p %d -o rss,sz' % pid).readlines() print i, ' ', a2[1], return int(a2[1].split()[1]) N = 1024 #X2 = rand(N,N) #X3 = rand(N,N,3) X4 = rand(N,N,4) start = time.time() b4 = (X4*255).astype(UInt8).tostring() for i in range(50): im = fromarray(X4, 0) #im = frombuffer(b4, N, N, 0) #val = report_memory(i) end = time.time() print 'elapsed: %1.3f'%(end-start)
On Thu, 2005年04月14日 at 16:00 -0500, John Hunter wrote: > Thanks for the suggestions and patch. I incorporated frombuffer and > have been testing it. I've been testing the performance of frombuffer > vs fromarray, and have seen some 2-3x speedups but nothing like the > numbers you are reporting. [Also, I don't see any detectable memory > leaks so I don't think you have any worries there] That kind of speed up is probably more realistic - I probably made a greater number of optimisations to the python buffer code than to the numerix code (which I only remembered after posting my first message). Performance do gains seem greater where memory is limited though as the reduced memory consumption means less swapping, in cases where the reduced memory consumption avoids swapping altogether they are, of course, huge. > Here is the test script I am using - does this look like a fair test? It seems to be a fair test - I'd have created the string buffer outside of the timing (as in reality you wouldn't go through that step) but as it should be a fairly quick conversion it shouldn't matter too much. > Another suggestion for Nicholas -- perhaps you want to support MxN, > MxNx3 and MxNx4 arrays in your frombuffer function? I could do - the main reason I don't particularly want to is that a compiler should be able to more easily optimise a simple for(i,i<j,i++) loop than a series of nested loops and other instructions. As this code is only really of use where performance is particularly important I'd rather keep performance as high as possible - it's easy to generate buffers in whatever format is necessary. My suggestion for improved performance would be to allow the Image class to work directly on the buffer passed to the generator function - I've started implementing this in c++. Going with this approach should improve speed further and again conserve memory for very large images - and (without making other changes to the Image class) would require that rgba rather than rgb or intensity was used as an input. > And a final question -- how are you getting your function into the > matplotlib image pipeline. Did you alter the image.py > AxesImage.set_data function to test whether A is a buffer object? If > so, you might want to post these changes to the codebase as well. I made some alterations to these functions - but they are currently limited to my own situation. I will make them general once I've played around with writable buffers a bit - this will be fairly easily but I don't want to spend time writing wrapper code until I'm happy with what I'm wrapping. Nicholas
On Fri, 2005年04月15日 at 12:27 +0100, I wrote: > My suggestion for improved performance would be to allow the Image class > to work directly on the buffer passed to the generator function - I've > started implementing this in c++. Going with this approach should > improve speed further and again conserve memory for very large images - > and (without making other changes to the Image class) would require that > rgba rather than rgb or intensity was used as an input. Having tried this (patch attached) I get the following results on running a slightly modified version of John Hunters test script (attached). In the original script it turned out that a significant amount of time was being taken up with creating some of the test environment - hence a smaller improvement was being shown. Running with 1024x1024: Starting array: Array set up: resident stack size: 39716, size: 10914 Tests done: resident stack size: 43836, size: 11938 Elapsed: 9.363 Starting buffer: Buffer set up: resident stack size: 15192, size: 4823 Tests done: resident stack size: 15200, size: 4824 Elapsed: 0.690 Fractional improvement: 12.577 Running with 2048x2048: Starting array: Array set up: resident stack size: 146276, size: 37592 Tests done: resident stack size: 158572, size: 40664 Elapsed: 38.544 Starting buffer: Buffer set up: resident stack size: 39784, size: 10968 Tests done: resident stack size: 39784, size: 10967 Elapsed: 2.044 Fractional improvement: 17.855 Running with 4096x4096: Starting array: Array set up: resident stack size: 564076, size: 142041 Tests done: resident stack size: 613252, size: 154329 Elapsed: 170.495 Starting buffer: Buffer set up: resident stack size: 67100, size: 35544 Tests done: resident stack size: 133060, size: 35544 Elapsed: 8.474 Fractional improvement: 19.120 As you can see - in both cases a big improvement. In the case of large data sets its a change from a noticeable delay 3.4s per plot to the very usable 0.17s per plot (none of these plots required swapping - although the set functions did). If you don't have you data in the form of a writable python buffer it's necessary to wrap the input buffer in a Python array.array to get this performance (a read-only buffer is still accepted but will be slightly slower). Performance using a non-modifiable buffer is slightly lower even with the overhead of an additional Python function call (I guess it implements some more intelligent buffer creation strategy). > I made some alterations to these functions - but they are currently > limited to my own situation. I will make them general once I've played > around with writable buffers a bit - this will be fairly easily but I > don't want to spend time writing wrapper code until I'm happy with what > I'm wrapping. Changes in the patch I've attached (and were even simpler than I imagined). Nich
Hi, I made a suggestion for improving imshow performance for plotting an image already in byte string form a while ago; some of the results are currently in CVS. I seems that other changes made to CVS at the same time or since mean that the floating point buffer source code is now much faster and I'd suggest taking anything sourced from my code back out. Nick
>>>>> "Nicholas" == Nicholas Young <su...@su...> writes: Nicholas> Hi, I made a suggestion for improving imshow performance Nicholas> for plotting an image already in byte string form a Nicholas> while ago; some of the results are currently in CVS. I Nicholas> seems that other changes made to CVS at the same time or Nicholas> since mean that the floating point buffer source code is Nicholas> now much faster and I'd suggest taking anything sourced Nicholas> from my code back out. Maybe that explains why I was never able to get the same performance benefits you were seeing.... But I was able to get 1.5x - 2.5x faster with your patch, which is pretty damned good. And the memory benefit of your patch could be substantial as well. Why pull it? What are your latest profiling numbers? Your patch came in just at the time I was leaving for Paris for a meeting, after which I experienced a rather nasty total hard drive failure that set me back in my mpl maintenance. I am not sure I am ready to give up on the ideas you introduced.... Can you remind me -- did I manage to incorporate all of the matplotlib.axes and matplotlib.image patches I needed to take advantage of your patches to the _image extension code? JDH
I sent this message sometime on Friday but as it doesn't seem to have made it to the sourceforge list yet I'm assuming it's not going to and trying again. On Thu, 2005年05月26日 at 21:15 -0500, John Hunter wrote: > >>>>> "Nicholas" == Nicholas Young <su...@su...> writes: > > Nicholas> Hi, I made a suggestion for improving imshow performance > Nicholas> for plotting an image already in byte string form a > Nicholas> while ago; some of the results are currently in CVS. I > Nicholas> seems that other changes made to CVS at the same time or > Nicholas> since mean that the floating point buffer source code is > Nicholas> now much faster and I'd suggest taking anything sourced > Nicholas> from my code back out. > > Maybe that explains why I was never able to get the same performance > benefits you were seeing.... But I was able to get 1.5x - 2.5x faster > with your patch, which is pretty damned good. And the memory > benefit of your patch could be substantial as well. Why pull it? When I first tried the new CVS code (other than my own) I was rather rushed trying to get a paper finished and tested with too small an image and thus saw too small a speedup for it to be worth it. I'm also using non-contiguous arrays which make the relative speed increase smaller. I also didn't think the interface was particularly user friendly. I'm now sure why my code was faster (a mixture of data copying using optimised functions and no need do floating point to integer conversion) and have written some new code which is slightly faster than the old and has a nicer interface (patch to CVS attached). > What are your latest profiling numbers? Profiling with my latest version and comparing to CVS I get: Running with 1024x1024: Starting array: Array set up: resident stack size: 37408, size: 12571 Tests done: resident stack size: 41536, size: 13596 Elapsed: 7.674 Starting byte: Byte set up: resident stack size: 12876, size: 6429 Tests done: resident stack size: 12880, size: 6428 Elapsed: 0.521 Fractional improvement: 13.724 Running with 2048x2048: Starting array: Array set up: resident stack size: 143956, size: 39197 Tests done: resident stack size: 156252, size: 42269 Elapsed: 29.577 Starting byte: Byte set up: resident stack size: 37464, size: 12573 Tests done: resident stack size: 37464, size: 12572 Elapsed: 2.098 Fractional improvement: 13.100 Running with 4096x4096: Starting array: Array set up: resident stack size: 561756, size: 143646 Tests done: resident stack size: 609388, size: 155963 Elapsed: 127.401 Starting byte: Byte set up: resident stack size: 66376, size: 37178 Tests done: resident stack size: 132280, size: 37177 Elapsed: 8.526 Fractional improvement: 13.943 > Your patch came in just at the time I was leaving for Paris for a > meeting, after which I experienced a rather nasty total hard drive > failure that set me back in my mpl maintenance. I am not sure I am > ready to give up on the ideas you introduced.... Can you remind me -- > did I manage to incorporate all of the matplotlib.axes and > matplotlib.image patches I needed to take advantage of your patches to > the _image extension code? I think it's partly in there in a non-functional form, the patch I've attached removes it and adds a new function to the Axes class called directshow. This accepts the same syntax as imshow (where relevant) rather than adding options to imshow; I chose to do this as my old syntax for passing through imshow wasn't that easy to understand didn't makes the different functionality clear. This function calls a class call DirectImage which inherits from AxesImage. I've also rewritten my c++ image object creation function called frombyte to take an unsigned byte array as input rather than a buffer. By using the std::memcopy function rather than a loop for copying the speed advantage of passing data in as a buffer disappears and using arrays is generally more intuitive. The function still only takes x*y*4 arrays as input at the moment as the processor time decrease from not using loops to copy is fairly significant. Nick
>>>>> "Nicholas" == Nicholas Young <su...@su...> writes: Nicholas> I think it's partly in there in a non-functional form, Nicholas> the patch I've attached removes it and adds a new Nicholas> function to the Axes class called directshow. This Nicholas> accepts the same syntax as imshow (where relevant) Nicholas> rather than adding options to imshow; I chose to do this Nicholas> as my old syntax for passing through imshow wasn't that Nicholas> easy to understand didn't makes the different Nicholas> functionality clear. This function calls a class call Nicholas> DirectImage which inherits from AxesImage. I have mixed feelings about making this a separate class / separate function. The current axes imshow / image.AxesImage is already overloaded (handling MxN, MxNx3, MxNx3, _image.Image and PIL images. What is the logic of making a special functions/classes case for MxNx4 with directshow. On the one hand, I appreciate the desire to simplify the code by pulling it into separate classes and functions. On the other hand, I wonder if it will confuse users to have one separate function for UInt8 images and another for everything else. Another worry I have about the separate DirectImage class is that it copies much of the image resize functionality from AxesImage, including the broken handling of aspect='preserve'. This too argues for keeping as much functionality in AxesImage as possible, testing for A.typecode() where appropriate. What do you think? Also, note the matplotlib CVS has added several new image interpolation methods. Some of these need the parameters filternorm=1, filterrad=4.0 as described in the imshow docstring. Since directimage exposes the interpolation method, shouldn't it also expose these new params. Nicholas> I've also rewritten my c++ image object creation Nicholas> function called frombyte to take an unsigned byte array Nicholas> as input rather than a buffer. By using the Nicholas> std::memcopy function rather than a loop for copying the Nicholas> speed advantage of passing data in as a buffer Nicholas> disappears and using arrays is generally more intuitive. Nicholas> The function still only takes x*y*4 arrays as input at Nicholas> the moment as the processor time decrease from not using Nicholas> loops to copy is fairly significant. Looks good to me. I'll hold off on applying these changes until I hear from you on the issues above. Thanks! JDH
On Tue, 2005年05月31日 at 10:28 -0500, John Hunter wrote: > >>>>> "Nicholas" == Nicholas Young <su...@su...> writes: > I have mixed feelings about making this a separate class / separate > function. The current axes imshow / image.AxesImage is already > overloaded (handling MxN, MxNx3, MxNx3, _image.Image and PIL images. > What is the logic of making a special functions/classes case for MxNx4 > with directshow. On the one hand, I appreciate the desire to simplify > the code by pulling it into separate classes and functions. On the > other hand, I wonder if it will confuse users to have one separate > function for UInt8 images and another for everything else. Another > worry I have about the separate DirectImage class is that it copies > much of the image resize functionality from AxesImage, including the > broken handling of aspect='preserve'. This too argues for keeping as > much functionality in AxesImage as possible, testing for A.typecode() > where appropriate. What do you think? I'm happy for it be be called from imshow but I can't think of a syntax to do so which isn't going to be confusing. The existing variations to imshow are all based upon the type of the input image and this won't work here - the case of someone who wants to plot a MxNx4 UInt8 array and have the values it contains normalised being the problem. Another keyword parameter could be added which only takes effect in this case; I thought this likely to confuse but if you prefer it I'm happy. Nick
On Wed, 2005年06月01日 at 12:39 +0100, Nicholas Young wrote: > On Tue, 2005年05月31日 at 10:28 -0500, John Hunter wrote: > > >>>>> "Nicholas" == Nicholas Young <su...@su...> writes: > > I have mixed feelings about making this a separate class / separate > > function. The current axes imshow / image.AxesImage is already > > overloaded (handling MxN, MxNx3, MxNx3, _image.Image and PIL images. > > What is the logic of making a special functions/classes case for MxNx4 > > with directshow. On the one hand, I appreciate the desire to simplify > > the code by pulling it into separate classes and functions. On the > > other hand, I wonder if it will confuse users to have one separate > > function for UInt8 images and another for everything else. Another > > worry I have about the separate DirectImage class is that it copies > > much of the image resize functionality from AxesImage, including the > > broken handling of aspect='preserve'. This too argues for keeping as > > much functionality in AxesImage as possible, testing for A.typecode() > > where appropriate. What do you think? > > I'm happy for it be be called from imshow but I can't think of a syntax > to do so which isn't going to be confusing. The existing variations to > imshow are all based upon the type of the input image and this won't > work here - the case of someone who wants to plot a MxNx4 UInt8 array > and have the values it contains normalised being the problem. Another > keyword parameter could be added which only takes effect in this case; I > thought this likely to confuse but if you prefer it I'm happy. Sorry to reply to my own message but I realised just after sending it that it didn't make any sense (as MxNx4 images aren't normalised anyway). I now realise that it makes sense to assume that an MxNx4 UInt8 image will contain pixel data as John suggested and I'll rewrite my code to implement this. It also occurred to me that it would make sense to change the PIL code to avoid the UInt8 -> Float -> UInt transition. I'll try implementing this after making the alterations John suggested. Nick
On Wed, 2005年06月01日 at 13:14 +0100, Nicholas Young wrote: > On Wed, 2005年06月01日 at 12:39 +0100, Nicholas Young wrote: > > On Tue, 2005年05月31日 at 10:28 -0500, John Hunter wrote: > > > >>>>> "Nicholas" == Nicholas Young <su...@su...> writes: > > > I have mixed feelings about making this a separate class / separate > > > function. The current axes imshow / image.AxesImage is already > > > overloaded (handling MxN, MxNx3, MxNx3, _image.Image and PIL images. > > > What is the logic of making a special functions/classes case for MxNx4 > > > with directshow. On the one hand, I appreciate the desire to simplify > > > the code by pulling it into separate classes and functions. On the > > > other hand, I wonder if it will confuse users to have one separate > > > function for UInt8 images and another for everything else. Another > > > worry I have about the separate DirectImage class is that it copies > > > much of the image resize functionality from AxesImage, including the > > > broken handling of aspect='preserve'. This too argues for keeping as > > > much functionality in AxesImage as possible, testing for A.typecode() > > > where appropriate. What do you think? > > Sorry to reply to my own message but I realised just after sending it > that it didn't make any sense (as MxNx4 images aren't normalised > anyway). I now realise that it makes sense to assume that an MxNx4 > UInt8 image will contain pixel data as John suggested and I'll rewrite > my code to implement this. > > It also occurred to me that it would make sense to change the PIL code > to avoid the UInt8 -> Float -> UInt transition. I'll try implementing > this after making the alterations John suggested. Now done in the attached patch, I also added support for MxNx3 images; as I suspected these are noticeably slower than MxNx4 images but I changed my mind and decided it was worth supporting them. I changed the PIL image output to output from PIL RGBX (alpha is 255) or RGBA images only when converting to a string and then array. This will use more memory but, as the conversion to RGBA is done no more than once and by PIL code (which is probably optimised), should be the fastest option. I changed the conversion code to always attempt to convert to RGBA whenever an unsupported format is given and to raise an error on catching a ValueError (rather than before trying the conversion); this adds support for several image types. I'd also suggest changing the error raised here to a ValueError rather than a SystemError (as the error is due to an unsupported value) but haven't in case others are catching the SystemError. Additionally I changed the docstring for the imshow function to state that floating point MxNx3 and MxNx4 arrays aren't normalised (this was the original source of my earlier confusion). Nick
>>>>> "Nicholas" == Nicholas Young <su...@su...> writes: Nicholas> Now done in the attached patch, I also added support for Nicholas> MxNx3 images; as I suspected these are noticeably slower Nicholas> than MxNx4 images but I changed my mind and decided it Nicholas> was worth supporting them. Great, I just committed this to CVS. Andrew, if you get a minute to update from CVS, can you make sure that the PIL changes don't cause a problem for your PIL using mpl scripts? Thanks! JDH
John Hunter wrote: >>>>>>"Nicholas" == Nicholas Young <su...@su...> writes: > > > Nicholas> Now done in the attached patch, I also added support for > Nicholas> MxNx3 images; as I suspected these are noticeably slower > Nicholas> than MxNx4 images but I changed my mind and decided it > Nicholas> was worth supporting them. > > Great, I just committed this to CVS. > > Andrew, if you get a minute to update from CVS, can you make sure that > the PIL changes don't cause a problem for your PIL using mpl scripts? I didn't have much of a check, but image_demo3.py still works. This may be the apparently-longstanding imshow resizing bug (that I haven't followed), but this demo shows some storage behavior when resizing -- horizontal stretching of the window allows the aspect ratio of the image to change to horizontally stretched, but vertical stretching of the window appears to scale the image to maintain its original aspect ratio. Cheers! Andrew
On Wed, 2005年06月01日 at 15:49 -0700, Andrew Straw wrote: > John Hunter wrote: > >>>>>>"Nicholas" == Nicholas Young <su...@su...> writes: > > > > > > Nicholas> Now done in the attached patch, I also added support for > > Nicholas> MxNx3 images; as I suspected these are noticeably slower > > Nicholas> than MxNx4 images but I changed my mind and decided it > > Nicholas> was worth supporting them. > > > > Great, I just committed this to CVS. > > > > Andrew, if you get a minute to update from CVS, can you make sure that > > the PIL changes don't cause a problem for your PIL using mpl scripts? > > I didn't have much of a check, but image_demo3.py still works. > > This may be the apparently-longstanding imshow resizing bug (that I > haven't followed), but this demo shows some storage behavior when > resizing -- horizontal stretching of the window allows the aspect ratio > of the image to change to horizontally stretched, but vertical > stretching of the window appears to scale the image to maintain its > original aspect ratio. I don't think this is connected to the changes I've made; as I've changed none of the resizing/streaching/fixed aspect code. I've not followed any prior discussion (and so don't know why the problem hasn't been fixed) but if I'm reading it correctly it's fairly clear from the general imshow code that problem is going to manifest itself. Nick