Hi everyone, Quite recently I started out with learning Python, IPython, SciPy and Matplotlib, to try see if I could replace some data analysis software that was previously written in LabVIEW. Slowly I'm getting sort-of the hang of it, but I've run into a few problems, and I'm now looking around for some advice... Main issue is Matplotlib's performance. I'm trying to plot a current trace from a physics experiment, containing about 300,000 data points. In LabVIEW, one can easily browse through a data set like this, but I haven't been able yet to get such a good performance with IPython+Matplotlib. Especially scrolling/panning through the data is sluggish. (Anyone knows how to add a scrollbar for this instead of panning with the mouse, btw?) I know this is an extremely vague description of the problem, but perhaps someone can give some general advice on improving performance? I read something about data clipping, but this doesn't work in my version of Matplotlib --- the Line2D object doesn't seem to have a property "data_clippied" anymore. Also, I read that someone had written an EEG display program (both in the user's guide and on the examples page), which certainly looked very good. So it must be possible... :) In any case, some more details about my system: Windows XP SP2, 2.8 GHz dual core processor with 1 GB RAM; running Python 2.5.1, IPython 0.8.2, Matplotlib 0.90.1, NumPy 1.0.5, SciPy 0.6.0.0006. Very much looking forward to a small discussion on the subject... Thanks a lot in advance! Best regards, -- Onno Broekmans
On 2008年2月29日, mat...@on... apparently wrote: > I'm trying to plot a current trace from a physics > experiment, containing about 300,000 data points. You may find this off topic, since you seem to mean by "plot a current trace" something different than I'm familiar with. Suppose I have a ×ばつ1028 monitor and I'm willing to have really tiny 9-pixel points to represent an observation. Then I can display ×ばつ1028/9 distinct points, about half as many as you want, and cover the entire monitor. So perhaps your question is really about the appropriate way to radically down-sample your data? Cheers, Alan Isaac
On Fri, Feb 29, 2008 at 8:27 AM, <mat...@on...> wrote: > Hi everyone, > > Quite recently I started out with learning Python, IPython, SciPy and > Matplotlib, to try see if I could replace some data analysis software > that was previously written in LabVIEW. Slowly I'm getting sort-of the > hang of it, but I've run into a few problems, and I'm now looking > around for some advice... > > Main issue is Matplotlib's performance. I'm trying to plot a current > trace from a physics experiment, containing about 300,000 data points. > In LabVIEW, one can easily browse through a data set like this, but I > haven't been able yet to get such a good performance with > IPython+Matplotlib. Especially scrolling/panning through the data is > sluggish. (Anyone knows how to add a scrollbar for this instead of > panning with the mouse, btw?) > http://matplotlib.sf.net/examples/embedding_in_gtk3.py shows an example using a scrolled window. You could also use the "clipped line" approach to pass in a custom class that only plots the data in the current view limits defined by timemin, timemax. See http://matplotlib.sf.net/examples/clippedline.py. This example changes the marker and line style depending on how many points are in the view port, but you could expand on this idea to do downsampling when the number of points is too large. BTW, I wrote the EEG viewer you referred to (it currently lives as pbrain in the nipy project), and it was a bit pokey, but with some judicious use of custom classes like the one in clippedline.py you can probably get acceptable performance. In earlier versions of mpl we tried to build some of these smarts in (eg the data_clipped) but they were rarely used and of questionable performance benefit so we pulled them. We may want to revisit the issue... Hope this helps, JDH
Hi Alan, >> I'm trying to plot a current trace from a physics >> experiment, containing about 300,000 data points. AGI> You may find this off topic, since you seem to mean by "plot AGI> a current trace" something different than I'm familiar with. AGI> [snip] AGI> So perhaps your question is really about the appropriate way AGI> to radically down-sample your data? Thanks for your reply! I agree that under normal circumstances, downsampling would be a good thing to do. However, in this case, it's really about the tiny details in the trace, so I'd like to zoom in on a small part of the data, and then scroll through it. This is already possible with Matplotlib, but is very slow... Best regards, -- Onno Broekmans
On Fri, Feb 29, 2008 at 10:21 AM, Onno Broekmans <mat...@on...> wrote: > Thanks for your reply! I agree that under normal circumstances, > downsampling would be a good thing to do. However, in this case, it's > really about the tiny details in the trace, so I'd like to zoom in on > a small part of the data, and then scroll through it. This is already > possible with Matplotlib, but is very slow... Note that with the clipped line approach I suggested, you can have the best of both worlds. Downsampl when N>20000 or some appropriate number, and plot the full data when you zoom. JDH
JH> Note that with the clipped line approach I suggested, you can have the JH> best of both worlds. Downsampl when N>20000 or some appropriate JH> number, and plot the full data when you zoom. Hm.. Good point. I'll try to implement that. Thanks! Best regards, -- Onno Broekmans
> > Main issue is Matplotlib's performance. I'm trying to plot a current > > trace from a physics experiment, containing about 300,000 data points. > > In LabVIEW, one can easily browse through a data set like this, but I > > haven't been able yet to get such a good performance with > > IPython+Matplotlib. Especially scrolling/panning through the data is > > sluggish. (Anyone knows how to add a scrollbar for this instead of > > panning with the mouse, btw?) > > > > http://matplotlib.sf.net/examples/embedding_in_gtk3.py shows an > example using a scrolled window. > > You could also use the "clipped line" approach to pass in a custom > class that only plots the data in the current view limits defined by > timemin, timemax. See > http://matplotlib.sf.net/examples/clippedline.py. This example > changes the marker and line style depending on how many points are in > the view port, but you could expand on this idea to do downsampling > when the number of points is too large. Hi Onno and JDH, JDH, I have just started using matplotlib and love it. Thanks so much for your work. I have come across the same performance issues. My vote is for bringing clipped line back and even making it the default. A check may be needed in the constructor to make sure it is sorted, but I think it is worth it. If the program is used for its primary original intent (plotting), the vast majority are going to be increasing in x. I am including a class based on ClippedLine that does decimation. Please reply if you have improvements and please consider putting something like it in the code. This probably should not be used as default, though, because it may not be what the user expects. For example, if Onno is looking for very short duration spikes, they will not get plotted. That is the nature of the decimation beast. And, the filter requires the x data to be equally spaced. With decimation you not only get performance increases, but you also get rid of the smooching that occurs if the data is not monotonic so you can actually see something. Here are the performance results on my computer: it took -0.511511087418 seconds for matplotlib.lines.Line2D to draw() it took -0.4196870327 seconds for __main__.ClippedLine to draw() downsampling plotted line... it took -0.11829996109 seconds for __main__.DecimatedClippedLine to draw() from matplotlib.lines import Line2D import numpy as npy from pylab import figure, show, draw import scipy.signal import time # adjusted from /usr/share/doc/matplotlib-0.91.2/examples/clippedline.py class ClippedLine(Line2D): """ Clip the xlimits to the axes view limits -- this example assumes x is sorted """ def __init__(self, ax, *args, **kwargs): Line2D.__init__(self, *args, **kwargs) ## axes the line is plotted in self.ax = ax def set_data(self, *args, **kwargs): Line2D.set_data(self, *args, **kwargs) ## what is plotted pre-clipping self.xorig = npy.array(self._x) ## what is plotted pre-clipping self.yorig = npy.array(self._y) def draw(self, renderer): xlim = self.ax.get_xlim() ind0, ind1 = npy.searchsorted(self.xorig, xlim) self._x = self.xorig[ind0:ind1] self._y = self.yorig[ind0:ind1] Line2D.draw(self, renderer) class DecimatedClippedLine(Line2D): """ Decimate and clip the data so it does not take as long to plot. Assumes data is sorted and equally spaced. """ def __init__(self, ax, *args, **kwargs): """ *Parameters*: ax: axes the line is plotted on *args, **kwargs: Line2D args """ Line2D.__init__(self, *args, **kwargs) ## axes the line is plotted in self.ax = ax def set_data(self, *args, **kwargs): Line2D.set_data(self, *args, **kwargs) ## data preclipping and decimation self.xorig = npy.array(self._x) ## data pre clipping and decimation self.yorig = npy.array(self._y) def draw(self, renderer): bb = self.ax.get_window_extent() width = bb.width() xlim = self.ax.get_xlim() ind0, ind1 = npy.searchsorted(self.xorig, xlim) if self.ax.get_autoscale_on(): ylim = self.ax.get_xlim() self.ax.set_ylim( min([ylim[0], self._y.min()]), max([ylim[1], self._y.max()]) ) self._x = self.xorig[ind0:ind1] self._y = self.yorig[ind0:ind1] if width / float( ind1 - ind0 ) < 0.4: # if number of points to plot is much greater than the pixels in the plot b, a = scipy.signal.butter(5, width / float( ind1 - ind0 ) ) print 'downsampling plotted line...' filty = scipy.signal.lfilter( b, a, self._y ) step = int( ( ind1 - ind0 ) / width ) self._x = self._x[::step] self._y = filty[::step] Line2D.draw(self, renderer) t = npy.arange(0.0, 100.0, 0.0001) s = npy.sin(2*npy.pi*t) s += (npy.random.rand( len(t) ) - 0.5)*3.0 for i in xrange(3): starttime = time.time() fig = figure(i) ax = fig.add_subplot(111, autoscale_on=False) if i == 0: line = Line2D(t, s, color='g', ls='-', lw=2) elif i == 1: line = ClippedLine(ax, t, s, color='g', ls='-', lw=2) elif i == 2: line = DecimatedClippedLine(ax, t, s, color='g', ls='-', lw=2) ax.add_line(line) ax.set_xlim(10,20) ax.set_ylim(-3.3,3.3) ax.set_title( str(line.__class__).replace('_','\_') ) draw() endtime = time.time() print 'it took', starttime-endtime, 'seconds for', str(line.__class__), 'to draw()' show()
thewtex wrote: >>> Main issue is Matplotlib's performance. I'm trying to plot a current >>> trace from a physics experiment, containing about 300,000 data points. >>> In LabVIEW, one can easily browse through a data set like this, but I >>> haven't been able yet to get such a good performance with >>> IPython+Matplotlib. Especially scrolling/panning through the data is >>> sluggish. (Anyone knows how to add a scrollbar for this instead of >>> panning with the mouse, btw?) >>> >> http://matplotlib.sf.net/examples/embedding_in_gtk3.py shows an >> example using a scrolled window. >> >> You could also use the "clipped line" approach to pass in a custom >> class that only plots the data in the current view limits defined by >> timemin, timemax. See >> http://matplotlib.sf.net/examples/clippedline.py. This example >> changes the marker and line style depending on how many points are in >> the view port, but you could expand on this idea to do downsampling >> when the number of points is too large. > > Hi Onno and JDH, > > JDH, I have just started using matplotlib and love it. Thanks so much for your > work. > > I have come across the same performance issues. My vote is for bringing > clipped line back and even making it the default. A check may be needed in the > constructor to make sure it is sorted, but I think it is worth it. If the > program is used for its primary original intent (plotting), the vast majority > are going to be increasing in x. > > I am including a class based on ClippedLine that does decimation. Please reply > if you have improvements and please consider putting something like it in the > code. This probably should not be used as default, though, because it may not > be what the user expects. For example, if Onno is looking for very short > duration spikes, they will not get plotted. That is the nature of the > decimation beast. And, the filter requires the x data to be equally spaced. > > With decimation you not only get performance increases, but you also get rid of > the smooching that occurs if the data is not monotonic so you can actually see > something. I agree that exploration of large data sets is an important application, and that we need to speed it up. A couple days ago I added automatic subsetting (but not decimation--although this could be added easily) to image drawing, and that made a big difference for panning and zooming using imshow or pcolorfast with regular grids. An easy, built-in interface makes sense for line/marker plotting as well, but it will take some thought to figure out exactly what that interface should be. The line plotting case (including things like scatter) is more complicated than the image. Probably optimizations should be specified via kwargs, not by default. Clipping should not be to points inside the xlim, but should include one more point on each side so that lines go to the edge of the box. Eric
> I agree that exploration of large data sets is an important application, > and that we need to speed it up. A couple days ago I added automatic > subsetting (but not decimation--although this could be added easily) to > image drawing, and that made a big difference for panning and zooming > using imshow or pcolorfast with regular grids. Cool. Low-pass filtering is more work to implement and takes away from the computational gains, but it's necessary to prevent aliasing a la the Nyquist-Shannon theorem. > An easy, built-in interface makes sense for line/marker plotting as > well, but it will take some thought to figure out exactly what that > interface should be. The line plotting case (including things like > scatter) is more complicated than the image. Probably optimizations > should be specified via kwargs, not by default. true > Clipping should not be to points inside the xlim, but should include one > more point on each side so that lines go to the edge of the box. Good point. As I understand npy.searchsorted(), it should then be ind0 = npy.searchsorted(self.xorig, xlim[0], side='left') ind1 = npy.searchsorted(self.xorig, xlim[1], side='right') instead of ind0, ind1 = npy.searchsorted(self.xorig, xlim)