matplotlib

Thread: [Matplotlib-users] Matplotlib performance

Brought to you by: cjgohlke, dsdale, efiring, heeres, and 8 others

matplotlib-users

[Matplotlib-users] Matplotlib performance

From: <mat...@on...> - 2008年02月29日 14:27:04

Hi everyone,
Quite recently I started out with learning Python, IPython, SciPy and
Matplotlib, to try see if I could replace some data analysis software
that was previously written in LabVIEW. Slowly I'm getting sort-of the
hang of it, but I've run into a few problems, and I'm now looking
around for some advice...
Main issue is Matplotlib's performance. I'm trying to plot a current
trace from a physics experiment, containing about 300,000 data points.
In LabVIEW, one can easily browse through a data set like this, but I
haven't been able yet to get such a good performance with
IPython+Matplotlib. Especially scrolling/panning through the data is
sluggish. (Anyone knows how to add a scrollbar for this instead of
panning with the mouse, btw?)
I know this is an extremely vague description of the problem, but
perhaps someone can give some general advice on improving performance?
I read something about data clipping, but this doesn't work in my
version of Matplotlib --- the Line2D object doesn't seem to have a
property "data_clippied" anymore.
Also, I read that someone had written an EEG display program (both in
the user's guide and on the examples page), which certainly looked
very good. So it must be possible... :)
In any case, some more details about my system: Windows XP SP2, 2.8
GHz dual core processor with 1 GB RAM; running Python 2.5.1, IPython
0.8.2, Matplotlib 0.90.1, NumPy 1.0.5, SciPy 0.6.0.0006.
Very much looking forward to a small discussion on the subject...
Thanks a lot in advance!
Best regards,
--
Onno Broekmans

Re: [Matplotlib-users] Matplotlib performance

From: Alan G I. <ai...@am...> - 2008年02月29日 15:01:36

On 2008年2月29日, mat...@on... apparently wrote:
> I'm trying to plot a current trace from a physics 
> experiment, containing about 300,000 data points. 
You may find this off topic, since you seem to mean by "plot 
a current trace" something different than I'm familiar with.
Suppose I have a ×ばつ1028 monitor and I'm willing
to have really tiny 9-pixel points to represent an 
observation. Then I can display ×ばつ1028/9 distinct
points, about half as many as you want, and cover
the entire monitor.
So perhaps your question is really about the appropriate way 
to radically down-sample your data?
Cheers,
Alan Isaac

Re: [Matplotlib-users] Matplotlib performance

From: John H. <jd...@gm...> - 2008年02月29日 15:24:36

On Fri, Feb 29, 2008 at 8:27 AM, <mat...@on...> wrote:
> Hi everyone,
>
> Quite recently I started out with learning Python, IPython, SciPy and
> Matplotlib, to try see if I could replace some data analysis software
> that was previously written in LabVIEW. Slowly I'm getting sort-of the
> hang of it, but I've run into a few problems, and I'm now looking
> around for some advice...
>
> Main issue is Matplotlib's performance. I'm trying to plot a current
> trace from a physics experiment, containing about 300,000 data points.
> In LabVIEW, one can easily browse through a data set like this, but I
> haven't been able yet to get such a good performance with
> IPython+Matplotlib. Especially scrolling/panning through the data is
> sluggish. (Anyone knows how to add a scrollbar for this instead of
> panning with the mouse, btw?)
>
http://matplotlib.sf.net/examples/embedding_in_gtk3.py shows an
example using a scrolled window.
You could also use the "clipped line" approach to pass in a custom
class that only plots the data in the current view limits defined by
timemin, timemax. See
http://matplotlib.sf.net/examples/clippedline.py. This example
changes the marker and line style depending on how many points are in
the view port, but you could expand on this idea to do downsampling
when the number of points is too large.
BTW, I wrote the EEG viewer you referred to (it currently lives as
pbrain in the nipy project), and it was a bit pokey, but with some
judicious use of custom classes like the one in clippedline.py you can
probably get acceptable performance. In earlier versions of mpl we
tried to build some of these smarts in (eg the data_clipped) but they
were rarely used and of questionable performance benefit so we pulled
them. We may want to revisit the issue...
Hope this helps,
JDH

Re: [Matplotlib-users] Matplotlib performance

From: Onno B. <mat...@on...> - 2008年02月29日 16:21:33

Hi Alan,
>> I'm trying to plot a current trace from a physics 
>> experiment, containing about 300,000 data points. 
AGI> You may find this off topic, since you seem to mean by "plot
AGI> a current trace" something different than I'm familiar with.
AGI> [snip]
AGI> So perhaps your question is really about the appropriate way 
AGI> to radically down-sample your data?
Thanks for your reply! I agree that under normal circumstances,
downsampling would be a good thing to do. However, in this case, it's
really about the tiny details in the trace, so I'd like to zoom in on
a small part of the data, and then scroll through it. This is already
possible with Matplotlib, but is very slow...
Best regards,
-- 
Onno Broekmans

Re: [Matplotlib-users] Matplotlib performance

From: John H. <jd...@gm...> - 2008年02月29日 17:19:53

On Fri, Feb 29, 2008 at 10:21 AM, Onno Broekmans
<mat...@on...> wrote:
> Thanks for your reply! I agree that under normal circumstances,
> downsampling would be a good thing to do. However, in this case, it's
> really about the tiny details in the trace, so I'd like to zoom in on
> a small part of the data, and then scroll through it. This is already
> possible with Matplotlib, but is very slow...
Note that with the clipped line approach I suggested, you can have the
best of both worlds. Downsampl when N>20000 or some appropriate
number, and plot the full data when you zoom.
JDH

Re: [Matplotlib-users] Matplotlib performance

From: Onno B. <mat...@on...> - 2008年03月01日 16:23:06

JH> Note that with the clipped line approach I suggested, you can have the
JH> best of both worlds. Downsampl when N>20000 or some appropriate
JH> number, and plot the full data when you zoom.
Hm.. Good point. I'll try to implement that. Thanks!
Best regards,
-- 
Onno Broekmans

Re: [Matplotlib-users] Matplotlib performance

From: thewtex <mat...@gm...> - 2008年04月15日 18:35:25

> > Main issue is Matplotlib's performance. I'm trying to plot a current
> > trace from a physics experiment, containing about 300,000 data points.
> > In LabVIEW, one can easily browse through a data set like this, but I
> > haven't been able yet to get such a good performance with
> > IPython+Matplotlib. Especially scrolling/panning through the data is
> > sluggish. (Anyone knows how to add a scrollbar for this instead of
> > panning with the mouse, btw?)
> >
> 
> http://matplotlib.sf.net/examples/embedding_in_gtk3.py shows an
> example using a scrolled window.
> 
> You could also use the "clipped line" approach to pass in a custom
> class that only plots the data in the current view limits defined by
> timemin, timemax. See
> http://matplotlib.sf.net/examples/clippedline.py. This example
> changes the marker and line style depending on how many points are in
> the view port, but you could expand on this idea to do downsampling
> when the number of points is too large.
Hi Onno and JDH,
JDH, I have just started using matplotlib and love it. Thanks so much for your 
work. 
I have come across the same performance issues. My vote is for bringing 
clipped line back and even making it the default. A check may be needed in the 
constructor to make sure it is sorted, but I think it is worth it. If the 
program is used for its primary original intent (plotting), the vast majority 
are going to be increasing in x. 
I am including a class based on ClippedLine that does decimation. Please reply 
if you have improvements and please consider putting something like it in the 
code. This probably should not be used as default, though, because it may not 
be what the user expects. For example, if Onno is looking for very short 
duration spikes, they will not get plotted. That is the nature of the 
decimation beast. And, the filter requires the x data to be equally spaced.
With decimation you not only get performance increases, but you also get rid of 
the smooching that occurs if the data is not monotonic so you can actually see 
something. 
Here are the performance results on my computer:
it took -0.511511087418 seconds for matplotlib.lines.Line2D to draw()
it took -0.4196870327 seconds for __main__.ClippedLine to draw()
downsampling plotted line...
it took -0.11829996109 seconds for __main__.DecimatedClippedLine to draw()
from matplotlib.lines import Line2D
import numpy as npy
from pylab import figure, show, draw
import scipy.signal
import time
# adjusted from /usr/share/doc/matplotlib-0.91.2/examples/clippedline.py
class ClippedLine(Line2D):
 """
 Clip the xlimits to the axes view limits -- this example assumes x is sorted
 """
 def __init__(self, ax, *args, **kwargs):
 Line2D.__init__(self, *args, **kwargs)
	## axes the line is plotted in
 self.ax = ax
 def set_data(self, *args, **kwargs):
 Line2D.set_data(self, *args, **kwargs)
	## what is plotted pre-clipping
 self.xorig = npy.array(self._x)
	## what is plotted pre-clipping
 self.yorig = npy.array(self._y)
 def draw(self, renderer):
 xlim = self.ax.get_xlim()
 ind0, ind1 = npy.searchsorted(self.xorig, xlim)
 self._x = self.xorig[ind0:ind1]
 self._y = self.yorig[ind0:ind1]
 Line2D.draw(self, renderer)
class DecimatedClippedLine(Line2D):
 """
 Decimate and clip the data so it does not take as long to plot. Assumes data 
is sorted and equally spaced.
 """
 def __init__(self, ax, *args, **kwargs):
 """
 *Parameters*:
 ax:
	axes the line is plotted on
 *args, **kwargs:
	Line2D args
 """
 Line2D.__init__(self, *args, **kwargs)
 ## axes the line is plotted in
 self.ax = ax
 def set_data(self, *args, **kwargs):
 Line2D.set_data(self, *args, **kwargs)
 ## data preclipping and decimation
 self.xorig = npy.array(self._x)
 ## data pre clipping and decimation
 self.yorig = npy.array(self._y)
 def draw(self, renderer):
 bb = self.ax.get_window_extent()
 width = bb.width()
 xlim = self.ax.get_xlim()
 ind0, ind1 = npy.searchsorted(self.xorig, xlim)
 if self.ax.get_autoscale_on(): 
	 ylim = 
self.ax.get_xlim()
 
	 self.ax.set_ylim( min([ylim[0], self._y.min()]), max([ylim[1], 
self._y.max()]) ) 
 self._x = self.xorig[ind0:ind1]
 self._y = self.yorig[ind0:ind1]
 if width / float( ind1 - ind0 ) < 0.4: # if number of points to plot is 
much greater than the pixels in the plot
 b, a = scipy.signal.butter(5, width / float( ind1 - ind0 ) )
 print 'downsampling plotted line...'
 filty = scipy.signal.lfilter( b, a, self._y )
 
 step = int( ( ind1 - ind0 ) / width )
 self._x = self._x[::step]
 self._y = filty[::step]
 Line2D.draw(self, renderer)
t = npy.arange(0.0, 100.0, 0.0001)
s = npy.sin(2*npy.pi*t)
s += (npy.random.rand( len(t) ) - 0.5)*3.0
for i in xrange(3):
 starttime = time.time()
 fig = figure(i)
 ax = fig.add_subplot(111, autoscale_on=False)
 if i == 0:
 line = Line2D(t, s, color='g', ls='-', lw=2)
 elif i == 1:
 line = ClippedLine(ax, t, s, color='g', ls='-', lw=2)
 elif i == 2:
 line = DecimatedClippedLine(ax, t, s, color='g', ls='-', lw=2)
 ax.add_line(line)
 ax.set_xlim(10,20)
 ax.set_ylim(-3.3,3.3)
 ax.set_title( str(line.__class__).replace('_','\_') )
 draw()
 endtime = time.time()
 print 'it took', starttime-endtime, 'seconds for', str(line.__class__), 'to 
draw()'
 
show()

Re: [Matplotlib-users] Matplotlib performance

From: Eric F. <ef...@ha...> - 2008年04月15日 20:31:30

thewtex wrote:
>>> Main issue is Matplotlib's performance. I'm trying to plot a current
>>> trace from a physics experiment, containing about 300,000 data points.
>>> In LabVIEW, one can easily browse through a data set like this, but I
>>> haven't been able yet to get such a good performance with
>>> IPython+Matplotlib. Especially scrolling/panning through the data is
>>> sluggish. (Anyone knows how to add a scrollbar for this instead of
>>> panning with the mouse, btw?)
>>>
>> http://matplotlib.sf.net/examples/embedding_in_gtk3.py shows an
>> example using a scrolled window.
>>
>> You could also use the "clipped line" approach to pass in a custom
>> class that only plots the data in the current view limits defined by
>> timemin, timemax. See
>> http://matplotlib.sf.net/examples/clippedline.py. This example
>> changes the marker and line style depending on how many points are in
>> the view port, but you could expand on this idea to do downsampling
>> when the number of points is too large.
> 
> Hi Onno and JDH,
> 
> JDH, I have just started using matplotlib and love it. Thanks so much for your 
> work. 
> 
> I have come across the same performance issues. My vote is for bringing 
> clipped line back and even making it the default. A check may be needed in the 
> constructor to make sure it is sorted, but I think it is worth it. If the 
> program is used for its primary original intent (plotting), the vast majority 
> are going to be increasing in x. 
> 
> I am including a class based on ClippedLine that does decimation. Please reply 
> if you have improvements and please consider putting something like it in the 
> code. This probably should not be used as default, though, because it may not 
> be what the user expects. For example, if Onno is looking for very short 
> duration spikes, they will not get plotted. That is the nature of the 
> decimation beast. And, the filter requires the x data to be equally spaced.
> 
> With decimation you not only get performance increases, but you also get rid of 
> the smooching that occurs if the data is not monotonic so you can actually see 
> something. 
I agree that exploration of large data sets is an important application, 
and that we need to speed it up. A couple days ago I added automatic 
subsetting (but not decimation--although this could be added easily) to 
image drawing, and that made a big difference for panning and zooming 
using imshow or pcolorfast with regular grids.
An easy, built-in interface makes sense for line/marker plotting as 
well, but it will take some thought to figure out exactly what that 
interface should be. The line plotting case (including things like 
scatter) is more complicated than the image. Probably optimizations 
should be specified via kwargs, not by default.
Clipping should not be to points inside the xlim, but should include one 
more point on each side so that lines go to the edge of the box.
Eric

Re: [Matplotlib-users] Matplotlib performance

From: Matt M. <mat...@gm...> - 2008年04月15日 21:45:57

> I agree that exploration of large data sets is an important application,
> and that we need to speed it up. A couple days ago I added automatic
> subsetting (but not decimation--although this could be added easily) to
> image drawing, and that made a big difference for panning and zooming
> using imshow or pcolorfast with regular grids.
Cool. 
Low-pass filtering is more work to implement and takes away from the 
computational gains, but it's necessary to prevent aliasing a la the 
Nyquist-Shannon theorem.
> An easy, built-in interface makes sense for line/marker plotting as
> well, but it will take some thought to figure out exactly what that
> interface should be. The line plotting case (including things like
> scatter) is more complicated than the image. Probably optimizations
> should be specified via kwargs, not by default.
true
> Clipping should not be to points inside the xlim, but should include one
> more point on each side so that lines go to the edge of the box.
Good point. As I understand npy.searchsorted(), it should then be
 ind0 = npy.searchsorted(self.xorig, xlim[0], side='left')
 ind1 = npy.searchsorted(self.xorig, xlim[1], side='right')
instead of 
 ind0, ind1 = npy.searchsorted(self.xorig, xlim)

Thanks for helping keep SourceForge clean.