I need to generate and save thousands of scatter plots which are essentially the same except the only thing that changes is the "y" variable. What is the fastest way to do this?
I thought about creating the Figure and Axes instance and simply clearing then between plots like this:
import matplotlib.pyplot as plt
import numpy as np
data = np.random.random((100, 1000))
x = list(range(100))
fig = plt.figure()
ax = fig.add_subplot(111)
for i in range(data.shape[1]):
ax.scatter(x, data[:,i])
fig.savefig("%d.png" % i, dpi=100)
ax.cla()
This still takes an decent amount of time, so is a better/faster way to do this? Each image in this example is about 15kb so I'm assuming writing to disk is not limiting the speed too much.
2 Answers 2
One option is to use multiple processes.
import matplotlib.pyplot as plt
import numpy as np
import multiprocessing
data = np.random.random((100, 1000))
x = list(range(100))
fig = plt.figure()
ax = fig.add_subplot(111)
def save_plot(i):
ax.scatter(x, data[:,i])
fig.savefig("%d.png" % i, dpi=100)
ax.cla()
p = multiprocessing.Pool(4)
p.map(save_plot, range(data.shape[1]))
1 Comment
Try creating your figures without the GUI. I find that much faster when creating and saving many figures.
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure
import matplotlib.pyplot as plt
fig = plt.Figure()
ax = fig.add_subplot(111)
ax.plot(range(5))
canvas = FigureCanvas(fig)
canvas.print_figure('sample.png')
Something similar can be found at http://www.dalkescientific.com/writings/diary/archive/2005/04/23/matplotlib_without_gui.html