2
\$\begingroup\$

I think the arrays here are slowing down my code due to incorrect usage.

There is a lot of stuff going on here, it's understandable if nobody responds. Just looking for some pointers. Thanks.

for MN in xrange(RowModel,TotalModels):
 if(MN%5000==0):print 'Examining models', MN,'to', (MN+5000), '...'
 fobs=fobstemp
 fobserr=fobstemp*0.1 #10% error
 y=np.array([])
 fmod=np.array([])

I was reading that it can be memory exhaustive to run for loops and range functions over numpy arrays and it is better instead to run np.arange[]. Can anyone validate this statement? I'm not sure how to implement it here where RowModel and TotalModels are separate arrays. And assuming they are the same dimensions.

Here's the section of that code (which starts off with the above snippet). Ultimately the values at the end are used in a statistical calculation. The code works but it's insanely slow at this portion.

for MN in xrange(RowModel,TotalModels):
 if(MN%5000==0):print 'Examining models', MN,'to', (MN+5000), '...'
 fobs=fobstemp
 fobserr=fobstemp*0.1 #10% error
 y=np.array([])
 fmod=np.array([])
#Set the scale factor
#units are already embedded in scalemult [mJy]
 if(modcalibrate<=13): 
 scale= (10**(convertStr(data[RowStar][calibrate])/-2.5)*10**(0.4*tau_mag[choice-1])*scalemult)/((convertStr(model[MN][modcalibrate])*1e+03*Fbol*1e+23)/(c/wavelengthmodel[modcalibrate-10]))
 else: 
 scale= (convertStr(data[RowStar][calibrate])*10**(0.4*tau_mag[choice-1]))/(convertStr(model[MN][modcalibrate])*Fbol*1e+029/(c/wavelengthmodel[modcalibrate-10])) 
#Convert the model data into fluxes and insert into array fmod
#
 for i in xrange(10+len(wavelength)-2):#2 repeats
 y=np.append(y,model[MN][i])
 for i in xrange(10,10+len(wavelength)-2): #2 repeats
 Alambda=Ak/wavelength[i-10]
 fmod=np.append(fmod,convertStr(y[i])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i-10]))
 if i== 11:
 fmod=np.append(fmod,convertStr(y[i])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i-10]))
 if i== 13:
 fmod=np.append(fmod,convertStr(y[i])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i-10]))
#Delete the components that are not shared between fobs and fmod
#
 j=0
 for i in xrange(len(wavelength)):
 if j<=len(fobs)-1:
 if -10<fobs[j]<-9:
 fobs=np.delete(fobs,j,None)
 fobserr=np.delete(fobserr,j,None)
 fmod=np.delete(fmod,j,None)
 else: 
 j=j+1
#Calculate Chi^2 for the model and insert into an array
# 
 scalefactor=np.append(scalefactor,scale)
 chi2_new=np.sum(((fobscaled-fmod)/(fobserr))**2)/(len(fobscaled)-1)
 chi2=np.append(chi2,chi2_new)
 if chi2_new<ChiThresh: #User input required here for Chi threshold maximum. 
 BestFitsModel=np.append(BestFitsModel,MN)
 BestFitsChi=np.append(BestFitsChi,chi2_new)
 BestFitsScale=np.append(BestFitsScale,scale)
 counter=counter+1
 temp=np.sort(chi2)

wavelengthmodel is defined as an array of size 8. (8 entries, all floats).

wavelength is defined as an array of size 11 (11 entries, all floats).

modcalibrate is an index array which corresponds to a list imported elsewhere.

MN is a numerical array starting at 0 and ending at some large number. This corresponds to TopModels and RowModel which starts at 0 (RowModel value) and ends at n (TopModels value).

Anything else undefined in this script-section is an empty array until filled here, except 'ChiThresh' which is an integer of user input.

asked Jul 16, 2013 at 18:53
\$\endgroup\$
5
  • \$\begingroup\$ Where did you read that? \$\endgroup\$ Commented Jul 16, 2013 at 19:05
  • \$\begingroup\$ Looks like you wrote it...maybe I misunderstood? Here's the link \$\endgroup\$ Commented Jul 16, 2013 at 19:12
  • \$\begingroup\$ The concern is less memory and more processing time. Using arange instead of range won't really help. You need to avoid for loops. You can get help doing that here, but you'll need to actual show working code. i.e. show us code that does the right thing, even if its slow. \$\endgroup\$ Commented Jul 16, 2013 at 19:23
  • \$\begingroup\$ I was afraid of that. I'll post the code via github and link it in the original post. It's long, ugly, and very slow. \$\endgroup\$ Commented Jul 16, 2013 at 19:30
  • \$\begingroup\$ see the FAQ, you aren't allowed to just link code. You must at least include some portion of it in the question. \$\endgroup\$ Commented Jul 16, 2013 at 19:32

1 Answer 1

4
\$\begingroup\$

You should almost never be using np.append, numpy arrays are designed to be used as fixed size arrays. np.append creates a whole new array each time and will thus fairly expensive.

The first thing is that you should create the array at its full size. Then assign to the elements:

y = np.empty(10+len(wavelength)-2, int) # I don't know the type, guessing here
for i in xrange(10+len(wavelength)-2):#2 repeats
 y[i] = model[MN][i]

The second thing is that you should look for numpy operations to do the same thing your loop does. In this case, I can actually replace that code by:

y = model[MN, :10 + len(wavelength) - 2]

This assumes that model is a numpy array. If it is isn't, you should make it one.

Let's look at the code for fmod

for i in xrange(10,10+len(wavelength)-2): #2 repeats
 Alambda=Ak/wavelength[i-10]
 fmod=np.append(fmod,convertStr(y[i])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i-10]))
 if i== 11:
 fmod=np.append(fmod,convertStr(y[i])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i-10]))
 if i== 13:
 fmod=np.append(fmod,convertStr(y[i])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i-10]))

It is not as obvious how to vectorize this code. Firstly, we can clean up the code a bit. Eliminate the unused Alambda, eliminate duplicate pieces of code, and change the loop to start with zero. (When trying to vectorize you pretty much always want to start by changing your loops to start with 0).

for i in xrange(len(wavelength)-2): #2 repeats
 value = convertStr(y[i+ 10])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i])
 fmod=np.append(fmod, value)
 if i == 11 or i == 13:
 fmod = np.append(fmod, value)

It is still not clear how to break up this loop, so we apply a general strategy of breaking up a loop. If a loop does multiple things, we break the loop up so that the loop only does one thing. Here is the first loop.

value = numpy.empty( len(wavelength) - 2)
for i in xrange(len(wavelength) - 2):
 value[i] = convertStr(y[i+10])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i])

Once we have the loop down to the simple level of assigning to each element, we want to vectorize it. I don't know what convertStr does, but I'll assume it can be adapted to vectorize whatever I pass into it. So I can replace this loop by:

value = convertStr(y[10:])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel)

Let's move onto the second loop

for i in xrange(len(wavelength)-2): #2 repeats
 fmod=np.append(fmod, value[i])
 if i == 11 or i == 13:
 fmod = np.append(fmod, value[i])

This essentially copies value, but introduces repeats. There are few ways this could be accomplished. Probability the simplest would be:

fmod = np.empty( len(wavelength), int)
fmod[0:11] = value[0:11]
fmod[11:13] = value[11]
fmod[13] = value[12]
fmod[14:17] = value[13]
fmod[17:] = value[14:]

Hopefully that gives you an idea of how to approach improving the performance.

answered Jul 16, 2013 at 21:23
\$\endgroup\$
3
  • \$\begingroup\$ Wow, thanks. I have to absorb all of this for sure. I'd give you a +1 but my minuscule reputations prevents it. THANKS!!! \$\endgroup\$ Commented Jul 16, 2013 at 21:31
  • \$\begingroup\$ Question: In your first breakdown of my code where you show how to create the y array, would I still need to declare y as an array like I've done? Or is that created assigned automatically with np.empty? \$\endgroup\$ Commented Jul 16, 2013 at 21:33
  • 1
    \$\begingroup\$ @Matt, in python the syntax y = ? always throws away whatever was in y. So y = numpy.empty(?) creates a new array and doesn't need you to have created an array in y. \$\endgroup\$ Commented Jul 16, 2013 at 21:38

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.