What would be the optimal way to address these arrays

Question 1

I think the arrays here are slowing down my code due to incorrect usage.

There is a lot of stuff going on here, it's understandable if nobody responds. Just looking for some pointers. Thanks.

for MN in xrange(RowModel,TotalModels):
 if(MN%5000==0):print 'Examining models', MN,'to', (MN+5000), '...'
 fobs=fobstemp
 fobserr=fobstemp*0.1 #10% error
 y=np.array([])
 fmod=np.array([])

I was reading that it can be memory exhaustive to run for loops and range functions over numpy arrays and it is better instead to run np.arange[]. Can anyone validate this statement? I'm not sure how to implement it here where RowModel and TotalModels are separate arrays. And assuming they are the same dimensions.

Here's the section of that code (which starts off with the above snippet). Ultimately the values at the end are used in a statistical calculation. The code works but it's insanely slow at this portion.

for MN in xrange(RowModel,TotalModels):
 if(MN%5000==0):print 'Examining models', MN,'to', (MN+5000), '...'
 fobs=fobstemp
 fobserr=fobstemp*0.1 #10% error
 y=np.array([])
 fmod=np.array([])
#Set the scale factor
#units are already embedded in scalemult [mJy]
 if(modcalibrate<=13): 
 scale= (10**(convertStr(data[RowStar][calibrate])/-2.5)*10**(0.4*tau_mag[choice-1])*scalemult)/((convertStr(model[MN][modcalibrate])*1e+03*Fbol*1e+23)/(c/wavelengthmodel[modcalibrate-10]))
 else: 
 scale= (convertStr(data[RowStar][calibrate])*10**(0.4*tau_mag[choice-1]))/(convertStr(model[MN][modcalibrate])*Fbol*1e+029/(c/wavelengthmodel[modcalibrate-10])) 
#Convert the model data into fluxes and insert into array fmod
#
 for i in xrange(10+len(wavelength)-2):#2 repeats
 y=np.append(y,model[MN][i])
 for i in xrange(10,10+len(wavelength)-2): #2 repeats
 Alambda=Ak/wavelength[i-10]
 fmod=np.append(fmod,convertStr(y[i])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i-10]))
 if i== 11:
 fmod=np.append(fmod,convertStr(y[i])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i-10]))
 if i== 13:
 fmod=np.append(fmod,convertStr(y[i])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i-10]))
#Delete the components that are not shared between fobs and fmod
#
 j=0
 for i in xrange(len(wavelength)):
 if j<=len(fobs)-1:
 if -10<fobs[j]<-9:
 fobs=np.delete(fobs,j,None)
 fobserr=np.delete(fobserr,j,None)
 fmod=np.delete(fmod,j,None)
 else: 
 j=j+1
#Calculate Chi^2 for the model and insert into an array
# 
 scalefactor=np.append(scalefactor,scale)
 chi2_new=np.sum(((fobscaled-fmod)/(fobserr))**2)/(len(fobscaled)-1)
 chi2=np.append(chi2,chi2_new)
 if chi2_new<ChiThresh: #User input required here for Chi threshold maximum. 
 BestFitsModel=np.append(BestFitsModel,MN)
 BestFitsChi=np.append(BestFitsChi,chi2_new)
 BestFitsScale=np.append(BestFitsScale,scale)
 counter=counter+1
 temp=np.sort(chi2)

wavelengthmodel is defined as an array of size 8. (8 entries, all floats).

wavelength is defined as an array of size 11 (11 entries, all floats).

modcalibrate is an index array which corresponds to a list imported elsewhere.

MN is a numerical array starting at 0 and ending at some large number. This corresponds to TopModels and RowModel which starts at 0 (RowModel value) and ends at n (TopModels value).

Anything else undefined in this script-section is an empty array until filled here, except 'ChiThresh' which is an integer of user input.

Question 2

Where did you read that?

Question 3

Looks like you wrote it...maybe I misunderstood? Here's the link

Question 4

The concern is less memory and more processing time. Using arange instead of range won't really help. You need to avoid for loops. You can get help doing that here, but you'll need to actual show working code. i.e. show us code that does the right thing, even if its slow.

Question 5

I was afraid of that. I'll post the code via github and link it in the original post. It's long, ugly, and very slow.

Question 6

see the FAQ, you aren't allowed to just link code. You must at least include some portion of it in the question.

Question 7

You should almost never be using np.append, numpy arrays are designed to be used as fixed size arrays. np.append creates a whole new array each time and will thus fairly expensive.

The first thing is that you should create the array at its full size. Then assign to the elements:

y = np.empty(10+len(wavelength)-2, int) # I don't know the type, guessing here
for i in xrange(10+len(wavelength)-2):#2 repeats
 y[i] = model[MN][i]

The second thing is that you should look for numpy operations to do the same thing your loop does. In this case, I can actually replace that code by:

y = model[MN, :10 + len(wavelength) - 2]

This assumes that model is a numpy array. If it is isn't, you should make it one.

Let's look at the code for fmod

for i in xrange(10,10+len(wavelength)-2): #2 repeats
 Alambda=Ak/wavelength[i-10]
 fmod=np.append(fmod,convertStr(y[i])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i-10]))
 if i== 11:
 fmod=np.append(fmod,convertStr(y[i])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i-10]))
 if i== 13:
 fmod=np.append(fmod,convertStr(y[i])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i-10]))

It is not as obvious how to vectorize this code. Firstly, we can clean up the code a bit. Eliminate the unused Alambda, eliminate duplicate pieces of code, and change the loop to start with zero. (When trying to vectorize you pretty much always want to start by changing your loops to start with 0).

for i in xrange(len(wavelength)-2): #2 repeats
 value = convertStr(y[i+ 10])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i])
 fmod=np.append(fmod, value)
 if i == 11 or i == 13:
 fmod = np.append(fmod, value)

It is still not clear how to break up this loop, so we apply a general strategy of breaking up a loop. If a loop does multiple things, we break the loop up so that the loop only does one thing. Here is the first loop.

value = numpy.empty( len(wavelength) - 2)
for i in xrange(len(wavelength) - 2):
 value[i] = convertStr(y[i+10])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i])

Once we have the loop down to the simple level of assigning to each element, we want to vectorize it. I don't know what convertStr does, but I'll assume it can be adapted to vectorize whatever I pass into it. So I can replace this loop by:

value = convertStr(y[10:])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel)

Let's move onto the second loop

for i in xrange(len(wavelength)-2): #2 repeats
 fmod=np.append(fmod, value[i])
 if i == 11 or i == 13:
 fmod = np.append(fmod, value[i])

This essentially copies value, but introduces repeats. There are few ways this could be accomplished. Probability the simplest would be:

fmod = np.empty( len(wavelength), int)
fmod[0:11] = value[0:11]
fmod[11:13] = value[11]
fmod[13] = value[12]
fmod[14:17] = value[13]
fmod[17:] = value[14:]

Hopefully that gives you an idea of how to approach improving the performance.

Question 8

Wow, thanks. I have to absorb all of this for sure. I'd give you a +1 but my minuscule reputations prevents it. THANKS!!!

Question 9

Question: In your first breakdown of my code where you show how to create the y array, would I still need to declare y as an array like I've done? Or is that created assigned automatically with np.empty?

Question 10

@Matt, in python the syntax y = ? always throws away whatever was in y. So y = numpy.empty(?) creates a new array and doesn't need you to have created an array in y.

Winston Ewert Winston Ewert 30.7k4 gold badges52 silver badges79 bronze badges · Answer 1 · 2013-07-16 21:23:21Z

You should almost never be using np.append, numpy arrays are designed to be used as fixed size arrays. np.append creates a whole new array each time and will thus fairly expensive.

The first thing is that you should create the array at its full size. Then assign to the elements:

y = np.empty(10+len(wavelength)-2, int) # I don't know the type, guessing here
for i in xrange(10+len(wavelength)-2):#2 repeats
 y[i] = model[MN][i]

The second thing is that you should look for numpy operations to do the same thing your loop does. In this case, I can actually replace that code by:

y = model[MN, :10 + len(wavelength) - 2]

This assumes that model is a numpy array. If it is isn't, you should make it one.

Let's look at the code for fmod

for i in xrange(10,10+len(wavelength)-2): #2 repeats
 Alambda=Ak/wavelength[i-10]
 fmod=np.append(fmod,convertStr(y[i])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i-10]))
 if i== 11:
 fmod=np.append(fmod,convertStr(y[i])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i-10]))
 if i== 13:
 fmod=np.append(fmod,convertStr(y[i])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i-10]))

It is not as obvious how to vectorize this code. Firstly, we can clean up the code a bit. Eliminate the unused Alambda, eliminate duplicate pieces of code, and change the loop to start with zero. (When trying to vectorize you pretty much always want to start by changing your loops to start with 0).

for i in xrange(len(wavelength)-2): #2 repeats
 value = convertStr(y[i+ 10])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i])
 fmod=np.append(fmod, value)
 if i == 11 or i == 13:
 fmod = np.append(fmod, value)

It is still not clear how to break up this loop, so we apply a general strategy of breaking up a loop. If a loop does multiple things, we break the loop up so that the loop only does one thing. Here is the first loop.

value = numpy.empty( len(wavelength) - 2)
for i in xrange(len(wavelength) - 2):
 value[i] = convertStr(y[i+10])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel[i])

Once we have the loop down to the simple level of assigning to each element, we want to vectorize it. I don't know what convertStr does, but I'll assume it can be adapted to vectorize whatever I pass into it. So I can replace this loop by:

value = convertStr(y[10:])*Fbol*scale*1e+023*1e+03/(c/wavelengthmodel)

Let's move onto the second loop

for i in xrange(len(wavelength)-2): #2 repeats
 fmod=np.append(fmod, value[i])
 if i == 11 or i == 13:
 fmod = np.append(fmod, value[i])

This essentially copies value, but introduces repeats. There are few ways this could be accomplished. Probability the simplest would be:

fmod = np.empty( len(wavelength), int)
fmod[0:11] = value[0:11]
fmod[11:13] = value[11]
fmod[13] = value[12]
fmod[14:17] = value[13]
fmod[17:] = value[14:]

Hopefully that gives you an idea of how to approach improving the performance.

Wow, thanks. I have to absorb all of this for sure. I'd give you a +1 but my minuscule reputations prevents it. THANKS!!!
Question: In your first breakdown of my code where you show how to create the y array, would I still need to declare y as an array like I've done? Or is that created assigned automatically with np.empty?
@Matt, in python the syntax y = ? always throws away whatever was in y. So y = numpy.empty(?) creates a new array and doesn't need you to have created an array in y.

Stack Exchange Network

What would be the optimal way to address these arrays

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

What would be the optimal way to address these arrays

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions