2

I want to create a new array with specific elements of the original array. I created a minimal example and there it works what I want to achieve, but on the actual data it doesn't and I can't figure out the essential difference.

First the minimal example: I want to get the numbers of every row that stand under the letters A-C and save it in the array B_feat.

 import numpy as np
 years = 5 #A-E
 yearsf = 3 #A-C
 B_new =(['0','A','B','C','D','E','A','B','C','D','E'],
 ['X','2','3','3','3','4','6','5','4','3','4'],
 ['Y','3','4','6','7','3','2','4','7','9','8'],
 ['Z','3','4','6','3','4','6','9','1','4','7']) 
 B_feat = np.zeros((3,2*yearsf)) 
 i=0
 for row in B_feat:
 j=0 
 k=0
 for element in row:
 B_feat[i][j:int(j+yearsf)]=B_new[i+1][k+1:int(k+yearsf+1)]
 j+=yearsf
 k+=years
 i+=1 
 print B_feat

and I receive

[[ 2. 3. 3. 6. 5. 4.]
 [ 3. 4. 6. 2. 4. 7.]
 [ 3. 4. 6. 6. 9. 1.]]

Now with the actual data I have:

 years = 9
 yearsf = 4
 np.shape(B_new) = (244, 181)
 np.shape(B_feat) = (243, 76)

I want to have a new array B_feat that ignores the first row and column of B_new, then skips 9 columns and starts "extracting" always the first 4 out of the next 9 row elements.

 import numpy as np
 i=0
 for row in B_feat:
 j=0 
 k=0
 for element in row:
 B_feat[i][j:int(j+yearsf)]=B_new[i+1][int(k+1+years):int(k+years+yearsf+1)]
 j+=yearsf
 k+=years
 i+=1 

When running the code, I receive the following error:

 IndexError: index 80 is out of bounds for axis 0 with size 76

I don't really understand this error since I thought axis 0 is down the rows (where I have 243 for B_feat) and I could't figure out where the index goes up to 80.

As I'm new to python and this forum, please let me know if I can improve my question or anything is unclearly stated.

asked Mar 9, 2016 at 13:37

1 Answer 1

1

Short answer, you are getting out of range as the error suggest, and this is also true for the minimal example that seems to work. I do not know why it works for the minimal example in the first place.

Detailled answer: In the minimal example, B_newhas 6 columns indexed from 0 to 5. Your internal loop iterate over the number of columns of B_new. And for each iteration, it increments j by yearsf which is 3. At the third iteration, j is 6 wich is larger than the max index 5. You have the exact same issue with the index k.

Suggestion: The number of iteration in your internal loop must be the number of group of columns to process, 2 in your minimal example. By group of column, I simply mean a set of columns below 'A-C'. Your loops can be turned into something like this:

i=0
for row in B_feat:
 for j,k in zip( range(0,B_feat.shape[1], yearsf),
 range(1,B_new.shape[1], years)):
 print 'i = ', i, ', j = ', j, ', k = ', k
 B_feat[i][j:int(j+yearsf)]=B_new[i+1][k:int(k+yearsf)]
 i+=1

Notice that I remove the +1 in the slice of B_new and set k to start at 1 And I will advise you to turn the external loop into something like for i in range(B_feat.shape[0]) and remove the statement i+=1

answered Mar 9, 2016 at 14:34
Sign up to request clarification or add additional context in comments.

2 Comments

Great, I adjusted the code in the following way and it worked out! Thanks a lot for your help! Unfortunately I can't post the code here in a nice format, but I also added the external loop;)
@Dave, there is absolutely no need to post your final code here. You solved your problem and accepted an answer to help those whose have similar problem in the future.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.