0

2d array, consists of 2 axes, axis=0 which represents the rows and the axis=1 represents the columns

aa = np.random.randn(10, 2) # Here is 2d array, first axis has 10 rows and second axis has 2 columns
array([[ 0.6999521 , -0.17597954],
 [ 1.70622947, -0.85919459],
 [-0.90019284, 0.80774052],
 [-1.42953238, 0.19727917],
 [-0.03416532, 0.49584749],
 [-0.28981586, -0.77484498],
 [-1.31129122, 0.423833 ],
 [-0.43920016, -1.93541758],
 [-0.06667634, 2.09925218],
 [ 1.24633485, -0.04153847]])

why when I want to scatter the points I only consider the first column and the second column dimension from axis=1? do dimensions mean columns when plotting and at other times they mean axes? can you please explain more the reasons to do it like this? and if there are good references I could benefit myself on dimensions relating to this

plt.scatter(x[:,0], x[:,1]) # this also means dimensions or columns?
x[:,0], x[:,1] why not do x[0,:], x[:,1}
asked Dec 23, 2022 at 19:35
2
  • x[:,0] is a 1d array. For scatter it doesn't matter whether the 1d array is made diectly with np.array([1,2,3]) or indiectly from the columns or rows of the 2d array. Commented Dec 23, 2022 at 22:52
  • x[0,:] has 2 elements, the 1st 'row'. x[:,1] has 10. You can't 'scatter' 10 against 2. scatter wants 2 1d arrays that match in length. Commented Dec 23, 2022 at 22:58

2 Answers 2

0

It can be difficult to visualize this, especially in multiple dimensions.

The parameters to the [] operator represent the dimensions. Your first dimension is the rows. The first row is array[0]. Your second dimension is the columns. The entire second column is called array[:,1] -- the ":" is a numpy notation that means "take all of this dimension". array[2,1] refers to the second column in the third row.

plt.scatter expects the x coordinate values as its first parameter, and the y coordinate values as its second parameter. plt.scatter(x[:,0], x[:,1]) means "take all of column 0" and "take all of column 1", which is the way scatter wants them.

answered Dec 23, 2022 at 19:43
Sign up to request clarification or add additional context in comments.

5 Comments

yes you didn't answer my question, why take the columns x[:,0], x[:,1] and not take rows and columns which represent the 2 axes/dims of the 2d array?
Because those are the arguments that plt.scatter expects. It wants two arrays, where each element in the first array matches the corresponding element in the second array. It could have been written differently, but it wasn't.
aha, where can I learn more about this? I can't find this in linear algebra, are there any good references that explain dimensionality? because I also have problems understanding which axis to choose to do mean/sum!!
numpy arrays are more grneral than linear algebra matrices and vectors (or more abstract). Don't try to find 'meaning' else where.
Axis 0 is dimension 0 -- the rows in your case. Axis 1 is dimension 1 -- the columns in your case. sum(aa,axis=1) would produce two elements with sums of the columns. sum(aa,axis=0) would produce 10 elements, when the sum of the coordinates for each point. Playing with it is probably the best way to learn.
0

With this randn call you make a 2d array with the specified shape. The dimensions, 10 and 2, don't represent anything - that's an abstract (10,2) array. Meaning comes from how you use it.

In [50]: aa = np.random.randn(10, 2)
In [51]: aa
Out[51]: 
array([[-0.26769106, 0.09882999],
 [-1.5605514 , -1.38614473],
 [ 1.23312852, 0.86838848],
 [ 1.2603898 , 2.19895989],
 [-1.66937976, 0.79666952],
 [-0.15596669, 1.47848784],
 [ 1.74964902, 0.39280584],
 [-1.0982447 , 0.46888408],
 [ 0.84396231, -0.34809148],
 [-0.83489678, -1.8093045 ]])

That's a display - with rows and columns.

Rather than pass the slices directly to scatter lets assign them to variables:

In [52]: x = aa[:,0]; y = aa[:,1]; x,y
Out[52]: 
(array([-0.26769106, -1.5605514 , 1.23312852, 1.2603898 , -1.66937976,
 -0.15596669, 1.74964902, -1.0982447 , 0.84396231, -0.83489678]),
 array([ 0.09882999, -1.38614473, 0.86838848, 2.19895989, 0.79666952,
 1.47848784, 0.39280584, 0.46888408, -0.34809148, -1.8093045 ]))

We now have two 1d arrays with shape (10,) (that's a 1 element tuple). We can then plot them with:

In [53]: plt.scatter(x,y)

I could just as well used

x = np.arange(10); y = np.random.randn(10)

to make two 1d arrays.

The dimensions of the aa array have nothing to do with the axes of a scatter plot.

I could select a 'row' of aa, but will only get a (2,) shape array. That can't be plotted against a (10,) array:

In [53]: aa[0,:]
Out[53]: array([-0.26769106, 0.09882999])

As for meaning of dimensions in sum/mean, why not experiement?

Sum all values:

In [54]: aa.sum()
Out[54]: 2.2598841819604134

sum down the columns, resulting in one value per column:

In [55]: aa.sum(axis=0)
Out[55]: array([-0.49960074, 2.75948492])

It can help to keepdims, producing a (1,2) array:

In [56]: aa.sum(axis=0, keepdims=True)
Out[56]: array([[-0.49960074, 2.75948492]])

or a (10,1) array:

In [57]: aa.sum(axis=1, keepdims=True)
Out[57]: 
array([[-0.16886107],
 [-2.94669614],
 [ 2.101517 ],
 [ 3.45934969],
 [-0.87271024],
 [ 1.32252115],
 [ 2.14245486],
 [-0.62936062],
 [ 0.49587083],
 [-2.64420128]])

There's some ambiguity when talking about summing along rows or columns when dealing with 2d arrays. It becomes clearer when we apply sum to 1d arrays (sum the only one), or 3d.

For example, note which dimension is missing when I do:

In [58]: np.arange(24).reshape(2,3,4).sum(axis=1).shape
Out[58]: (2, 4)

or

In [59]: np.arange(24).reshape(2,3,4).sum(axis=2)
Out[59]: 
array([[ 6, 22, 38],
 [54, 70, 86]])

Again - dimensions of numpy arrays are abstract things. An array can have 0, 1, 2 or more (up to 32) dimensions. Most of linear algebra deals with 2d arrays, matrices and "vectors". You can do LA with numpy, but numpy is used for much more.

edit

You could think of your aa as 10 2-element points. Then aa[:,0] are all the x coordinates. A mean with axis=0 would be the "center of mass" of those points.

In [60]: np.mean(aa, axis=0)
Out[60]: array([-0.04996007, 0.27594849])

Mean on axis=1 may not make sense, though you could calculate the norm of the points (sqrt(x^2+y^2)), or the length of the vectors represented by the points.

In [61]: np.linalg.norm(aa, axis=1)
Out[61]: 
array([0.28535218, 2.08727523, 1.50821235, 2.53456249, 1.84973271,
 1.48669159, 1.79320052, 1.19414978, 0.91292938, 1.99264533])

For direction of these points I'd use:

np.arctan2(aa[:,0], aa[:,1])

(or maybe switch the 0 and 1).

answered Dec 23, 2022 at 23:30

13 Comments

thanks for the prompt response, I understand this but I want to be able to find the relation between the dimensions and axes and the operations made among them. but which type of problems suit taking the sum across the 1st dim(axis=0)/2nd(axis=1), 3rd(axis=-1) ex, calculating the distances between points, it's always calculated across the second axis, why? when should I calculate across the rows, (axis=0)? I'd really appreciate your answer
What are you calling points? Are you think of the aa array as 10 points, with each point represented as 2 values? What the "dimensions" mean is in your head, not in numpy. What if you made a (2,10) array, and plotted bb[0,:] against bb[1,:]? Either can represent 10 2-element points. 3 dimensions could represent something else, for example a (400,600,3) array might be a 400x600 color image.
if we take the scatter plot of bb[0,:] and bb[1,:] we would have plotted the points projected on the x axis
I think of aa as 10 points with coordinates on 2 axes (dimensions: x, y) if we are in a 2d array but what are the types of problems that require us to calculate the distances between points on the second axis and not on the first axis? when I have something like this (400,600,3) or with batches (6,400,600,3) what are the reasons that make me average across the first axis/2nd/3rd? like is it a rule that you always take the mean across the first dim (axis=0) a rule to take differences.sum(axis=1) between points across the first dim?
What operation makes sense for a particular array axis depends on what meaning YOU assigned to the axis. numpy by itself doesn't determine that. I added to my answer a norm taken on axis 1. atan2 could be used to get the angle, direction, of those 10 'points'.
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.