2d array, consists of 2 axes, axis=0 which represents the rows and the axis=1 represents the columns
aa = np.random.randn(10, 2) # Here is 2d array, first axis has 10 rows and second axis has 2 columns
array([[ 0.6999521 , -0.17597954],
[ 1.70622947, -0.85919459],
[-0.90019284, 0.80774052],
[-1.42953238, 0.19727917],
[-0.03416532, 0.49584749],
[-0.28981586, -0.77484498],
[-1.31129122, 0.423833 ],
[-0.43920016, -1.93541758],
[-0.06667634, 2.09925218],
[ 1.24633485, -0.04153847]])
why when I want to scatter the points I only consider the first column and the second column dimension from axis=1? do dimensions mean columns when plotting and at other times they mean axes? can you please explain more the reasons to do it like this? and if there are good references I could benefit myself on dimensions relating to this
plt.scatter(x[:,0], x[:,1]) # this also means dimensions or columns?
x[:,0], x[:,1] why not do x[0,:], x[:,1}
2 Answers 2
It can be difficult to visualize this, especially in multiple dimensions.
The parameters to the [] operator represent the dimensions. Your first dimension is the rows. The first row is array[0]. Your second dimension is the columns. The entire second column is called array[:,1] -- the ":" is a numpy notation that means "take all of this dimension". array[2,1] refers to the second column in the third row.
plt.scatter expects the x coordinate values as its first parameter, and the y coordinate values as its second parameter. plt.scatter(x[:,0], x[:,1]) means "take all of column 0" and "take all of column 1", which is the way scatter wants them.
5 Comments
numpy arrays are more grneral than linear algebra matrices and vectors (or more abstract). Don't try to find 'meaning' else where.sum(aa,axis=1) would produce two elements with sums of the columns. sum(aa,axis=0) would produce 10 elements, when the sum of the coordinates for each point. Playing with it is probably the best way to learn.With this randn call you make a 2d array with the specified shape. The dimensions, 10 and 2, don't represent anything - that's an abstract (10,2) array. Meaning comes from how you use it.
In [50]: aa = np.random.randn(10, 2)
In [51]: aa
Out[51]:
array([[-0.26769106, 0.09882999],
[-1.5605514 , -1.38614473],
[ 1.23312852, 0.86838848],
[ 1.2603898 , 2.19895989],
[-1.66937976, 0.79666952],
[-0.15596669, 1.47848784],
[ 1.74964902, 0.39280584],
[-1.0982447 , 0.46888408],
[ 0.84396231, -0.34809148],
[-0.83489678, -1.8093045 ]])
That's a display - with rows and columns.
Rather than pass the slices directly to scatter lets assign them to variables:
In [52]: x = aa[:,0]; y = aa[:,1]; x,y
Out[52]:
(array([-0.26769106, -1.5605514 , 1.23312852, 1.2603898 , -1.66937976,
-0.15596669, 1.74964902, -1.0982447 , 0.84396231, -0.83489678]),
array([ 0.09882999, -1.38614473, 0.86838848, 2.19895989, 0.79666952,
1.47848784, 0.39280584, 0.46888408, -0.34809148, -1.8093045 ]))
We now have two 1d arrays with shape (10,) (that's a 1 element tuple). We can then plot them with:
In [53]: plt.scatter(x,y)
I could just as well used
x = np.arange(10); y = np.random.randn(10)
to make two 1d arrays.
The dimensions of the aa array have nothing to do with the axes of a scatter plot.
I could select a 'row' of aa, but will only get a (2,) shape array. That can't be plotted against a (10,) array:
In [53]: aa[0,:]
Out[53]: array([-0.26769106, 0.09882999])
As for meaning of dimensions in sum/mean, why not experiement?
Sum all values:
In [54]: aa.sum()
Out[54]: 2.2598841819604134
sum down the columns, resulting in one value per column:
In [55]: aa.sum(axis=0)
Out[55]: array([-0.49960074, 2.75948492])
It can help to keepdims, producing a (1,2) array:
In [56]: aa.sum(axis=0, keepdims=True)
Out[56]: array([[-0.49960074, 2.75948492]])
or a (10,1) array:
In [57]: aa.sum(axis=1, keepdims=True)
Out[57]:
array([[-0.16886107],
[-2.94669614],
[ 2.101517 ],
[ 3.45934969],
[-0.87271024],
[ 1.32252115],
[ 2.14245486],
[-0.62936062],
[ 0.49587083],
[-2.64420128]])
There's some ambiguity when talking about summing along rows or columns when dealing with 2d arrays. It becomes clearer when we apply sum to 1d arrays (sum the only one), or 3d.
For example, note which dimension is missing when I do:
In [58]: np.arange(24).reshape(2,3,4).sum(axis=1).shape
Out[58]: (2, 4)
or
In [59]: np.arange(24).reshape(2,3,4).sum(axis=2)
Out[59]:
array([[ 6, 22, 38],
[54, 70, 86]])
Again - dimensions of numpy arrays are abstract things. An array can have 0, 1, 2 or more (up to 32) dimensions. Most of linear algebra deals with 2d arrays, matrices and "vectors". You can do LA with numpy, but numpy is used for much more.
edit
You could think of your aa as 10 2-element points. Then aa[:,0] are all the x coordinates. A mean with axis=0 would be the "center of mass" of those points.
In [60]: np.mean(aa, axis=0)
Out[60]: array([-0.04996007, 0.27594849])
Mean on axis=1 may not make sense, though you could calculate the norm of the points (sqrt(x^2+y^2)), or the length of the vectors represented by the points.
In [61]: np.linalg.norm(aa, axis=1)
Out[61]:
array([0.28535218, 2.08727523, 1.50821235, 2.53456249, 1.84973271,
1.48669159, 1.79320052, 1.19414978, 0.91292938, 1.99264533])
For direction of these points I'd use:
np.arctan2(aa[:,0], aa[:,1])
(or maybe switch the 0 and 1).
13 Comments
aa array as 10 points, with each point represented as 2 values? What the "dimensions" mean is in your head, not in numpy. What if you made a (2,10) array, and plotted bb[0,:] against bb[1,:]? Either can represent 10 2-element points. 3 dimensions could represent something else, for example a (400,600,3) array might be a 400x600 color image.numpy by itself doesn't determine that. I added to my answer a norm taken on axis 1. atan2 could be used to get the angle, direction, of those 10 'points'.
x[:,0]is a 1d array. For scatter it doesn't matter whether the 1d array is made diectly withnp.array([1,2,3])or indiectly from the columns or rows of the 2d array.x[0,:]has 2 elements, the 1st 'row'.x[:,1]has 10. You can't 'scatter' 10 against 2.scatterwants 2 1d arrays that match in length.