lack of knowledge what do dimesions really represent

Question 1

2d array, consists of 2 axes, axis=0 which represents the rows and the axis=1 represents the columns

aa = np.random.randn(10, 2) # Here is 2d array, first axis has 10 rows and second axis has 2 columns
array([[ 0.6999521 , -0.17597954],
 [ 1.70622947, -0.85919459],
 [-0.90019284, 0.80774052],
 [-1.42953238, 0.19727917],
 [-0.03416532, 0.49584749],
 [-0.28981586, -0.77484498],
 [-1.31129122, 0.423833 ],
 [-0.43920016, -1.93541758],
 [-0.06667634, 2.09925218],
 [ 1.24633485, -0.04153847]])

why when I want to scatter the points I only consider the first column and the second column dimension from axis=1? do dimensions mean columns when plotting and at other times they mean axes? can you please explain more the reasons to do it like this? and if there are good references I could benefit myself on dimensions relating to this

plt.scatter(x[:,0], x[:,1]) # this also means dimensions or columns?
x[:,0], x[:,1] why not do x[0,:], x[:,1}

Question 2

x[:,0] is a 1d array. For scatter it doesn't matter whether the 1d array is made diectly with np.array([1,2,3]) or indiectly from the columns or rows of the 2d array.

Question 3

x[0,:] has 2 elements, the 1st 'row'. x[:,1] has 10. You can't 'scatter' 10 against 2. scatter wants 2 1d arrays that match in length.

Question 4

It can be difficult to visualize this, especially in multiple dimensions.

The parameters to the [] operator represent the dimensions. Your first dimension is the rows. The first row is array[0]. Your second dimension is the columns. The entire second column is called array[:,1] -- the ":" is a numpy notation that means "take all of this dimension". array[2,1] refers to the second column in the third row.

plt.scatter expects the x coordinate values as its first parameter, and the y coordinate values as its second parameter. plt.scatter(x[:,0], x[:,1]) means "take all of column 0" and "take all of column 1", which is the way scatter wants them.

Question 5

yes you didn't answer my question, why take the columns x[:,0], x[:,1] and not take rows and columns which represent the 2 axes/dims of the 2d array?

Question 6

Because those are the arguments that plt.scatter expects. It wants two arrays, where each element in the first array matches the corresponding element in the second array. It could have been written differently, but it wasn't.

Question 7

aha, where can I learn more about this? I can't find this in linear algebra, are there any good references that explain dimensionality? because I also have problems understanding which axis to choose to do mean/sum!!

Question 8

numpy arrays are more grneral than linear algebra matrices and vectors (or more abstract). Don't try to find 'meaning' else where.

Question 9

Axis 0 is dimension 0 -- the rows in your case. Axis 1 is dimension 1 -- the columns in your case. sum(aa,axis=1) would produce two elements with sums of the columns. sum(aa,axis=0) would produce 10 elements, when the sum of the coordinates for each point. Playing with it is probably the best way to learn.

Question 10

With this randn call you make a 2d array with the specified shape. The dimensions, 10 and 2, don't represent anything - that's an abstract (10,2) array. Meaning comes from how you use it.

In [50]: aa = np.random.randn(10, 2)
In [51]: aa
Out[51]: 
array([[-0.26769106, 0.09882999],
 [-1.5605514 , -1.38614473],
 [ 1.23312852, 0.86838848],
 [ 1.2603898 , 2.19895989],
 [-1.66937976, 0.79666952],
 [-0.15596669, 1.47848784],
 [ 1.74964902, 0.39280584],
 [-1.0982447 , 0.46888408],
 [ 0.84396231, -0.34809148],
 [-0.83489678, -1.8093045 ]])

That's a display - with rows and columns.

Rather than pass the slices directly to scatter lets assign them to variables:

In [52]: x = aa[:,0]; y = aa[:,1]; x,y
Out[52]: 
(array([-0.26769106, -1.5605514 , 1.23312852, 1.2603898 , -1.66937976,
 -0.15596669, 1.74964902, -1.0982447 , 0.84396231, -0.83489678]),
 array([ 0.09882999, -1.38614473, 0.86838848, 2.19895989, 0.79666952,
 1.47848784, 0.39280584, 0.46888408, -0.34809148, -1.8093045 ]))

We now have two 1d arrays with shape (10,) (that's a 1 element tuple). We can then plot them with:

In [53]: plt.scatter(x,y)

I could just as well used

x = np.arange(10); y = np.random.randn(10)

to make two 1d arrays.

The dimensions of the aa array have nothing to do with the axes of a scatter plot.

I could select a 'row' of aa, but will only get a (2,) shape array. That can't be plotted against a (10,) array:

In [53]: aa[0,:]
Out[53]: array([-0.26769106, 0.09882999])

As for meaning of dimensions in sum/mean, why not experiement?

Sum all values:

In [54]: aa.sum()
Out[54]: 2.2598841819604134

sum down the columns, resulting in one value per column:

In [55]: aa.sum(axis=0)
Out[55]: array([-0.49960074, 2.75948492])

It can help to keepdims, producing a (1,2) array:

In [56]: aa.sum(axis=0, keepdims=True)
Out[56]: array([[-0.49960074, 2.75948492]])

or a (10,1) array:

In [57]: aa.sum(axis=1, keepdims=True)
Out[57]: 
array([[-0.16886107],
 [-2.94669614],
 [ 2.101517 ],
 [ 3.45934969],
 [-0.87271024],
 [ 1.32252115],
 [ 2.14245486],
 [-0.62936062],
 [ 0.49587083],
 [-2.64420128]])

There's some ambiguity when talking about summing along rows or columns when dealing with 2d arrays. It becomes clearer when we apply sum to 1d arrays (sum the only one), or 3d.

For example, note which dimension is missing when I do:

In [58]: np.arange(24).reshape(2,3,4).sum(axis=1).shape
Out[58]: (2, 4)

or

In [59]: np.arange(24).reshape(2,3,4).sum(axis=2)
Out[59]: 
array([[ 6, 22, 38],
 [54, 70, 86]])

Again - dimensions of numpy arrays are abstract things. An array can have 0, 1, 2 or more (up to 32) dimensions. Most of linear algebra deals with 2d arrays, matrices and "vectors". You can do LA with numpy, but numpy is used for much more.

edit

You could think of your aa as 10 2-element points. Then aa[:,0] are all the x coordinates. A mean with axis=0 would be the "center of mass" of those points.

In [60]: np.mean(aa, axis=0)
Out[60]: array([-0.04996007, 0.27594849])

Mean on axis=1 may not make sense, though you could calculate the norm of the points (sqrt(x^2+y^2)), or the length of the vectors represented by the points.

In [61]: np.linalg.norm(aa, axis=1)
Out[61]: 
array([0.28535218, 2.08727523, 1.50821235, 2.53456249, 1.84973271,
 1.48669159, 1.79320052, 1.19414978, 0.91292938, 1.99264533])

For direction of these points I'd use:

np.arctan2(aa[:,0], aa[:,1])

(or maybe switch the 0 and 1).

Question 11

thanks for the prompt response, I understand this but I want to be able to find the relation between the dimensions and axes and the operations made among them. but which type of problems suit taking the sum across the 1st dim(axis=0)/2nd(axis=1), 3rd(axis=-1) ex, calculating the distances between points, it's always calculated across the second axis, why? when should I calculate across the rows, (axis=0)? I'd really appreciate your answer

Question 12

What are you calling points? Are you think of the aa array as 10 points, with each point represented as 2 values? What the "dimensions" mean is in your head, not in numpy. What if you made a (2,10) array, and plotted bb[0,:] against bb[1,:]? Either can represent 10 2-element points. 3 dimensions could represent something else, for example a (400,600,3) array might be a 400x600 color image.

Question 13

if we take the scatter plot of bb[0,:] and bb[1,:] we would have plotted the points projected on the x axis

Question 14

I think of aa as 10 points with coordinates on 2 axes (dimensions: x, y) if we are in a 2d array but what are the types of problems that require us to calculate the distances between points on the second axis and not on the first axis? when I have something like this (400,600,3) or with batches (6,400,600,3) what are the reasons that make me average across the first axis/2nd/3rd? like is it a rule that you always take the mean across the first dim (axis=0) a rule to take differences.sum(axis=1) between points across the first dim?

Question 15

What operation makes sense for a particular array axis depends on what meaning YOU assigned to the axis. numpy by itself doesn't determine that. I added to my answer a norm taken on axis 1. atan2 could be used to get the angle, direction, of those 10 'points'.

Tim Roberts 55.3k4 gold badges29 silver badges41 bronze badges · Answer 1 · 2022-12-23 19:43:02Z

0

It can be difficult to visualize this, especially in multiple dimensions.

The parameters to the [] operator represent the dimensions. Your first dimension is the rows. The first row is array[0]. Your second dimension is the columns. The entire second column is called array[:,1] -- the ":" is a numpy notation that means "take all of this dimension". array[2,1] refers to the second column in the third row.

plt.scatter expects the x coordinate values as its first parameter, and the y coordinate values as its second parameter. plt.scatter(x[:,0], x[:,1]) means "take all of column 0" and "take all of column 1", which is the way scatter wants them.

Share

Improve this answer

edited Dec 23, 2022 at 20:01

answered Dec 23, 2022 at 19:43

Tim Roberts's user avatar

Tim Roberts

55.3k4 gold badges29 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

user4556432

user4556432 Over a year ago

yes you didn't answer my question, why take the columns x[:,0], x[:,1] and not take rows and columns which represent the 2 axes/dims of the 2d array?

2022年12月23日T19:52:31.193Z+00:00

Frank Yellin

Frank Yellin Over a year ago

Because those are the arguments that plt.scatter expects. It wants two arrays, where each element in the first array matches the corresponding element in the second array. It could have been written differently, but it wasn't.

2022年12月23日T20:00:03.153Z+00:00

user4556432

user4556432 Over a year ago

aha, where can I learn more about this? I can't find this in linear algebra, are there any good references that explain dimensionality? because I also have problems understanding which axis to choose to do mean/sum!!

2022年12月23日T20:18:42.577Z+00:00

hpaulj

hpaulj Over a year ago

numpy arrays are more grneral than linear algebra matrices and vectors (or more abstract). Don't try to find 'meaning' else where.

2022年12月23日T22:55:38.843Z+00:00

Tim Roberts

Tim Roberts Over a year ago

Axis 0 is dimension 0 -- the rows in your case. Axis 1 is dimension 1 -- the columns in your case. sum(aa,axis=1) would produce two elements with sums of the columns. sum(aa,axis=0) would produce 10 elements, when the sum of the coordinates for each point. Playing with it is probably the best way to learn.

2022年12月23日T23:24:26.847Z+00:00

hpaulj 233k14 gold badges260 silver badges392 bronze badges · Answer 2 · 2022-12-23 23:30:15Z

With this randn call you make a 2d array with the specified shape. The dimensions, 10 and 2, don't represent anything - that's an abstract (10,2) array. Meaning comes from how you use it.

In [50]: aa = np.random.randn(10, 2)
In [51]: aa
Out[51]: 
array([[-0.26769106, 0.09882999],
 [-1.5605514 , -1.38614473],
 [ 1.23312852, 0.86838848],
 [ 1.2603898 , 2.19895989],
 [-1.66937976, 0.79666952],
 [-0.15596669, 1.47848784],
 [ 1.74964902, 0.39280584],
 [-1.0982447 , 0.46888408],
 [ 0.84396231, -0.34809148],
 [-0.83489678, -1.8093045 ]])

That's a display - with rows and columns.

Rather than pass the slices directly to scatter lets assign them to variables:

In [52]: x = aa[:,0]; y = aa[:,1]; x,y
Out[52]: 
(array([-0.26769106, -1.5605514 , 1.23312852, 1.2603898 , -1.66937976,
 -0.15596669, 1.74964902, -1.0982447 , 0.84396231, -0.83489678]),
 array([ 0.09882999, -1.38614473, 0.86838848, 2.19895989, 0.79666952,
 1.47848784, 0.39280584, 0.46888408, -0.34809148, -1.8093045 ]))

We now have two 1d arrays with shape (10,) (that's a 1 element tuple). We can then plot them with:

In [53]: plt.scatter(x,y)

I could just as well used

x = np.arange(10); y = np.random.randn(10)

to make two 1d arrays.

The dimensions of the aa array have nothing to do with the axes of a scatter plot.

I could select a 'row' of aa, but will only get a (2,) shape array. That can't be plotted against a (10,) array:

In [53]: aa[0,:]
Out[53]: array([-0.26769106, 0.09882999])

As for meaning of dimensions in sum/mean, why not experiement?

Sum all values:

In [54]: aa.sum()
Out[54]: 2.2598841819604134

sum down the columns, resulting in one value per column:

In [55]: aa.sum(axis=0)
Out[55]: array([-0.49960074, 2.75948492])

It can help to keepdims, producing a (1,2) array:

In [56]: aa.sum(axis=0, keepdims=True)
Out[56]: array([[-0.49960074, 2.75948492]])

or a (10,1) array:

In [57]: aa.sum(axis=1, keepdims=True)
Out[57]: 
array([[-0.16886107],
 [-2.94669614],
 [ 2.101517 ],
 [ 3.45934969],
 [-0.87271024],
 [ 1.32252115],
 [ 2.14245486],
 [-0.62936062],
 [ 0.49587083],
 [-2.64420128]])

There's some ambiguity when talking about summing along rows or columns when dealing with 2d arrays. It becomes clearer when we apply sum to 1d arrays (sum the only one), or 3d.

For example, note which dimension is missing when I do:

In [58]: np.arange(24).reshape(2,3,4).sum(axis=1).shape
Out[58]: (2, 4)

or

In [59]: np.arange(24).reshape(2,3,4).sum(axis=2)
Out[59]: 
array([[ 6, 22, 38],
 [54, 70, 86]])

Again - dimensions of numpy arrays are abstract things. An array can have 0, 1, 2 or more (up to 32) dimensions. Most of linear algebra deals with 2d arrays, matrices and "vectors". You can do LA with numpy, but numpy is used for much more.

edit

You could think of your aa as 10 2-element points. Then aa[:,0] are all the x coordinates. A mean with axis=0 would be the "center of mass" of those points.

In [60]: np.mean(aa, axis=0)
Out[60]: array([-0.04996007, 0.27594849])

Mean on axis=1 may not make sense, though you could calculate the norm of the points (sqrt(x^2+y^2)), or the length of the vectors represented by the points.

In [61]: np.linalg.norm(aa, axis=1)
Out[61]: 
array([0.28535218, 2.08727523, 1.50821235, 2.53456249, 1.84973271,
 1.48669159, 1.79320052, 1.19414978, 0.91292938, 1.99264533])

For direction of these points I'd use:

np.arctan2(aa[:,0], aa[:,1])

(or maybe switch the 0 and 1).

thanks for the prompt response, I understand this but I want to be able to find the relation between the dimensions and axes and the operations made among them. but which type of problems suit taking the sum across the 1st dim(axis=0)/2nd(axis=1), 3rd(axis=-1) ex, calculating the distances between points, it's always calculated across the second axis, why? when should I calculate across the rows, (axis=0)? I'd really appreciate your answer
What are you calling points? Are you think of the aa array as 10 points, with each point represented as 2 values? What the "dimensions" mean is in your head, not in numpy. What if you made a (2,10) array, and plotted bb[0,:] against bb[1,:]? Either can represent 10 2-element points. 3 dimensions could represent something else, for example a (400,600,3) array might be a 400x600 color image.
if we take the scatter plot of bb[0,:] and bb[1,:] we would have plotted the points projected on the x axis
I think of aa as 10 points with coordinates on 2 axes (dimensions: x, y) if we are in a 2d array but what are the types of problems that require us to calculate the distances between points on the second axis and not on the first axis? when I have something like this (400,600,3) or with batches (6,400,600,3) what are the reasons that make me average across the first axis/2nd/3rd? like is it a rule that you always take the mean across the first dim (axis=0) a rule to take differences.sum(axis=1) between points across the first dim?
What operation makes sense for a particular array axis depends on what meaning YOU assigned to the axis. numpy by itself doesn't determine that. I added to my answer a norm taken on axis 1. atan2 could be used to get the angle, direction, of those 10 'points'.

CollectivesTM on Stack Overflow

lack of knowledge what do dimesions really represent

2 Answers 2

5 Comments

edit

13 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

5 Comments

edit

13 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related