Standardize the Samples (Compute the z-Score)

Question 1

Given a list of floating point numbers, standardize it.

Details

A list \$x_1,x_2,\ldots,x_n\$ is standardized if the mean of all values is 0, and the standard deviation is 1. One way to compute this is by first computing the mean \$\mu\$ and the standard deviation \$\sigma\$ as $$ \mu = \frac1n\sum_{i=1}^n x_i \qquad \sigma = \sqrt{\frac{1}{n}\sum_{i=1}^n (x_i -\mu)^2} ,$$ and then computing the standardization by replacing every \$x_i\$ with \$\frac{x_i-\mu}{\sigma}\$.
You can assume that the input contains at least two distinct entries (which implies \$\sigma \neq 0\$).
Note that some implementations use the sample standard deviation, which is not equal to the population standard deviation \$\sigma\$ we are using here.
There is a CW answer for all trivial solutions.

Examples

[1,2,3] -> [-1.224744871391589,0.0,1.224744871391589]
[1,2] -> [-1,1]
[-3,1,4,1,5] -> [-1.6428571428571428,-0.21428571428571433,0.8571428571428572,-0.21428571428571433,1.2142857142857144]

(These examples have been generated with this script.)

Question 2

R, (削除) 51 (削除ここまで) (削除) 45 (削除ここまで) (削除) 38 (削除ここまで) 37 bytes

Thanks to Giuseppe and J.Doe!

function(x)scale(x)/(1-1/sum(x|1))^.5

Try it online!

Question 3

Beat me by 2 bytes and 1 minute

Question 4

CW for all trivial entries

Python 3 + scipy, 31 bytes

from scipy.stats import*
zscore

Try it online!

Octave / MATLAB, 15 bytes

@(x)zscore(x,1)

Try it online!

Question 5

APL (Dyalog Classic), (削除) 21 (削除ここまで) (削除) 20 (削除ここまで) 19 bytes

(×ばつ≢)×ばつ≢

Try it online!

⊢÷⌹ is sum of squares

×ばつ≢ is sum of squares divided by length

Question 6

Wow. I shouldn't be surprised anymore, but I am every time

Question 7

MATL, 10 bytes

tYm-t&1Zs/

Try it online!

Explanation

t % Implicit input
 % Duplicate
Ym % Mean
- % Subtract, element-wise
t % Duplicate
&1Zs % Standard deviation using normalization by n
/ % Divide, element-wise
 % Implicit display

Question 8

APL+WIN, (削除) 41,32 (削除ここまで) 30 bytes

9 bytes saved thanks to Erik + 2 more thanks to ngn

x←v-(+/v)÷⍴v←⎕⋄x÷(×ばつx÷⍴v)*.5

Prompts for vector of numbers and calculates mean standard deviation and standardised elements of input vector

Question 9

Can't you assign x←v-(+/v)÷⍴v←⎕ and then do x÷((+/x*2)÷⍴v)*.5?

Question 10

I can indeed. Thanks.

Question 11

does apl+win do singleton extension (1 2 3+,4 ←→ 1 2 3+4)? if yes, you could rewrite (+/x*2)÷⍴v as +/x×x÷⍴v

Question 12

@ngn That works for another 2 bytes. Thanks.

Question 13

R + pryr, (削除) 53 (削除ここまで) 52 bytes

-1 byte using sum(x|1) instead of length(x) as seen in @Robert S.'s solution

pryr::f((x-(y<-mean(x)))/(sum((x-y)^2)/sum(x|1))^.5)

For being a language built for statisticians, I'm amazed that this doesn't have a built-in function. At least not one that I could find. Even the function mosaic::zscore doesn't yield the expected results. This is likely due to using the population standard deviation instead of sample standard deviation.

Try it online!

Question 14

You can change the <- into a = to save 1 byte.

Question 15

@J.Doe nope, I used the method I commented on Robert S.'s solution. scale is neat!

Question 16

@J.Doe since you only use n once you can use it directly for 38 bytes

Question 17

@RobertS. here on PPCG we tend to encourage allowing flexible input and output, including outputting more than is required, with the exception of challenges where the precise layout of the output is the whole point of the challenge.

Question 18

Of course R built-ins wouldn't use "population variance". Only confused engineers would use such a thing (hencethe Python and Matlab answers ;))

Question 19

Tcl, 115 bytes

proc S L {lmap c $L {expr ($c-[set m ([join $L +])/[set n [llength $L]].])/sqrt((([join $L -$m)**2+(]-$m)**2)/$n)}}

Try it online!

Question 20

Jelly, 10 bytes

_×ばつ

Try it online!

It's not any shorter, but Jelly's determinant function ÆḊ also calculates vector norm.

_Æm x - mean(x)
 μ then:
 L1⁄2 Square root of the Length
 ÷ÆḊ divided by the norm
 ×ばつ Multiply by that value

Question 21

Hey, nice alternative! Unfortunately, I can't see a way to shorten it.

Question 22

Mathematica, 25 bytes

Mean[(a=#-Mean@#)a]^-.5a&

Pure function. Takes a list of numbers as input and returns a list of machine-precision numbers as output. Note that the built-in Standardize function uses the sample variance by default.

Question 23

J, 22 bytes

-1 byte thanks to Cows quack!

(-%[:%:1#.-*-%#@[)+/%#

Try it online!

J, (削除) 31 (削除ここまで) 23 bytes

(-%[:%:#@[%~1#.-*-)+/%#

Try it online!

 +/%# - mean (sum (+/) divided (%) by the number of samples (#)) 
( ) - the list is a left argument here (we have a hook)
 - - the difference between each sample and the mean
 * - multiplied by 
 - - the difference between each sample and the mean
 1#. - sum by base-1 conversion
 %~ - divided by
 #@[ - the length of the samples list
 %: - square root
 [: - convert to a fork (function composition) 
 - - subtract the mean from each sample
 % - and divide it by sigma

Question 24

Rearranging it gives 22 [:(%[:%:1#.*:%#)]-+/%# tio.run/##y/qfVmyrp2CgYKVg8D/…, I think one of those caps could be removed, but haven't had any luck so far, EDIT: a more direct byteshaving is (-%[:%:1#.-*-%#@[)+/%# also at 22

Question 25

@Cows quack Thanks!

Question 26

APL (Dyalog Unicode), (削除) 33 (削除ここまで) 29 bytes

{d÷.5*⍨l×ばつ⍨d←⍵-(+/⍵)÷l←≢⍵}

-4 bytes thanks to @ngn

Try it online!

Question 27

you could assign ⍵-m to a variable and remove m← like this: {d÷.5*⍨l÷⍨+/×⍨d←⍵-(+/⍵)÷l←≢⍵}

Question 28

@ngn Ah, nice, thanks, I didn't see that duplication somehow

Question 29

Haskell, (削除) 80 (削除ここまで) (削除) 75 (削除ここまで) 68 bytes

t x=k(/sqrt(f$sum$k(^2)))where k g=g.(-f(sum x)+)<$>x;f=(/sum(1<$x))

Thanks to @flawr for the suggestions to use sum(1<$x) instead of sum[1|_<-x] and to inline the mean, @xnor for inlining the standard deviation and other reductions.

Expanded:

-- Standardize a list of values of any floating-point type.
standardize :: Floating a => [a] -> [a]
standardize input = eachLessMean (/ sqrt (overLength (sum (eachLessMean (^2)))))
 where
 -- Map a function over each element of the input, less the mean.
 eachLessMean f = map (f . subtract (overLength (sum input))) input
 -- Divide a value by the length of the input.
 overLength n = n / sum (map (const 1) input)

Question 30

You can replace [1|_<-x] with (1<$x) to save a few bytes. That is a great trick for avoiding the fromIntegral, that I haven't seen so far!

Question 31

By the way: I like using tryitonline, you can run your code there and then copy the preformatted aswer for posting here!

Question 32

And you do not have to define m.

Question 33

You can write (-x+) for (+(-x)) to avoid parens. Also it looks like f can be pointfree: f=(/sum(1<$x)), and s can be replaced with its definition.

Question 34

@xnor Ooh, (-x+) is handy, I’m sure I’ll be using that in the future

Question 35

MathGolf, 7 bytes

▓-_2▓√/

Try it online!

Explanation

This is literally a byte-for-byte recreation of Kevin Cruijssen's 05AB1E answer, but I save some bytes from MathGolf having 1-byters for everything needed for this challenge. Also the answer looks quite good in my opinion!

▓ get average of list
 - pop a, b : push(a-b)
 _ duplicate TOS
 2 pop a : push(a*a)
 ▓ get average of list
 √ pop a : push(sqrt(a)), split string to list
 / pop a, b : push(a/b), split strings

Question 36

Jelly, 10 bytes

_Æm÷2Æm1⁄2Ɗ$

Try it online!

Question 37

JavaScript (ES7), (削除) 80 (削除ここまで) 79 bytes

a=>a.map(x=>(x-g(a))/g(a.map(x=>(x-m)**2))**.5,g=a=>m=eval(a.join`+`)/a.length)

Try it online!

Commented

a => // given the input array a[]
 a.map(x => // for each value x in a[]:
 (x - g(a)) / // compute (x - mean(a)) divided by
 g( // the standard deviation:
 a.map(x => // for each value x in a[]:
 (x - m) ** 2 // compute (x - mean(a))2
 ) // compute the mean of this array
 ) ** .5, // and take the square root
 g = a => // g = helper function taking an array a[],
 m = eval(a.join`+`) // computing the mean
 / a.length // and storing the result in m
 ) // end of outer map()

Question 38

Python 3 + numpy, 46 bytes

lambda a:(a-mean(a))/std(a)
from numpy import*

Try it online!

Question 39

Haskell, 59 bytes

(%)i=sum.map(^i)
f l=[(0%l*y-1%l)/sqrt(2%l*0%l-1%l^2)|y<-l]

Try it online!

Doesn't use libraries.

The helper function % computes the sum of ith powers of a list, which lets us get three useful values.

0%l is the length of l (call this n)
1%l is the sum of l (call this s)
2%l is the sum of squares of l (call this m)

We can express the z-score of an element y as

(n*y-s)/sqrt(n*v-s^2)

(This is the expression (y-s/n)/sqrt(v/n-(s/n)^2) simplified a bit by multiplying the top and bottom by n.)

We can insert the expressions 0%l, 1%l, 2%l without parens because the % we define has higher precedence than the arithmetic operators.

(%)i=sum.map(^i) is the same length as i%l=sum.map(^i)l. Making it more point-free doesn't help. Defining it like g i=... loses bytes when we call it. Although % works for any list but we only call it with the problem input list, there's no byte loss in calling it with argument l every time because a two-argument call i%l is no longer than a one-argument one g i.

Question 40

We do have \$\LaTeX\$ here:)

Question 41

I really like the % idea! It looks just like the discrete version of the statistical moments.

Question 42

K (oK), (削除) 33 (削除ここまで) 23 bytes

-10 bytes thanks to ngn!

{t%%(+/t*t:x-/x%#x)%#x}

Try it online!

First attempt at coding (I don't dare to name it "golfing") in K. I'm pretty sure it can be done much better (too many variable names here...)

Question 43

nice! you can replace the initial (x-m) with t (tio)

Question 44

the inner { } is unnecessary - its implicit parameter name is x and it has been passed an x as argument (tio)

Question 45

another -1 byte by replacing x-+/x with x-/x. the left argument to -/ serves as initial value for the reduction (tio)

Question 46

@ngn Thank you! Now I see that the first 2 golfs are obvious; the last one is beyond my current level :)

Question 47

MATLAB, 26 bytes

Trivial-ish, std(,1) for using population standard deviation

f=@(x)(x-mean(x))/std(x,1)

Question 48

TI-Basic (83 series), (削除) 14 (削除ここまで) 11 bytes

Ans-mean(Ans
Ans/√(mean(Ans2

Takes input in Ans. For example, if you type the above into prgmSTANDARD, then {1,2,3}:prgmSTANDARD will return {-1.224744871,0.0,1.224744871}.

Previously, I tried using the 1-Var Stats command, which stores the population standard deviation in σx, but it's less trouble to compute it manually.

Question 49

05AB1E, 9 bytes

ÅA-DnÅAt/

Port of @Arnauld's JavaScript answer, so make sure to upvote him!

Try it online or verify all test cases.

Explanation:

ÅA # Calculate the mean of the (implicit) input
 # i.e. [-3,1,4,1,5] → 1.6
 - # Subtract it from each value in the (implicit) input
 # i.e. [-3,1,4,1,5] and 1.6 → [-4.6,-0.6,2.4,-0.6,3.4]
 D # Duplicate that list
 n # Take the square of each
 # i.e. [-4.6,-0.6,2.4,-0.6,3.4] → [21.16,0.36,5.76,0.36,11.56]
 ÅA # Pop and calculate the mean of that list
 # i.e. [21.16,0.36,5.76,0.36,11.56] → 7.84
 t # Take the square-root of that
 # i.e. 7.84 → 2.8
 / # And divide each value in the duplicated list with it (and output implicitly)
 # i.e. [-4.6,-0.6,2.4,-0.6,3.4] and 2.8 → [-1.6428571428571428,
 # -0.21428571428571433,0.8571428571428572,-0.21428571428571433,1.2142857142857144]

Question 50

Pyth, (削除) 21 (削除ここまで) 19 bytes

[email protected]^-Jk2Q2

Try it online here.

[email protected]^-Jk2Q2Q Implicit: Q=eval(input())
 Trailing Q inferred
 J.OQ Take the average of Q, store the result in J
 m Q Map the elements of Q, as k, using:
 -Jk Difference between J and k
 ^ 2 Square it
 .O Find the average of the result of the map
 @ 2 Square root it
 - this is the standard deviation of Q
m Q Map elements of Q, as d, using:
 -dJ d - J
 c Float division by the standard deviation
 Implicit print result of map

Edit: after seeing Kevin's answer, changed to use the average builtin for the inner results. Previous answer: mc-dJ.OQ@csm^-Jk2QlQ2

Question 51

SNOBOL4 (CSNOBOL4), 229 bytes

	DEFINE('Z(A)')
Z	X =X + 1
	M =M + A<X>	:S(Z)
	N =X - 1.
	M =M / N
D	X =GT(X) X - 1	:F(S)
	A<X> =A<X> - M	:(D)
S	X =LT(X,N) X + 1	:F(Y)
	S =S + A<X> ^ 2 / N	:(S)
Y	S =S ^ 0.5
N	A<X> =A<X> / S
	X =GT(X) X - 1	:S(N)
	Z =A	:(RETURN)

Try it online!

Link is to a functional version of the code which constructs an array from STDIN given its length and then its elements, then runs the function Z on that, and finally prints out the values.

Defines a function Z which returns an array.

The 1. on line 4 is necessary to do the floating point arithmetic properly.

Question 52

Julia 0.7, 37 bytes

a->(a-mean(a))/std(a,corrected=false)

Try it online!

Question 53

Charcoal, (削除) 25 (削除ここまで) 19 bytes

≧−∕ΣθLθθI∕θ2∕ΣXθ2Lθ

Try it online! Link is to verbose version of code. Explanation:

 θ Input array
≧ Update each element
 − Subtract
 Σ Sum of
 θ Input array
 ∕ Divided by
 L Length of
 θ Input array

Calculate \$\mu\$ and vectorised subtract it from each \$x_i\$.

 θ Updated array
 ∕ Vectorised divided by
 2 Square root of
 Σ Sum of
 θ Updated array
 X Vectorised to power
 2 Literal 2
 ∕ Divided by
 L Length of
 θ Array
I Cast to string
 Implicitly print each element on its own line.

Calculate \$\sigma\$, vectorised divide each \$x_i\$ by it, and output the result.

Edit: Saved 6 bytes thanks to @ASCII-only for a) using SquareRoot() instead of Power(0.5) b) fixing vectorised Divide() (it was doing IntDivide() instead) c) making Power() vectorise.

Question 54

crossed out 25 = no bytes? :P (Also, you haven't updated the TIO link yet)

Question 55

@ASCII-only Oops, thanks!

Question 56

Factor, 34 bytes

[ dup demean swap 0 std-ddof v/n ]

Try it online!

Sadly, while Factor has the z-score word, it uses the sample standard deviation instead of the population standard deviation.

Question 57

CASIO BASIC (CASIO fx-9750GIII), 20 bytes

?→List1
1-Variable List1
(List1-x̄)÷σx

builtins

Question 58

APL(NARS), 26 chars

{m←÷≢⍵⋄d×ばつ⍨d×ばつ+/⍵}

test:

 f←{m←÷≢⍵⋄d×ばつ⍨d×ばつ+/⍵}
 f 1 2
 ̄1 1 
 f 1 2 3
 ̄1.224744871 0 1.224744871 
 f ̄3 1 4 1 5
 ̄1.642857143 ̄0.2142857143 0.8571428571 ̄0.2142857143 1.214285714

Robert S. 1,32311 silver badges17 bronze badges · Accepted Answer · 2018-12-14 19:34:44Z

8

\$\begingroup\$

R, (削除) 51 (削除ここまで) (削除) 45 (削除ここまで) (削除) 38 (削除ここまで) 37 bytes

Thanks to Giuseppe and J.Doe!

function(x)scale(x)/(1-1/sum(x|1))^.5

Try it online!

Share

Improve this answer

edited Dec 14, 2018 at 22:10

answered Dec 14, 2018 at 19:34

Robert S.'s user avatar

Robert S.

1,32311 silver badges17 bronze badges

\$\endgroup\$

1

\$\begingroup\$ Beat me by 2 bytes and 1 minute \$\endgroup\$

Sumner18
– Sumner18

2018年12月14日 19:38:00 +00:00
Commented Dec 14, 2018 at 19:38

Add a comment |

Standardize the Samples (Compute the z-Score)

Details

Examples

28 Answers 28

R, (削除) 51 (削除ここまで) (削除) 45 (削除ここまで) (削除) 38 (削除ここまで) 37 bytes

CW for all trivial entries

Python 3 + scipy, 31 bytes

Octave / MATLAB, 15 bytes

APL (Dyalog Classic), (削除) 21 (削除ここまで) (削除) 20 (削除ここまで) 19 bytes

MATL, 10 bytes

Explanation

APL+WIN, (削除) 41,32 (削除ここまで) 30 bytes

R + pryr, (削除) 53 (削除ここまで) 52 bytes

Tcl, 115 bytes

Jelly, 10 bytes

Mathematica, 25 bytes

J, 22 bytes

J, (削除) 31 (削除ここまで) 23 bytes

APL (Dyalog Unicode), (削除) 33 (削除ここまで) 29 bytes

Haskell, (削除) 80 (削除ここまで) (削除) 75 (削除ここまで) 68 bytes

MathGolf, 7 bytes

Explanation

Jelly, 10 bytes

JavaScript (ES7), (削除) 80 (削除ここまで) 79 bytes

Commented

Python 3 + numpy, 46 bytes

Haskell, 59 bytes

K (oK), (削除) 33 (削除ここまで) 23 bytes

MATLAB, 26 bytes

TI-Basic (83 series), (削除) 14 (削除ここまで) 11 bytes

05AB1E, 9 bytes

Pyth, (削除) 21 (削除ここまで) 19 bytes

SNOBOL4 (CSNOBOL4), 229 bytes

Julia 0.7, 37 bytes

Charcoal, (削除) 25 (削除ここまで) 19 bytes

Factor, 34 bytes

CASIO BASIC (CASIO fx-9750GIII), 20 bytes

APL(NARS), 26 chars

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions