Given a list of floating point numbers, standardize it.
Details
- A list \$x_1,x_2,\ldots,x_n\$ is standardized if the mean of all values is 0, and the standard deviation is 1. One way to compute this is by first computing the mean \$\mu\$ and the standard deviation \$\sigma\$ as $$ \mu = \frac1n\sum_{i=1}^n x_i \qquad \sigma = \sqrt{\frac{1}{n}\sum_{i=1}^n (x_i -\mu)^2} ,$$ and then computing the standardization by replacing every \$x_i\$ with \$\frac{x_i-\mu}{\sigma}\$.
- You can assume that the input contains at least two distinct entries (which implies \$\sigma \neq 0\$).
- Note that some implementations use the sample standard deviation, which is not equal to the population standard deviation \$\sigma\$ we are using here.
- There is a CW answer for all trivial solutions.
Examples
[1,2,3] -> [-1.224744871391589,0.0,1.224744871391589]
[1,2] -> [-1,1]
[-3,1,4,1,5] -> [-1.6428571428571428,-0.21428571428571433,0.8571428571428572,-0.21428571428571433,1.2142857142857144]
(These examples have been generated with this script.)
28 Answers 28
R, (削除) 51 (削除ここまで) (削除) 45 (削除ここまで) (削除) 38 (削除ここまで) 37 bytes
Thanks to Giuseppe and J.Doe!
function(x)scale(x)/(1-1/sum(x|1))^.5
-
\$\begingroup\$ Beat me by 2 bytes and 1 minute \$\endgroup\$Sumner18– Sumner182018年12月14日 19:38:00 +00:00Commented Dec 14, 2018 at 19:38
CW for all trivial entries
Python 3 + scipy, 31 bytes
from scipy.stats import*
zscore
Octave / MATLAB, 15 bytes
@(x)zscore(x,1)
APL (Dyalog Classic), (削除) 21 (削除ここまで) (削除) 20 (削除ここまで) 19 bytes
(×ばつ≢)×ばつ≢
⊢÷⌹ is sum of squares
×ばつ≢ is sum of squares divided by length
-
\$\begingroup\$ Wow. I shouldn't be surprised anymore, but I am every time \$\endgroup\$Quintec– Quintec2018年12月16日 16:19:47 +00:00Commented Dec 16, 2018 at 16:19
MATL, 10 bytes
tYm-t&1Zs/
Explanation
t % Implicit input
% Duplicate
Ym % Mean
- % Subtract, element-wise
t % Duplicate
&1Zs % Standard deviation using normalization by n
/ % Divide, element-wise
% Implicit display
APL+WIN, (削除) 41,32 (削除ここまで) 30 bytes
9 bytes saved thanks to Erik + 2 more thanks to ngn
x←v-(+/v)÷⍴v←⎕⋄x÷(×ばつx÷⍴v)*.5
Prompts for vector of numbers and calculates mean standard deviation and standardised elements of input vector
-
\$\begingroup\$ Can't you assign
x←v-(+/v)÷⍴v←⎕and then dox÷((+/x*2)÷⍴v)*.5? \$\endgroup\$Erik the Outgolfer– Erik the Outgolfer2018年12月14日 19:11:39 +00:00Commented Dec 14, 2018 at 19:11 -
\$\begingroup\$ I can indeed. Thanks. \$\endgroup\$Graham– Graham2018年12月14日 19:24:02 +00:00Commented Dec 14, 2018 at 19:24
-
\$\begingroup\$ does apl+win do singleton extension (
1 2 3+,4←→1 2 3+4)? if yes, you could rewrite(+/x*2)÷⍴vas+/x×x÷⍴v\$\endgroup\$ngn– ngn2018年12月16日 10:24:00 +00:00Commented Dec 16, 2018 at 10:24 -
\$\begingroup\$ @ngn That works for another 2 bytes. Thanks. \$\endgroup\$Graham– Graham2018年12月16日 12:47:31 +00:00Commented Dec 16, 2018 at 12:47
R + pryr, (削除) 53 (削除ここまで) 52 bytes
-1 byte using sum(x|1) instead of length(x) as seen in @Robert S.'s solution
pryr::f((x-(y<-mean(x)))/(sum((x-y)^2)/sum(x|1))^.5)
For being a language built for statisticians, I'm amazed that this doesn't have a built-in function. At least not one that I could find. Even the function mosaic::zscore doesn't yield the expected results. This is likely due to using the population standard deviation instead of sample standard deviation.
-
2\$\begingroup\$ You can change the
<-into a=to save 1 byte. \$\endgroup\$Robert S.– Robert S.2018年12月14日 19:52:17 +00:00Commented Dec 14, 2018 at 19:52 -
\$\begingroup\$ @J.Doe nope, I used the method I commented on Robert S.'s solution.
scaleis neat! \$\endgroup\$Giuseppe– Giuseppe2018年12月14日 19:57:13 +00:00Commented Dec 14, 2018 at 19:57 -
2
-
2\$\begingroup\$ @RobertS. here on PPCG we tend to encourage allowing flexible input and output, including outputting more than is required, with the exception of challenges where the precise layout of the output is the whole point of the challenge. \$\endgroup\$ngm– ngm2018年12月14日 21:09:25 +00:00Commented Dec 14, 2018 at 21:09
-
6\$\begingroup\$ Of course R built-ins wouldn't use "population variance". Only confused engineers would use such a thing (hencethe Python and Matlab answers ;)) \$\endgroup\$ngm– ngm2018年12月14日 21:12:04 +00:00Commented Dec 14, 2018 at 21:12
Jelly, 10 bytes
_×ばつ
It's not any shorter, but Jelly's determinant function ÆḊ also calculates vector norm.
_Æm x - mean(x)
μ then:
L1⁄2 Square root of the Length
÷ÆḊ divided by the norm
×ばつ Multiply by that value
-
\$\begingroup\$ Hey, nice alternative! Unfortunately, I can't see a way to shorten it. \$\endgroup\$Erik the Outgolfer– Erik the Outgolfer2018年12月14日 22:34:46 +00:00Commented Dec 14, 2018 at 22:34
Mathematica, 25 bytes
Mean[(a=#-Mean@#)a]^-.5a&
Pure function. Takes a list of numbers as input and returns a list of machine-precision numbers as output. Note that the built-in Standardize function uses the sample variance by default.
J, 22 bytes
-1 byte thanks to Cows quack!
(-%[:%:1#.-*-%#@[)+/%#
J, (削除) 31 (削除ここまで) 23 bytes
(-%[:%:#@[%~1#.-*-)+/%#
+/%# - mean (sum (+/) divided (%) by the number of samples (#))
( ) - the list is a left argument here (we have a hook)
- - the difference between each sample and the mean
* - multiplied by
- - the difference between each sample and the mean
1#. - sum by base-1 conversion
%~ - divided by
#@[ - the length of the samples list
%: - square root
[: - convert to a fork (function composition)
- - subtract the mean from each sample
% - and divide it by sigma
-
1\$\begingroup\$ Rearranging it gives 22
[:(%[:%:1#.*:%#)]-+/%#tio.run/##y/qfVmyrp2CgYKVg8D/…, I think one of those caps could be removed, but haven't had any luck so far, EDIT: a more direct byteshaving is(-%[:%:1#.-*-%#@[)+/%#also at 22 \$\endgroup\$user41805– user418052018年12月15日 13:12:48 +00:00Commented Dec 15, 2018 at 13:12 -
\$\begingroup\$ @Cows quack Thanks! \$\endgroup\$Galen Ivanov– Galen Ivanov2018年12月15日 14:30:39 +00:00Commented Dec 15, 2018 at 14:30
-
\$\begingroup\$ you could assign
⍵-mto a variable and removem←like this:{d÷.5*⍨l÷⍨+/×⍨d←⍵-(+/⍵)÷l←≢⍵}\$\endgroup\$ngn– ngn2018年12月16日 10:39:17 +00:00Commented Dec 16, 2018 at 10:39 -
\$\begingroup\$ @ngn Ah, nice, thanks, I didn't see that duplication somehow \$\endgroup\$Quintec– Quintec2018年12月16日 16:15:09 +00:00Commented Dec 16, 2018 at 16:15
Haskell, (削除) 80 (削除ここまで) (削除) 75 (削除ここまで) 68 bytes
t x=k(/sqrt(f$sum$k(^2)))where k g=g.(-f(sum x)+)<$>x;f=(/sum(1<$x))
Thanks to @flawr for the suggestions to use sum(1<$x) instead of sum[1|_<-x] and to inline the mean, @xnor for inlining the standard deviation and other reductions.
Expanded:
-- Standardize a list of values of any floating-point type.
standardize :: Floating a => [a] -> [a]
standardize input = eachLessMean (/ sqrt (overLength (sum (eachLessMean (^2)))))
where
-- Map a function over each element of the input, less the mean.
eachLessMean f = map (f . subtract (overLength (sum input))) input
-- Divide a value by the length of the input.
overLength n = n / sum (map (const 1) input)
-
1\$\begingroup\$ You can replace
[1|_<-x]with(1<$x)to save a few bytes. That is a great trick for avoiding thefromIntegral, that I haven't seen so far! \$\endgroup\$flawr– flawr2018年12月16日 10:54:29 +00:00Commented Dec 16, 2018 at 10:54 -
\$\begingroup\$ By the way: I like using tryitonline, you can run your code there and then copy the preformatted aswer for posting here! \$\endgroup\$flawr– flawr2018年12月16日 10:57:50 +00:00Commented Dec 16, 2018 at 10:57
-
\$\begingroup\$ And you do not have to define
m. \$\endgroup\$flawr– flawr2018年12月16日 11:02:28 +00:00Commented Dec 16, 2018 at 11:02 -
\$\begingroup\$ You can write
(-x+)for(+(-x))to avoid parens. Also it looks likefcan be pointfree:f=(/sum(1<$x)), andscan be replaced with its definition. \$\endgroup\$xnor– xnor2018年12月16日 20:00:49 +00:00Commented Dec 16, 2018 at 20:00 -
\$\begingroup\$ @xnor Ooh,
(-x+)is handy, I’m sure I’ll be using that in the future \$\endgroup\$Jon Purdy– Jon Purdy2018年12月16日 21:15:06 +00:00Commented Dec 16, 2018 at 21:15
MathGolf, 7 bytes
▓-_2▓√/
Explanation
This is literally a byte-for-byte recreation of Kevin Cruijssen's 05AB1E answer, but I save some bytes from MathGolf having 1-byters for everything needed for this challenge. Also the answer looks quite good in my opinion!
▓ get average of list
- pop a, b : push(a-b)
_ duplicate TOS
2 pop a : push(a*a)
▓ get average of list
√ pop a : push(sqrt(a)), split string to list
/ pop a, b : push(a/b), split strings
JavaScript (ES7), (削除) 80 (削除ここまで) 79 bytes
a=>a.map(x=>(x-g(a))/g(a.map(x=>(x-m)**2))**.5,g=a=>m=eval(a.join`+`)/a.length)
Commented
a => // given the input array a[]
a.map(x => // for each value x in a[]:
(x - g(a)) / // compute (x - mean(a)) divided by
g( // the standard deviation:
a.map(x => // for each value x in a[]:
(x - m) ** 2 // compute (x - mean(a))2
) // compute the mean of this array
) ** .5, // and take the square root
g = a => // g = helper function taking an array a[],
m = eval(a.join`+`) // computing the mean
/ a.length // and storing the result in m
) // end of outer map()
Haskell, 59 bytes
(%)i=sum.map(^i)
f l=[(0%l*y-1%l)/sqrt(2%l*0%l-1%l^2)|y<-l]
Doesn't use libraries.
The helper function % computes the sum of ith powers of a list, which lets us get three useful values.
0%lis the length ofl(call thisn)1%lis the sum ofl(call thiss)2%lis the sum of squares ofl(call thism)
We can express the z-score of an element y as
(n*y-s)/sqrt(n*v-s^2)
(This is the expression (y-s/n)/sqrt(v/n-(s/n)^2) simplified a bit by multiplying the top and bottom by n.)
We can insert the expressions 0%l, 1%l, 2%l without parens because the % we define has higher precedence than the arithmetic operators.
(%)i=sum.map(^i) is the same length as i%l=sum.map(^i)l. Making it more point-free doesn't help. Defining it like g i=... loses bytes when we call it. Although % works for any list but we only call it with the problem input list, there's no byte loss in calling it with argument l every time because a two-argument call i%l is no longer than a one-argument one g i.
-
\$\begingroup\$ We do have \$\LaTeX\$ here:) \$\endgroup\$flawr– flawr2018年12月16日 09:59:13 +00:00Commented Dec 16, 2018 at 9:59
-
\$\begingroup\$ I really like the
%idea! It looks just like the discrete version of the statistical moments. \$\endgroup\$flawr– flawr2018年12月16日 10:02:01 +00:00Commented Dec 16, 2018 at 10:02
K (oK), (削除) 33 (削除ここまで) 23 bytes
-10 bytes thanks to ngn!
{t%%(+/t*t:x-/x%#x)%#x}
First attempt at coding (I don't dare to name it "golfing") in K. I'm pretty sure it can be done much better (too many variable names here...)
-
1
-
1
-
1
-
\$\begingroup\$ @ngn Thank you! Now I see that the first 2 golfs are obvious; the last one is beyond my current level :) \$\endgroup\$Galen Ivanov– Galen Ivanov2018年12月16日 10:14:04 +00:00Commented Dec 16, 2018 at 10:14
MATLAB, 26 bytes
Trivial-ish, std(,1) for using population standard deviation
f=@(x)(x-mean(x))/std(x,1)
TI-Basic (83 series), (削除) 14 (削除ここまで) 11 bytes
Ans-mean(Ans
Ans/√(mean(Ans2
Takes input in Ans. For example, if you type the above into prgmSTANDARD, then {1,2,3}:prgmSTANDARD will return {-1.224744871,0.0,1.224744871}.
Previously, I tried using the 1-Var Stats command, which stores the population standard deviation in σx, but it's less trouble to compute it manually.
05AB1E, 9 bytes
ÅA-DnÅAt/
Port of @Arnauld's JavaScript answer, so make sure to upvote him!
Try it online or verify all test cases.
Explanation:
ÅA # Calculate the mean of the (implicit) input
# i.e. [-3,1,4,1,5] → 1.6
- # Subtract it from each value in the (implicit) input
# i.e. [-3,1,4,1,5] and 1.6 → [-4.6,-0.6,2.4,-0.6,3.4]
D # Duplicate that list
n # Take the square of each
# i.e. [-4.6,-0.6,2.4,-0.6,3.4] → [21.16,0.36,5.76,0.36,11.56]
ÅA # Pop and calculate the mean of that list
# i.e. [21.16,0.36,5.76,0.36,11.56] → 7.84
t # Take the square-root of that
# i.e. 7.84 → 2.8
/ # And divide each value in the duplicated list with it (and output implicitly)
# i.e. [-4.6,-0.6,2.4,-0.6,3.4] and 2.8 → [-1.6428571428571428,
# -0.21428571428571433,0.8571428571428572,-0.21428571428571433,1.2142857142857144]
Pyth, (削除) 21 (削除ここまで) 19 bytes
[email protected]^-Jk2Q2
Try it online here.
[email protected]^-Jk2Q2Q Implicit: Q=eval(input())
Trailing Q inferred
J.OQ Take the average of Q, store the result in J
m Q Map the elements of Q, as k, using:
-Jk Difference between J and k
^ 2 Square it
.O Find the average of the result of the map
@ 2 Square root it
- this is the standard deviation of Q
m Q Map elements of Q, as d, using:
-dJ d - J
c Float division by the standard deviation
Implicit print result of map
Edit: after seeing Kevin's answer, changed to use the average builtin for the inner results. Previous answer: mc-dJ.OQ@csm^-Jk2QlQ2
SNOBOL4 (CSNOBOL4), 229 bytes
DEFINE('Z(A)')
Z X =X + 1
M =M + A<X> :S(Z)
N =X - 1.
M =M / N
D X =GT(X) X - 1 :F(S)
A<X> =A<X> - M :(D)
S X =LT(X,N) X + 1 :F(Y)
S =S + A<X> ^ 2 / N :(S)
Y S =S ^ 0.5
N A<X> =A<X> / S
X =GT(X) X - 1 :S(N)
Z =A :(RETURN)
Link is to a functional version of the code which constructs an array from STDIN given its length and then its elements, then runs the function Z on that, and finally prints out the values.
Defines a function Z which returns an array.
The 1. on line 4 is necessary to do the floating point arithmetic properly.
Charcoal, (削除) 25 (削除ここまで) 19 bytes
≧−∕ΣθLθθI∕θ2∕ΣXθ2Lθ
Try it online! Link is to verbose version of code. Explanation:
θ Input array
≧ Update each element
− Subtract
Σ Sum of
θ Input array
∕ Divided by
L Length of
θ Input array
Calculate \$\mu\$ and vectorised subtract it from each \$x_i\$.
θ Updated array
∕ Vectorised divided by
2 Square root of
Σ Sum of
θ Updated array
X Vectorised to power
2 Literal 2
∕ Divided by
L Length of
θ Array
I Cast to string
Implicitly print each element on its own line.
Calculate \$\sigma\$, vectorised divide each \$x_i\$ by it, and output the result.
Edit: Saved 6 bytes thanks to @ASCII-only for a) using SquareRoot() instead of Power(0.5) b) fixing vectorised Divide() (it was doing IntDivide() instead) c) making Power() vectorise.
-
\$\begingroup\$ crossed out 25 = no bytes? :P (Also, you haven't updated the TIO link yet) \$\endgroup\$ASCII-only– ASCII-only2018年12月25日 10:59:48 +00:00Commented Dec 25, 2018 at 10:59
-
\$\begingroup\$ @ASCII-only Oops, thanks! \$\endgroup\$Neil– Neil2018年12月25日 14:32:27 +00:00Commented Dec 25, 2018 at 14:32
CASIO BASIC (CASIO fx-9750GIII), 20 bytes
?→List1
1-Variable List1
(List1-x̄)÷σx
builtins
APL(NARS), 26 chars
{m←÷≢⍵⋄d×ばつ⍨d×ばつ+/⍵}
test:
f←{m←÷≢⍵⋄d×ばつ⍨d×ばつ+/⍵}
f 1 2
̄1 1
f 1 2 3
̄1.224744871 0 1.224744871
f ̄3 1 4 1 5
̄1.642857143 ̄0.2142857143 0.8571428571 ̄0.2142857143 1.214285714
Explore related questions
See similar questions with these tags.