This is just a simple median calculator that calculates the median for a given array/list.
def median(lst):
nl = []
nl = sorted(lst)
e = int(len(nl))
if e % 2 != 0: # True if odd # of #'s
ind = int(((e / 2.0) - 0.5)) # index of the median
c = nl[ind]
print c
else: # even # of #'s
ind_left = int(e / 2.0) - 1 # index left of median
ind_right = int(e / 2.0) # index right of median
z = (nl[ind_left] + nl[ind_right]) / 2.0
print z
median([4, 4, 5, 5, 2.5])
6 Answers 6
Please name your variables appropriately. Single-letter variables are in most cases sub-optimal and make it needlessly hard for the you-in-six-weeks to figure out what it does.
I'm fairly sure your algorithm is sub-par as well, although I'm not that proficient in algorithms myself. I suspect a function like this already exists in the Python library, I'll update this answer when I've found it.
The following is double work:
nl = []
nl = sorted(lst)
The following will suffice:
nl = sorted(lst)
If you'd be using Python 3.4 or higher, statistics.median
would solve your problem.
In Python 2.7 you'll have to install NumPy, an often used library for large number calculations. Now use numpy.median
like this:
import numpy as np
def median(lst):
return np.median(np.array(lst))
print(median([4, 4, 5, 5, 2.5]))
Python usually has basic functions like this already available to you. If it's already available in commonly used Python libraries, it's usually a good idea not to re-invent the wheel yourself.
-
\$\begingroup\$ You’re right, the algorithm is sub-par. In fact, a partial sorting of the lower half of the list is sufficient., since we don’t care about the remainder of the list. In fact, we only care about the correct positioning of the one (or two) elements in the middle, but to achieve this we need to sort the lower half. \$\endgroup\$Konrad Rudolph– Konrad Rudolph2016年12月09日 16:55:09 +00:00Commented Dec 9, 2016 at 16:55
Like @Mast mentioned already mentioned you should use proper variable named in your program. A good variable name makes the program so much easier to read, specially if you're reading it after couple of months.
There are only two hard things in Computer Science: cache invalidation and naming things. — Phil Karlton
e = int(len(nl))
The call to
int()
is redundant, length is always an integer.Given integer operands you can get floored division in both Python 2 and 3 using
//
operator. This prevents theint()
conversions as we are not dealing with floats here.Return your result instead of printing it. That way you can import this function in other modules as well, making it re-usable.
New version of your program:
def median(seq):
new_list = sorted(seq)
length = len(seq)
if length % 2 != 0:
index = length // 2
median = new_list[index]
else:
index_right = length // 2
index_left = index_right - 1
median = (new_list[index_left] + new_list[index_right]) / 2.0
return median
-
\$\begingroup\$ I was getting bored so I did a beautiful one-liner from your function:
def median(seq): return sorted(seq)[len(seq) // 2] if len(seq) % 2 != 0 else (sorted(seq)[len(seq) // 2 - 1] + sorted(seq)[len(seq) // 2]) / 2.0
\$\endgroup\$Grajdeanu Alex– Grajdeanu Alex2016年12月09日 14:55:01 +00:00Commented Dec 9, 2016 at 14:55
If you're willing to upgrade to Python 3 you can do this with statistics.median
.
I'd recommend that you upgrade.
And would make your code:
from statistics import median
print(median([4, 4, 5, 5, 2.5]))
As for your code:
- You don't need to create a list
nl
to assign a new list to it. - You don't need to
int(len())
, it's an int already. - It's easier to understand
e % 2 == 1
overe % 2 != 0
. - You can just floor divide if the list is even or odd, this can be done with
//
. - You don't need to assign to variables if you can readably
return
. - You don't need a variable to index both the left and right of the center of the list.
- Keep your numbers as integers, don't use
2.0
orfloat
. This removes the need toint
. - Don't
print
,return
. n
would be a better single variable name rater thane
.i
would also be an easy to read variable name instead ofind_right
. Which is good as you use it twice.nl
is strange at best,d
ordata
are better variable names.- You don't error if the size of the input array is 0. Instead raise a
ValueError
.
This can get you:
def median(data):
data = sorted(data)
n = len(data)
if n == 0:
raise ValueError("no median for empty data")
if n%2 == 0:
return data[n//2]
else:
i = n//2
return (data[i - 1] + data[i]) / 2.0
Your variable naming really needs improvements. 1 simple rule I usually follow: if you have trouble finding a good name for a variable, then you don't need it (e, z, c, ind, ).
def median(elements):
sorted_elements = sorted(elements)
div, mod = divmod(len(elements), 2)
return (sorted_elements[div+mod-1] + sorted_elements[div]) / 2
assert median([1, 2, 3]) == 2
assert median([1, 2, 3, 4]) == 2.5
assert median([4, 4, 5, 5, 2.5, 1]) == 4
assert median([4, 4, 5, 5, 2.5, 1, 6]) == 4
-
1\$\begingroup\$ Your code doesn't work with
median([1, 2, 3]) -> 2.5
andmedian([1, 2, 3, 4]) -> 3
. \$\endgroup\$2016年12月09日 23:20:33 +00:00Commented Dec 9, 2016 at 23:20 -
\$\begingroup\$ @Peilonrayz: you are right, I fixed it (hopefully) \$\endgroup\$Guillaume– Guillaume2016年12月14日 13:52:51 +00:00Commented Dec 14, 2016 at 13:52
-
1\$\begingroup\$ Looks correct. It was a one-off error, and so you could have just added,
-1
to the second index, ;Preturn (sorted_elements[div] + sorted_elements[div+mod-1]) / 2
\$\endgroup\$2016年12月14日 17:07:21 +00:00Commented Dec 14, 2016 at 17:07 -
\$\begingroup\$ Right once again :) And that is actually what I wanted to do from the beginning :p \$\endgroup\$Guillaume– Guillaume2016年12月15日 09:51:24 +00:00Commented Dec 15, 2016 at 9:51
Apart from what @Mast already said, here is another small suggestion regarding your actual code:
e = int(len(nl))
The above is always going to be an int as you can't have a float
number of items in a list. So:
e = len(nl)
will do as well.
-
1\$\begingroup\$ Good catch, I hadn't realized yet there was a useless cast going on. \$\endgroup\$2016年12月09日 14:46:10 +00:00Commented Dec 9, 2016 at 14:46
I am assuming that you want to keep this function general, or that you are intentionally re-inventing the wheel (otherwise you would likely want to use numpy's median function, or the statistics median function for python 3.4+).
First:
def median(lst):
I don't see any length checking on the list. If the user supplies an empty list, I would expect a meaningful exception.
Alternatively, the function could provide an optional default value (similar to the built-in sum
), but this would be less intuitive to me.
Second:
nl = []
nl = sorted(lst)
e = int(len(nl))
Initializing n1 is unnecessary here. The result of len
also does not need to be converted - the result is always an int. The variable names should also be more descriptive to explain their contents or intent.
Third:
if e % 2 != 0: # True if odd # of #'s
ind = int(((e / 2.0) - 0.5)) # index of the median
c = nl[ind]
print c
In any language, I am always very careful when doing rounding and/or floating point operations. The correctness of this looks questionable to me (especially without comments). Luckily, python has integer division (//
), which rounds down and makes simpler operations like this one clearer.
This could should also return the value, rather than printing it, in order to keep the IO portions separate from the calculations. Separating the print
will make it easier to maintain and re-use the function.
Fourth:
else: # even # of #'s
ind_left = int(e / 2.0) - 1 # index left of median
ind_right = int(e / 2.0) # index right of median
z = (nl[ind_left] + nl[ind_right]) / 2.0
print z
An obvious optimization here is to avoid recalculating the value in ind_left
to get ind_right
(ind_left == ind_right - 1
).
Taking all of these suggestions will result in code that is something like this:
def median(lst):
if not lst:
raise ValueError('Cannot find median of an empty list')
sorted_items = sorted(lst)
# Determine if even number of items (True) or odd (False)
is_even = (len(sorted_items) % 2) == 0
median_index_l = len(sorted_items) // 2
if is_even:
# Even number of items -- median is average of two middle items
return (sorted_items[median_index_l] + sorted_items[median_index_l + 1]) / 2.0
else:
# Odd number of items -- median is middle item
return sorted_items[median_index_l]
-
\$\begingroup\$ Since you only use is_even once, why not just write
if lem(sorted_items) % 2:
? \$\endgroup\$Oscar Smith– Oscar Smith2016年12月12日 14:39:41 +00:00Commented Dec 12, 2016 at 14:39 -
\$\begingroup\$ @OscarSmith For readability. If I wanted to optimize it, I could use one of the two library functions, write it in C, etc. \$\endgroup\$Casey Kuball– Casey Kuball2016年12月12日 16:15:54 +00:00Commented Dec 12, 2016 at 16:15
-
\$\begingroup\$ In retrospect looking back at this, I can see a default return of
None
for an empty list being acceptable (though it's passing an error downstream - not recommended). \$\endgroup\$Casey Kuball– Casey Kuball2022年02月16日 18:23:04 +00:00Commented Feb 16, 2022 at 18:23
Explore related questions
See similar questions with these tags.