This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015年02月25日 18:40 by Journeyman08, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 23842 | closed | rhettinger, 2020年12月18日 17:57 | |
| PR 30174 | merged | mark.dickinson, 2021年12月17日 15:56 | |
| PR 30221 | merged | miss-islington, 2021年12月21日 12:43 | |
| PR 30220 | merged | miss-islington, 2021年12月21日 12:46 | |
| Messages (11) | |||
|---|---|---|---|
| msg236612 - (view) | Author: Jake (Journeyman08) | Date: 2015年02月25日 18:40 | |
In the statistics module documentation, there is a note that states that "The mean is strongly affected by outliers and is not a robust estimator for central location: the mean is not necessarily a typical example of the data points. For more robust, although less efficient, measures of central location, see median() and mode()" https://docs.python.org/3/library/statistics.html While I appreciate the intention, this is quite misleading. The implication is that the mean, median and mode are different ways to estimate one "central location", however, in reality they are very different things (albeit which refer to a similar notion). The sample mean is an unbiased estimator of the true mean but it need not be unbiased as an estimator of the true median or modes and vice versa for the median and mode. To make this clearer I would rephrase to "The mean is strongly affected by outliers and is not necessarily representative of the central tendency of the data. For cases with large outliers or very low sample size, see median() and mode()" Apologies if this is seen as frivolous, but statistics can be hard enough to remain very clear about even when the words are used precisely. |
|||
| msg383228 - (view) | Author: Irit Katriel (iritkatriel) * (Python committer) | Date: 2020年12月17日 11:03 | |
I agree with Jake's comment, but I think the solution is to remove that Note altogether. This document is a software manual, not a statistics textbook, and as such it should just state clearly what the statistics module does. If someone doesn't know whether they need the mean or the median, they really need to read a more fundamental text before writing their code. |
|||
| msg383335 - (view) | Author: Steven D'Aprano (steven.daprano) * (Python committer) | Date: 2020年12月19日 00:12 | |
I strongly oppose this change, and I dispute the characterisation of this as a misleading note. It is not misleading, and I argue that every word of it is factually correct. Jake, if you disagree, then please provide some citations. Irit: it is ridiculous to describe a two paragraph (nine line) note as "a statistics textbook". That is an exaggerated position that doesn't help the discussion. It's not a textbook, it is a short note that helps users whose knowledge of statistics is naive to understand which statistic is better for them. "If someone doesn't know whether they need the mean or the median, they really need to read a more fundamental text before writing their code." I totally disagree. This module is not intended only for statisticians and experts, and the user who isn't sure which average to use shouldn't have to read a textbook on the fundamentals of statistics. |
|||
| msg383361 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2020年12月19日 05:42 | |
Steven, you are the module maintainer. So if you're sure about the current wording, go ahead and close this. |
|||
| msg383362 - (view) | Author: Steven D'Aprano (steven.daprano) * (Python committer) | Date: 2020年12月19日 06:32 | |
I'm willing to give Irit and Jake opportunity to make their case. Particularly if they can demonstrate that I got my facts wrong. I'm going to close the ticket, but if anyone feels strongly enough to respond with a good argument, or better still citations demonstrating that the comment is factually wrong, I am open to revising or removing the wording. |
|||
| msg383368 - (view) | Author: Steven D'Aprano (steven.daprano) * (Python committer) | Date: 2020年12月19日 10:21 | |
Sorry Raymond, I missed this before closing the task. > FWIW, Allen Downey also had concerns about this wording. I don't recognise the name, who is Allen Downey and what concerns does he have? |
|||
| msg408551 - (view) | Author: Guido van Rossum (gvanrossum) * (Python committer) | Date: 2021年12月14日 17:03 | |
Couldn't we just change the first occurrence of "central location" in the note to "a central location" and the second to "(different) central locations"? That would leave Steven's intention intact but satisfy those who read it as referring to *a single* central location. |
|||
| msg408698 - (view) | Author: Steven D'Aprano (steven.daprano) * (Python committer) | Date: 2021年12月16日 12:23 | |
Prompted by Guido's reopening of the ticket, I have given it some more thought, and have softened my views. Jake if you're still around, perhaps there is more to what you said than I initially thought, and I just needed fresh eyes to see it. Sorry for being so slow to come around. People may be less likely to wrongly imagine there is a single centre location of data if we use the term "central tendency" instead of location. I think we should also drop the reference to mode(), since it only works with discrete data and is not suitable for continuous data. "The mean is strongly affected by outliers and is not necessarily a typical example of the data points. For a more robust, although less efficient, measure of central tendency, see median()" How do we feel about linking to Wikipedia? I'd like to link both outliers and central tendency to the appropriate Wikipedia entries. |
|||
| msg408726 - (view) | Author: Guido van Rossum (gvanrossum) * (Python committer) | Date: 2021年12月16日 18:56 | |
Great! I will leave it to Steven and Mark D to work out an acceptable solution. PS. Allen Downey is a computer scientist who has written at least one book about Python. |
|||
| msg408732 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2021年12月16日 19:35 | |
> "The mean is strongly affected by outliers and is not necessarily a typical example of the data points. For a more robust, although less efficient, measure of central tendency, see median()" That wording sounds fine to me. I don't think we can reasonably expect to hear from Jake again, but from my understanding of his post, this addresses his concerns. FWIW, I share those concerns. My brain can't parse "robust estimator for central location", because the term "estimator" has a precise and well-defined meaning in (frequentist) statistics, and what I expect to see after "estimator for" is a description of a parameter of a statistical model - as in for example "estimator for the population mean", or "estimator for the Weibull shape parameter". "central location" doesn't really fit in that slot. > How do we feel about linking to Wikipedia? I can't think of any good reason not to. We have plenty of other external links in the docs, and the Wikipedia links are probably at lower risk of becoming stale than most of the others. |
|||
| msg408794 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2021年12月17日 15:58 | |
Steven: I've made a PR at https://github.com/python/cpython/pull/30174. Does this match what you had in mind? |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:13 | admin | set | github: 67710 |
| 2021年12月21日 14:16:23 | mark.dickinson | set | status: open -> closed stage: patch review -> resolved |
| 2021年12月21日 12:46:29 | miss-islington | set | pull_requests: + pull_request28445 |
| 2021年12月21日 12:43:46 | miss-islington | set | nosy:
+ miss-islington pull_requests: + pull_request28444 |
| 2021年12月17日 15:58:01 | mark.dickinson | set | messages: + msg408794 |
| 2021年12月17日 15:56:38 | mark.dickinson | set | stage: patch review pull_requests: + pull_request28390 |
| 2021年12月16日 19:35:19 | mark.dickinson | set | messages: + msg408732 |
| 2021年12月16日 18:56:38 | gvanrossum | set | messages: + msg408726 |
| 2021年12月16日 14:00:23 | mark.dickinson | set | nosy:
+ mark.dickinson |
| 2021年12月16日 12:23:34 | steven.daprano | set | stage: resolved -> (no value) messages: + msg408698 versions: + Python 3.6, Python 3.7, Python 3.8, Python 3.9, Python 3.10, Python 3.11, - Python 3.4 |
| 2021年12月14日 17:03:45 | gvanrossum | set | status: closed -> open nosy: + gvanrossum messages: + msg408551 resolution: rejected -> |
| 2020年12月19日 10:21:10 | steven.daprano | set | messages: + msg383368 |
| 2020年12月19日 07:24:04 | steven.daprano | set | status: open -> closed resolution: rejected stage: patch review -> resolved |
| 2020年12月19日 06:32:31 | steven.daprano | set | messages: + msg383362 |
| 2020年12月19日 05:42:47 | rhettinger | set | messages: + msg383361 |
| 2020年12月19日 00:58:59 | rhettinger | set | messages: - msg383343 |
| 2020年12月19日 00:57:51 | rhettinger | set | assignee: docs@python -> steven.daprano |
| 2020年12月19日 00:57:36 | rhettinger | set | messages: + msg383343 |
| 2020年12月19日 00:12:12 | steven.daprano | set | messages: + msg383335 |
| 2020年12月18日 17:57:06 | rhettinger | set | keywords:
+ patch nosy: + rhettinger pull_requests: + pull_request22703 stage: patch review |
| 2020年12月17日 11:03:49 | iritkatriel | set | nosy:
+ iritkatriel messages: + msg383228 |
| 2015年02月25日 19:14:14 | SilentGhost | set | nosy:
+ steven.daprano |
| 2015年02月25日 18:40:46 | Journeyman08 | create | |