homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Mercurial robots.txt should let robots crawl landing pages.
Type: enhancement Stage: needs patch
Components: None Versions:
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Ivaylo.Popov, barry, benjamin.peterson, emily.zhao, ezio.melotti, georg.brandl, pitrou
Priority: normal Keywords: easy

Created on 2012年02月01日 22:29 by Ivaylo.Popov, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Messages (6)
msg152446 - (view) Author: Ivaylo Popov (Ivaylo.Popov) Date: 2012年02月01日 22:29
http://hg.python.org/robots.txt currently disallows all robots from all paths. This means that the site doesn't show up in Google search results seeking, for instance, browsing access to the python source
https://www.google.com/search?ie=UTF-8&q=python+source+browse
https://www.google.com/search?ie=UTF-8&q=python+repo+browse
https://www.google.com/search?ie=UTF-8&q=hg+python+browse
etc...
Instead, robots.txt should allow access to the landing page, http://hg.python.org/, and the landing pages for hosted projects, e.g. http://hg.python.org/cpython/, while prohibiting access to the */rev/*, */shortlog/*, ..., directories.
This change would be very easy, cost virtually nothing, and let users find the mercurial repository viewer from search engines. Note that http://svn.python.org/ does show up in search results, as an illustration of how convenient this is.
msg152457 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年02月02日 13:26
Can you propose a robots.txt file?
msg219976 - (view) Author: Emily Zhao (emily.zhao) * Date: 2014年06月07日 21:12
I don't know too much about robots.txt but how about
Disallow: */rev/*
Disallow: */shortlog/*
Allow:
Are there any other directories we'd like to exclude?
msg220003 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2014年06月07日 23:54
Unfortunately, I don't think it will be that easy because I don't think robots.txt supports wildcard paths like that. Possibly, we should just whitelist a few important repositories.
msg220109 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014年06月09日 19:06
Yes, I think we should whitelist rather than blacklist. The problem with letting engines index the repositories is the sheer resource cost when they fetch many heavy pages (such as annotate, etc.).
msg275898 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2016年09月12日 00:17
Two things: is it worth fixing this bug given the impending move to github? Also, why is this reported here and not the pydotorg tracker? https://github.com/python/pythondotorg/issues
Given that the last comment was 2014, I'm going to go ahead and close this issue.
History
Date User Action Args
2022年04月11日 14:57:26adminsetgithub: 58132
2016年09月12日 00:17:26barrysetstatus: open -> closed

nosy: + barry
messages: + msg275898

resolution: wont fix
2014年06月09日 19:06:07pitrousetmessages: + msg220109
2014年06月07日 23:54:14benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg220003
2014年06月07日 21:12:46emily.zhaosetnosy: + emily.zhao
messages: + msg219976
2013年08月17日 14:53:04ezio.melottisetkeywords: + easy
stage: needs patch
2012年02月02日 14:42:52ezio.melottisetnosy: + ezio.melotti
2012年02月02日 13:26:24pitrousetnosy: + georg.brandl, pitrou
messages: + msg152457
2012年02月01日 22:29:55Ivaylo.Popovcreate

AltStyle によって変換されたページ (->オリジナル) /