Nofollow: Difference between revisions
Revision as of 20:44, 24 January 2005
nofollow is a possible value of the attribute rel
in the <A>
markup of HTML. Many search engines, such as Google, rank pages based on the number of other websites that link to them, but ignore links that contain rel="nofollow"
. This one attribute determines whether Wikipedia has any influence on the search engine ranking of any page linked to by Wikipedia. This article is an attempt to gauge community opinion on whether or not Wikipedia should use the nofollow attribute.
NOTE: The primary purpose of adding rel="nofollow" to MediaWiki is to reduce spam effectiveness on unattended and poorly-maintained third-party wikis by enabling it by default in the installation. On a wiki with a strong community, linkspam is generally quickly removed. The attribute could either be turned off on a site like (the active languages of) Wikipedia with little ill effect, or some alternate scheme of deciding when to add it could be used. --brion 07:24, 22 Jan 2005 (UTC)
Usage
Link spam or comment spam is the practice of someone putting links to their own pages in wikis, blogs and other places in order to raise their page's ranking in search engines by creating bogus links. It may be part of a en:google bomb campaign.
On 18 January 2005 the Google blog entry "Preventing comment spam" declared that Google would henceforth respect a rel="nofollow"
attribute on hyperlinks. Their page ranking algorithm now ignores links with this attribute when ranking the destination page. The intended result is that site administrators can modify user-posted links such that the attribute is present, and thus an attempt to googlebomb by posting a link on such a site would yield no increase from that link.
Here is a quote from the example:
- Q: How does a link change?
- A: Any link that a user can create on your site automatically gets a new "nofollow" attribute. So if a blog spammer previously added a comment like
Visit my <a href="http://www.example.com/">discount pharmaceuticals</a> site.
- That comment would be transformed to
Visit my <a href="http://www.example.com/" rel="nofollow">discount pharmaceuticals</a> site.
Previous discussion
At 18:37, 19 Jan 2005 (UTC), Rfc1394 suggested that Wikipedia follow that policy, at en:Wikipedia:Village pump (proposals)#Stopping Link Spam / Comment Spam. The issue was brought up again by Dwheeler at 00:18, 2005 Jan 20 (UTC) on en:Wikipedia:Village pump (technical)#Reduce link spamming: Support Google's "nofollow" approach. Discussion ensued, and there was no consensus that this feature should be turned on. At 02:30, Jan 22, 2005 (UTC), Ben Brockert posted that it had already been turned on.
- The real previous discussion was a discussion among developers, all who commented agreeing that it was an excellent idea. I was one of them. It was probably read about and implemented before the post you referenced. For background, I'm the person who has been doing most of the spam blacklist updating and an experienced anti-spam programmer, one of those involved in the development of an anti-spam program with around 100,00 users. Jamesday 07:44, 22 Jan 2005 (UTC)
Reasoning
This is a major change in the way Wikipedia works, and was implemented across all Wikipedias with no discussion on meta or en. The change should be reverted so that the community has time to come to a consensus on the issue.
The change does not harm the literal functioning of Wikipedia, and if it is turned off after discussion the links will be restored.
Reasons this is a bad thing:
- We already have a spam link blacklist that is regularly updated.
- Wikipedia no longer be contributing to the google rank of pages it links to. Links are as rigorously verified as the rest of information in Wikipedia; having the links but nullifying their effect is second guessing the validity of every article on Wikipedia.
- Presuming most links in Wikipedia are valid, and point to useful, scholarly resources, we want to boost their rankings. This is the same means by which Wikipedia itself becomes visible on Google. Our valid links help good, scholarly resources compete in the Google rankings.
- If we use "nofollow", Wikipedia itself will not be known to Google a "backward link" from any of these articles.
- It's just poor net etiquette.
Reasons this is a good thing:
- The spam blacklist is only for serious repeat offenders for whom normal blocks have proved ineffective. It's a very high overhead process.
- It strongly decreases the reward for spamming
- It saves substantial time for those involved in fighting link spam to use on other tasks to improve the quality of the projects.
- It's good netiquette not to reward spammers and, just like the block on email relaying, this is likely to rapidly become standard practice.
Discussion:
You're quite right that we want to reward good sites, even though that's only incidental to our purpose (an encyclopedia or other project). This is a first implementation, not a final implementation. It's likely that there will be some system in place soon enough to accept some links without the nofollow while still using it for the vast majority of spam link additions. I suggest waiting 6 months and looking to see what the system we have then is like. Jamesday 07:44, 22 Jan 2005 (UTC)
- Waiting half a year would be a good way to make the code permanent. I think it's better to evaluate it before adding it, rather than after; or in this case, re-evaluate it after it's added without public discussion. —Ben Brockert < 06:01, 23 Jan 2005 (UTC)
- That discussion was public - around 100 people present, all with the opportunity to comment either way. The code is already a permanent part of the MediaWiki software, including the ability for each site to turn it on or off. Whether this wiki chooses to continue wasting the time of its contributors by rewarding spamming is the question. Jamesday 07:55, 24 Jan 2005 (UTC)
"...wikipedia itself probably should turn the feature off." —Jimbo Wales, [1]
- Wikipedia is the one where the spam wastes most time and which provides (because of it's high page rank) the greatest reward and target for spammers. It's likely to be the biggest factor in justifying the work of supporting MediaWiki in spamming tools. Wikipedia isn't a web directory - we could remove every link without doing anything to reduce the quality of the encyclopedia entries. Perhaps we should, since we're not really in the business of being a directory. Jamesday 07:55, 24 Jan 2005 (UTC)
For it to actually reduce the hand-weeding of spam, the spammers have to understand that inserting a link in WP will not gain them anything. Few will be reading our policy pages to figure that out -- will we add yet another advisory to every edit page, and would it make a difference? It complicates the idea of turning it on for some and off for others, too -- nofollow can work for something like LiveJournal because a spammer can learn by reputation that "linkspam doesn't work at LJ", but if nofollow is used on some of our wikis and not on others, you can't get that same reputation effect, that "linkspam doesn't work on MediaWiki". Not to mention all the other wiki software out there -- who knows how many spammers actually know the difference between MediaWiki and MoinMoin and so on...? There will never be a full belief that "linkspam doesn't work on wikis at all." 18:50, 23 Jan 2005 (UTC)
Preliminary vote
This vote is purely to gauge user opinion without requiring each user to be involved in the discussion. It is not binding. Please sign your vote; mention your Wikipedia username if you don't have a Wikimedia username.
Get rid of nofollow on all projects, whether their community wants it or not
- —Ben Brockert < 05:25, 22 Jan 2005 (UTC)
- As some people have pointed out in the discussions linked above, this will do little to deter linkspam, which is already dealt with by users. It penalizes good resources, and may lower Wikipedia's own Google rankings. --Slowking Man 07:13, Jan 22, 2005 (UTC)
- brion's idea sounds like an effective compromise. --Slowking Man 07:29, Jan 22, 2005 (UTC)
- How does it lower the rankings of Wikipedia, Wictionary, Wikibooks, Commmons, September 11th or another project?
- Okay, I was making an assumption. Does Google use referred links in its pagerank forumla or not? If not, then it doesn't, and I was incorrect in stating so. --Slowking Man 08:02, Jan 22, 2005 (UTC)
- I would move this vote to the section at the bottom created by brion, but "think about alternatives" seems a bit ambiguous. I would like them turned off on the en Wikipedia. I don't really care about other projects. Maybe this vote should be put up on each project seperately? --Slowking Man 08:02, Jan 22, 2005 (UTC)
- Link spam is infrequent and reverted in a timely manner, whereas in blogs, they usually don't have the same editing standards and Wikipedia's "watchful eyes." Further, using "nofollow" effectively drops Wikipedia from participating in the global web process of determining link relevancy and popularity via search engine mechanisms. The arguments in favor just haven't been persuasive. --Stevietheman 07:17, 22 Jan 2005 (UTC)
- OK, I think I'm liking the ideas in the third section. Thanks to those who are using their noggins on this. I just feel that deploying the original idea on the English Wikipedia would end up becoming a bit of a mini-disaster for searchability on the web. --Stevietheman 00:25, 23 Jan 2005 (UTC)
- Implementing this on en is anti-web. We have a strong enough user base to not worry about it. If it can be turned on/off per wiki then I suggest it be turned off EN immediately. Think about this: Google JUST implemented this feature, and chances are the implementation on their server has problems with it - Google keeps crap in "beta" forever, tending to have all of us be their beta testers rather than internal testing. Who knows what we are doing to the web by implementing on this. I'd be interested in a MySQL query to see how many external links we have....this problem is amplified with Yahoo and MSN search. --Alterego 03:51, 23 Jan 2005 (UTC)]
- We have ~387,000 external links...PLEASE PLEASE turn this off immediately guys... --Alterego 03:52, 23 Jan 2005 (UTC)]
- This is absolutely inappropriate. Absolutely. (I would also accept the Keep nofollow on unattended wikis, think about alternatives for active projects option, though, particularly Patrice's comments). blankfaze 08:40, 23 Jan 2005 (UTC)
- Bad idea, especially for the major wikis. The "Keep nofollow on unatttended wikis" is acceptable. However, any alternative for active projects will be difficult, if not impractical due to overhead costs. Of course, maybe I'm wrong and not being creative enough :-D Ashlux 09:17, 24 Jan 2005 (UTC)
- I would also accept keeping nofollow on unattended projects, but only for the duration of their "unattended status". We should compile a list of said unattended Wikis--Oldak Quill 16:24, 24 Jan 2005 (UTC)
Keep nofollow on all projects whether their community wants it or not
- It's obviously useful, in its present state and is likely to become significantly refined over time. Jamesday 07:44, 22 Jan 2005 (UTC)
- Keep it on. email spam seemed harmless five years ago, now it threatens to make email useless. Link spam is one of the most serious long-range threats to open Wikis. It baffles me why anyone would be opposed to this. Links are there for the convenience of readers who wish to manually click on them to find information; they are not intended to be a means whereby Wikipedia as a whole "votes with its links" to influence search engine ranking. 68.160.145.238 22:07, 22 Jan 2005 (UTC)
- But it isn't that hard to spam Wikipedia if you know what you are doing. Always create a login name, only add one or two links per user name, use different IP addresses by logging in at your local public library and having subs to services like Earthlink and AOL that give you a different IP address every time, only add one or two links a day, etc. There are some successful spam sites in Wikipedia. The serious ones learn from their mistakes and figure out how to edit here and get away with it. Further, I am aware of several sites that are on the spam filter because a competitor spammed the hell out of them to get them banned from Wikipedia. They know about the spam filter and are using this method to get their competition out. The actual site owners are innocent but even legit adds from these sites are filtered out. Having a no follow tag on links will end most spamming. Most good sites don't need the help of the Wikis to rank well... 172.135.76.241 01:40, 23 Jan 2005 (UTC)
Keep nofollow on unattended wikis, think about alternatives for active projects
- Link spam is a big problem in general, and pollutes both the affected wikis and search engine databases. Cleaning up our act by using rel="nofollow" helps keep search engines clean (which are how we find stuff on the net). We can think about and look at further improvements for active projects, but the tags stay in the default installation and the many many unattended and lightly-used wikis that we and third-party users operate. brion 07:48, 22 Jan 2005 (UTC)
- Strongly agree, see my wikitech-l posts. -- Tim Starling 07:57, 22 Jan 2005 (UTC)
- Very strongly agree. --Daniel Mayer 08:16, 22 Jan 2005 (UTC)
- Yes, but turn it off now on the most active Wikis and don't wait until a better solution is found. --Patrice 14:17, 22 Jan 2005 (UTC)
- As Patrice says, turn off on most actives. Also provide a tool labeled "Pages with recently added external links" (or even "Recently added external links", using Diff) to assist policing link spam. [belated sign-on & sig --Jerzy 18:28, 22 Jan 2005 (UTC)]
- Agree. MarkSweep 17:12, 22 Jan 2005 (UTC)
- Agree. Julianortega 17:31, 22 Jan 2005 (UTC)
- BesigedB 19:33, 22 Jan 2005 (UTC)
- Keep nofollow tags for anonymously contributed external links only. en:Wikipedia doesn't seem to have too much of a linkspam problem, though. Wikibooks does, though, and I would strongly want it kept on wikibooks. I imagine there is some on lesser watched articles and wikis, though, and wastes peoples' time. - Omegatron 06:20, 22 Jan 2005 (UTC)
- There's not a good way to determine what was "anonymously contributed" on a page. If you want to create a scheme like this you'll need to flesh it out in more detail. --64.165.228.112 01:48, 23 Jan 2005 (UTC)
- Every edit made to the wiki has a user associated with it. How is it not possible to only nofollow the external links added by anons? - Omegatron 04:02, 23 Jan 2005 (UTC)
- Every edit is a complete self-contained blob of text with the entire article. Figuring out who first added which bit of text requires comparing multiple revisions (potentially thousands). If you would like to do this efficiently, you'll need to flesh out the scheme in much more detail. --brion 06:28, 23 Jan 2005 (UTC)
- So just start doing it now, and assume older external links were good ones. I don't know. I'm not a programmer, but it certainly seems feasible to me. All the necessary information is there, it's just a matter of writing software to process it. If that's the problem, then I don't know what to say. It's an idea. - Omegatron 19:59, 23 Jan 2005 (UTC)
- I appreciate that it seems feasible to you, but then you freely admit that you don't know what you're talking about. ;) I'm sorry to say it's not as easy as it sounds the way you describe it.
- Even if we did know who contributed a given link, forever dooming a particular link just because it was added by an anonymous editor will have annoying consequences. We'd see people removing links and re-adding them to try to remove the nofollow, producing further editing congestion. And remember, linkspam bots are capable of creating accounts and logging in; they regularly do so today on wikis and blogs where no anonymous contributions are allowed.
- A slightly more feasible alternative is to compare the prior revision with the current revision and to mark as nofollow any external URL links which do not appear in the prior revision. (This doubles the expense of rendering a page by requiring a load and parse of the previous revision.) This however would still be subject to spamming, as spambots may make multiple passes on a given wiki, leaving delicious spam links from prior revisions. --brion 23:44, 23 Jan 2005 (UTC)
- So just start doing it now, and assume older external links were good ones. I don't know. I'm not a programmer, but it certainly seems feasible to me. All the necessary information is there, it's just a matter of writing software to process it. If that's the problem, then I don't know what to say. It's an idea. - Omegatron 19:59, 23 Jan 2005 (UTC)
- Every edit is a complete self-contained blob of text with the entire article. Figuring out who first added which bit of text requires comparing multiple revisions (potentially thousands). If you would like to do this efficiently, you'll need to flesh out the scheme in much more detail. --brion 06:28, 23 Jan 2005 (UTC)
- Every edit made to the wiki has a user associated with it. How is it not possible to only nofollow the external links added by anons? - Omegatron 04:02, 23 Jan 2005 (UTC)
- There's not a good way to determine what was "anonymously contributed" on a page. If you want to create a scheme like this you'll need to flesh it out in more detail. --64.165.228.112 01:48, 23 Jan 2005 (UTC)
- As was said, linkspam isn't much of a problem on well-patrolled wikis like en:wikipedia, but IMO nofollow should be used for ones that are not as closely watched. bdesham 01:20, 23 Jan 2005 (UTC)
- "Unattended" or "active" is a matter of degree. One solution could be for the software to use "nofollow" when the latest revision of an article is younger than a certain age, which site administrators could configure. Another possibility is to recognise the "patrolled" flag for those sites that have it enabled. --Zigger 02:56, 2005 Jan 23 (UTC)
- 216.177.2.144 03:41, 23 Jan 2005 (UTC) (en:User:Poccil)
- --APPER 04:14, 23 Jan 2005 (UTC)
- A compromise (that would require some programming work) would be to mark all new links as 'nofollow' and after a while (e.g. a week) convert them to regular links. That assumes that after a week every spam-link is detected and removed, so there is no incentive for spammers anymore. --217.226.21.178 12:56, 23 Jan 2005 (UTC)
- Do not jail 99% of the links because of 1% of short-living spams.wikipedia does not need this, lthought some other wikis might.--Avsa 16:48, 23 Jan 2005 (UTC)
- Thue 18:02, 23 Jan 2005 (UTC)
- Enable nofollow for all wikis until a quorum of users can be had, which requests otherwise. -Fennec 19:08, 23 Jan 2005 (UTC)
- Minh Nguyễn (talk, blog) – Maybe we should also look into ways to mark a link with
nofollow
manually, a la Robert Scoble and the carpet store. Possibly with an extra pipe, like we do with extra attributes for images. –21:05, 23 Jan 2005 (UTC) - David Gerard 23:56, 23 Jan 2005 (UTC)
- I agree. This is the best compromise. Add the feature to MediaWiki, but it likely does not have to be enabled for Wikipedia itself. ALTHOUGH, I have seen dubious commercial links before on some of the education pages here (on pages like learning management system I think) that were there for months. So I also agree with Minh Nguyễn's suggestion above that on Wikipedia we be able to manually tag a link as nofollow when we aren't sure if a link should be outright deleted. (one way to do this now is to prepend Google's redirect URL to a link: http://www.google.com/url?sa=D&q= It has the same effect of denying added pagerank to the link.)
- --194.47.181.149 17:09, 24 Jan 2005 (UTC) (proposal below seems very interesting too....)
- Arj 17:55, 24 Jan 2005 (UTC)
- Michael Snow 18:31, 24 Jan 2005 (UTC)
- There are some resources I've found that were linked almost nowhere, and are very useful. Their google rank should be boosted if they're relevant external links. "Wikipedia isn't a web directory - we could remove every link without doing anything to reduce the quality of the encyclopedia entries. Perhaps we should, since we're not really in the business of being a directory. Jamesday 07:55, 24 Jan 2005 (UTC)" Not true; sometimes there are copyrighted sources that we can't include under any circumstances, but another site has them, and they add significantly to the article. Think Mickey Mouse in a country without fair use, or a rare and very informative photo for which the copyright owner pursues infringements vigorously, and would take us to court over fair use. --68.204.254.4 18:43, 24 Jan 2005 (UTC) (w:User:SPUI)
- We should absolutely discontinue the use of nofollow in the english wikipedia; we should be building the web, by linking to good sites and giving them credit, just like they should link to us. The option of rel="nofollow" should be in mediawiki, though, and we could think of using it for low-traffic wikis (and possibly all the Sandboxes?) ✏ Sverdrup 20:44, 24 Jan 2005 (UTC)
Possible compromise
I was thinking about possible compromises in this matter. One thing we could do is dynamically include nofollow on young links, while links more than, say, a week old would not have it. Then, short-lived links which are spam we haven't gotten around to removing yet would not affect rank, while "accepted" links that have stuck around will.
This would be a little tricky to do in the software, but it would basically amount to associating each link with the edit that created or edited that link, so that its date can be examined by the HTML translator. Dcoetzee 20:19, 23 Jan 2005 (UTC)
- Not sure if it's technically workable, but conceptually it sounds good. The only caveat I see is that the time period for acceptance should be a configuration setting for the wiki, so that it would cover the not so well attended wikis. --Stevietheman 22:12, 23 Jan 2005 (UTC)
- I do like that idea - David Gerard 23:56, 23 Jan 2005 (UTC)
- This sticks you with a big ugly link association table (doable if necessary) and either requires adding a cache expiration, an intermittent garbage collection process, or allowing pages to not necessarily remove the nofollow attribute until re-rendering of the page. (Google for instance would see the same cached HTML if a page isn't edited between runs, potentially including the expired rel="nofollow" attributes.) --brion 00:18, 24 Jan 2005 (UTC)
- That has unpleasant effects on cachability and places which display cached versions of pages. Something like noticing whether the edit is by an admin or a long-registered user with few admin reverts (reverts are an admin feature used for removing vandalism only) would make the link active immediately and we can presumably trust admins and such users not to add spam. Could also notice if the previous version of hte page had the link and keep the tag around if no new link was added. That way the nofollow tag would only be present when little-known accounts or IPs added the link. long-registered would presumably be configurable. Jamesday 07:55, 24 Jan 2005 (UTC)
This sort of thing has already been discussed and could be done. So could a variety of other things. What's hard to understand about this being an initial implementation, not the final one? Jamesday 07:55, 24 Jan 2005 (UTC)