Acceptable use of the Diseases Database web site

Summary

The Diseases Database does not require registration, CAPTCHAs etc before access. An open access policy exposes the Diseases Database site to potentially disabling levels of non-human ("robot") traffic.

We have mechanisms to minimise the impact of dozens of (otherwise) sustained automated "attacks" a day. We cannot always avoid an occassional human becoming caught in the net.

Collateral damage?

You are probably reading this because you are a responsible Diseases Database user whose access has become blocked by accident. Please contact us if you wish to be whitelisted.

Site defences may block innocent users who share a proxy server or netblock with bot infected machines, hackers or abusers. Please find further explanations below.

Off line readers
Web accelerators / prefetchers
Spam bots etc.
Tor and other anonymous proxies
Illegitimate and dysfunctional search engine spiders
Accidental blocks and whitelisting

Automated web page downloading software applications (robots or 'bots') can cause sites like ours difficulties. Service levels of our site depend on blocking these. Protection mechanisms can deny access to entire internet service providers or institutions (including prominent educational and healthcare institutions) when these are the origin of denial of service attacks.

The Diseases Database generates over 100,000 unique pages. There is limited bandwidth on our side. Attempts to rapidly download all (or large proportion) of this content impairs access for all other users.

Our site would suffer dozens of outages per day if unprotected. This issue is not unique to smaller sites - 'brown-out' and intermittent loss of service often experienced on prominent sites are frequently the consequence of malware robots.

Denial of service is seldom the intent of badly designed robot software but often the outcome. It is not clever to bring down web sites and doing so might even hinder the malware author's purpose. In the unlikely event a miscreant reads this we ask them please to desist. The average home broadband user will typically have more bandwidth than our small business web server. [ISPs demand a premium for reliability / service level agreements which makes the cost per kilobit/sec far higher than standard domestic broadband.]

Off line readers

Please do not attempt to download the Diseases Database web site for off line use.

There are over 100,000 pages on this site. Bandwidth is limited on our side. Downloading the entire site will interfere with all other users' access for hours or even days
Few offline readers work properly with this site
There is little point having this site off-line: server side site searching is lost and (anyway) the main purpose of the Diseases Database is to interact with other web sites. We have full instances of the Diseases Database on laptops but barely ever use them while disconnected from the internet.
The Database behind the pages is replaced with an updated version (new and revised content) roughly once every two weeks

Spam bots

We block the following...

Comment spam, referrer spam and spam sent to our published e-mail addresses is antisocial. However the main problem is the associated bots rip through the entire site while either harvesting or delivering payloads.

Web accelerators / prefetchers

This class of software pre-emptively downloads all links from a page 'in case' you want to look at them http://en.wikipedia.org/wiki/Link_prefetching

The principle of accelerators is arguably bogus: if you have good bandwidth you don't need one; if you have poor bandwidth it wastes it and often wrongly anticipates the user's next action.

Please turn pre-fetchers off and do yourself, us and the web in general a favour. The Diseases Database is engineered to be highly responsive to individual requests. While our pages are small, some have several hundred internal links. Downloading all of them near simultaneously can cause bandwidth saturation.

Anonymous and open proxies

Several anonymous and/or open proxy services are blocked by default.

Please access the Diseases Database site directly or via conventional proxy servers. If your government or employer objects to our content abandon your computer and flee now - you have bigger problems to deal with =:-0

We acknowledge legitimate uses for anonymous proxies. However audits of Diseases Database logs consistently reveal nearly all (of many) accesses from open and / or anonymous proxies (for example Tor) are by illegitimate bots: largely comment or referrer spammers.

Illegitimate and dysfunctional search engine spiders

Few of the major search engine robots (Google, Yahoo, Bing, Ask etc) consume excessive bandwidth.

They employ very moderate page impression rates and/or respect Crawl-delay directives in robots.txt
They obey robots meta tag directions embedded in each of our web pages: these shrink the pool of indexable pages in the Diseases Database from over 100,000 to under 10,000.

Spiders which we consider irrelevant or misbehave are banned politely via the robots.txt file only when their operator maintains a web site explicitly stating the 'name' their robot recognises in robots.txt. Otherwise (or if robots.txt is ignored) blocking mechanisms apply.

Accidental blocks and whitelisting

PCs can be infected by malware. Thank you for ensuring your PC is not host to it.

Otherwise if you are blocked you probably follow in the wake of abuse by another user or machine. Normal browsing activity should seldom trigger defences.

Some language translation services appear to "read ahead" like web accelerators and thus trigger defences. Similarly proxy servers may be configured to pre-fetch (some proxies for mobile phones appear to do this.) However both facilities are frequently hijacked as portals for hacking and malware attacks making it hard to distinguish legitimate requests.

We welcome reports of "false positive" blocks. We have both the means and strong motivation to fix or work around these. picture of e-mail address