[bitfolk] Preventing spidering of web applications

Top Page
Author: Andy Smith
Date:  
To: users
Subject: [bitfolk] Preventing spidering of web applications

Reply to this message
gpg: Signature made Thu Jan 3 06:24:54 2013 UTC
gpg: using DSA key 2099B64CBF15490B
gpg: Good signature from "Andy Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andrew James Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andy Smith (UKUUG) <andy.smith@ukuug.org>" [unknown]
gpg: aka "Andy Smith (BitFolk Ltd.) <andy@bitfolk.com>" [unknown]
gpg: aka "Andy Smith (Linux User Groups UK) <andy@lug.org.uk>" [unknown]
gpg: aka "Andy Smith (Cernio Technology Cooperative) <andy.smith@cernio.com>" [unknown]
Hello,

What's the list's preferred techniques for preventing spidering of a
web application (in this case Mediawiki) by misguided web robots?

robots.txt already in place, but they ignore that of course.

Ideally Apache-based.

I don't particularly care if they are still able to download the
content or not, I just don't want them taking up every single
process slot thus impacting non-abusive 'real' web clients. So a
rate-limiting solution would be acceptable.

Cheers,
Andy