Preventing spidering of web applications - BitFolk Users

3 Jan 2013

Hello,

What's the list's preferred techniques for preventing spidering of a
web application (in this case Mediawiki) by misguided web robots?

robots.txt already in place, but they ignore that of course.

Ideally Apache-based.

I don't particularly care if they are still able to download the
content or not, I just don't want them taking up every single
process slot thus impacting non-abusive 'real' web clients. So a
rate-limiting solution would be acceptable.

Cheers,
Andy