Re: [bitfolk] Preventing spidering of web applications

Top Page

Reply to this message
Author: Simon
Date:  
To: users
Subject: Re: [bitfolk] Preventing spidering of web applications
On 03/01/13 06:24, Andy Smith wrote:
> Hello,
>
> What's the list's preferred techniques for preventing spidering of a
> web application (in this case Mediawiki) by misguided web robots?
>
> robots.txt already in place, but they ignore that of course.
>
> Ideally Apache-based.
>
> I don't particularly care if they are still able to download the
> content or not, I just don't want them taking up every single
> process slot thus impacting non-abusive 'real' web clients. So a
> rate-limiting solution would be acceptable.
>
> Cheers,
> Andy

I have read some great stuff on perishablepress.com about blocking
misguided/bad robots, using apache and mod rewrite if I remember correctly.
--