Re: [bitfolk] Preventing spidering of web applications

3 Jan 2013

On 03/01/13 06:24, Andy Smith wrote:
...
  Hello,

 What's the list's preferred techniques for preventing spidering of a
 web application (in this case Mediawiki) by misguided web robots?

 robots.txt already in place, but they ignore that of course.

 Ideally Apache-based.

 I don't particularly care if they are still able to download the
 content or not, I just don't want them taking up every single
 process slot thus impacting non-abusive 'real' web clients. So a
 rate-limiting solution would be acceptable.

 Cheers,
 Andy 
I have read some great stuff on perishablepress.com about blocking 
misguided/bad robots, using apache and mod rewrite if I remember correctly.
-- 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Re: [bitfolk] Preventing spidering of web applications