This page will (hopefully!) tell you everything you need to know to setup Anti-Hammer protection on your web site. It's usually straightforward.
If you need help with any aspect of the seup, I am an away.
Ensure your server is running at least PHP5.1
Unzip the Anti-Hammer packageAnd drop the
anti-hammerdirectory into your site somewhere together, maybe inside
/inc/or something like that, though the root is just fine, too.
Make the Anti-Hammer directories writableIf you run php as a cgi/*suexec, you can probably get off with doing nothing, so long as the directory is owned by your user account.
When running php as an Apache module, the easiest method is probably via ftp, simply set the permissions to world-writable (777). Or else in a shell..
chmod -R 777 /path/to/anti-hammer/lists
chmod -R 777 /path/to/anti-hammer/sessionsNOTE: There is nothing inherently insecure about having a writeable directory, even a world-writeable directory. Anyone who tells you this is, by itself, a security issue on a modern web server, is deluded.
There are dozens of world-writeable dirtectories here at corz.org, and there have been for many years (I even have a public upload facility!). If this was an issue, the onslaught of "hacking" attempts that followed the erroneous mention of corz.org in the Moroccan national press (I'm talking thousands of attempts per day) would have been a total disaster. As it happened, the site did not blink.
Also note: There is no
lists/directory in the FREE version.
Set your Anti-Hammer preferencesThat's inside
anti-hammer.php, in a decent text editor, by which I mean with syntax highlighting, like these are.
Setup php auto_prependAnti-Hammer needs to run as a php "
auto-prepend", so it runs before your pages do. To achieve this magic, add the following command to your site's main (root) .htaccess file..
php_value auto_prepend_file "/full/real/server/path/to/anti-hammer.php"..replacing the path with the actual path, of course.
If php runs as cgi/*suexec/FastCGI on your site (or if the .htaccess method brings up a 500 error!), or you have global control, do this in your site's global/local
.user.inifile in a per-site configuration), instead ..
auto_prepend_file = "/full/real/server/path/to/anti-hammer.php"
If you don't have a
.user.ini), simply create one!
NOTE: You usually need to use the FULL, REAL path on the server*. If you site is in
/var/www/vhosts/mydomain.com/httpdocs/then you need to add ALL that. Run a
phpinfo();command on your site to discover the path to your web site (aka. "
* Some servers won't mind if you use a local path, e.g. "./path/to/anti-hammer.php", but as they say, YMMV.
If that sounds too complex, or you just prefer better, more interesting methods, grab (and use)
debug-report.zip, from here..
auto_prependis in place, before any php file on your site is served to a client (web browser, spider, bot, any client), Anti-Hammer runs, interrogating the client's hammer status, and acting accordingly, either passing control directly back to the requested page, or halting the request in its tracks, with a terse warning
To test all this, simply install Anti-Hammer and load your front page, refresh it repeatedly, over and over like bots do, quickly. Careful now! You will get banned!
Anti-Hammer also comes with a handy hammer-test page you can use to check everything is working as expected.
(allowing certain known clients special privileges)
The big advantage of preventing bots (and people!) from clobbering your website and overloading your server, is that you have more resources freed up for valid clients..
If you want, you can choose to allow certain clients (usually known friendly spiders and bots) to bypass Anti-Hammer altogether, or alternatively, hammer at a faster rate. If you do, you will be utilizing
exemptions.ini, which lives in the
exemptions/directory (along with the IP lists), is a standard plain text
.inifile containing a list of pairs of known User Agent strings and the text file in which to find their IP/Mask information.
Here's a slightly chopped-down example version..
[exemptions] Mozilla/5.0 (compatible; Googlebot=google.txt Googlebot=google.txt gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA=google.txt msnbot=msn.txt MSNBOT=msn.txt Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search=msn.txt Scooter/3.3Y!CrawlX=altavista.txt Scooter=inktomi.txt Yahoo=inktomi.txt slurp=inktomi.txt Excite=excite.txt Infoseek=infoseek.txt Lycos_Spider=lycos.txt NorthernLight=northernlight.txt Mozilla/2.0 (compatible; Ask=askjeeves.txt teoma_agent1=askjeeves.txt
On the left (of the "=" sign), is the expected User Agent string. This can be a partial match, but it must match from the very first character of the client's user agent string. Ideally, you want to roll as many variations as possible into a single line, without being so generic as to pull in every client under the Sun and create needless processing overhead (certain Yahoo! and msn bots post only "Mozilla/4.0", for example. They can meet the Anti-Hammer like everyone else!), but still retain enough information to positively identify a particular client.
For example, the string "Yahoo" will match all the following bots:
Yahoo-Blogs/v3.9 (compatible; Mozilla 4.0; MSIE 5.5; http://help.yahoo.com/help/us/ysearch/crawling/crawling-02.html )
Yahoo-MMAudVid/1.0 (mms dash mmaudvidcrawler dash support at yahoo dash inc dot com)
Yahoo-MMCrawler/3.x (mms dash mmcrawler dash support at yahoo dash inc dot com)
YahooFeedSeeker/1.0 (compatible; Mozilla 4.0; MSIE 5.5; my.yahoo.com/s/publishers.html)
YahooSeeker-Testing/v3.9 (compatible; Mozilla 4.0; MSIE 5.5; http://search.yahoo.com/)
YahooSeeker/1.1 (compatible; Mozilla 4.0; MSIE 5.5; http://help.yahoo.com/help/us/shop/merchant/)
YahooSeeker/1.2 (compatible; Mozilla 4.0; MSIE 5.5; yahooseeker at yahoo-inc dot com ; http://help.yahoo.com/help/us/shop/merchant/)
YahooSeeker/CafeKelsa-dev (compatible; Konqueror/3.2; FreeBSD ;email@example.com ) (KHTML, like Gecko)
Similarly, many Googlebots are matched against the simple word, "Googlebot". If your user agent string is a tad generic, and matches against a client that isn't the expected bot, it's not a problem; Anti-Hammer won't find them in the specified IP list and continues as normal. It's designed this way to catch clients pretending to be known bots, of which there are a surprising number.
NOTE: User agent strings are checked in order, and ini file processing halts as soon as a match is found. Note the two "Scooter" entries; if the Yahoo! version was before the AltaVista version, the AltaVista bot would never be allowed an exemption, as Anti-Hammer would always be looking inside
inktomi.txtfor its IP information.
NOTE: Matches are CaSe SeNsITiVE! If you want to match "msnbot" and "MSNBOT", you need two entries. Why? Because in tests, a case-insensitive match is at least three times slower than a Case Sensitive match. So make a second entry!
On the right, is the text file to look at for IP Mask information; where the specified user agent is expected to be making requests FROM. It's the standard Spider IP list format, one IP/Mask per line, as found here..
I've included the most recent lists in the Anti-Hammer zip package (and have started to add to and improve them with updated information), in place and ready-to-go, along with an
exemptions.inifile already setup to handle the major friendly spiders.
Remember, you don't need to add all the bots, or even any bots; only bots, spiders, and other clients that you wish to give special privileges to. Even they shouldn't be hammering, really!
If you wish to set a special rate for known clients, rather than allow them to simply bypass Anti-Hammer, all you do is switch the "true" in your
allow_botspreference (which can be considered "infinite
hammer_time"), for a integer (aka. plain number) representing 1/100th Second, just like the regular
$anti_hammer['allow_bots'] = 50;
A value of
50would enable two-hits-per-second spidering, but nothing faster, which is half the normal
hammer_timeof one second (
$anti_hammer['hammer_time'] = 100;).
Effectively we have two available hammer rates; one for known good clients, and one for everyone else.
While I'm here I should add, there's also the facility to enable one correctly configured browser to bypass Anti-Hammer at all times. This is designed for busy webmasters who sometimes, in the course of their daily activities, will need to hammer their own site. I know I do!
This, setting ("
admin_agent_string"), along with many other settings, can be found in the preferences section inside
anti-hammer.php. Essentially, you tag a unique string onto the end of your browser's User Agent string (perhaps with user-agent-switcher), so that Anti-Hammer can recognize you as you. It's not high-security, but it is handy. I've used a similar approach to avoid logging my own hits for years.
Not requiring that the client send back the ID, potentially has one undesirable side-effect..
If two clients share the same IP (perhaps a proxy) and are using a perfecty identical browsers (in every way, down to the user's locale), and are browsing your site at the exact same time, and view a page within one second of each other (or whatever you set the
hammer_timeto), it is possible that they may unwittingly increment each other's hammer count!
Clearly this would be a rare situation, but still, good to know.
Upgrading from Free to Pro
If you are using a recent version of Anti-Hammer FREE (0.9.3+), it's a simple drop-in replacement.
You will need to copy over your preferences from the old version, which should only take a minute or two.
If you are using an older version of Anti-Hammer FREE, you will need to check your sessions path preference, to ensure it is pointing to the correct directory. Everything else should work as expected (once you copy over your prefs).
If you have a question, feel free to leave a comment, below. I don't expect it to get too busy; Anti-Hammer usually just works. If you think you have found a bug, please mail me about it, with full details, preferably attaching your script to thte mail. Thanks!