No one's getting past..

php document icon/logo @ _256 px the 'page' is only a shadow. poetic, huh. Anti-Hammer!

Automatically ban web site hammers! Protect your valuable server resources for genuine clients.

Anti-Hammer is a php script that runs before your pages do, watching. As requests arrive, Anti-Hammer checks how long it's been since that client's last  request. If a reasonable amount of time has passed, the page is served as usual. But if not, their "Hammer Count" is increased. Oh oh!

When the hammer count reaches preset levels, their hammering is suspended, and instead of the page, they get a cute message (read: warning), and must wait X amount of seconds before trying again.

The more they hammer, the longer they have to wait, incrementally. Simple.

You can even set an absolute cut-off point, beyond which they simply get a blank page, nothing, until their ban lifts (hours later).

Everything is configurable.
 

No Way Around Anti-Hammer!

Anti-Hammer uses its own php-session-like client tracking mechanism..

This works very like php sessions, except it works for ALL clients, regardless of their advertised capabilities, and regardless of whether or not they have cookies enabled. Oh yes! You can even Anti-Hammer the GoogleBot! Not that you would want or need to; it's a rather well-behaved bot.

Rather than wait for some session id to come back (that would be on the second request, you see, and we haven't even sent one yet), Anti-Hammer uses a mix of available client properties to create a unique client id there-and-then, and from that point, recognizes the client by this id (which is an MD5 of all that data concatenated together). It's pretty similar to the way a php_session is created, except Anti-Hammer doesn't need the browser to send it back.

Anti-Hammer's storage mechanism (a serialized array in a flat file) is the same as a php session, too. And like a php session, it is anonymous; aside from the hammer time info, we store no other data server-side.

Unless you want that..

Anti-Hammer also comes with a mechanism to allow  certain bots and other friendly spidering entities (matching specific criteria, including a known IP address/range), usually search engine spiders, to pass clean through Anti-Hammer, if required, or alternatively, allow them a faster hammer rate.

Did I mention everything is configurable?
 

Quick-Start Guide:

 

Unzip the Anti-Hammer package..

And drop anti-hammer.php and the anti-hammer directory into your site somewhere together, maybe inside /includes/ or /inc/ or something like that.
 

Make the anti-hammer/ directory writable..

If you run php as a cgi/*suexec, you can probably get off with doing nothing, so long as the directory is owned by your user account. For everyone else, the easiest method is probably via ftp, simply set all its permissions to word-writable (777). Or else in a shell..
chmod 777 /path/to/anti-hammer
 

Set your Anti-Hammer preferences..

That's inside anti-hammer.php, in a decent text editor, by which I mean with syntax highlighting, like these are.
 

Setup php auto_prepend..

Anti-Hammer needs to run as a php "auto-prepend", so it runs before your pages do. To achieve this magic, add the following command to your site's main (root) .htaccess file..

php_value auto_prepend_file "/full/real/server/path/to/anti-hammer.php"

..replacing the path with the actual path, of course. If php runs as cgi/*suexec on your site, or you have global control, do this in your site's global/local php.ini, instead ..

auto_prepend_file = "/full/real/server/path/to/anti-hammer.php"

NOTE: You need to use the FULL, REAL path on the server. If you site is in /var/www/vhosts/mydomain.com/httpdocs/ then you need to add ALL that. Run a phpinfo(); command on your site to find the real path ("DOCUMENT_ROOT").

If that sounds too complex, or you just prefer better, more interesting methods, grab (and use) debug-report.zip, from here..

 

You're done!

Once the auto_prepend is in place, before any php file on your site is served to a client (web browser, spider, bot, any client), Anti-Hammer runs, interrogating the client's hammer status, and acting accordingly, either passing control directly back to the requested page, or halting the request in its tracks, with a terse warning.

To test all this, simply install Anti-Hammer and load your front page, refresh it repeatedly, over and over like bots do, quickly. Careful now! You will get banned!

If you really must, you can test it here at corz.org (yes, of course it's running here!), preferably some low content page, like this.
 

exemptions.ini
(allowing certain known clients special privileges)

The big advantage of preventing bots (and people!) from clobbering your website and overloading your server, is that you have more resources freed up for valid clients..

If you want, you can choose to allow certain clients (usually known friendly spiders and bots) to bypass Anti-Hammer altogether, or alternatively, hammer at a faster rate. If you do, you will be utilizing exemptions.ini.

exemptions.ini, which lives in the exemptions/ directory (along with the IP lists), is a standard plain text .ini file containing a list of pairs of known User Agent strings and the text file in which to find their IP/Mask information.

Here's a slightly chopped-down example version..
 
	[exemptions]

	Mozilla/5.0 (compatible; Googlebot=google.txt
	Googlebot=google.txt
	gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA=google.txt

	msnbot=msn.txt
	MSNBOT=msn.txt
	Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search=msn.txt

	Scooter/3.3Y!CrawlX=altavista.txt

	Scooter=inktomi.txt
	Yahoo=inktomi.txt
	slurp=inktomi.txt

	Excite=excite.txt
	Infoseek=infoseek.txt

	Lycos_Spider=lycos.txt

	NorthernLight=northernlight.txt

	Mozilla/2.0 (compatible; Ask=askjeeves.txt
	teoma_agent1=askjeeves.txt
 

How exemptions.ini works..

On the left  (of the "=" sign), is the expected User Agent string. This can be a partial match, but it must match from the very first character of the client's user agent string. Ideally, you want to roll as many variations as possible into a single line, without being so generic as to pull in every client under the Sun and create needless processing overhead (certain Yahoo! and msn bots post only "Mozilla/4.0", for example. They can meet the Anti-Hammer like everyone else!), but still retain enough information to positively identify a particular client.

For example, the string "Yahoo" will match all  the following bots:
Yahoo! Mindset
Yahoo-Blogs/v3.9 (compatible; Mozilla 4.0; MSIE 5.5; http://help.yahoo.com/help/us/ysearch/crawling/crawling-02.html )
Yahoo-MMAudVid/1.0 (mms dash mmaudvidcrawler dash support at yahoo dash inc dot com)
Yahoo-MMCrawler/3.x (mms dash mmcrawler dash support at yahoo dash inc dot com)
YahooFeedSeeker/1.0 (compatible; Mozilla 4.0; MSIE 5.5; my.yahoo.com/s/publishers.html)
YahooSeeker-Testing/v3.9 (compatible; Mozilla 4.0; MSIE 5.5; http://search.yahoo.com/)
YahooSeeker/1.1 (compatible; Mozilla 4.0; MSIE 5.5; http://help.yahoo.com/help/us/shop/merchant/)
YahooSeeker/1.2 (compatible; Mozilla 4.0; MSIE 5.5; yahooseeker at yahoo-inc dot com ; http://help.yahoo.com/help/us/shop/merchant/)
YahooSeeker/CafeKelsa-dev (compatible; Konqueror/3.2; FreeBSD ;cafekelsa-dev-webmaster@yahoo-inc.com ) (KHTML, like Gecko)
YahooVideoSearch www.yahoo.com/
YahooYSMcm/2.0.0
 
Similarly, many Googlebots are matched against the simple word, "Googlebot". If your user agent string is a tad generic, and matches against a client that isn't  the expected bot, it's not a problem; Anti-Hammer won't find them in the specified IP list and continues as normal. It's designed this way to catch clients pretending to be known bots, of which there are a surprising number.

NOTE: User agent strings are checked in order, and ini file processing halts as soon as a match is found. Note the two "Scooter" entries; if the Yahoo! version was before the AltaVista version, the AltaVista bot would never be allowed an exemption, as Anti-Hammer would always be looking inside inktomi.txt for its IP information.

NOTE: Matches are CaSe SeNsITiVE! If you want to match "msnbot" and "MSNBOT", you need two entries. Why? Because in tests, a case-insensitive match is at least three times slower than a Case Sensitive match. So make a second entry!
 
On the right, is the text file to look at for IP Mask information; where the specified user agent is expected to be making requests FROM. It's the standard Spider IP list format, one IP/Mask per line, as found here..
http://www.iplists.com/
http://www.iplists.com/nw/ <- updated, reorganised, with msnbot & more.

A blog URI is listed on that page, where updates are posted (maybe two or three times a year).
I've included the most recent lists in the Anti-Hammer zip package (and have started to add to and improve them with updated information), in place and ready-to-go, along with an exemptions.ini file already setup to handle the major friendly spiders.

Remember, you don't need to add all the bots, or even any bots; only bots, spiders, and other clients that you wish to give special privileges to. Even they shouldn't be hammering, really!

If you wish to set a special rate for known clients, rather than allow them to simply bypass Anti-Hammer, all you do is switch the "true" in your allow_bots preference (which can be considered "infinite hammer_time"), for a integer (aka. plain number) representing 1/100th Second, just like the regular hammer_time preference, e.g..

$anti_hammer['allow_bots'] = 50;

A value of 50 would enable two-hits-per-second spidering, but nothing faster, which is half the normal hammer_time of one second ($anti_hammer['hammer_time'] = 100;).

Effectively we have two available hammer rates; one for known good clients, and one for everyone else.
 

I, Admin.

While I'm here I should add, there's also the facility to enable one correctly configured browser to bypass Anti-Hammer at all times. This is designed for busy webmasters who sometimes, in the course of their daily activities, will need to hammer their own site. I know I do!

This, setting ("admin_agent_string"), along with many other settings, can be found in the preferences section inside anti-hammer.php. Essentially, you tag a unique string onto the end of your browser's User Agent string, so that Anti-Hammer can recognize you as you. It's not high-security, but it is handy. I've used a similar approach to avoid loggin my own site hits for years.
 

Caveats:

One-Way Sessions..

Not requiring that the client send back the ID, potentially has one undesirable side-effect..

If two clients share the same IP (perhaps a proxy) and are using an identical browser (in every way, down to the user's locale), and are browsing your site at the exact same time, and view a page within one second of each other (or whatever you set the hammer_time to), it is possible that they may unwittingly increment each other's hammer count!

Clearly this would be a rare situation, but still, good to know.

 

Source Code & Download..


You can view the php source code here..


image of php document icon, transparent
 

And download a ready-to-go zip package, right here..


image of php zip package (basically a cardboard box with some writing on it, to let you know it;s php in there
 

Thank You!

If you want to show your appreciation, you can do that here..

 

Bye Now!

If you have any problems at all, installing or using Anti-Hammer, PLEASE DO leave a comment below, or contact me some other way, let me know about it, so I can fix it, ta.

Have fun!

;o) Cor

 
 
cbparser powered comments..

cor - 30.10.09 5:29 pm

Hopefully this marks the beginning of a new trend; each of my "wee scripts" deserves a page of its own, with usage instructions, comments and all that. So here we are.

Have fun!

;o) Cor


astro - 05.12.09 12:56 am

I am attempting to install this, but whenI add the php_value line to my htaccess, I get 500 errors on my site
i have set permissions for hte php file, and have verified the path info, does something need to be set in the server PHP.ini file? I would need to contact the host and ask them to set that if so.


cor - 05.12.09 1:33 am

astro, it sounds like your server runs some kind of php suexec (where php runs as a cgi). If that's so, you would need to add the directive into a local or global php.ini file, instead of .htaccess.

The format is slightly different, see any php.ini file for specifics. This devblog entry explains the difference between regular and cgi flavours of php, and demonstrates how to add your php_value type statements with full examples.

;o) Cor


Laurent - 14.12.09 5:23 pm

Sorry, I normally speak French,

Thank you for the valuable advice, read over your website: the more I find interesting.
I am still a child before rewriting in PHP.
May I thank you very much all your explanations!

Laurent - Geneva - Switzerland


Don - 22.12.09 5:46 pm

Your website was recommended over on the forums at phpfreaks.com and I've alreayd bookmarked a half dozen pages. Awesome stuff here!!


Miauw - 16.01.10 5:25 pm

Can't get it to work. No error is shown, it just doesn't work; I can hold refresh and nothing happens. Rechecked the paths 100 times and seems to be all correct.


Katica - 20.01.10 10:10 am

Your site is very interesting. Tried out anti-hammer, but get the same problem as Miauw. Checked the path 100 times. I've created the folder for the log, made it writtable, no log is created. Tried to rewrite anti-hammer.php as it just output a sentence, nothing happenned, seems that auto-pretend is not working at all.
Any idea or advice? Thanks is advance.


Matt Lewandowsky - 30.01.10 12:28 am

This script looks interesting, but before I even think about trying it, I'm curious what its impact may be on a fancy, heavy "Web 2.0" site which can potentially have a few hundred objects on a single page. Normal users in such a case can easily end up requesting a few hundred objects every few seconds, if someone keeps clicking (for example) "Next Page" and their browser's cache is somehow broken, causing every image to be re-requested.

Also, have you tested this method with non-Apache servers, particularly those which use FastCGI PHP? I've got sites running with PHP-FPM, so I'd be curious to know if you've actually tried it with a custom per-instance php.ini.


 

leave a comment, become part of this site!


First, confirm that you are human by entering the code you see..

(if you find it difficult to read, refresh the page for a new code)


gd verification image