<?php // ۞// text { encoding:utf-8 ; bom:no ; linebreaks:unix ; tabs:4sp ; }
										$clever_404['version'] = '1.9.15';
/*

	NOTE: Clever 404 has been deprecated (though still works great!).
	For more power, and a single-file solution, check out Active Error Pages:

		http://corz.org.uk/server/tools/active-errors/


	Clever 404

	The corz.org intelligent php 404 error handler

	404 pages are more important than we generally think. Most visitors, when
	presented with a broken link, will just go elsewhere. Bye! A decent 404 page
	will give you a "second-shot", keep them hanging around, and shows you care,
	which many webmasters don't. Hit a 404 at corz.org for a wee demo.


	what it does..

		As well as the usual valid link to your site root and email address that
		folks can click to give feedback, 404.php provides an intelligent
		response to your site's missing pages..

		First 404.php does automatic redirection for all your moved pages.
		Whenever you move a page, simply add it to 404.php's "catcher" list, and
		have all visitors and search engines automatically and permenantly
		redirected, without any fuss or .htaccess hacking.

		For genuinely missing documents, 404.php goes on to do a (very*) quick
		scan your site, looking for similar items, and returns a list of any
		matching files, as links. 404 is capable of "fuzzy" matching, so will
		usually catch typos in hand-inputted URL's.

			* usually 0.01 seconds or less.

		If there is only a single matching document, 404.php can (optionally)
		jump the visitor directly to that document, nifty. This auto-jumping can
		be either by meta-refresh, or proper 302/301 http headers. The latter is
		preferable; though 404 does look cool, its purpose is to get users to
		the correct page ASAP, if at all possible..

			http://corz.org/physical

		Finally, if no matches are found, 404.php (optionally) presents the user
		with a corzoogle search form, the important part of their query already
		inserted into the search field, enabling them to perform a full content
		search of your web site.

		TADA!


	To use..

	i.
		Edit the preferences section (inside error-settings.php) with your own
		details, name, etc..

	ii.
		Direct your site's 404 errors to this script..

		This is achieved by editing either your master httpd.conf or
		main .htaccess file, inserting a line something like..

			ErrorDocument 400 /err/400.php
			ErrorDocument 401 /err/401.php
			ErrorDocument 403 /err/403.php
			ErrorDocument 404 /err/404.php
			ErrorDocument 410 /err/410.php
			ErrorDocument 500 /err/500.php
			ErrorDocument 503 /err/503.php

		..which would direct all 404 errors to.. http://yoursite/err/404.php
		which is where I happen to keep *my* copy of this script, that is,
		inside a folder called "err" in the top level of your site.

		You will have noticed 404 comes with matching 403 page, 401 and so on.
		Use what you want/need.

		For more information on .htaccess files, see here..

			http://corz.org/serv/tricks/htaccess.php

		I'll likely include an example .htaccess inside the zip distribution

		On some web hosts you can chose your error pages through the CP (control
		panel) or site admin page. as a last resort, ask your website
		hosts/sysadmin how to achieve this.

	iii.
		The optional corzoogle search form (for *very* missing documents) will
		obviously need to have corzoogle installed somewhere on your site to be
		of any use. Clearly 404.php is best instructed to use the corzoogle
		search engine in the 'root', or 'top level' of your site. you can get
		corzoogle here..

			http://corz.org/corzoogle/download.php

	iv.
		The "catchers" part does automatic redirection of your moved pages.

		The idea is, when you move a page, for whatever reason, you add it to
		the catchers list. 404.php will then permanently (301) direct visitors
		straight to that page, bypassing the 404 altogether, they won't even
		realize they *got* a 404!

		As well as catching your real moved pages, this can be useful for
		catching known mis-spelt inward links (forums!), hot-linkers, and more.

		See inside "moved.ini" for the actual catchers list. Basically a list
		of old="new" entries, something like this..

weegary.php="/fun/weegary.php"
/pj_demo="/serv/security/demo/"

		etc.

	v.
		Lastly, if you want the spam-bot-foiling email address mashing function
		to work, you will need to include the mail-mash function somewhere. if
		you download the zip of this script, you'll find I have thoughtfully
		done this for you. if you move it somewhere, you'll need to edit the
		location in the preferences, below.


	That's it!
	You keep your lost visitors now!

	;o) Cor

	(c) corz.org 2004 -> tomorrow!

*/


// grab settings/prefs..
include ('error-settings.php');


if (substr($_SERVER['REQUEST_URI'], -1) == "?") {
	die("improper request for non-existent page");
}

// first we do any "catchers", for pages that we have moved/redirected
// gotta do it first, we are sending http "headers"
// using output buffering on a 404 just *feels* wrong, somehow. :/
while (list($old_page, $new_page) = each($clever_404['catchers'])) {
	if (stristr($_SERVER['REQUEST_URI'], $old_page)) {

		// wait for x seconds..
		usleep($clever_404['time_to_jump'] * 1000000);

		if ($clever_404['redirect_testing']) {
			header("HTTP/1.1 302 Temporary Redirect");
		} else {
			header("HTTP/1.1 301 Moved Permanently");
		}
		header('Location: http://'.$clever_404['domain'].$new_page);
		die();
	}
}

// ok, we got a real 404 here.
// probably..


/*
	let's search for the document..   */

// init..
$level = 0;
$count = 0;
$links_array = array();
$full_name = '';
$meta_refresh = '';
$no_scan = false;

// transform scan_path into an array..
$clever_404['scan_path'] = array($clever_404['scan_path']);


// grab the filename parts of the URL string, to be used later..
$insert = rawurldecode(substr($_SERVER['REQUEST_URI'], (strrpos($_SERVER['REQUEST_URI'], '/')+1)));
if ($insert == '') $insert = basename($_SERVER['REQUEST_URI']);
if (strlen($insert) > 255)  $insert = substr($insert, 0, 255); // for levenshtein (i.e. some joker is having a laugh!)
$insert_no_ext = substr($insert, 0, strrpos($insert, '.'));
if ($insert_no_ext == '') $insert_no_ext = $insert; // folders, etc


// attempt a scan-lock, and begin the scan..
if(scan_lock($clever_404['lock_file'])) {
	scan_site();
	scan_unlock();
} else {
	$no_scan = true;
	$clever_404['message_found_NO_matches'] = $clever_404['still_scanning'];
}


// jump on single hit right now?
if (($count == 1) and ($clever_404['jump_on_single_hit'])) {

	switch (true) {

		case $clever_404['jump_method'] == '301':
			sleep($clever_404['time_to_jump']);
			header("HTTP/1.1 301 Moved Permanently");
			header("Location: $full_name");
			die();

		// don't use 307 unless you know what you are doing (passes POST variables onward, and many entities don't GET it!)
		case $clever_404['jump_method'] == '307' or $clever_404['jump_method'] == '302':
			sleep($clever_404['time_to_jump']);
			header('HTTP/1.1 '.$clever_404['jump_method'].' Temporary Redirect');
			header("Location: $full_name");
			die();

		case 'meta':
			$meta_refresh = '<meta http-equiv="refresh" content="'.round($clever_404['time_to_jump'], 0).';URL='.$full_name.'">';
	}
}


/*
	Begin Page..
					*/

if (!$clever_404['embedded']) {
	begin_header();
	echo '
<title>another beautifully caught "page not found" by the 404.php, the intelligent error handler v',$clever_404['version'],'..</title>
<meta name="description" content="',$clever_404['domain'],' 404 page.. intelligent 404 handling with seek-and-return. The non-existent file file" />
<meta name="keywords" content="404,php,404 error,error handler,auto-scan,auto-find,source code available at corz.org" />';
	finish_header();
}
  // you may want to put your header here
echo '
<!--beautifully caught by 404, the non-existent file file, from corz.org-->
<div class="content-wide">
	<div class="two-column">

		<div class="left-column">
			<h1>',$clever_404['message_404'],'</h1>
			If you\'re certain that a page <em>should</em>&nbsp;&nbsp;be here, please <a href="',$clever_404['email_address'],'?subject=404%20-%20',rawurlencode($_SERVER['REQUEST_URI']),'" title="your valuable feedback is appreciated. thanks">tell ',$clever_404['webmaster'],'</a> about it. Alternatively, click <a href="/"
			title="up to the site root">here</a> for some real links.
		</div>

		<div class="right-column">
			<div class="error">404</div>
		</div>

	</div>
	<div class="clear">&nbsp;</div>';

if ($meta_refresh) echo $meta_refresh;

do_result('out');

if ($links_page != '') {
	echo '
	<h2 id="found_matches">',$clever_404['message_found_matches'],'</h2>
		',$links_page,'
		<div class="tiny-space">&nbsp;</div>';
	if ($clever_404['corzoogle_always'] == true and !empty($clever_404['cz_location'])) corzoogle_box();
} else {
	echo '
	<div id="found_NO_matches">
		<h2>',$clever_404['message_found_NO_matches'],'</h2>
	</div>';
	if (!$no_scan and $clever_404['cz_location']) corzoogle_box();
}
echo'
<div class="half-space">&nbsp;</div>
</div>';


end_error_page();




// show the corzoogle search form..
function corzoogle_box() {
global $clever_404, $insert_no_ext;
	$insert_no_ext = strip_stuff(urldecode($insert_no_ext));
	echo '
<h4>',$clever_404['message_do_a_search'],'</h4>

<div class="centered">
	<a href="http://corz.org/corzoogle/" onclick="window.open(this.href); return false;" title="corzoogle locates! (opens in a new window - Apple|Ctrl|whatever-click for a new tab instead)">
	<img src="',$clever_404['cz_img_location'],'" alt="corzoogle locates!" /></a><br />
	<br />
	<form method="get" action="',$clever_404['cz_location'],'">
	<div class="form">
		<input type="text" name="q" size="21" maxlength="256" value="',stripslashes($insert_no_ext),'" />
		&nbsp;
		<input type="submit" value="do it!" />
	</div>
	</form>
	<div class="small-space">&nbsp;</div>
</div>';
}


// attempt to achieve a scan lock.
// return true if successful..
function scan_lock($lock_file) {

	clearstatcache();
	//$lock_age = @filectime($lock_file);
	// check existence of lock file..
	if (file_exists($lock_file)) {
		$lock_age = filectime($lock_file);

		// if exists, check date/time
		if ((time() - filectime($lock_file)) > 60) {
			// if older than one minute, delete it..
			// (something bad must have happened elsewhere)
			unlink($lock_file);
		} else {
			return false;
		}
	}

	// set lock file..
	$fp = fopen($lock_file, 'wb');
	if (is_writable($lock_file)) {
		if ($fp) {
			$GLOBALS['locked'] = flock($fp, LOCK_EX);
			if ($GLOBALS['locked']) {
				// clearer than fputs, same function..
				fwrite($fp, '1');		// could put their IP in here. hmm. perhaps a lock "folder" one lock for each IP, or 1 file per IP
				//flock ($fp, LOCK_UN);	// but then system /tmp/ may not allow folder creation. hmm.
			}
			fclose($fp); // this releases the file lock!
		}
	}

	// if all is well, return success..
	if (file_exists($lock_file)) {
		return true;
	} else {
		return false;
	}
}


/*
function:scan_site()
for more comments, see corzoogle.php  spider() */
function scan_site() {
global $clever_404, $insert, $insert_no_ext, $level;


	if (!$clever_404['exact_match']) $insert = $insert_no_ext;
	for ($search=0,$search_path=''; $search <= $level; $search++) {
		$search_path .= $clever_404['scan_path'][$search];
		$search_path = str_replace($clever_404['ignore_folders'], '', $search_path);
	}

	$dirhandle = opendir($search_path);
	while ($file = readdir($dirhandle)) {

		if ($file{0} != '.') {

			if (is_file($search_path.$file)) {
				$fext = substr($file,strrpos($file,'.'));
				$itsname = basename($file);
				$short_name = substr($itsname, 0, 0 - strlen($fext));

				if (($clever_404['partial_match']) and (in_array($fext, $clever_404['allowed_extensions']) and (@stristr($file, $insert)))) {
						do_result($search_path.$file);

				} elseif ($clever_404['fuzzy_match']) {
					if (in_array($fext, $clever_404['allowed_extensions'])
						// first we test if a single change gives a match
						and (similar_text($short_name, $insert) == strlen($short_name)-1)
							// and test that it's a single replacement..
							and levenshtein($short_name, $insert) <= $clever_404['fuzziness_level']) {
							// using two tests allows us to match for dodgy, non-letter
							// characters and makes things more accurate.
						do_result($search_path.$file);
					}
				} else {

					// non-fuzzy or partial match..
					if (in_array($fext, $clever_404['allowed_extensions']) and (@stristr($itsname, $insert))) {
						do_result($search_path.$file);
					}
				}
			} elseif (is_dir($search_path.$file)) {
				if ($clever_404['match_dirs'] and (!in_array($search_path.$file, $clever_404['ignore_folders'])) and @stristr($search_path.$file, $insert)) do_result($search_path.$file);
				$clever_404['scan_path'][++$level] = ($file.'/');
				scan_site();
				$level--;
			}
		}
	}
}/*	end function:scan_site()
*/



function scan_unlock() {			// Don't lock, so that we can read it later for time info!!!!! r8? :/
global $clever_404;

	// unlock the lock file..
	if ($GLOBALS['locked']) { @flock($fp, LOCK_UN); }
	// delete lock file
	$deleted = @unlink($clever_404['lock_file']); // @ in case (and this has happened) the system cleaned up the lock file during the scan
						// the irony is, it manages this because, once written, we don't actually "lock" the lock file!
}						//  this is by design.		Actually, I'm re-thinking this, testing lock hold 7-12-08



/*
function do_result()	*/
function do_result($file) {
global $clever_404, $count, $full_name, $links_page, $links_array;

	if ($file == 'out') {
		// output the page
		foreach($links_array as $link) {
			$links_page .= $link;
		}
	} else {
		$count++;
		$display_name = $title = basename($file);
		$full_name = str_replace($clever_404['scan_path']{0},'http://'.$clever_404['domain'].'/',$file);
		if ($clever_404['links_are_full']) { $display_name = $full_name; }
		array_push($links_array, '<a href="'.$full_name.'" title="'.$display_name.'">'.$display_name."</a><br />\n");
	}
}/*	end function do_result()
*/


/*
function strip_stuff() 	*/
function strip_stuff($string) {

	$nonos = array('.','..',' .'.'. ',',',';','[',']','*','~','#','&','?','$','%','+','=','»','«');
	$stripped = str_replace($nonos, ' ', $string);	// remove undesirables

	return trim($stripped);
}/*
end function strip_stuffing() 	*/




/*
	changes..

	I thought I might start keeping changes under the scripts themselves.
	it doesn't cost us anything. php will ignore this.


		1.9.15

		*	central config file: error-settings.php

		*	removed some left-over branding

		*	Added matching 400, 410 and 503 pages


		1.9.11

		*	In the event of the site scan turning up a single match, 404 can now
			redirect with a proper 301 header, just like the catchers. Most
			users wouldn't even realize they got a 404. This basically gives you
			automatic 301 permanent redirects for any pages you move. keep the
			users and spiders happy!

		*	You now can specify the catchers auto-jump method, '301', or
			old-school meta-refresh, in the preferences.

		*	Added scan locking. When 404 is scanning the site, it will place a
			temporary lock file, to prevent crazy bots and site abusers from
			running multiple file scans at once, and potentially stressing the
			server, chewing up resources.

			404 will still display, but with a message telling the user to wait
			a moment before trying again, rather than the usual search results.
			Most folk will never see this in action, but it's good to know it's
			there, preventing potential mishaps.

		*	You can now choose to have 404 return matches for directories.
			so if the user was looking for the non-existent/foo/hell they could
			get back results for /bar/shell scripts/

		*	Fixed the slashes in the corzoogle input (for '' quotes).


		1.8

		*	fixed the corzoogle image location, and some other stuff.

		*	Cleaned up distro prefs.

		*	Improved layout, now uses a nice container like my regular pages


		1.7

		*	incorporated partial matching and fuzzy matching; produces great
			results.

		*	cleaned up some xhtml output


		1.6.5

		*	Added some fuzzy matching for the file scan. A sorta request.

		*	this is a highly specialized tweak, but works great as per request.
			you can play around with things to get different results, but as it
			stands, g-dip will match g_dip.jpg, and in my own mirror,
			tempz_piles will match tempx_piles.jpg, etc. This can be
			enabled/disabled from a preference called $clever_404['fuzzy_match'].


		1.6.2-1.6.4

		*	just minor things.


		1.6.2:

		*	Fixed some potential bugs in initialisation.


		1.6.1:

		*	XHTML 1.0 Strict compliance. Nice.


		1.6:

		*	404 will now strip characters from the input string for entry into
			the corzoogle search box. for instance, a 404 for mama.mia.php will
			now enter "mama mia" into the search box, instead of "mama.mia"
			which would likely produce a lot less hits. corzoogle, of course,
			takes the dot into account

			Added some information to the readme up top, including important
			notes about editing the redirections. I discovered this the hard
			way.



		:2do..

			lost songs
			redirect lost *.mp3 (or whatever) to a special page
			like the /audio/ root.

*/
?>