corz.org uses cookies to remember that you've seen this notice explaining that corz.org uses cookies, okay!
<?php // ۞// text { encoding:utf-8 ; bom:no ; linebreaks:unix ; tabs:4sp ; }
$clever_404['version'] = '1.9.15';
// direct access -> demo..
//if (realpath($_SERVER['SCRIPT_FILENAME']) == realpath(__FILE__)) { sleep(2); header("Location: /check"); }
/*
NOTE: Clever 404 has been deprecated (though still works great!).
For more power, and a single-file solution, check out Active Error Pages:
https://corz.org/server/tools/active-errors/
Clever 404
The corz.org intelligent php 404 error handler
404 pages are more important than we generally think. Most visitors, when
presented with a broken link, will just go elsewhere. Bye! A decent 404 page
will give you a "second-shot", keep them hanging around, and shows you care,
which many webmasters don't. Hit a 404 at corz.org for a wee demo.
what it does..
As well as the usual valid link to your site root and email address that
folks can click to give feedback, 404.php provides an intelligent
response to your site's missing pages..
First 404.php does automatic redirection for all your moved pages.
Whenever you move a page, simply add it to 404.php's "catcher" list, and
have all visitors and search engines automatically and permenantly
redirected, without any fuss or .htaccess hacking.
For genuinely missing documents, 404.php goes on to do a (very*) quick
scan your site, looking for similar items, and returns a list of any
matching files, as links. 404 is capable of "fuzzy" matching, so will
usually catch typos in hand-inputted URL's.
* usually 0.01 seconds or less.
If there is only a single matching document, 404.php can (optionally)
jump the visitor directly to that document, nifty. This auto-jumping can
be either by meta-refresh, or proper 302/301 http headers. The latter is
preferable; though 404 does look cool, its purpose is to get users to
the correct page ASAP, if at all possible..
https://corz.org/physical
Finally, if no matches are found, 404.php (optionally) presents the user
with a corzoogle search form, the important part of their query already
inserted into the search field, enabling them to perform a full content
search of your web site.
TADA!
To use..
i.
Edit the preferences section (inside error-settings.php) with your own
details, name, etc..
ii.
Direct your site's 404 errors to this script..
This is achieved by editing either your master httpd.conf or
main .htaccess file, inserting a line something like..
ErrorDocument 400 /err/400.php
ErrorDocument 401 /err/401.php
ErrorDocument 403 /err/403.php
ErrorDocument 404 /err/404.php
ErrorDocument 410 /err/410.php
ErrorDocument 500 /err/500.php
ErrorDocument 503 /err/503.php
..which would direct all 404 errors to.. http://yoursite/err/404.php
which is where I happen to keep *my* copy of this script, that is,
inside a folder called "err" in the top level of your site.
You will have noticed 404 comes with matching 403 page, 401 and so on.
Use what you want/need.
For more information on .htaccess files, see here..
https://corz.org/serv/tricks/htaccess.php
I'll likely include an example .htaccess inside the zip distribution
On some web hosts you can chose your error pages through the CP (control
panel) or site admin page. as a last resort, ask your website
hosts/sysadmin how to achieve this.
iii.
The optional corzoogle search form (for *very* missing documents) will
obviously need to have corzoogle installed somewhere on your site to be
of any use. Clearly 404.php is best instructed to use the corzoogle
search engine in the 'root', or 'top level' of your site. you can get
corzoogle here..
https://corz.org/corzoogle/download.php
iv.
The "catchers" part does automatic redirection of your moved pages.
The idea is, when you move a page, for whatever reason, you add it to
the catchers list. 404.php will then permanently (301) direct visitors
straight to that page, bypassing the 404 altogether, they won't even
realize they *got* a 404!
As well as catching your real moved pages, this can be useful for
catching known mis-spelt inward links (forums!), hot-linkers, and more.
See inside "moved.ini" for the actual catchers list. Basically a list
of old="new" entries, something like this..
weegary.php="/fun/weegary.php"
/pj_demo="/serv/security/demo/"
etc.
v.
Lastly, if you want the spam-bot-foiling email address mashing function
to work, you will need to include the mail-mash function somewhere. if
you download the zip of this script, you'll find I have thoughtfully
done this for you. if you move it somewhere, you'll need to edit the
location in the preferences, below.
That's it!
You keep your lost visitors now!
;o)
(c) corz.org 2004 -> tomorrow!
*/
// grab settings/prefs..
@include ('Clever-404-settings.php');
if (substr($_SERVER['REQUEST_URI'], -1) == "?") {
die("improper request for non-existent page");
}
// first we do any "catchers", for pages that we have moved/redirected
// gotta do it first, we are sending http "headers"
// using output buffering on a 404 just *feels* wrong, somehow. :/
while (list($old_page, $new_page) = each($clever_404['catchers'])) {
if (stristr($_SERVER['REQUEST_URI'], $old_page)) {
// wait for x seconds..
usleep($clever_404['time_to_jump'] * 1000000);
if ($clever_404['redirect_testing']) {
header("HTTP/1.1 302 Temporary Redirect");
} else {
header("HTTP/1.1 301 Moved Permanently");
}
header('Location: http://'.$clever_404['domain'].$new_page);
die();
}
}
// ok, we got a real 404 here.
// probably..
/*
let's search for the document.. */
// init..
$level = 0;
$count = 0;
$links_array = array();
$full_name = '';
$meta_refresh = '';
$no_scan = false;
// transform scan_path into an array..
$clever_404['scan_path'] = array($clever_404['scan_path']);
// grab the filename parts of the URL string, to be used later..
$insert = rawurldecode(substr($_SERVER['REQUEST_URI'], (strrpos($_SERVER['REQUEST_URI'], '/')+1)));
if ($insert == '') $insert = basename($_SERVER['REQUEST_URI']);
if (strlen($insert) > 255) $insert = substr($insert, 0, 255); // for levenshtein (i.e. some joker is having a laugh!)
$insert_no_ext = substr($insert, 0, strrpos($insert, '.'));
if ($insert_no_ext == '') $insert_no_ext = $insert; // folders, etc
// attempt a scan-lock, and begin the scan..
if(scan_lock($clever_404['lock_file'])) {
scan_site();
scan_unlock();
} else {
$no_scan = true;
$clever_404['message_found_NO_matches'] = $clever_404['still_scanning'];
}
// jump on single hit right now?
if (($count == 1) and ($clever_404['jump_on_single_hit'])) {
switch (true) {
case $clever_404['jump_method'] == '301':
sleep($clever_404['time_to_jump']);
header("HTTP/1.1 301 Moved Permanently");
header("Location: $full_name");
die();
// don't use 307 unless you know what you are doing (passes POST variables onward, and many entities don't GET it!)
case $clever_404['jump_method'] == '307' or $clever_404['jump_method'] == '302':
sleep($clever_404['time_to_jump']);
header('HTTP/1.1 '.$clever_404['jump_method'].' Temporary Redirect');
header("Location: $full_name");
die();
case 'meta':
$meta_refresh = '<meta http-equiv="refresh" content="'.round($clever_404['time_to_jump'], 0).';URL='.$full_name.'">';
}
}
/*
Begin Page..
*/
if (!$clever_404['embedded']) {
begin_header();
echo '
<title>another beautifully caught "page not found" by the 404.php, the intelligent error handler v',$clever_404['version'],'..</title>
<meta name="description" content="',$clever_404['domain'],' 404 page.. intelligent 404 handling with seek-and-return. The non-existent file file" />
<meta name="keywords" content="404,php,404 error,error handler,auto-scan,auto-find,source code available at corz.org" />';
finish_header();
}
// you may want to put your header here
echo '
<!--beautifully caught by 404, the non-existent file file, from corz.org-->
<div class="content-wide">
<div class="two-column">
<div class="left-column">
<h1>',$clever_404['message_404'],'</h1>
If you\'re certain that a page <em>should</em> be here, please <a href="',$clever_404['email_address'],'?subject=404%20-%20',rawurlencode($_SERVER['REQUEST_URI']),'" title="your valuable feedback is appreciated. thanks">tell ',$clever_404['webmaster'],'</a> about it. Alternatively, click <a href="/"
title="up to the site root">here</a> for some real links.
</div>
<div class="right-column">
<div class="error">404</div>
</div>
</div>
<div class="clear"> </div>';
if ($meta_refresh) echo $meta_refresh;
do_result('out');
if ($links_page != '') {
echo '
<h2 id="found_matches">',$clever_404['message_found_matches'],'</h2>
',$links_page,'
<div class="tiny-space"> </div>';
if ($clever_404['corzoogle_always'] == true and !empty($clever_404['cz_location'])) corzoogle_box();
} else {
echo '
<div id="found_NO_matches">
<h2>',$clever_404['message_found_NO_matches'],'</h2>
</div>';
if (!$no_scan and $clever_404['cz_location']) corzoogle_box();
}
echo'
<div class="half-space"> </div>
</div>';
end_error_page();
// show the corzoogle search form..
function corzoogle_box() {
global $clever_404, $insert_no_ext;
$insert_no_ext = strip_stuff(urldecode($insert_no_ext));
echo '
<h4>',$clever_404['message_do_a_search'],'</h4>
<div class="centered">
<a href="https://corz.org/corzoogle/" target="_blank" rel="noopener noreferrer" title="corzoogle locates! (opens in a new window - Apple|Ctrl|whatever-click for a new tab instead)">
<img src="',$clever_404['cz_img_location'],'" alt="corzoogle locates!" /></a><br />
<br />
<form method="get" action="',$clever_404['cz_location'],'">
<div class="form">
<input type="text" name="q" size="21" maxlength="256" value="',stripslashes($insert_no_ext),'" />
<input type="submit" value="do it!" />
</div>
</form>
<div class="small-space"> </div>
</div>';
}
// attempt to achieve a scan lock.
// return true if successful..
function scan_lock($lock_file) {
clearstatcache();
//$lock_age = @filectime($lock_file);
// check existence of lock file..
if (file_exists($lock_file)) {
$lock_age = filectime($lock_file);
// if exists, check date/time
if ((time() - filectime($lock_file)) > 60) {
// if older than one minute, delete it..
// (something bad must have happened elsewhere)
unlink($lock_file);
} else {
return false;
}
}
// set lock file..
$fp = fopen($lock_file, 'wb');
if (is_writable($lock_file)) {
if ($fp) {
$GLOBALS['locked'] = flock($fp, LOCK_EX);
if ($GLOBALS['locked']) {
// clearer than fputs, same function..
fwrite($fp, '1'); // could put their IP in here. hmm. perhaps a lock "folder" one lock for each IP, or 1 file per IP
//flock ($fp, LOCK_UN); // but then system /tmp/ may not allow folder creation. hmm.
}
fclose($fp); // this releases the file lock!
}
}
// if all is well, return success..
if (file_exists($lock_file)) {
return true;
} else {
return false;
}
}
/*
function:scan_site()
for more comments, see corzoogle.php spider() */
function scan_site() {
global $clever_404, $insert, $insert_no_ext, $level;
if (!$clever_404['exact_match']) $insert = $insert_no_ext;
for ($search=0,$search_path=''; $search <= $level; $search++) {
$search_path .= $clever_404['scan_path'][$search];
$search_path = str_replace($clever_404['ignore_folders'], '', $search_path);
}
$dirhandle = opendir($search_path);
while ($file = readdir($dirhandle)) {
if ($file{0} != '.') {
if (is_file($search_path.$file)) {
$fext = substr($file,strrpos($file,'.'));
$itsname = basename($file);
$short_name = substr($itsname, 0, 0 - strlen($fext));
if (($clever_404['partial_match']) and (in_array($fext, $clever_404['allowed_extensions']) and (@stristr($file, $insert)))) {
do_result($search_path.$file);
} elseif ($clever_404['fuzzy_match']) {
if (in_array($fext, $clever_404['allowed_extensions'])
// first we test if a single change gives a match
and (similar_text($short_name, $insert) == strlen($short_name)-1)
// and test that it's a single replacement..
and levenshtein($short_name, $insert) <= $clever_404['fuzziness_level']) {
// using two tests allows us to match for dodgy, non-letter
// characters and makes things more accurate.
do_result($search_path.$file);
}
} else {
// non-fuzzy or partial match..
if (in_array($fext, $clever_404['allowed_extensions']) and (@stristr($itsname, $insert))) {
do_result($search_path.$file);
}
}
} elseif (is_dir($search_path.$file)) {
if ($clever_404['match_dirs'] and (!in_array($search_path.$file, $clever_404['ignore_folders'])) and @stristr($search_path.$file, $insert)) do_result($search_path.$file);
$clever_404['scan_path'][++$level] = ($file.'/');
scan_site();
$level--;
}
}
}
}/* end function:scan_site()
*/
function scan_unlock() { // Don't lock, so that we can read it later for time info!!!!! r8? :/
global $clever_404;
// unlock the lock file..
if ($GLOBALS['locked']) { @flock($fp, LOCK_UN); }
// delete lock file
$deleted = @unlink($clever_404['lock_file']); // @ in case (and this has happened) the system cleaned up the lock file during the scan
// the irony is, it manages this because, once written, we don't actually "lock" the lock file!
} // this is by design. Actually, I'm re-thinking this, testing lock hold 7-12-08
/*
function do_result() */
function do_result($file) {
global $clever_404, $count, $full_name, $links_page, $links_array;
if ($file == 'out') {
// output the page
foreach($links_array as $link) {
$links_page .= $link;
}
} else {
$count++;
$display_name = $title = basename($file);
$full_name = str_replace($clever_404['scan_path']{0},'http://'.$clever_404['domain'].'/',$file);
if ($clever_404['links_are_full']) { $display_name = $full_name; }
array_push($links_array, '<a href="'.$full_name.'" title="'.$display_name.'">'.$display_name."</a><br />\n");
}
}/* end function do_result()
*/
/*
function strip_stuff() */
function strip_stuff($string) {
$nonos = array('.','..',' .'.'. ',',',';','[',']','*','~','#','&','?','$','%','+','=','»','«');
$stripped = str_replace($nonos, ' ', $string); // remove undesirables
return trim($stripped);
}/*
end function strip_stuffing() */
/*
changes..
I thought I might start keeping changes under the scripts themselves.
it doesn't cost us anything. php will ignore this.
1.9.15
* central config file: error-settings.php
* removed some left-over branding
* Added matching 400, 410 and 503 pages
1.9.11
* In the event of the site scan turning up a single match, 404 can now
redirect with a proper 301 header, just like the catchers. Most
users wouldn't even realize they got a 404. This basically gives you
automatic 301 permanent redirects for any pages you move. keep the
users and spiders happy!
* You now can specify the catchers auto-jump method, '301', or
old-school meta-refresh, in the preferences.
* Added scan locking. When 404 is scanning the site, it will place a
temporary lock file, to prevent crazy bots and site abusers from
running multiple file scans at once, and potentially stressing the
server, chewing up resources.
404 will still display, but with a message telling the user to wait
a moment before trying again, rather than the usual search results.
Most folk will never see this in action, but it's good to know it's
there, preventing potential mishaps.
* You can now choose to have 404 return matches for directories.
so if the user was looking for the non-existent/foo/hell they could
get back results for /bar/shell scripts/
* Fixed the slashes in the corzoogle input (for '' quotes).
1.8
* fixed the corzoogle image location, and some other stuff.
* Cleaned up distro prefs.
* Improved layout, now uses a nice container like my regular pages
1.7
* incorporated partial matching and fuzzy matching; produces great
results.
* cleaned up some xhtml output
1.6.5
* Added some fuzzy matching for the file scan. A sorta request.
* this is a highly specialized tweak, but works great as per request.
you can play around with things to get different results, but as it
stands, g-dip will match g_dip.jpg, and in my own mirror,
tempz_piles will match tempx_piles.jpg, etc. This can be
enabled/disabled from a preference called $clever_404['fuzzy_match'].
1.6.2-1.6.4
* just minor things.
1.6.2:
* Fixed some potential bugs in initialisation.
1.6.1:
* XHTML 1.0 Strict compliance. Nice.
1.6:
* 404 will now strip characters from the input string for entry into
the corzoogle search box. for instance, a 404 for mama.mia.php will
now enter "mama mia" into the search box, instead of "mama.mia"
which would likely produce a lot less hits. corzoogle, of course,
takes the dot into account
Added some information to the readme up top, including important
notes about editing the redirections. I discovered this the hard
way.
:2do..
lost songs
redirect lost *.mp3 (or whatever) to a special page
like the /audio/ root.
*/
?>