corz.org uses cookies to remember that you've seen this notice explaining that corz.org uses cookies, okay!
<?php // ۞// text { encoding:utf-8 ; bom:no ; linebreaks:unix ; tabs:4sp ; }
/*
v0.5.1
links.php - corzoogle-powered robot food generator
This innocuous script (and my /blog/2004-oct/) is no doubt where Google
got the idea for "sitemaps". <sarcasm>Thanks for the credit!</sarcasm>
This script can also produce such a document, if you so desire. (see the
notes at the foot for implementation)
Essentially, this script produces a list of all the links on a site, in
no particular order useful for search engines robots and other
creepy-crawlers.
Now has "text mode" where it will embed the links into a text file of
your choosing. This is a nifty way to do it, something for everyone.
Rename to "links.php" and chuck in the root of your site, let the robots
chew. Includes valid corzblog links automatically, too. Put a link
somewhere on your front page.
You can put this inside folders, too, providing links for only that area.
I ripped a lot of the code straight out of the corzoogle engine..
https://corz.org/server/tools/corzoogle/
Have fun!
;o)
(c) copyright corz.org 2004->tomorrow!
*/
/*
files to create links from
I don't create links for downloads, zips, etc.
bots don't need to access those.
*/
$lm['extentions'] = array(
'.html',
'.txt',
'.xml',
'.nfo',
'.php',
'.phps',
'.blog',
'.c',
'.comment',
'.jpg',
'.png',
'.gif',
'.src',
'.term',
'.au3',
'.m3u',
'.sh',
'.doc',
'.pdf',
'.out');
/*
IGNORE folders.
these will NOT be parsed by the link generator. one '/inc/' covers all
if you need to be more specific, do: "/foo/inc/"
*/
$lm['ignore'] = array(
'/_arc/',
'/err/',
'/data/',
'/inc/',
'/corz/tools/',
'/hag/',
'/img/',
'/cvs/',
'/tracker/',
'/includes/',
'/cgi-bin/',
'/public/images/smileys/',
'/public/icons/',
'/private/');
/*
Special Links
these are links that, for one reason or another, you would like to put
manually on the page. a generated gateway link of some kind, for instance,
like https://corz.org/engine is. things that wouldn't be picked up searching
through a filesystem, yet you would like to include.
links in single quotes, seperated by commas..
$lm['special_links'] = array ('/engine','/source/');
of course, your *hot* new pages, should go here, too, the search engines
will pick them up as soon as possible. These appear first in the final
output.
*/
$lm['special_links'] = array
(
'/words/',
'/engine',
'/windows/',
'/devblog/',
'/blog/'
);
// for the title..
$lm['links_domain'] = $_SERVER['HTTP_HOST'];
// transformations..
/*
if you have some script generated pages, you can perform transformations on the included (real) documents
I'll use corzblog as an example, but this will work for any generated page that uses flat-file content.
*/
//$lm['match_ext'] = '.blog';
//$lm['trans_title'] = 'corzblog archive: ';
//$lm['trans_path'] = '/blog/'; // I use (mod_rewritten) flat-inks
//$lm['trans_add'] = ''; // any string you wish to add on the end of the link, eg. '&foo=bar'
// so, for example; when the links machine finds /some/path/file.blog it converts it into..
// https://corz.org/blog/index.php?archive=file
// or.. https://corz.org/blog/2006-May or whatevr
// in the real world, you are better to use .htaccess and mod_rewrite for this stuff.
/*
Standard Sitemap? true/false/'auto'
You can have links.php output a standard XML sitemap.
If you set this to true, the following preferences have no effect.
*/
$lm['standard_sitemap'] = 'auto';
/*
NOTE: You can also use the setting "auto", which will switch to regular
sitemap XML output, IF this file is accessed via an .xml URI. So, you could
have both running at once. links.php giving you a nice HTML output, and with
this in your .htaccess..
RewriteRule ^sitemap.xml /links.php [NC,L]
Accessing http://site.me/sitemap.xml would get you the "sitemap" XML output.
*/
/* The following preferences are for NON-sitemap output..
*/
/*
text mode?
false: produces a plain list of all the links.
simple, neat and effective.
true: will "linkify" a text file.
This is best, mixes the links into a pre-selected text document.
This is a real page of text, except with your site links all through
it. A good place to put some text that you'd like folks to read,
too. En-joy..
https://corz.org/links.php
*/
$lm['text_mode'] = true;
/* the text file..
(location from site root)
*/
$lm['text_file'] = '/inc/txt/text.txt';
// embed this script in some other page?
$lm['embed_in_page'] = false;
// if not embedded, you can specify the header and footer to use..
//
// site header..
$lm['header_location'] = $_SERVER['DOCUMENT_ROOT'].'/inc/header.php';
// site metadata..
$lm['metadata_location'] = $_SERVER['DOCUMENT_ROOT'].'/inc/metadata.php';
// site footer?
$lm['footer_location'] = $_SERVER['DOCUMENT_ROOT'].'/inc/footer.php';
/* end of preferences
*/
// <url>
// <loc>http://www.example.com/</loc>
// <lastmod>2005-01-01</lastmod>
// <changefreq>monthly</changefreq>
// <priority>0.8</priority>
// </url>
// </urlset>
//
//
// init..
//
$lm['path'] = array('./');
$lm['level'] = 0;
$lm['count'] = 0;
$lm['links_array'] = array();
$lm['links_page'] = '';
// Auto sitemap output..
if ($lm['standard_sitemap'] == 'auto' and substr(strrchr($_SERVER['REQUEST_URI'], "."), 1) == 'xml') {
$lm['standard_sitemap'] = true;
} else {
$lm['standard_sitemap'] = false;
}
if ($lm['standard_sitemap']) {
header('Content-type: text/xml; charset=utf-8');
echo '<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';
} else {
echo '<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="initial-scale=1.0, width=device-width" />
<title>generated links for ',$lm['links_domain'],' (robot food)</title>
<meta name="description" content="corzoogle-powered link machine generator LINKS links robot food yum corz.org" />
<meta name="robots" content="all" />
<meta name="generator" content="corzoogle engine" />
<meta name="author" content="corz.org" />
<meta name="publisher" content="',$lm['links_domain'],'" />
<link rel="stylesheet" href="/inc/css/main.css" type="text/css" media="screen" />';
if (file_exists($lm['metadata_location'])) { include $lm['metadata_location']; }
echo '
</head>
<body>';
if (file_exists($lm['header_location'])) { include $lm['header_location']; }
echo '
<div class="clear"></div>
<div class="genlinks content">
<h1>generated links for '.$lm['links_domain'].'..</h1>';
if ($lm['text_mode']) {
echo '<h3 class="sub"><small>(fine text mode)</small></h3>';
}
}
if (is_array($lm['special_links'])) {
foreach($lm['special_links'] as $lm_link) {
if ($lm['standard_sitemap']) {
array_push($lm['links_array'], '
<url>
<loc>http://'.$lm['links_domain'].$lm_link.'</loc>
<priority>0.9</priority>
</url>');
} else {
$lm_ln = '';
if (!$lm['text_mode']) { $lm_ln = '<br />'; }
array_push($lm['links_array'], '<a href="'.$lm_link.'" title="'.
$lm_link.'" target="_blank" rel="noopener noreferrer">'.$lm_link.'</a>'.$lm_ln);
}
}
$lm['count'] += count($lm['links_array']);
}
burrow();
do_result('out');
echo $lm['links_page'];
// leave this in, ta.
if ($lm['standard_sitemap']) {
echo '
</urlset>';
} else {
echo '
<div class="source-link" style="font-size:small;position:absolute;bottom:-50px;right:100px;"><a href="https://corz.org/engine?source=menu&section=php/seo%20scripts"
title="get the get the source for this link generator"><small>get the source for this generator</small></a></div>';
if (!$lm['embed_in_page']) {
echo '
</div>';
if (file_exists($lm['footer_location'])) {
if (!isset($igd['init_done'])) { require_once $_SERVER['DOCUMENT_ROOT'].'/inc/init.php'; }
include $lm['footer_location'];
}
echo '
</body>
</html>';
}
}
/*
function:burrow()
for more comments, see corzoogle.php
*/
function burrow() {
global $lm;
$lm_search_path='';
for ($lm_search=0; $lm_search <= $lm['level']; $lm_search++) {
$lm_search_path .= $lm['path'][$lm_search];
$lm_search_path = str_replace($lm['ignore'], '', $lm_search_path);
}
$lm_dirhandle = opendir($lm_search_path);
while ($lm_file = readdir($lm_dirhandle)) {
if ($lm_file[0] != '.') {
if (is_file($lm_search_path.$lm_file)) {
$lm_fext = substr($lm_file,strrpos($lm_file,'.'));
$lm_itsname = basename($lm_file);
if (in_array($lm_fext, $lm['extentions'])) {
do_result($lm_search_path.$lm_file);
$lm['count'] += 1;
}
}
elseif (is_dir($lm_search_path.$lm_file)) {
$lm['path'][++$lm['level']] = $lm_file.'/';
burrow();
$lm['level']--;
}
}
}
}
/*
do_result()
filling up the $lm['links_array'], and making the $lm['links_page']
*/
function do_result($lm_file) {
global $lm;
$lm_pre_title = '';
if ($lm_file == 'out') {
foreach($lm['links_array'] as $lm_link) {
$lm['links_page'] .= $lm_link;
}
if (!$lm['standard_sitemap'] and $lm['text_mode']) {
$lm['links_page'] = embed_in_text();
}
} else {
$lm_title = basename($lm_file);
if (!$lm['text_mode']) {
$lm_pre_title .= '<br />';
}
if (isset($lm['match_ext'])) {
// transformations..
if (substr($lm_title, -5) == $lm['match_ext']) {
$lm_pre_title .= $lm['trans_title'];
$lm_title = str_replace($lm['match_ext'], '', $lm_title);
$lm_file = $lm['trans_path'].$lm_title.$lm['trans_add'];
}
}
// replace *certain* characters with entities..
$old_ent = array(' ','&');
$new_ent = array('%20', '&');
$lm_file = trim(str_replace($old_ent, $new_ent, $lm_file), '.');
if ($lm['standard_sitemap']) {
array_push($lm['links_array'], '
<url>
<loc>http://'.$lm['links_domain'].$lm_file.'</loc>
</url>');
} else {
array_push($lm['links_array'],'<a href="http://'.$lm['links_domain'].$lm_file.'" title="'.
str_replace($old_ent, $new_ent, $lm_title).'" target="_blank" rel="noopener noreferrer">'.$lm_pre_title.$lm_title.'</a>');
}
}
}
/*
embed links in text
clean, simple */
function embed_in_text() {
global $lm;
$lm_embedded_page = '';
$lm_chunk = implode('', file($_SERVER['DOCUMENT_ROOT'].$lm['text_file']));
$lm_chunk_size = (strlen($lm_chunk) / $lm['count']);
$lm_text_chunks = array();
while (strlen($lm_chunk) > 0) {
array_push($lm_text_chunks, substr($lm_chunk,0,$lm_chunk_size));
$lm_chunk = substr($lm_chunk,$lm_chunk_size);
}
$lm_i=0;
foreach($lm['links_array'] as $lm_link) {
$lm_text_chunks[$lm_i] = preg_replace("/\">(.*)<\/a>/i","\">$lm_text_chunks[$lm_i]</a>",$lm_link);
$lm_i++;
}
// we do it in two stages, to catch the extra text in the file
foreach($lm_text_chunks as $lm_text_line) {
$lm_embedded_page .= $lm_text_line;
}
$lm_embedded_page = str_replace("\n","<br />\n",$lm_embedded_page); // get back the linebreaks
$lm_embedded_page = str_replace("\t",' ',$lm_embedded_page); // and tabs
return $lm_embedded_page;
}
/*
changes:
0.5.1
* Fixed the linebreaks in the embedded text output.
0.5
* Links can now output a standard "sitemap". Ironic, considering the
idea for sitemaps was knicked from this very script!
You can have both running simultaneously, using mod_rewrite to
decide which output is shown, e.g (in .htaccess)..
RewriteRule ^sitemap.xml /links.php [NC,L]
Robot Links will show your standard output when accessed via "links.php"
(or whatever you use) and will display a standard (XML) sitemap when
accessed via "sitemap.xml". Nifty.
0.4
* HTML5 output, customizable header/footer
0.3.6
* Added ability to embed links.php in other pages. simply include
0.3.5
* XHTML update - looks nice, now.
0.3.3
* Improved linking code and body formatting
*/
?>