corzoogle logo - a simple chubby magnifier with a lightening bolt inset. Everything is in a cool semi-transparent grey gradient.

install corzoogle..

Simple install..

Drop "corzoogle.php" into whatever folder you want to search, and take your web browser to..

  http://yoursite/corzoogle.php

That's it.

Site install..

If you are running corzoogle on a live web site, even if you're not, it would probably be a good idea to open corzoogle.php in a text editor and customize its preferences. There's copious notege and comment within.

If you are running corzoogle on a live web site, and don't want folk to find your database passwords and such, I definitely recommend you read the preferences!

Greatest hits..

If you want corzoogle to remember recent searches, you'll need to have a file named .corzoogles, and that file will need to be writable by the web server process. corzoogle will attempt to create the file itself, but depending on your security settings, probably won't succeed. See this page for more details about how to remedy that.

Preferences..

corzoogle is extremely configurable. It's worth reading through the preference section at the top of corzoogle at least once. I'm told it's a good read, and you may even get a laugh or two.

F.A.Q..

When I load corzoogle in my browser I get a 403 error. What's up?

Mac OSX user? It sounds like you unzipped corzoogle somewhere, perhaps on a network volume, and then moved it into place on your local machine. Often, when moving files across volumes like this, file permissions get altered, tightened. And the mac webserver is pretty strict about file permissions; generally a good thing.

What to do is, open the file's "get info" (right click the file) in the Finder, and set permissions to read/write for all catagories ( owner / group / others (world) ), or do..

  chmod 755 /path/to/corzoogle.php

..in a shell/terminal session (as root).

If you unzip corzoogle on the same volume (preferably right in the folder where it will live) there is much less chance of its file permissions being reset like this, they will remain at the factory (my iMac) default.

The above probably applies to any script you download and install from corz.org

Is my Language supported?
aka. "Can I use such-and-such a character?"

A real-life email response sums up the whole matter..

> Hi!
>
> (lots of stuff about how great corzoogle is ..snipped.. *grin*)
>
> Is there any way to make corzoogle include the swedish characters å, ä and ö
>
> Thanks!
>
> *non-english person*

>

There is a way, but currently it's not pretty. I've had similar requests from Hungary, Estonia, Russia and a few other places, and I'm currently looking into full Unicode search support, which should hopefully cover everyone's needs. Sadly, php itself has only recently acquired "full" Unicode support, and older servers (like my host) just choke on it. Unicode handling is also slow.

The alternative (if plan A fails) is to have some kind of plug-in language support, and if I go this way, I will *definitely* be in touch at the early testing stage!

Now here's the really fun answer..

corzoogle ALREADY DOES!

It all depends on your server setup, and the encoding of the documents you are searching. I do know that corzoogle is installed on some *really* foreign sites, and they get results in everything from Arabic to Swahili! It just works, and no one says a word!

Right now (as a wee test) I added the word "öändersonå" to my main title page, and then corzoogled for it. See the attached jpeg. Weird huh?

Try this..

In the main .htaccess file of your site, add the line..

php_value default_charset utf-8

If, for some reason, you don't have access to the .htaccess file (now that's ironic) you could add a line to corzoogle itself, up near the top of the script, this..

ini_set('default_charset','utf-8');

might just make it all happen. I'll probably put that in the next release. (done, and just wait for the screams!) Please let me know how that works out, and thanks for caring about corzoogle!

for now..

;o) corz.org

And PLEASE do let me know how it works out!

next!

You can leave feedback!

You can ask stuff there, leave comments, that kind of thing. I don't expect it to get too busy; corzoogle just works.

Feedback

If you have a question, feel free to leave a comment, below. I don't expect it to get too busy; Anti-Hammer usually just works.


Welcome to the comments facility!


previous comments (seven pages)   show all comments

Sam - 21.07.05 2:27 am

Decided to use the 'nosearch' way (yet another solution! such flexibility!) as I was doing this on a couple of other files anyway (eg. site map).

A couple more FAQ:

1) Does the inclusion of Google Adwords on a site mean it's no longer "not-for-profit"? (Thought someone should ask!)

2) I have Adwords and I think it's causing me to receive up to 4 email notifications per search - but seems to be random - no direct relation to number of units shown. 2a) Does that make sense? 2b) How can this be stopped?

Also, a nice to have - the page from which the search was performed. I have a little form embedded on every page and it's nice to know (easily) what page they were on when they searched. I've modified the notify_webmaster function (added $_SERVER['REQUEST_URI'] and $_SERVER['HTTP_REFERER'] to the mail - covering my bases) but will not be able to test for 12 hours or so (cannot upload from where I am).

About that cheque - I've misplaced my pen. But seriously - been tyre-kicking on one of my hobby sites (and I'm very happy) and will add to a client's site soon (I think I left my pen there too).


corz - 21.07.05 8:22 am

Yeah Sam, I'm BIG on preferences! the 'nosearch' option is very useful, and included in a couple of my templates, for certain kinds of files. good work!

Google Ad-Words, haven't thought about it before. erm. No.

I really meant that the site itself is "for-profit", in that it is geared towards making money. However, if Google ad-words make you zillions of pounds, a donation would be gratefully accepted! (when you find your pen!)

The email additions sound great (I've added a couple of things myself recently, now that I have proper mail for my domain) but you'll need to add a "@" before the referrer check, or it will occasionally fail.. @$_SERVER['HTTP_REFERER']

As to the multiple notifications, I've noticed this myself. It's more likely the googlebot following the "recent hits" links. I'll likely add a check for this, so bot hits, while allowed (makes for good spidering), won't be mailed to the webmaster.

I'll do a preference for it! There could be an internal list in the preferences, but as this could potentially get very large as more and more spiders come along, we could also use an external "bot list". I use one for my site logger (and other places) and as I'm thinking about this now, I think this might be the way to go for corzoogle, too.

We simply need to check the "user agent" (again with "@", as it is sometimes absent) and if it matches a bot from our list, bypass the email altogether. My current bot list (with a couple of superfluous entries) looks something like this..

ai_archiver
almaden.ibm.com/cs/crawler
ia_archiver
Ask Jeeves/Teoma
BecomeBot
ConveraCrawler
Exabot@exava.com
FAST Enterprise Crawler
Feedster Crawler
FeedValidator
findlinks/
gazz/
Girafabot
globalspec.com/Ocelli
googlebot
Jetbot
larbin_
mikeelliott@hotmail.com
msnbot
NG/2.0
nhnbot@naver.com
slurp
statbot@gmail.com
Syndic8
Yahoo-MMCrawler
Yahoo! Slurp
YahooFeedSeeker
ZyBorg

I'm telling you all this, because it sounds like you know some php and may want to implement something in the meantime.

for now..

;o)
(or


Sam - 21.07.05 2:35 pm

Indeed - Mediapartners-Google/2.1 is the culprit. I've added the HTTP_USER_AGENT to the email as well - nice way to build on the list above..... now I think about it - don't all web browsers have "Mozilla" at the start of the UA? I'll do some more research and get back to you.


corz - 21.07.05 6:20 pm

I went ahead and added a routine for this today, it's up here at corz.org and works great, bypassing the email process if the user agent is on the internal (or external) list. I'll get this into the beta folder hopefully sometime tonight.. https://corz.org/beta

Not all browsers have "Mozilla" in them, but most of the popular ones certainly do. I've been watching user agents too, here's my current "extra" body lines..
$email_body = "There has been a new corzoogle search!\n";
$email_body .= "Their IP was: ".$_SERVER['REMOTE_ADDR']."\n";
$email_body .= "Their Browser was: ".@$_SERVER['HTTP_USER_AGENT']."\n";
which should let me know about any other spiders out there that slip through the list.
If you come across any good ones, let me know!

;o)
(or


Sam - 22.07.05 1:06 am

I've posted a thread about this in a much loved forum. You may want to comment or follow. There's also a link in there to a very nice spider trap that I have not yet implemented but may get around to soon.


Sam - 25.07.05 2:44 am

Getting closer to a result.

It seems that the most reliable way to test if it's a real human (or at least a real search) involves changing the form method from a GET to a POST and then testing the $_POST array.

Will this break anything? (I can and will of course test it but my testing regime may not be as rigorous as yours smiley for :D)


corz - 26.07.05 4:53 am

Nah, it shouldn't break anything. Remember you can also use $_REQUEST, to test either array.

My own spider-protection is working really well with the simple user agent test, certainly well enough for my needs. I've added a couple of new bots as they slip through the net, and haven't had a spider in my mail for quite a few days now.

;o)
(or


Aaron Cavanaugh - 01.09.05 8:06 am

Hi,

Would there be any way to force the info in a txt file to wrap?

So when you click on the topic the .txt file shows all in one line. Could I force the browser (or the results) to wrap the lines of text?
http://www.christianradio.me.uk/forum/corzoogle.php

Thanks. God Bless.

Aaron.


corz - 01.09.05 9:26 am

Well, that's nothing to do with corzoogle, that's the format of the files themselves, the files on your site that corzoogle is finding. Normally a web browser would always wrap plain text, but these look to be in a very strange format, full of weird characters and no linebreaks. I suspect your bulletin board software does this for its own purposes.

It's unusual that your bulletin board is all stored as wee text files, perhaps you want to think about mangling the results to point to the container. I had a wee look, but the long numbers of the text files don't seem to correspond to the numbers of the posts. Bummer.

To be honest, you'd be better off with some kind of spidering search engine, something that can crawl your site as the user sees it, via the web pages at ... /forum/forum.php?action=view&topic=whatever.

Perhaps your bulletin board software has been updated since you installed it, it may even now have a search facility built-in.

;o)
(or


Aaron Cavanaugh - 02.09.05 10:16 am

Hey Cor,

Thanks for taking a look.

God Bless.

Aaron.


Adrian - 08.09.05 6:02 am

Could you add a page that shows either an example of corzoogle we could try out or links to pages that have it in use?

Thanks


next comments (3 pages)

First, confirm that you are human by entering the code you see..

(if you find the code difficult to decipher, click it for a new one!)


gd verification image