corzoogle logo - a simple chubby magnifier with a lightening bolt inset. Everything is in a cool semi-transparent grey gradient.

install corzoogle..

Simple install..

Drop "corzoogle.php" into whatever folder you want to search, and take your web browser to..

  http://yoursite/corzoogle.php

That's it.

Site install..

If you are running corzoogle on a live web site, even if you're not, it would probably be a good idea to open corzoogle.php in a text editor and customize its preferences. There's copious notege and comment within.

If you are running corzoogle on a live web site, and don't want folk to find your database passwords and such, I definitely recommend you read the preferences!

Greatest hits..

If you want corzoogle to remember recent searches, you'll need to have a file named .corzoogles, and that file will need to be writable by the web server process. corzoogle will attempt to create the file itself, but depending on your security settings, probably won't succeed. See this page for more details about how to remedy that.

Preferences..

corzoogle is extremely configurable. It's worth reading through the preference section at the top of corzoogle at least once. I'm told it's a good read, and you may even get a laugh or two.

F.A.Q..

When I load corzoogle in my browser I get a 403 error. What's up?

Mac OSX user? It sounds like you unzipped corzoogle somewhere, perhaps on a network volume, and then moved it into place on your local machine. Often, when moving files across volumes like this, file permissions get altered, tightened. And the mac webserver is pretty strict about file permissions; generally a good thing.

What to do is, open the file's "get info" (right click the file) in the Finder, and set permissions to read/write for all catagories ( owner / group / others (world) ), or do..

  chmod 755 /path/to/corzoogle.php

..in a shell/terminal session (as root).

If you unzip corzoogle on the same volume (preferably right in the folder where it will live) there is much less chance of its file permissions being reset like this, they will remain at the factory (my iMac) default.

The above probably applies to any script you download and install from corz.org

Is my Language supported?
aka. "Can I use such-and-such a character?"

A real-life email response sums up the whole matter..

> Hi!
>
> (lots of stuff about how great corzoogle is ..snipped.. *grin*)
>
> Is there any way to make corzoogle include the swedish characters å, ä and ö
>
> Thanks!
>
> *non-english person*

>

There is a way, but currently it's not pretty. I've had similar requests from Hungary, Estonia, Russia and a few other places, and I'm currently looking into full Unicode search support, which should hopefully cover everyone's needs. Sadly, php itself has only recently acquired "full" Unicode support, and older servers (like my host) just choke on it. Unicode handling is also slow.

The alternative (if plan A fails) is to have some kind of plug-in language support, and if I go this way, I will *definitely* be in touch at the early testing stage!

Now here's the really fun answer..

corzoogle ALREADY DOES!

It all depends on your server setup, and the encoding of the documents you are searching. I do know that corzoogle is installed on some *really* foreign sites, and they get results in everything from Arabic to Swahili! It just works, and no one says a word!

Right now (as a wee test) I added the word "öändersonå" to my main title page, and then corzoogled for it. See the attached jpeg. Weird huh?

Try this..

In the main .htaccess file of your site, add the line..

php_value default_charset utf-8

If, for some reason, you don't have access to the .htaccess file (now that's ironic) you could add a line to corzoogle itself, up near the top of the script, this..

ini_set('default_charset','utf-8');

might just make it all happen. I'll probably put that in the next release. (done, and just wait for the screams!) Please let me know how that works out, and thanks for caring about corzoogle!

for now..

;o) corz.org

And PLEASE do let me know how it works out!

next!

You can leave feedback!

You can ask stuff there, leave comments, that kind of thing. I don't expect it to get too busy; corzoogle just works.

Feedback

If you have a question, feel free to leave a comment, below. I don't expect it to get too busy; Anti-Hammer usually just works.


Welcome to the comments facility!


return to paged comments
Steven Houtzager - 23.06.04 3:58 pm

Basic question. I am using GoLive 6 to make the site. In simple terms, how do I create the "input" items to make the search work. In other words, do I need to make a small form that has a button and a text input field. Then, how does the button link to the php?


corz - 24.06.04 5:59 am

a simple html GET form will do the trick..

probably the easiest thing to do would be to download a copy of 404. (you can download a zip from that source page, too) which has exactly what you are looking for. it's all in the function called corzoogle_box()

hit a real 404 page here at corz.org for a wee demo.

if you need more help with this, feel free to drop more comments here, or mail me.

have fun!

;o)
(or



Jeremy - 28.06.04 8:52 pm

When you search for "html", not only does it bring up pages with the word HTML in the text, but it also searches within the actual html tags and picks up any html file references (ie. "index.html"). The easiest way to fix this I think is to make it automatically add "-.html" after any word or phrase entered in the search (searching for "html -.html" brings up only legitimate results with "html" in the text.) How would I go about doing this?


corz - 29.06.04 7:32 am

searching for "html -.html" brings up only legitimate results with "html" in the text … isn't that the answer to your own question? that's why the -not mechanism exists, so users can filter out stuff.

You think that corzoogle should automatically not search for terms that happen to be file extensions? It's technically feasible, of course, but do we really want this? Most folk, I imagine, would input something like "html" specifically to search for filenames inside files.

But maybe I don't understand the question, I can be a bit thick sometimes; feel free to elaborate.

;o)
(or



encoderX - 10.07.04 10:33 pm

Fantastic script! I've been searching for months for something like this.
Corzoogle is amazingly configuarable yet amazingly simplistic. Can't fault it... yet!
Thank you and keep up the great work - appreciated.

encoderX


jenninnifer - 19.07.04 11:09 pm

Do you have a 'Search with the results' option?

So if someone searchs for 'Whatever', then within those results, they can then search for 'Something'?

Thanks a bunch!


corz - 20.07.04 12:21 am

'Search with the results'?

Yes! simply add more terms! corzoogle always operates in "Boolean AND mode". The more terms you add, the narrower the search.

For instance, if you searched for Whatever and got fifty results, searching for Whatever Something would return reslts with BOTH the word Whatever AND the word Something, and probably much fewer results. You can narrow the search right down this way, being very specific. And there is no limit to the number of terms you can use, either. (though I have put a 256 character limit on the form input, corzoogle itself has no limit)

Check out the tips page for more useful stuff like this!

;o)
(or


Jim - 12.08.04 1:18 am

I have searched the net for many sleepless nights hoping to find a search script like yours. I can now sleep (for 2-3 hours). Thank you for writing this fantastic work! I have had no problems running your script and getting great results. I now go to delete all the other search scripts on my sites that didn't work.

A fan for life;

Jim


hs - 12.08.04 1:54 pm

maybe for the next version an option that all text between 'script-kiddie prank: ' en '' can be ignored, so the php src can't be viewed and so reducing the chance for unsecurity.
or i am mistaking and is this option already included?
greets hs


Jan Gottliebsen - 04.09.04 4:01 pm

Hi

corzoogle looks like a wery nice sitesearch. I have downloaded the script but something seems to be wrong with the zip file.

Can you help me? I would really like to test this script.

Best regards
Jan Gottliebsen


Cameron - 06.09.04 5:53 pm

Nice script, thank you! I love all your funny comments in the script too...especially the things you would accept as tokens of gratitude (like "mungbeans" or "interesting stones") I will be packaging up the mungbeans and stones soon to send to you (ummm...will mungbeans clear with customs? - and what is a phototron unit?) Anyhow...you can see your script (and join in if you have a photoblog) at work here: photoblogring.org
an image


corz - 09.09.04 2:31 am

glad to be of help with the sleep, Jim. I gotta get some of that myself; soon, I hope. Been a helluva few weeks.

hs, I'm guessing you mean between php tags; cbparser eats those. I guess you mean the preview snippets too. Not a bad idea, would slow things down a bit, but we could have a switch. noted.

Jan, if you're still having troubles with the zip, bang something in my inbox and I'll send you a plain text version. Putting "corzoogle" in the subject is a good idea, my shit-list deamon has a ferocious appetite.

Cameron! Good timing! I'm on the very dregs of my mung supply! mail me and I'll return a pdf version of "The Magic of Mung Beans" for your amusement. A Phototron Unit, by the way, is a hexagonal growroom particularly suited to high yeild type plants. I fancy playing with one, swapping out the bulbs, experimenting. I'd like a piano, too, for similar reasons.

All the best with the photoblog!

;o)
(or




raja iskandar shah - 10.09.04 12:01 pm

useful script. but i could not get it to work with pdf and sxw (oo.o writer). worked with .txt .doc and .html





corz - 10.09.04 11:42 pm

just add whatever extensions you need into the preferences.
the line goes something like..
$extentions = '..html.txt.doc.phps.blog.comment';/*


;o)
(or



Paul - 01.10.04 6:50 pm

How do I avoid html coding elements from being displayed? A search for "tom" returns

tom:2px solid #0B2C89;border-top:2px solid #0B2C89;"> To better expedite the implementation of Distance Learning and Video Conferencing throughout the New York State Education system, we have . .

that is an an inline style applied to a table.

thanks for any suggestions
Paul


corz - 06.10.04 9:28 pm

As it stands; you don't. When I wrote corzoogle, I particularly wanted to search within tags for just these sorts of things. It's a feature!

It's an idea, though, for the next version, could make it optional.

In the meantime, try "tom -bottom" as your search string.

;o)
(or


Kushal Naik - 10.11.04 8:23 pm

I there any modification to the corzoogle script I can make so that recent searches are emailed to me? Is there anyway to include the IP and timestamp of the user making the search?

Thank you for your time.

Kushal Naik
kushalnaik@comcast.net


corz - 12.11.04 1:22 am

php mailing is fairly straightforward, but I'd never imagined somone might want this, the "recent hits" was always more of a fun feature than anything else, so thank you!

I've added some notify functionality to corzoogle, just for you Kushal, (by the way, other features requested on this page are also available in the latest betas, not searching inside tags, among them). grab corzoogle.v0.7.5b2.zip for all the latest goodies.


things to note: as it stands, it's a straightforward mail-on-every-search type thing. you can switch it on or off (it's off by default) in the preferences and you can also set the various mailer strings.

IMPORTANT: if your web hosts's mailer is incorrectly setup, or plain broke (mine), enabling this option could introduce some potentially lengthy delays, and I urge you to test this feature somewhere onsite before enabling in your main corzoogle engine.

At any rate, the mailing routine happens right at the very end, so at least the user should have their search results, even if the page isn't technically "finished".

I haven't tested the feature at all, for the above stated reasons, and I'd appreciate if someone could let me know if it works, it should do.

other data: currently, there is no other data stored with the hit. the "timestamp" will be the time of the email, which will be corzoogle run-time. it would be possible to plug a great deal more information into the "latest hits". client IP is easy enough to grab, what else would you like?

motivate me!

;o)
(or



Kushal - 12.11.04 2:08 am

404 on the beta dl?????




corz - 12.11.04 3:51 am

hahahah that's a funny one!

I was about to say "hmmm.. it works fine in all my browsers", and then I looked closely at my status bar, and I realised the URL was http://corzorg/beta and not http://corz.org/beta ... it was pointing to my local dev mirror!

then I remembered how I was testing the new syndication functions of the distro machine, which I then, erm, uploaded, like I say, the beta machine is always the latest version! smiley for :geek:

it's good to end the night with a laugh.
thanks for letting me ken.

;o)
(or

ps.. try now!


anonymous - 13.11.04 7:37 pm

I am testing your 0.7.5.b3, i am very happy. Work very fine.
I am intersting in sendign mail to webmaster.
I compile, i think correct, the e-mail address and so on, but i don't receive e-mail.
Have you anything to suggest?

Many thank,
from Italy



corz - 13.11.04 11:03 pm

hello Italy!

did you switch on the $notify preference? set that to..
$notify = true;

if that's not the answer, check your php error logs. what error are you getting? I can't test this at my live webhost (i.e. here) but it's working fine on my test server at home. does your webserver have a working mail function? make a script with just..



does that work?
keep me posted!

;o)
(or



anonymous - 14.11.04 12:15 pm

YES!!!
Yoiu are right!
I don't make my attention about notify func.
I am sorry ;-(
The script work fine.
I have to test all the function.
Many thanks.



corz - 15.11.04 10:00 pm

cool! it works!

;o)
(or


Kushal - 25.11.04 3:08 am

I finally got around to updating my corzoogle to the newest beta (the one where you added email to webmaster (THANKS!).) The only issue is that with that addition the script gets HORRRIBLY slow!

So I did a little research and added a mod to the end of the script which sends the email using mail() and also sends user information (host, ip, time, browser etc). I compiled it and it runs MUCH faster (only miliseconds slower than ver. 7.4 , do you want a copy of the code to share with other "caring webmasters.."?

Let me know, I will be happy to share!


corz - 25.11.04 1:11 pm

slow? what? smiley for :aargh:

do you mean where it sends the email?
if the mailserver you set it to use isn't working, it will wait until it times-out completely (30 seconds +). the results appear before it starts that, though.

is that what you mean? or slowness somewhere else?
I MUST KNOW!

;o)
(or

ps.. there's another beta since then!


Kushal Naik - 29.11.04 5:45 am

I have 7.5b. Whatever.


Kushal Naik - 29.11.04 5:48 am

Also, do I need a licence for a non-profit org.? I was unsure when reading the licencing.


corz - 29.11.04 1:26 pm

no Kushal, you don't. so long as you leave the links/logo in tact, you can use corzoogle for free.

though all donations are gratefully accepted!

;o)
(or



Kushal Naik - 02.12.04 2:42 am

Could you check, http://www.cvhs.us/corzoogle1.php and see if the logo modification will satisfy the requirements.

Thx,
Kushal Naik


corz - 02.12.04 3:50 am

heh nice one! it's large, that's for sure.

I'll maybe lookout the original photoshop document and do a "corzoogle.com" (but without the "www." *eew*) for you.

happy corzoogling!

;o)
(or



hari - 08.12.04 7:32 am

does this search for text inside Acrobat documents too ?


corz - 08.12.04 8:56 am

yes. so long as the PDF's aren't encrypted, corzoogle can find text inside them.

;o)
(or



Kushal Naik - 13.12.04 8:57 am

Hey its me again,

This ones not about corzoogle its about the source engine. What is that script that is doing the engine? It has no extention so I'm lost, is that CGI or PHP or what?

Wondering...


corz - 14.12.04 12:06 pm

Lots of folks find it hard to get their head round, but everything is inside corzoogle.php!
even the logo!

the only external file is the (optional) "recent hits" file.
it's all php.

;o)
(or

ps.. I like clayton valley high school's new "Instant Updates" feature. very nice! smiley for :ken:


Kushal Naik - 15.12.04 3:43 am

Corzoogle search engine is coming soon to CVHS but I need some time to play with the output etc. Plus I have tons of files I don't want it to index. Very time consuming stuff!

Thanks for the complement and hats off to you for all your hard work on the corz___ stuff!


corz - 15.12.04 10:04 am

corzoogle doesn't index! it's all happening live, and there are three really simple ways to get corzoogle to avoid particular files and folders. Something tells you're about to do a lot more work than is neccessary!

mail me with particulars, I could save you some (more) time!

;o)
(or


Joseph - 06.01.05 2:02 pm

The script works great. I run it on a phpnuke website that uploads files (image and txt) to a folder for users, and the mix of corzoogle and my setup lets the users find their own files without many of the normal requests for aid that fill my email and private message box.
Site

Here is the link if you want to see what I changed on the script's display/style.

Joseph


corz - 06.01.05 8:12 pm

Yeah Joseph, looks good!
Apart from maybe one thing. Try this..

an image

ahhh.. nizzze!

;o)
(or


Sébastien - 03.02.05 6:50 pm

Hello,

I'm in test for a search engine to integrate in a CMS that I do, and I'm looking for the licence text for corzoogle.

What is the exacts terms of the licence ? I need free software, and if corzoogle have some licence restriciton, I'll could not distribute It.

Could you inform me about that ?

thanks a lot.

--
Sébastien


corz - 04.02.05 11:33 am

sure.

essentially, corzoogle is free for non-profit use, so long as the (very small) links on the results page are left in-tact.

For more details, see the buy corzoogle page, or read the agreement at the bottom of corzoogle (aye, inside corzoogle itself.

There's also a list there of other ways which you give back a little; I might add that to the buy corzoogle page. hmmm..

happy corzoogling!

;o)
(or




Sébastien - 04.02.05 11:53 am

Ok, so it's not free software, I could not use it.

It's a shame, but I have to search something else ...

bye !


steve - 07.02.05 6:21 pm

Hi all!

I wonder if its possible to highlight the searched words in the page I go to aftr the site search..like: i search a word, get the links in the results and after clicking the link in the results, the searched word gets highlighted in the new loaded page.

any ideas to make this possible?

groovin greetz

steve



corz - 11.02.05 3:01 pm

Hi Steve.

Highlighting in the page would need to be controlled by the page itself. What you'd need to do is insert something into your page headers that checked for referrer information, and then use that for your highlighting, ie. the "?q=some query" part.

Of course, there's always more than one way to skin a cat. You could alternatively hack a handler URL into corzoogle and have your handler serve up the page instead, adding highlighting along the way. highlighter.php?page=yaddayadda.html

or perhaps use corzoogle's new mangle preferences to supply an alternative extension, which an .htaccess file could transform into a handled URL. In corzoogle..

$mangle = array( '.htm' => '.htm.handled' );

and then catch that with .htaccess..

RewriteRule ^(.*)\.handled http://mydoman.org/highlighter.php?page$1 [nc]

this is just off the top of my head, but you get the idea, I hope!

;o)
(or



rv - 16.03.05 4:36 pm

great search engine! could you post php code for the handler to insert in the page header?

rv


corz - 06.05.05 4:25 am

hey what happened to my comment?

I have a bad habit of going "feck off!" to my browser, when there's still unposted textareas, certainly isn't the first time I've left a bit of a gap! smiley for :lol: my cute answer was... see here! same thing.

;o)
(or


Bael - 17.05.05 12:03 am

lol just exploring :P

I had no idea how popular your search engine was lol =P
just did a search in google for corzoogle, an it found loads =P

dunno if this is like against your terms and conditions for using corzoogle but some guy changed the image etc, Click Here if it is against your terms etc sue his ass lol!?!?!?!

and i noticed that you were obviously going for a google look with the logo, but the font isnt the exact right one =P i have the commercial catull(google) font from a previous sig i made for some forums, you can have it if ya want lol =P or you could send me the .psd(hopefully you use photoshop) of the logo and i could edit it ya...oooo alittle less illegal lol :P

Bael
Bael


Bael - 17.05.05 12:04 am

hehe i mustve done the tags wrong or something the click here didnt turn into a link =S lol i need to start using the preview button =P

Click Here


Bael - 17.05.05 1:04 am

ok i got bored so made one anyway lol.....it aint great i could have got it abit more like the proper google logo...the colours aren't exactly right an the shadow looks a little off aswell...but it's ok i guess lol =P an you're a really knowledgable guy an probably looked into all this an decided you wanted to customize it yourself more...but meh it kept me occupied ofr 5 mins lol =P

(Click for full size)

an image

Bael


corz - 17.05.05 8:21 am

Bael, have you ANY idea how much trouble a logo like that could get me into! smiley for :eek: There's a fine line between "homage" and "rippin off", and you just crossed it! smiley for :lol:

About breaking the license terms, you would not believe the amount of webmasters that just plain rip me off, some of them really high profile sites, too! I have most of their URL's, of course, and do plan to get around to a legal letter type thing at some point.

I guess I'm kinda hoping they might see the light first. smiley for :roll:

;o)
(or


Bael - 17.05.05 9:28 am

really lol, i can't see how that would be illegal, yeah maybe it is rippin off, but it's a different word lol, an surely they can't copyright the "style"?? i mean look at "ador" trainers lol, they have a different name, but completely rip-off "adidas" trainers lmao. and even the "style" of the "oogle" bit of the logo isn't the same as google's lol, i swapped the colours of the O's around =P

and i wouldn't think twice if someone was rippin me off lol, and you know that hardly any of them "will see the light" if any at all lol xD

Bael




TT - 20.05.05 2:50 am

Hi all

I got the search working ok, but for the life of me I cant get the image to load. I placed
<?php include("corzoogle.php"); ?>
into a different page, when the page loads NO image just a box with a red X in it.I've looked at the enbedded image code and I cant work it out, I've tried different paths etc with no luck. Any one help me out please.
Once I got the image working I can edit the image back ground to suit site, by the way great corzoogle I like it.


TT


corz - 20.05.05 11:57 am

when runnning in embedded mode, you'll need to use a static image. There should be something about that in the prefs. It's easy to do; simply alter the preference that reads..
$logo = 'embedded';
to read
$logo = '/path/to/image.png';

You can grab a transparent version of the corzoogle logo here..
an image
save that to your hard drive, or click it for a bigger version!

have fun!

;o)
(or


Aaron Cavanaugh - 21.05.05 6:59 am

Hi,

I think this search works good. Is there a way to show the first 20 characters of a file instead of the file name (highlighted). Or if it was html show the page title, rather than the page name.

For example my site http://www.christianradio.me.uk/forum/forum.php if you click on forum search and search for something it comes back with the filename 100201231230 or something highlighted. I am searching txt files. I would like it to show say... the first 20 characters of the file instead of this unreadable filename.

Thanks. God Bless.

Aaron.


corz - 21.05.05 8:02 am

Hi Aaron,

If there is a title in the file, corzoogle will use it.

Your site is all .txt files (and not very human-readable ones at that), so they don't have <title> tags. It wouldn't take much tweaking to get to use the first 20 characters instead of the filename, how good are your php skills? You might want to look into corzoogle's mangling features, too.

And by the way, where's the corzoogle logo? You know, the one that "must be displayed prominently"... Bael, tell him about the "license agreement" thing! smiley for :ken:

l*rz..

;o)
(or


Aaron Cavanaugh - 21.05.05 11:14 pm

Hi Corz,

I guess I didn't see that on the license sorry. I will fix it.

My php skills are not so good, I can edit some stuff but I wouldn't know how to program php specific code.

If you could point me in the right direction.

Thanks. God Bless.

Aaron.


corz - 23.05.05 10:49 am

Well, Aaron, it shouldn't be too tricky to get what you want, if you look for (depending on the version you are running that wee routine is probably around line 1500) this..

<?php
if($score_all_titles == true) {
    if(!
$title get_title($file_data))  {
        
$name_is_title true;
        
$title substr($file_name,0,strrpos($file_name,'.'));
    }
}

?>


(without the <?php ?> tags, which I just added so you could get the funky syntax highlighting) and change it to..

<?php
if($score_all_titles == true) {
    if(!
$title get_title($file_data))  {
        
$name_is_title true;
        
$title substr(strip_tags($search_data),0,33).'...';
    }
}
?>


Which, in the absence of a "real" title, will use the first 33 characters of the document as the title. Use whatever number suits you. Also, ensure you've enabled the $score_all_titles preference (make it true) for this to work.

If you want the title to truncate at the end of a word, rather than an absolute number use this quick hack (which may throw up the odd non-fatal php error, but you wouldn't see those on a production server)..

<?php
$title 
substr(strip_tags($search_data),0strpos($file_data,' '33)).'...';
?>


That's the easiest way to do it. smiley for :ken:

;o)
(or

ps.. about the logo, don't sweat it, at least you didn't remove all the links like some sites I could mention. If you like, you could keep your logo, and put a miniture corzoogle logo somewhere else on your page, even elsewhere on your site, that would be cool, something like this (transparent, old browser won't show the transparency correctly, so it's your call) ..
an image
or use this one with a white background...
an image

[edit] or else grab the latest beta, which now has a preference for this smiley for :D [/edit]


Aaron Cavanaugh - 23.05.05 10:40 pm

Great it works.

Thanks. God Bless.

Aaron.


Kry - 27.06.05 9:50 pm

Having problems searching .pdf files. Has your search engine been tested with this type of file?


corz - 28.06.05 12:29 am

sure has. probably you have encrypted PDF files, password-protected or some such. corzoogle can only search text, it makes no attempt to dycrypt and decypher a PDF while its working away.

if you ad protections to your PDF's, you convert the text to gibberrish, like in a zip file, and that's not searchable with corzoogle. with regular PDF's, it works great.

;o)
(or


Kry - 28.06.05 12:37 am

Excellent, thank you.


Sam - 16.07.05 6:44 am

Firstly, FANTASTIC, the cheque's in the mail.

Just started playing - Is there a way to specify a wildcard in $private - or more specifically a way to exclude files starting with a '_' as this is how I've done all my "includes" :(

I suppose I could rename them all *.inc so they're not picked up in $extensions.


corz - 20.07.05 2:35 pm

hey! really! I look forward to THAT!

in answer to your question Sam (better late than never!) YES there is a way, but not in the prefs, it would involve a slight tweak to the code. I might add some facility to a future version.

If you want to, you can mail me with your current corzoogle, and I'll tweak it for you. FOC! giving them all the same extension is a good idea, though!

;o)
(or


Sam - 21.07.05 2:27 am

Decided to use the 'nosearch' way (yet another solution! such flexibility!) as I was doing this on a couple of other files anyway (eg. site map).

A couple more FAQ:

1) Does the inclusion of Google Adwords on a site mean it's no longer "not-for-profit"? (Thought someone should ask!)

2) I have Adwords and I think it's causing me to receive up to 4 email notifications per search - but seems to be random - no direct relation to number of units shown. 2a) Does that make sense? 2b) How can this be stopped?

Also, a nice to have - the page from which the search was performed. I have a little form embedded on every page and it's nice to know (easily) what page they were on when they searched. I've modified the notify_webmaster function (added $_SERVER['REQUEST_URI'] and $_SERVER['HTTP_REFERER'] to the mail - covering my bases) but will not be able to test for 12 hours or so (cannot upload from where I am).

About that cheque - I've misplaced my pen. But seriously - been tyre-kicking on one of my hobby sites (and I'm very happy) and will add to a client's site soon (I think I left my pen there too).


corz - 21.07.05 8:22 am

Yeah Sam, I'm BIG on preferences! the 'nosearch' option is very useful, and included in a couple of my templates, for certain kinds of files. good work!

Google Ad-Words, haven't thought about it before. erm. No.

I really meant that the site itself is "for-profit", in that it is geared towards making money. However, if Google ad-words make you zillions of pounds, a donation would be gratefully accepted! (when you find your pen!)

The email additions sound great (I've added a couple of things myself recently, now that I have proper mail for my domain) but you'll need to add a "@" before the referrer check, or it will occasionally fail.. @$_SERVER['HTTP_REFERER']

As to the multiple notifications, I've noticed this myself. It's more likely the googlebot following the "recent hits" links. I'll likely add a check for this, so bot hits, while allowed (makes for good spidering), won't be mailed to the webmaster.

I'll do a preference for it! There could be an internal list in the preferences, but as this could potentially get very large as more and more spiders come along, we could also use an external "bot list". I use one for my site logger (and other places) and as I'm thinking about this now, I think this might be the way to go for corzoogle, too.

We simply need to check the "user agent" (again with "@", as it is sometimes absent) and if it matches a bot from our list, bypass the email altogether. My current bot list (with a couple of superfluous entries) looks something like this..

ai_archiver
almaden.ibm.com/cs/crawler
ia_archiver
Ask Jeeves/Teoma
BecomeBot
ConveraCrawler
Exabot@exava.com
FAST Enterprise Crawler
Feedster Crawler
FeedValidator
findlinks/
gazz/
Girafabot
globalspec.com/Ocelli
googlebot
Jetbot
larbin_
mikeelliott@hotmail.com
msnbot
NG/2.0
nhnbot@naver.com
slurp
statbot@gmail.com
Syndic8
Yahoo-MMCrawler
Yahoo! Slurp
YahooFeedSeeker
ZyBorg

I'm telling you all this, because it sounds like you know some php and may want to implement something in the meantime.

for now..

;o)
(or


Sam - 21.07.05 2:35 pm

Indeed - Mediapartners-Google/2.1 is the culprit. I've added the HTTP_USER_AGENT to the email as well - nice way to build on the list above..... now I think about it - don't all web browsers have "Mozilla" at the start of the UA? I'll do some more research and get back to you.


corz - 21.07.05 6:20 pm

I went ahead and added a routine for this today, it's up here at corz.org and works great, bypassing the email process if the user agent is on the internal (or external) list. I'll get this into the beta folder hopefully sometime tonight.. https://corz.org/beta

Not all browsers have "Mozilla" in them, but most of the popular ones certainly do. I've been watching user agents too, here's my current "extra" body lines..
$email_body = "There has been a new corzoogle search!\n";
$email_body .= "Their IP was: ".$_SERVER['REMOTE_ADDR']."\n";
$email_body .= "Their Browser was: ".@$_SERVER['HTTP_USER_AGENT']."\n";
which should let me know about any other spiders out there that slip through the list.
If you come across any good ones, let me know!

;o)
(or


Sam - 22.07.05 1:06 am

I've posted a thread about this in a much loved forum. You may want to comment or follow. There's also a link in there to a very nice spider trap that I have not yet implemented but may get around to soon.


Sam - 25.07.05 2:44 am

Getting closer to a result.

It seems that the most reliable way to test if it's a real human (or at least a real search) involves changing the form method from a GET to a POST and then testing the $_POST array.

Will this break anything? (I can and will of course test it but my testing regime may not be as rigorous as yours smiley for :D)


corz - 26.07.05 4:53 am

Nah, it shouldn't break anything. Remember you can also use $_REQUEST, to test either array.

My own spider-protection is working really well with the simple user agent test, certainly well enough for my needs. I've added a couple of new bots as they slip through the net, and haven't had a spider in my mail for quite a few days now.

;o)
(or


Aaron Cavanaugh - 01.09.05 8:06 am

Hi,

Would there be any way to force the info in a txt file to wrap?

So when you click on the topic the .txt file shows all in one line. Could I force the browser (or the results) to wrap the lines of text?
http://www.christianradio.me.uk/forum/corzoogle.php

Thanks. God Bless.

Aaron.


corz - 01.09.05 9:26 am

Well, that's nothing to do with corzoogle, that's the format of the files themselves, the files on your site that corzoogle is finding. Normally a web browser would always wrap plain text, but these look to be in a very strange format, full of weird characters and no linebreaks. I suspect your bulletin board software does this for its own purposes.

It's unusual that your bulletin board is all stored as wee text files, perhaps you want to think about mangling the results to point to the container. I had a wee look, but the long numbers of the text files don't seem to correspond to the numbers of the posts. Bummer.

To be honest, you'd be better off with some kind of spidering search engine, something that can crawl your site as the user sees it, via the web pages at ... /forum/forum.php?action=view&topic=whatever.

Perhaps your bulletin board software has been updated since you installed it, it may even now have a search facility built-in.

;o)
(or


Aaron Cavanaugh - 02.09.05 10:16 am

Hey Cor,

Thanks for taking a look.

God Bless.

Aaron.


Adrian - 08.09.05 6:02 am

Could you add a page that shows either an example of corzoogle we could try out or links to pages that have it in use?

Thanks


corz - 08.09.05 9:35 am

http://corzoogle.com

Which will lead you back to the search page here. You would be quicker clicking the search box in the toolbar, though. Scroll up to the top, you'll see it.

;o)
(or


Nathan - 12.09.05 10:13 am

Hi,
Corzoogle is a really good search engine. I really enjoyed reading through the comments hehe. There are some minor changes i've done like adding the url underneath the link and other stuff. If you'd like to see what i've done with it (so far) go to http://www.themeofthebible.com/misc/search.php.


corz - 13.09.05 9:51 am

Looks good Nathan, but unless you plan to pay me to use corzoogle, you'll need to put back at least the logo and the download link, thanks.

There's a small version of the logo kicking around somewhere (above) which might be easier to fit into your scheme of things.

;o)
(or

ps.. if you enjoy reading comments, there's loads more inside corzoogle itself, en-joy!


Nathan - 14.09.05 7:49 am

&*$% you noticed lol. I've made the changes. I was wondering if having multiple pages for larger max results would be possible or would it eat the speed too much?


corz - 14.09.05 8:27 am

smiley for :lol:

I don't have a mechanism to split the results into pages, no, if someone is getting more than a dozen results, they probably need to narrow their search query, and I have considered adding some text along these lines, something like.. "you have reached the results limit, consider adding more search terms to narrow down your results"

You can set a maximum limit for results, but corzoogle is so good at getting the best results to the top of the list, any half-intelligent search query will usually get the page you wanted in the top three or so, so it's never really been an issue.

Multiple results pages isn't so much a speed issue so much as one of compexity. We'd have to store the results somewhere before splitting them for the user (like these comments are paged) which would mean creating a file somewhere, and I guess I like to keep corzoogle as portable as possible, drop-in-and-search.

If lots of folk asked me for this, of course I'd consider it for when I get around to corzoogle pro, circa early 2006.

;o)
(or

ps.. THIS is a corzoogle logo..   an image
being transparent, it will look better against your orange background, too.


Ryan - 09.11.05 5:43 pm

First and foremost, brilliant software. Very easy to understand and implement into any site. Awesome.

Quick question though. I was wondering if corzoogle has the ability to show the results in a sepcified area. Lets say I had the search box at the top of the home page and wanted to overwrite my home page content with the search results when the user inputs a word to search (the site im working with is a php based site) ...Like many sites using other software for their search engine, this layout is used, Just wondering if I could get some help implementing this function! Thanks! And Great Work!



- Ryan.


corz - 10.11.05 10:02 am

Sure, just set $embedded = true; and then include corzoogle in your page. It will return the results in whatever space you put it inside. Lots of folk now run corzoogle this way.

You can put the search form - which is simply an html post form - absolutely anywhere and everywhere (the target being your embedded results page). Check out the 404 page here for an example, there's another example inside corzoogle itself, but you'll have to scroll for that.

have fun!

;o)
(or


Rikel - 16.11.05 2:34 pm

Hi Cor,

Nice to see a helpful FAQ page, and what seems like a nice clean search engine.

I wonder if it's possible for corzoogle to receive the search terms from a form, and run the search unseen. What I'm trying to do is have the user choose a couple of dropdown options, and then return just the results page (with logo, of course smiley for :D) after they submit.

This is a bit of a reach for my coding skills, so any help would be appreciated.

Thanks!
- Rik


Rikel - 18.11.05 1:05 pm

...Or could I just change the search text box to dropdowns in the corzoogle code?

That'd probably be a lot easier!

Is that possible?


- R


corz - 18.11.05 1:27 pm

erm - insert apologies for effin gmx mail server - so that's why it's been so quiet these last few days - damn, I wish I could still speak German! smiley for :roll: - yup. The form itself is self-contained (see functions at foot) so you could mess with that without fear of messing up corzoogle itself (you'll have a backup too, of course).

There are loads of ways to run things. Essentially, corzoogle searches and returns the results. You can do this in lots of ways, embedded, in an iframe, whatever you like. I saw one site using a drop-down to fire corzoogle, it was javascript, there was no need to even hit submit, it just shot off, but I'll be darned if I can find that URL.

Feel free to get back with more specifics, there will be a way.

;o)
(or


Rikel - 18.11.05 3:50 pm

Putting in the dropdowns themselves is no problem, it's making sure that corzoogle picks up everything from them is what concerns me.

I'd have to keep the name="q", wouldn't I? I see that's used later. How can I then be sure this captures the options from *both* dropdowns? smiley for :erm:

Thanks for spoon-feeding me on this.

- R


corz - 19.11.05 3:38 pm

Well, corzoogle will only use the "q" variable (you could easily change that), but *all* named elements will be available as variables in your target page. "capturing" and using them is up to you, or rather, the target script (container page).

For instance, if you had a form with two dropdowns. The first is the query term (named "q", value is whatever they choose), and the second is something else, say, a list called "other". When the user submits the form, "q" AND "other" will be available in the target page, their values being whatever was set by the user. corzoogle will use the "q" part, you can use the other variables however you like.

Grab this and use it as your target page for testing. It will show you exactly what variables are arriving at the page, and what their values are, along with lots of other potentially useful information.

If you need more help, paste in the actual HTML code you are using.

for now..

;o)
(or


Rikel - 21.11.05 2:58 pm

Right, so corzoogle wouldn't search by the 2nd dropdown term - unless I assigned it another variable and told corzoogle to look at that, too.

OK, let me think about that...

- R


sushma - 02.12.05 10:41 am

it only searches for files names with certain extensions.it would not search for a word which is present in those files.it would be better if we can search a term in all files and retun the files corresponding to that.what exactly i'm talking is like microsoft indexing service functionality in asp.net.
-chowdary


corz - 02.12.05 12:13 pm

sushma, I'm not sure I understand what you mean.

I think what you are saying is that corzoogle should seacrh *all* files regardless of the extension, but that sounds like security insanity!

;o)
(or

ps.. by the way, it would take about three second's hacking to get corzoogle to do that! But why oh why?


Nathan - 16.12.05 4:18 am

Hi again,
I ended up getting multiple pages to work withought using multiple files. There is one small error that i know of but it doesn't effect it too much. If you want to see the code changes just let me know. Here's the url again http://www.themeofthebible.com/misc/search.php


corz - 16.12.05 11:28 am

Heh, yeah, that's nice. You are running the search for *every* page, yeah? I'd considered this approach, but really I guess the idea of running the search all over again just to show the next "page" of results seems wasteful to me. This saves in other areas, of course.

But corzoogle is fast, and clearly its fast enough to make this a feasible option at least on small to medium sized sites, the results are cached in the server's ram, so this might actually be a good way forward. Hmm. Certainly I reckon it could be an option. Nice work!

And Yes! Of course I want to see your code! If anyone does *anything* to corzoogle, I want to see the code!

;o)
(or

ps.. what's the small error? perhaps I could look into it.


Nathan - 18.12.05 2:10 am

Did you want me to post the changes here or somewhere else?
There is something wrong with the code tags. At least on my computer.


corz - 24.12.05 2:04 am

Oh, mail that! It's noisy here. I'll take a look at your code tags, too. A second pair of eyes is always useful.

;o)
(or

[ edit - oops! what was I telling folk about posting comments on the dev mirror! smiley for :lol: ]


Mardi - 10.04.06 11:42 am

Hi!

I just get start with corzoogle. I don't know if corzoogle works with Khmer Unicode?

Best regard,
Mardi


LPetri - 02.10.07 6:24 pm

Hi,
I am using Corzoogle and loving it! Do you know if it will work with PHP5? My server is about to upgrade.
Thanks!


corz - 02.10.07 8:08 pm

Cool! As to your question..

Good question! I moved to a php5 server last week, and before corzoogle would work, I had to..
do nothing at all. smiley for :lol: Yup, it works great.

for now..

;o)
(or

ps. new update coming soon, as well as a corzoogle XHTML overhaul (but first I'll need to do that to this area of the site, which for some reason, I'm leaving until last).


Simon - 29.11.07 11:16 pm

Can't download, it complains I don't have cookies enabled - but I do! This is in Firefox and IE.


corz - 03.12.07 10:18 am

Aye, that was a bug with the recent distro machine beta (always the latest version of everything running here at the .org, bugs and all!). It's now fixed.

It was still possible to download from the main distro machine menus in /engine (as many seem to have), just not the embedded menus, like on the corzoogle download page.

By the way, the beta is the recommended version (as running here), not just for the XHTML-goodness. It will likely become the main release, as soon as I get time to do that.

;o)
(or

ps. Apologies for not replying sooner - recovering from hardware failure.


Nicholas Pratt - 04.01.09 9:21 pm

Hello:

I installed Apache with these settings:
- Installed apache_2.2.11-win32-x86-no_ssl-r2.msi
- Server settings:
- Network Domain: localhost
- Server Name: localhost

When I go to http://localhost/ it says "It Works"

I installed corzoogle.php within my Administrator directory. It was initially set to be readable with Dreamweaver so I changed the program association to Notepad.

How do I access the Corzoogle search script on my local computer? When I go to C:\Documents and Settings\Administrator\corzoogle.php it brings up the script text but no search box. What search path or method should I use?

Thanks,

Nicholas


corz - 04.01.09 11:36 pm

Hi.

corzoogle is a php script, which means it needs to be processed by the php engine, rather than simply rendered by a web browser, like HTML is. As you have discovered, this would only get you the source code in plain text.

First, you need php running on the web server, if it isn't already; download the Windows installer from the php site, and install it. Restart your Apache server. Then simply navigate (with your web browser) to..

http://localhost/corzoogle.php

That assumes you dropped corzoogle in the root of your server, and it's still named corzoogle.php. If it's in a folder called, for example, admin, you would obviously instead go to..

http://localhost/admin/corzoogle.php

And so on. On a completely different subject..

Your Explorer file associations were probably altered when you installed Dreamweaver. I'm not familiar with the tool (because I don't like wysiwyg web coding) but I'd wager it's superior to Notepad, and you obviously use it anyway … Why not leave the association as it is?

If you do prefer to edit php files in a plain text editor - something I wholeheartedly recommend, by the way - Notepad isn't going to cut it. There are some excellent editors around. I recommend a few here.

At the very least, you want something that does Syntax Highlighting. Otherwise, you are very likely to make errors when coding php, setting your corzoogle preferences, etc.

Good luck!

If you have any issues installing corzoogle, feel free to get back here with more details.

for now..

;o)
(or

p.s. The "It Works" page is standard on all new Apache installs. Drop an index.php file (or index.html, or index.htm) in there, and you'll get that, instead.


Macy - 21.07.09 9:36 am

Hi. I'm using this at my site where it's working very well. Thank you! I want to use it at home but I'm not having a lot of luck setting it up. Is this even doable? I have Apache running on my computer and it works ok. It's Ubuntu. I want to search inside my home folder. thanks!

Best. Macy


corz - 21.07.09 7:41 pm

Hi Macy.

Yes, it's doable! So long as Apache is running with php, corzoogle will run, too. I have it doing exactly what you want on my Kubuntu (Jaunty) laptop, so these instructions should be fairly accurate..

First, create a virtual host for your home folder.

As you have complete control of the server (it's your machine!) it makes sense to give your home folder its own virtual host. In (K)ubuntu, you simply add a config file into /etc/apache2/sites-enabled/. Name the file anything you like, here's mine..
<VirtualHost *:80>

	ServerName tobi
	DirectoryIndex index.php index.html
	ServerAdmin admin@tobi
	DocumentRoot /home/cor

	<Directory /home/cor/>
		Options FollowSymLinks
		AllowOverride All
		Order deny,allow
    	        Deny from all
		Allow from 127.0.0.0/255.0.0.0 ::1/128
	</Directory>

	ErrorLog /home/cor/.apache2/error.log
	LogLevel warn
	CustomLog /home/cor/.apache2/access.log combined

</VirtualHost>


There's not much to it; it's a basic virtual host accessible only from the local machine. Rename corzoogle.php to index.php and drop it directly into your home folder. Ensure it is world-readable (chmod 644 ~/index.php), and then restart Apache.

My laptop's host name is "tobi", and as I don't use that for any other web server, typing tobi into my web browser takes me directly to my corzoogle "desktop search", just like that!

Have fun!

;o)


Smokey - 12.02.10 2:36 pm

First i got to say: i love this script, it does *almost* everything i want for the site i'm creating.

What i've done so far with the script:

I've modified some of the search parameters so it also looks into exif data, so the script also finds images. (since the site is going to be used for photography, thats an important bit)

Edited the look of the resultspage using css.

Added a "back" button on the results page (since i wanted corzoogle to be on a page of it's own, but not open in a new window) to return to the main-site.

Translated the pages to dutch (since most viewers of the site will be dutch)

Now i'm trying to expand on the script abit but am having alot of trouble with it since im not a PHP-kind of guy (basically i stick to using html)
i'd like to know if it is possible to allow thumbnails to be implemented into the resultsnippets so u get to see the image it found..

regards;
Smokey

Anything is possible. ;o)



John - 16.12.10 8:02 am

My website uses php includes and the 'get' method to include content into one main template page.
IE: pages are linked to as index.php?p=pagehere , and that page would be off in my pages directory called pagehere.php

I use this code to do that:
<?php
$page 
= $_GET['p'];
if ($page)
{
include(
"pages/".$page.".php");
}
else
{
include(
"pages/index.php");
}
?>

I see that your script has a mangle function which appears to do kinda what I want, but after a few hours of messing around i couldnt make it work.

Is there any way to make your search work with my site? Ive used it before on other sites and its amazing, just when I changed my layout I forgot to take the fact that it might not work into account.

Thanks in advance
John

The mangle preference is only for mangling file extensions. You would be best served hacking something directly into the mangle code itself. It shouldn't bee too difficult. ;o)



First, confirm that you are human by entering the code you see..

(if you find the code difficult to decipher, click it for a new one!)


Enter the 5-digit code this text sounds like :

lower-case en, Upper-Case You, thuree, lower-case elle, Upper-Case Kay


 

Welcome to corz.org!

I'm always messing around with the back-end.. See a bug? Wait a minute and try again. Still see a bug? Mail Me!