corzblog bbcode parser preview



Here it is! My [search engine fodder] bbcode to html parser, and html to bbcode parser [/search engine fodder]!

This is the actual very onsite parser that parses the bbcode of my blogs and site comments, which as well its usual tasks of, well, you know, the parsing stuff, also moonlights doing a cute wee background demo of itself, you're looking at it. it knew you wanted to do that. hit the "preview" button to see at least one half of the parser's bbcode to html/html to bbcode functionality.

The front-end (below) is built-in to the parser, you just call the function and it creates the form. The cool, super-portable JavaScript bbcode buttons and functions come in the package, too. Have fun. Oh, and by the way, output is 100% pure HTML5, or nice plain bbcode, which ever way you look at it, it's free.

button to undo the last javascript change
cbparser quick bbcode guide..
Most common bbtags are supported, and with cbparser's InfiniTags™ you can pretty much just make up tags as you go along. If cbparser can construct valid html tags out of them, it will. Experimentation is the key, and preview often.

A few bbcode examples..
[b]bold[/b], [i]italic[/i], [big]big[/big], [sm]small[/sm], [img]http://foo.com/image.png[/img], [code]code[/code],[code]teletype[/code], [url="http://foo.com" title="foo!"]foo U![/url], and more.. To post code with indentation and/or strange characters, .htaccess, etc., use [pre][/pre] tags.
download cbparser
an HTML5 compliant bbcode parser



Welcome to the comments facility!


previous comments (thirteen pages)   show all comments

cor - 10.01.06 10:12 am

Thanks!

I originally had
<?php
htmlentities
($text, ENT_QUOTES, 'UTF-8')
?>

but sadly my development server can't handle multibyte stuff very well (though it should!), so I had to switch that off (the line has since been put back in but is commented out, with a note).

I don't want to run the xssclean after parsing because I use javascript in some of the tags, so it must work at the bbcode end of thing. And if you want something really nasty for IE try this..
[table datasrc="."][/t]
I've added that to the xss clean-up, but your version will still be exploitable. try it just for fun.

I wasn't aware that you could throw javascript statements into image tags. Thats's fecking nuts! I presume this is IE only, is it? smiley for :roll: I guess I could add something for that.

replaced with str_replace, probably (in the xss-prevention code?). The thing with the regex engine is, once you've got it up and running, it's pretty much neck and neck with a regular str_replace. The secret is to avoid it altogether, if possible, which it isn't here.

Feel free to keep tweaking away, blah, that's what it's all about, and I'm sure new exploits will keep appearing all the time; annoying as it is, you can always drop them here, anyone. If you manage to replace any of the preg_replace statements with str_replace equivilents, mail me your changes!

I got the entities dropdown working properly yesterday, and put up a couple of updates as I went along. I've now tied the internal version number into the download link (which is generated), the idea being, as soon as a new version goes into place here, I'll need to up the same version for the download link to keep working. Of course, I may forget smiley for :D

I also updated the bbtags page to reflect the new version. Aside from more tags, there are a few other changes. I'll note some here, making notes for a proper devblog entry when this becomes the main cbparser release..

There's no more "strictly bbcode" option, in that it's bbcode or nothing. Angle brackets are encoded to html entities, so entering raw HTML tags is no longer an option. But of course, with InfiniTags™, you can enter any html as bbcode, so really, there's no need for it.

Likewise, the html >> bbcode conversion is always enabled. cbparser will attempt to translate any tags it doesn't recognise into bbcode InfiniTags™, just like it does with known bbcode markup.

Someone may have noticed that cbparser's built-in gui is also equipped with the most effective anti-CSRF attack measure available, though in truth, I didn't put that feature (trackable hidden token) in there for that, but for my own devious uses (tracking comment entries, in fact, ie.. edit your comment, or whatever). But there you are, an added bonus!

I'll do more notes later.

;o)
(or

ps.. fixing the image tag is just adding a "?" after the = of the javascript catcher. now it catches all sorts.
pps. try a newer version. smiley for :ken:


cor - 10.01.06 3:46 pm

Just to keep things balanced smiley for :D I've came across a simple set of tags that will crash Firefox (1.0.7 and below)..

<sourcetext></sourcetext>

ouch!

<parsererror></parsererror>

has the same effect, apparently, though I haven't tested it.
I've added these to the most recent xss-prevention code, of course, along with a few other nuggets I came across on my travels.

Quite fun, this browser crashing stuff.

;o)
(or

ps.. DO NOT put those tags (or the earlier IE table tag) into an html document and load in your browser if you have unsaved form elements, or any other data you value, because your browser will crash!


blah - 11.01.06 2:36 am

I generally try out all the exploits listed here prior to using any code on my site. You may find me irritating, but thats the least you can do when you have over 1 lac people on your site and you can unknowingly piss off quite a few of them :P


cor - 11.01.06 2:43 am

Yes, I know that page. A nice reference.

And no, I don't find you irritating. Though I do find the carzy security holes in common browsers *very* irritating; I've got better things to do with my unpaid time than mop-up after after multi-zillion dollar software companies!

Keep the exploits coming! It's all good!
Have you managed to exploit rc5, yet?

;o)
(or

ps.. phpsuexec upgrade in progress, expect many onsite errors, but not with the parser, it rock!


cor - 15.01.06 5:17 pm

I notice that the built-in demo is a bit messed-up when it's not living in my blog folder. I'll have a look at that in the coming week.

;o)
(or


möööp - 06.02.06 5:48 am

Thx for your work.
It looks interesting, but unfortunately it's just another regex and string-replacing orgy, but not a parser.


cor - 06.02.06 8:39 pm

Semantics!


möööp, you must be working with some obscure definition of the word "parser".

But you're right about one thing; it does look interesting, very interesting indeed. And in a visual media like the web, what else matters?

The methods employed are, considering all things, the most appropriate for the job, and it gets the job done superbly on hundreds, possibly thousands of sites, so there's nothing unfortunate about it!

;o)
(or

ps.. don't I know you from an Aesop's Fable?


Markus - 14.02.06 4:29 pm

I am looking for some support actually can I get a link to some FAQ's or support forum?


cor - 14.02.06 9:26 pm

Markus, you are on it! Fire away!

;o)
(or

ps.. I introduced a minor bug in 1.0.3b, where double square brackets [ ] weren't passing through the new tag balance checks. Fixed in 1.0.4b.


duck monster - 05.03.06 8:16 am

Radium, one of the techs behind SA has a really neato article on bbcode type parsers at ;-

http://www.teambarry.com/


Basically, parser does have a specific meaning in IT, which has to do with transforming one set of symbols to another (ie C++ -> assembly, or bbcode -> html), and theres a reasonable corpus of theory around it.

However on a common sense type level, sure you've written a parser of sorts.

One advantage of proper parser design is it automatically deals with unclosed tags and orphaned esclamations, etc. things like [ etc.


cor - 06.03.06 5:49 am

Not a bad wee read, duck monster, but seriously, anyone who says "php is a joke" should provide salt with their text! smiley for :lol: Sure, php lacks some of the finesse of certain other languages, but zillions of sites really aren't wrong; it's is an incredibly useful language for web development.

Anyway, cbparser does, in fact, deal with unclosed and orphaned tags. Later versions will correct certain tag imbalance errors, insert missing tags, give appropriate warnings, and it always stopped the user if the tags didn't balance. In practice, this proves to be a fine way to deal with orphaned [ characters and anything else, encouraging users to understand and get their code right in the first place! That's not such a bad thing.

There are thousands of comments here onsite, so I know that even the most technically inept can comprehend and operate the bbcode facilities without trouble, and produce great looking, valid xhtml. That's good enough for me, and most other humans.

I have no aspirations to write "the perfect bbcode parser", even if I could, though I have considered ways to implement a php stream-state parser (as I see it), but none very fruitfully, I'm in no hurry, I'm having quite enough fun with the one I've got.

Truth is, there's nothing out there quite like cbparser. Do you know of another bbcode parser that will, for instance, mash you email addresses so spam-bots don't chew on them? Or provide built-in GUI, or spam-protection? Or convert arbirtary legacy html code? It's a wee bit more than a simple bbcode->html converter.

At the end of the day, I'm wrote this for me. True, thousands of others have grabbed it, but essentially it's part of corzblog, and will always be evolving to my own particular requirements. On request, I made it available for free, and maintain it in this modular state, no small amount of work, but that doesn't mean I'm looking to collaborate on it, in any shape or form!

Thanks for the input, though.

;o)
(or

ps.. it does prove one thing, however; a bbcode parser is important. smiley for :ken:


next comments (7 pages)

First, confirm that you are human by entering the code you see..

(if you find the code difficult to decipher, click it for a new one!)


gd verification image

 

Welcome to corz.org!

I'm always messing around with the back-end.. See a bug? Wait a minute and try again. Still see a bug? Mail Me!