how to feed rss

creating rss newsfeeds easily with php

begin feeding..

This document explains how to create a simple rss feed for your website, and then presents some fairly generic ideas and methods for generating automated rss news feeds using php. I'll be shaving off a lot of corners to make this as hands-on and easy as possible. if you want deep technical information about xml namspaces and what-not, go elsewhere.

what is rss?

The easy answer is that rss stands for "Really Simple Syndication", and I'm happy enough with that. technically, it's an acronym for "RDF Site Summary", but mentioning RDF and RSS in the same paragraph is a sure-fire recipe for confusion. oops! anyway, an rss feed is no more than a simple xml document, a plain text file that a news aggregator can parse into headlines.

By giving the URL of a news feed to a news aggregator, you can grab a site's "headlines", which is a broad term describing anything from their latest forum posts to latest hot products, or Mac Tips, depending on who owns the feed. you can use your news aggregator to subscribe to any number of feeds and be kept perpetually up-to-date with all the things which matter to you. a decent news aggregator could quite literally change your online life.

Creating an rss feed can be as easy as placing a few simple tags in a text file and dropping it in your site, or very much more complex and generated dynamically from a whole range of data sources, all depending on the quantity and quality of data you want to provide.

Here is a fictional, though complete and quite valid, rss newsfeed..

<?xml version="1.0" encoding="UTF-8"?>


			<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" 

			"http://my.netscape.com/publish/formats/rss-0.91.dtd">


			<rss version="0.91">


			  <channel>


			    <title>corz rss</title>


			    <link>https://corz.org/news/</link>


			    <description>corz.org latest news</description>


			    <language>en-gb</language>


			    <image>


			      <title>corzblog</title>


			      <url>https://corz.org/blog/inc/img/corzblog.png</url>


			      <link>https://corz.org/</link>


			    </image>


			    <item>


			      <title>big story</title>


			      <description>wow! would you believe it! it's incredible!...</description>


			      <link>https://corz.org/news/index.php#big story</link>


			    </item>


			    <item>


			      <title>yesterday's big story</title>


			      <description>I have a feeling about tomorrow, almost as if... </description>


			      <link>https://corz.org/news/index.php#yesterday's big story</link>


			    </item>


			  </channel>


			</rss>

Nothing mysterious there. let's quickly run through it..

Just like html, the document start with a "doctype statement", which tells the client software (news aggregator, usually) just what to expect, and if necessary, what reference dtd (doctype definition) to refer to when attempting to parse it. different versions of rss use different definitions, the oldest and most common being "rss-0.91", which was the first widely-accepted rss standard on the net. rss has had a few upgrades since then, but it doesn't matter which version you use, so long as you tell the rss reader what to expect.

rss is a form of xml, or rather, it uses a distinct xml vocabulary to present the data. xml is cool, it enables us to create containers for data, and give those containers names, and ways to hold other containers, and an almost unlimited scope for describing the data inside. as far as we're concerned, this clever xml structure will enable us to create a "channel", and inside that channel, "headlines".

Before we begin with the headlines, we give the aggregator some information about the feed itself. we start by opening an <rss> container. inside that is our main channel "element", and inside our channel, is everything. the example above contains an absolute minimum set of channel elements (or technically, sub-elements). It's all fairly obvious, note; the titles can be up to 100 characters; the links, 500; description must be plain text (NO HTML!); language code has to be a standard valid one; and image no bigger than 144x400.

Finally, we get to the main feature; our channel's headlines, or <item>s, for each <item> we need to supply only a title (same rules as above), description (usually a small preview snippet from the top of the article, or synopsis, up to 500 characters) and a valid link. we finish up by closing all the containers and we're done. If you copy and paste the above example into a text file and drop it into your site as "feed.rss"; tada! you have a valid news feed! okay, it has nothing to do with you or your site, is fictional and doesn't update..

not really "live" is it?

If you only update your site once every few days or weeks, then it's perfectly feasible to just edit your rss file by hand, add a new <items> (up to a maximum of 15 for a 0.91 feed) and save it again. it's tedious, but would serve the purpose, I guess. You'd probably want to re-validate it after editing, using one of the net's many rss validators. But in the end you'd probably not bother, with any of it, and why should you? the whole point of rss syndication is that it can be automated, leaving you to concentrate on the content part of "content syndication".

generating newsfeeds

Although a lot of the channel data is fairly static; site name, main link, description, etc; the main juice of the feed really needs to be generated dynamically, that is, inserted into the page at "run-time", which is when the browser/aggregator requests the page. This is where php comes in. essentially, all we need to do is have our script gather up the headlines from whatever source; flat-files/database/etc; and then create <item>s with it.

It makes sense to use php variables for the static stuff, too. you could keep your details in one central "config.php" file, and pull in whatever you need at runtime. our fictitious config.php could be called by other onsite scripts, too, so more prefs could be added to that if need be. or you may already have a site-wide config.php of some sort, so just add a few variables to it, and include it from your rss script. here's one that would work fine..


<?php
// config.php for rss feed..
if  (realpath ($_SERVER['SCRIPT_FILENAME'])    ==    realpath (__FILE__))  {
        /* prevent direct access */     die ( 'to err is human, human!' ); }

/*
prefs..    */

$mysite = 'corzblog';
$mydomain = $_SERVER['HTTP_HOST'];
// this blog is busier right now..
$blogurl = "http://$mydomain/devblog/";
$description = 'the blog of the dev';
$emailaddress = "webmaster@$mydomain";
$filename = $_SERVER['DOCUMENT_ROOT'].'/devblog/blogz.blog';
?>

I wasn't kidding about uploading a static feed to your site, if you start with one valid xml feed, and go line-by-line converting it to php output, you almost can't go wrong. we're working to create a standards-compliant xml document, so it makes a lot of sense to start with exactly that! use it as a template. here's how the code might start to look..


<?php
require('config.php');
header('content-type: application/rss+xml'); 

echo '<?xml version="1.0" encoding="UTF-8"?>',"\n";
echo '<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"';
echo ' "http://my.netscape.com/publish/formats/rss-0.91.dtd">',"\n";
echo '<rss version="0.91">',"\n\n";
echo '<channel>',"\n";
echo '<title>',$mysite,'</title>',"\n";
echo '<link>',$blogurl,'</link>',"\n";
echo '<description>',$description,'</description>',"\n";
echo '<language>en-gb</language>',"\n\n";
echo '<copyright>copyright 2004 - ',$mydomain,'</copyright>',"\n";
echo '<managingEditor>',$emailaddress,'</managingEditor>',"\n";
echo '<webMaster>',$emailaddress,'</webMaster>',"\n\n";
echo '<image>',"\n";
echo '<title>',$mysite,'</title>',"\n";
echo '<url>',$blogurl,'inc/img/corzblog.png</url>',"\n";
echo '<link>',$blogurl,'</link>',"\n";
echo '</image>',"\n\n";
?>

getting picked up

The first line is the most important; the http header..

do it right the first time..

header("content-type: application/rss+xml");

Without this, the feed would be interpreted as a plain html document instead of an xml document, and most decent aggregators won't accept that these days. the only (dumber) alternative to sending a content-type header is to have your web server parse .rss (or .xml) files as php. this is easy enough to do, just add one line to your .htaccess file..

have .rss files piped through the php machine..

AddType application/x-httpd-php .rss .xml

If you run under phpsuexec, you can do something like this..

have .rss and .xml files piped through the php machine under phpsuexec..

<FilesMatch "\.(rss|xml)$">
SetHandler application/x-httpd-php
</FilesMatch>

and because the extension in .rss, aggregators will assume it's an xml document, so no header is required. but I do recommend taking control of this from within your script; there's nothing at all wrong with having an rss feed with a .php extension, so long as you send the correct headers.

Next we grab the configuration data from the prefs file (config.php), and use it to populate the various static elements. apart from a few added extras, namely "webmaster", "managing editor" and "copyright", the above code will produce output identical to the example at the top of this article. there are other "optional" elements that can be added, too, and I'll probably go into some of those later on. so far so good.

getting in the loop

Next comes the all-important headlines, or rather; <item> elements. this is where the fun really begins.

If you haven't already, it's time to ask yourself what you want your newsfeed to do. keep your blog readers updated of the latest rants? inform customers about your hot products? share your front-page news headlines? what? whatever it is, we are going to need our php script to grab the relevant content and produce <item> elements with it.

This "content" could be from anywhere; a database, a flat-file, another site, even another newsfeed, whatever. for the purposes of this article I'm going to use corzblog as an example, and we'll be creating an rss feed of the latest blogs. corzblog has had an rss feed almost since day one, using very similar code to that given here. it also produces an automated "rss 1.0" feed. download a copy if you want more details/code examples. both feeds can very easily be easily adapted for other purposes.

grabbing those headlines

My blogs are plain html structures, here's an example..

<div class="blog-entry">

			  <div id="AnotherBlogEntry">

			    <!--*g*-->

			    <h3>Another Blog Entry</h3>

			  <!--*g*-->this is another blog entry, more html.<!--*g*-->

			</div>

			<div class="byline">

			  posted by cor @ Wed 22nd February 7:37 pm

			</div>

			<hr class="cb-hr" /><br />

			</div>

			<!--*end*-->

			



			<div class="blog-entry">

			  <div id="ABlogEntry">

			    <!--*g*-->

			    <h3>A Blog Entry</h3>

			  <!--*g*-->this is a blog entry. simple html<!--*g*-->

			</div>

			<div class="byline">

			  posted by cor @ Wed 22nd February 7:36 pm

			</div>

			<hr class="cb-hr" /><br />

			</div>

			<!--*end*-->

the document is just straightforward html, split into "articles" by an  tag (a simple html comment).

As it stands, it's not much use for our feed. to utilise this data, we'll first need to spilt it up into blog entries, or "articles". so we'll use php to create an array, each item of the array containing a different blog entry. here's some code to do that..


<?php
// open and split the blog into an array of blog entries..
if (is_readable($filename)) { $file_contents = @implode('', file($filename)); }
$whole_blog = explode('<!--*end*-->', $file_contents); 
?>

As you can see in the example, each blog entry is further divided by more fictitious tags, in this case a  tag, which is my little joke. for a valid feed, we need to supply "title", "description" and "link" elements for each <item> of our feed, so we'll have to further split these html structures to pull out those individual parts. fortunately, each blog entry has all the bits we need..

between the start and 1 will be used for the link
between 1 and 2 lives the title.
between 2 and 3 is the blog itself, etc..

so we'll simply loop through the articles one-by-one, split them up by their  tags, spit out the correct parts with some xml formatting and we're done. if you don't have convenient structures like these in your data, you may have to use more devious methods to split it up, like regex, or whatever.

the whole thing..

The comments in the code are probably more useful than anything else I could say, so here's it is. I added some tabs ("\t") to make the outputed xml more readable, though this isn't strictly necessary.


<?php

// log the hit.. (you're gonna log them, surely!)
@include $_SERVER['DOCUMENT_ROOT'].'/inc/init.php';

// send correct header..
header('content-type: application/rss+xml'); 
require('config.php');

echo '<?xml version="1.0" encoding="UTF-8"?>',"\n";
echo '<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"';
echo ' "http://my.netscape.com/publish/formats/rss-0.91.dtd">'."\n";
echo '<rss version="0.91">',"\n\n";
echo '<channel>',"\n";
echo '<title>',$mysite,'</title>',"\n";
echo '<link>',$blogurl,'</link>',"\n";
echo '<description>',$description,'</description>',"\n";
echo '<language>en-gb</language>',"\n\n";
echo '<copyright>copyright 2004 - ',$mydomain,'</copyright>',"\n";
echo '<managingEditor>',$emailaddress,'</managingEditor>',"\n";
echo '<webMaster>',$emailaddress,'</webMaster>',"\n\n";
echo '<image>',"\n";
echo '<title>',$mysite,'</title>',"\n";
echo '<url>',$blogurl,'inc/img/corzblog.png</url>',"\n";
echo '<link>',$blogurl,'</link>',"\n";
echo '</image>',"\n\n";

// read and split the blog into an array of blog entries..
$file_contents = implode('', file($filename));
$whole_blog = explode('<!--*end*-->', $file_contents); 

// start the loop..
$total_articles = count ($whole_blog); 
for ($i=0;$i < $total_articles;$i++) { 
    $z = $i + 1;
    
    if ($z == $total_articles) {
        break; // we reached the end of the articles
    } else {
        $blog = $whole_blog[$i];
        
        // split the article into its parts..
        $parts = explode('<!--*g*-->', $blog); 
        $blog_title = strip_tags($parts[1]); // the title
        
        // we need a valid "encoded" version of the link..
        $enc_title = rawurlencode(strip_tags($parts[1]));

        // grab the first two paragraphs for the news preview.
        // it may go over 500 chrs, och well, we could check for that.
        $the_blog = explode('<br />', $parts[2]);
        $snippet = htmlspecialchars(strip_tags($the_blog[0])) 
            .htmlspecialchars(strip_tags($the_blog[1]))
                .htmlspecialchars(strip_tags($the_blog[2])); // what a pain!
        
        // now create the <item> element..
        echo '<item>',"\n";
        echo '<title>',$blog_title,'</title>',"\n";
        echo '<description>',$snippet,' ...</description>',"\n";
        echo '<link>',$blogurl,'index.php#',$enc_title,'</link>',"\n";
        echo '</item>',"\n\n";
    }
} // and loop until we reach the end of the blog

echo '</channel>',"\n";
echo '</rss>';
?>

It's not exactly rocket science, and does everything we need. as soon as a new blog entry is created it will be immediately available to the next aggregator that loads our feed. and that, of course, will be your news aggregator while you test it! it probably goes without saying that you should be loading the resultant page in a news aggregator at every step, certainly the quickest and smartest way to test if your feed is working.

validate that feed

Presuming your rss feed is functioning, the next stage is to validate it using one of the net's many rss validators, three good ones are feedvalidator.org, redland and rss scripting. this is neat, too. google around, there are quite a few now, and even software to allow you to validate your own at home, though unless you are creating a lot of feeds, this is clearly overkill. If you're feed doesn't validate right away, the validator will usually tell you why, and with a wee bit of trial-and-error, your feed will validate just fine.

getting real

This is a job you want to do once and forget about. once your rss feed has been setup, it will deliver your latest headlines to anyone who requests them, and you can get back to the content. If you want to do more with your rss feeds, that is, incorporate more data or interactive features, check out some of the online publications detailing rss 2.0 and rss 1.0/RDF specifications, and all the wonderful extra tags you can create. in fact, if you change the version info on this feed to <rss version="2.0"> you have a version 2.0 feed, and you can just start adding tags! just don't expect any other versions of rss to be this compatible. generally, it's best to pick a version (0.91/1.0/2.0) right at the start, and then stick to it! "0.91" has the widest support, but also the least features. you decide.

and this'll work, will it?

Here's all the php source files for the project..

If you like, you can access that machine directly, and peruse the source files. If you check out the source for rss.php (the "whole thing", above) and then click the "try out this script" link, you will get a valid rss feed in your browser! my blog headlines. so yeah, it works...

cut the crap and feed me NOW!

;o) Cor

Welcome to the comments facility!

james - 01.04.05 1:22 am

there is a bug in firefox where firefox will display rss/rdf feeds sent with header("Content-type: application/rss+xml"); will be displayed as plain text.

sending the feed with header("Content-Type: text/xml"); however, will work.

corz - 01.04.05 10:13 am

yeah, I'd noticed it downloads rss feeds into my text editor. I actually prefer that, the text highlighting is better!

I guess they'll fix it, though.

;o)

next comments (1 page)