nice orange rss icon

how to feed rss

creating rss newsfeeds easily with php

begin feeding..

This document explains how to create a simple rss feed for your website, and then presents some fairly generic ideas and methods for generating automated rss news feeds using php. I'll be shaving off a lot of corners to make this as hands-on and easy as possible. if you want deep technical information about xml namspaces and what-not, go elsewhere.

what is rss?

The easy answer is that rss stands for "Really Simple Syndication", and I'm happy enough with that. technically, it's an acronym for "RDF Site Summary", but mentioning RDF and RSS in the same paragraph is a sure-fire recipe for confusion. oops! anyway, an rss feed is no more than a simple xml document, a plain text file that a news aggregator can parse into headlines.

By giving the URL of a news feed to a news aggregator, you can grab a site's "headlines", which is a broad term describing anything from their latest forum posts to latest hot products, or Mac Tips, depending on who owns the feed. you can use your news aggregator to subscribe to any number of feeds and be kept perpetually up-to-date with all the things which matter to you. a decent news aggregator could quite literally change your online life.

Creating an rss feed can be as easy as placing a few simple tags in a text file and dropping it in your site, or very much more complex and generated dynamically from a whole range of data sources, all depending on the quantity and quality of data you want to provide.

Here is a fictional, though complete and quite valid, rss newsfeed..

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" 
"http://my.netscape.com/publish/formats/rss-0.91.dtd"
>
<
rss version="0.91">
  <
channel>
    <
title>corz rss</title>
    <
link>http://corz.org/news/</link>
    <
description>corz.org latest news</description>
    <
language>en-gb</language>
    <
image>
      <
title>corzblog</title>
      <
url>http://corz.org/blog/inc/img/corzblog.png</url>
      <
link>http://corz.org/</link>
    
</image>
    <
item>
      <
title>big story</title>
      <
description>wow! would you believe it! it's incredible!...</description>
      <
link>http://corz.org/news/index.php#big story</link>
    </
item>
    <
item>
      <
title>yesterday's big story</title>
      <
description>I have a feeling about tomorrow, almost as if... </description>
      <
link>http://corz.org/news/index.php#yesterday's big story</link>
    </
item>
  </
channel>
</
rss>
Nothing mysterious there. let's quickly run through it..

Just like html, the document start with a "doctype statement", which tells the client software (news aggregator, usually) just what to expect, and if necessary, what reference dtd (doctype definition) to refer to when attempting to parse it. different versions of rss use different definitions, the oldest and most common being "rss-0.91", which was the first widely-accepted rss standard on the net. rss has had a few upgrades since then, but it doesn't matter which version you use, so long as you tell the rss reader what to expect.

rss is a form of xml, or rather, it uses a distinct xml vocabulary to present the data. xml is cool, it enables us to create containers for data, and give those containers names, and ways to hold other containers, and an almost unlimited scope for describing the data inside. as far as we're concerned, this clever xml structure will enable us to create a "channel", and inside that channel, "headlines".

Before we begin with the headlines, we give the aggregator some information about the feed itself. we start by opening an <rss> container. inside that is our main channel "element", and inside our channel, is everything. the example above contains an absolute minimum set of channel elements (or technically, sub-elements). It's all fairly obvious, note; the titles can be up to 100 characters; the links, 500; description must be plain text (NO HTML!); language code has to be a standard valid one; and image no bigger than 144x400.

Finally, we get to the main feature; our channel's headlines, or <item>s, for each <item> we need to supply only a title (same rules as above), description (usually a small preview snippet from the top of the article, or synopsis, up to 500 characters) and a valid link. we finish up by closing all the containers and we're done. If you copy and paste the above example into a text file and drop it into your site as "feed.rss"; tada! you have a valid news feed! okay, it has nothing to do with you or your site, is fictional and doesn't update..

not really "live" is it?

If you only update your site once every few days or weeks, then it's perfectly feasible to just edit your rss file by hand, add a new <items> (up to a maximum of 15 for a 0.91 feed) and save it again. it's tedious, but would serve the purpose, I guess. You'd probably want to re-validate it after editing, using one of the net's many rss validators. But in the end you'd probably not bother, with any of it, and why should you? the whole point of rss syndication is that it can be automated, leaving you to concentrate on the content part of "content syndication".

generating newsfeeds

Although a lot of the channel data is fairly static; site name, main link, description, etc; the main juice of the feed really needs to be generated dynamically, that is, inserted into the page at "run-time", which is when the browser/aggregator requests the page. This is where php comes in. essentially, all we need to do is have our script gather up the headlines from whatever source; flat-files/database/etc; and then create <item>s with it.

It makes sense to use php variables for the static stuff, too. you could keep your details in one central "config.php" file, and pull in whatever you need at runtime. our fictitious config.php could be called by other onsite scripts, too, so more prefs could be added to that if need be. or you may already have a site-wide config.php of some sort, so just add a few variables to it, and include it from your rss script. here's one that would work fine..

<?php
// config.php for rss feed..
if  (realpath ($_SERVER['SCRIPT_FILENAME'])    ==    realpath (__FILE__))  {
        
/* prevent direct access */     die ( 'to err is human, human!' ); }

/*
prefs..    */

$mysite 'corzblog';
$mydomain $_SERVER['HTTP_HOST'];
// this blog is busier right now..
$blogurl "http://$mydomain/devblog/";
$description 'the blog of the dev';
$emailaddress "webmaster@$mydomain";
$filename $_SERVER['DOCUMENT_ROOT'].'/devblog/blogz.blog';
?>
I wasn't kidding about uploading a static feed to your site, if you start with one valid xml feed, and go line-by-line converting it to php output, you almost can't go wrong. we're working to create a standards-compliant xml document, so it makes a lot of sense to start with exactly that! use it as a template. here's how the code might start to look..

<?php
require('config.php');
header('content-type: application/rss+xml'); 

echo 
'<?xml version="1.0" encoding="UTF-8"?>',"\n";
echo 
'<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"';
echo 
' "http://my.netscape.com/publish/formats/rss-0.91.dtd">',"\n";
echo 
'<rss version="0.91">',"\n\n";
echo 
'<channel>',"\n";
echo 
'<title>',$mysite,'</title>',"\n";
echo 
'<link>',$blogurl,'</link>',"\n";
echo 
'<description>',$description,'</description>',"\n";
echo 
'<language>en-gb</language>',"\n\n";
echo 
'<copyright>copyright 2004 - ',$mydomain,'</copyright>',"\n";
echo 
'<managingEditor>',$emailaddress,'</managingEditor>',"\n";
echo 
'<webMaster>',$emailaddress,'</webMaster>',"\n\n";
echo 
'<image>',"\n";
echo 
'<title>',$mysite,'</title>',"\n";
echo 
'<url>',$blogurl,'inc/img/corzblog.png</url>',"\n";
echo 
'<link>',$blogurl,'</link>',"\n";
echo 
'</image>',"\n\n";
?>

getting picked up

The first line is the most important; the http header..
do it right the first time..
header("content-type: application/rss+xml");
Without this, the feed would be interpreted as a plain html document instead of an xml document, and most decent aggregators won't accept that these days. the only (dumber) alternative to sending a content-type header is to have your web server parse .rss (or .xml) files as php. this is easy enough to do, just add one line to your .htaccess file..
have .rss files piped through the php machine..
AddType application/x-httpd-php .rss .xml
If you run under phpsuexec, you can do something like this..
have .rss and .xml files piped through the php machine under phpsuexec..
<FilesMatch "\.(rss|xml)$">
  SetHandler application/x-httpd-php
</FilesMatch>
and because the extension in .rss, aggregators will assume it's an xml document, so no header is required. but I do recommend taking control of this from within your script; there's nothing at all wrong with having an rss feed with a .php extension, so long as you send the correct headers.

Next we grab the configuration data from the prefs file (config.php), and use it to populate the various static elements. apart from a few added extras, namely "webmaster", "managing editor" and "copyright", the above code will produce output identical to the example at the top of this article. there are other "optional" elements that can be added, too, and I'll probably go into some of those later on. so far so good.

getting in the loop

Next comes the all-important headlines, or rather; <item> elements. this is where the fun really begins.

If you haven't already, it's time to ask yourself what you want your newsfeed to do. keep your blog readers updated of the latest rants? inform customers about your hot products? share your front-page news headlines? what? whatever it is, we are going to need our php script to grab the relevant content and produce <item> elements with it.

This "content" could be from anywhere; a database, a flat-file, another site, even another newsfeed, whatever. for the purposes of this article I'm going to use corzblog as an example, and we'll be creating an rss feed of the latest blogs. corzblog has had an rss feed almost since day one, using very similar code to that given here. it also produces an automated "rss 1.0" feed. download a copy if you want more details/code examples. both feeds can very easily be easily adapted for other purposes.

grabbing those headlines

My blogs are plain html structures, here's an example..

<div class="blog-entry">
  <
div id="AnotherBlogEntry">
    <!--*
g*-->
    <
h3>Another Blog Entry</h3>
  <!--*
g*-->this is another blog entry, more html.<!--*g*-->
</
div>
<
div class="byline">
  
posted by cor @ Wed 22nd February 7:37 pm
</div>
<
hr class="cb-hr" /><br />
</
div>
<!--*
end*-->

<
div class="blog-entry">
  <
div id="ABlogEntry">
    <!--*
g*-->
    <
h3>A Blog Entry</h3>
  <!--*
g*-->this is a blog entry. simple html<!--*g*-->
</
div>
<
div class="byline">
  
posted by cor @ Wed 22nd February 7:36 pm
</div>
<
hr class="cb-hr" /><br />
</
div>
<!--*
end*-->
the document is just straightforward html, split into "articles" by an <!--*end*--> tag (a simple html comment).

As it stands, it's not much use for our feed. to utilise this data, we'll first need to spilt it up into blog entries, or "articles". so we'll use php to create an array, each item of the array containing a different blog entry. here's some code to do that..

<?php
// open and split the blog into an array of blog entries..
if (is_readable($filename)) { $file_contents = @implode(''file($filename)); }
$whole_blog explode('<!--*end*-->'$file_contents); 
?>
As you can see in the example, each blog entry is further divided by more fictitious tags, in this case a <!--*g*--> tag, which is my little joke. for a valid feed, we need to supply "title", "description" and "link" elements for each <item> of our feed, so we'll have to further split these html structures to pull out those individual parts. fortunately, each blog entry has all the bits we need..

between the start and <!--*g*-->1 will be used for the link
between <!--*g*-->1 and <!--*g*-->2 lives the title.
between <!--*g*-->2 and <!--*g*-->3 is the blog itself, etc..

so we'll simply loop through the articles one-by-one, split them up by their <!--*g*--> tags, spit out the correct parts with some xml formatting and we're done. if you don't have convenient structures like these in your data, you may have to use more devious methods to split it up, like regex, or whatever.

the whole thing..

The comments in the code are probably more useful than anything else I could say, so here's it is. I added some tabs ("\t") to make the outputed xml more readable, though this isn't strictly necessary.

<?php

// log the hit.. (you're gonna log them, surely!)
@include $_SERVER['DOCUMENT_ROOT'].'/inc/init.php';

// send correct header..
header('content-type: application/rss+xml'); 
require(
'config.php');

echo 
'<?xml version="1.0" encoding="UTF-8"?>',"\n";
echo 
'<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN"';
echo 
' "http://my.netscape.com/publish/formats/rss-0.91.dtd">'."\n";
echo 
'<rss version="0.91">',"\n\n";
echo 
'<channel>',"\n";
echo 
'<title>',$mysite,'</title>',"\n";
echo 
'<link>',$blogurl,'</link>',"\n";
echo 
'<description>',$description,'</description>',"\n";
echo 
'<language>en-gb</language>',"\n\n";
echo 
'<copyright>copyright 2004 - ',$mydomain,'</copyright>',"\n";
echo 
'<managingEditor>',$emailaddress,'</managingEditor>',"\n";
echo 
'<webMaster>',$emailaddress,'</webMaster>',"\n\n";
echo 
'<image>',"\n";
echo 
'<title>',$mysite,'</title>',"\n";
echo 
'<url>',$blogurl,'inc/img/corzblog.png</url>',"\n";
echo 
'<link>',$blogurl,'</link>',"\n";
echo 
'</image>',"\n\n";

// read and split the blog into an array of blog entries..
$file_contents implode(''file($filename));
$whole_blog explode('<!--*end*-->'$file_contents); 

// start the loop..
$total_articles count ($whole_blog); 
for (
$i=0;$i $total_articles;$i++) { 
    
$z $i 1;
    
    if (
$z == $total_articles) {
        break; 
// we reached the end of the articles
    
} else {
        
$blog $whole_blog[$i];
        
        
// split the article into its parts..
        
$parts explode('<!--*g*-->'$blog); 
        
$blog_title strip_tags($parts[1]); // the title
        
        // we need a valid "encoded" version of the link..
        
$enc_title rawurlencode(strip_tags($parts[1]));

        
// grab the first two paragraphs for the news preview.
        // it may go over 500 chrs, och well, we could check for that.
        
$the_blog explode('<br />'$parts[2]);
        
$snippet htmlspecialchars(strip_tags($the_blog[0])) 
            .
htmlspecialchars(strip_tags($the_blog[1]))
                .
htmlspecialchars(strip_tags($the_blog[2])); // what a pain!
        
        // now create the <item> element..
        
echo '<item>',"\n";
        echo 
'<title>',$blog_title,'</title>',"\n";
        echo 
'<description>',$snippet,' ...</description>',"\n";
        echo 
'<link>',$blogurl,'index.php#',$enc_title,'</link>',"\n";
        echo 
'</item>',"\n\n";
    }
// and loop until we reach the end of the blog

echo '</channel>',"\n";
echo 
'</rss>';
?>
It's not exactly rocket science, and does everything we need. as soon as a new blog entry is created it will be immediately available to the next aggregator that loads our feed. and that, of course, will be your news aggregator while you test it! it probably goes without saying that you should be loading the resultant page in a news aggregator at every step, certainly the quickest and smartest way to test if your feed is working.

validate that feed

Presuming your rss feed is functioning, the next stage is to validate it using one of the net's many rss validators, three good ones are feedvalidator.org, redland and rss scripting. this is neat, too. google around, there are quite a few now, and even software to allow you to validate your own at home, though unless you are creating a lot of feeds, this is clearly overkill. If you're feed doesn't validate right away, the validator will usually tell you why, and with a wee bit of trial-and-error, your feed will validate just fine.

getting real

This is a job you want to do once and forget about. once your rss feed has been setup, it will deliver your latest headlines to anyone who requests them, and you can get back to the content. If you want to do more with your rss feeds, that is, incorporate more data or interactive features, check out some of the online publications detailing rss 2.0 and rss 1.0/RDF specifications, and all the wonderful extra tags you can create. in fact, if you change the version info on this feed to <rss version="2.0"> you have a version 2.0 feed, and you can just start adding tags! just don't expect any other versions of rss to be this compatible. generally, it's best to pick a version (0.91/1.0/2.0) right at the start, and then stick to it!  "0.91" has the widest support, but also the least features. you decide.

and this'll work, will it?

Here's all the php source files for the project..



If you like, you can access that machine directly, and peruse the source files. If you check out the source for rss.php (the "whole thing", above) and then click the "try out this script" link, you will get a valid rss feed in your browser! my blog headlines. so yeah, it works...
;o)

Welcome to the comments facility!


previous comments (one page)   show all comments

cor - 01.04.05 10:13 am

yeah, I'd noticed it downloads rss feeds into my text editor. I actually prefer that, the text highlighting is better!

I guess they'll fix it, though.

;o) Cor



Noe - 30.04.05 6:00 pm

Whatever it stands for, rss isn't an acronym, it's an abbreviation. Unless you say "Errss" or something when you read it. But you don't; you say "Ar-Es-Es", right?


cor - 30.04.05 8:26 pm

I usually pronounce it "Arse" smiley for :lol:

;o) Cor


ps.. technically it's an "initialism", but try telling that to 1,758,345 websites!


Arun - 09.06.05 7:57 am

Hello,

I want create RSS for site which is html and hosted on windows server. The site in question is this Web hosting site

I am not using any CMS here and I like if RSS can be generated for each page automatically for my whole site.

Your help will be highly appreciated.

Thanks.


cor - 09.06.05 4:50 pm

well, Arun, if it were my task, I'd probably create an rss "handler" script, say /rss.php, and then punt the feed links out as either /rss.php?page=/some/page.php, or else (more elegant) punt out the links as /some/page.rss and have mod_rewrite redirect that to your handler script to do the business.

When you say "Windows server", do you mean IIS? (*ouch*) If you do, mod_rewrite won't work. But if you are running Apache on windows (*ahh, much better!*), it works great!

The handler script itself would parse the html (either from the raw file, or via HTML GET (your call)) and create the feed live for the rss client in a manner similar to my example above.

is that enough help?

;o) Cor



Arun - 19.06.05 7:19 am

Yess !!

This is a sound advice. I dont run Apache on windows :( I asked my developer to do it. I can skip nod_rewrite, If it fails I will shift my site to Linux hosting server as there is no other option left.

Thanks and best regards


wells - 01.08.05 11:10 am

-asking-
i am sorry before, i am a very newbie about this feeding things..how can i edit (*.rss) my page, such as adding shoutbox, timer..does it just like editing in Frontpage?or i have to instal a program to edit the site's pages, thanks for sharing tutorial...i am using Win XP and Office XP, IE 5 or i should use Linux, plz recomend some tools in Windows.....thanks...


cor - 01.08.05 4:18 pm

Well, wells , I think probably everyone should use Linux, at least start using it for something, for when windows eventually dies away. smiley for :lol:

And hey! Nothing to be sorry about, asking questions is why this comment thing exists! (catch anything I missed!) Right, to your questions proper..

Editing RSS??? Ideally, RSS is *generated*, so it's never edited, ever. Only the source for the script that generates it is edited. And for that you would use a good text editor, like EditPlus or something. (The "K" editors on linux are also excellent). I don't know much about FrontPage, other than it is one of those nasty "HTML Editors", and it creates really ugly code.

I do all my deveopment work in a plain text editor, like these are..
If you are producing a "static" rss feed (NONONONONO!) then you can create/edit this in a text editor, too. It's simply XML structures in plain text, but really, you want to be generating you feeds, so editing simply isn't an option, because there's nothing to edit!

If I've completely misunderstood your question, feel free to dive back with more details!

;o) Cor



Danny - 10.02.08 12:47 pm

Hi Cor
First up, thanks for the tutorials, they've been a great help smiley for :D

Anyways, I was wondering if you would be willing to share with us the code you use to log the hits to the feed?
I've had mine up and running for a few weeks now and I'm happy with it, but now I'm curious to know how many people are actually using it and what they are using.
It would be greatly appreciated!

Thanks
Danny


cor - 26.02.08 10:29 am

Sure, Danny. I use my plogger.

My init script (called at the start of the feed), does the actual logging (log_pair()). The following code is inside another switch in my logger - different things get logged to different places, so I've adapted it slightly for the example here..

<?php
include('plog.php'); // for logging and displaying "pairs"

$user_agent = @$_SERVER['http_user_agent'];
if (empty(
$user_agent)) { $user_agent '{empty}'; }

// rss/rdf feeds..
if (stristr($_SERVER['request_uri'], '/rdf.php') or stristr($_SERVER['request_uri'], '/rss.php')) {

    switch (
true) {

        case (
stristr($_SERVER['request_uri'], '/blog/')):
            
log_pair($blog_plog_file$user_agent);
            break;

        case (
stristr($_SERVER['request_uri'], '/devblog/')):
            
log_pair($devblog_plog_file$user_agent);
            break;

        default:
            
log_pair($demoblog_plog_file$user_agent);
            break;
    }
}


For a single feed, you could code it more simply, of course, but I'll leave that as an exercise. My admin page calls the plog script, using the plogger's plog_show() function..

<?php
echo '
<h2 id="rss-feeds">rss reeds..</h2>'
;
plog_show($blog_plog_file);
?>

And that's about it. Oh! The variable $blog_plog_file is simply the location of the file the plogger uses to log the hits, and is set in my main site config, something like..

<?php
$blog_plog_file 
$corz_root.$admin_path.'/log/.ht_blog_feed_hits';
?>

If you need more, let me know.

;o) Cor



Sunny - 16.01.10 1:24 pm

Tnx for article. It's pretty good, but for developers.

Correct! ;o) Cor



MAINTENANCE MODE!
Posting is currently disabled.

Welcome to corz.org!

If something isn't working, I'm probably improving it, try again in a minute. If it's still not working, please mail me!