Google Sitemaps & Textpattern

03 June 2005 @ mid-afternoon | Comments (2)

Update: 03 Jun 2005 -› Now possible with Sencer’s sitemap plugin

After checking out Social Patterns’ Google Sitemaps solution for Wordpress, I decided to modify Michael’s code and adapt it for Textpattern.

Download the source file (rename file extension to .php) and place this file in the document root of your domain. You may have to adjust the path to your textpattern directory in the first two lines of the file. You can also add/subtract sections and/or categories to exclude from the sitemap.

Paranoid about unwelcome eyes peering at your sitemap? Add a few lines to your .htaccess:

RewriteCond %{HTTP_USER_AGENT} !^GoogleBot [NC]
RewriteRule ^sitemap\.php$ - [F,L]

This will ban anyone without the Googlebot user agent from viewing your sitemap file. This is not a foolproof methodit is trivial to spoof a user agent so this really only provides superficial security.

Keep in mind that this sitemap generation method will not include other files outside of Textpattern, so if your needs are beyond this you may think about creating multiple sitemaps, using a sitemap index, and/or creating a sitemap with Google’s Sitemap Generator. Additionally, this script will not include your root section pages in the sitemap (perhaps I’ll get around to adding that).

With regards to exceeding the 10MB maximum sitemap size, I’m not really sure how I’d break this up or at what point it would become necessary. A 10MB file would be quite a lot of URLs, and I seriously doubt any Textpattern sites currently exist that would have more than 50,000 URLs or 10MB. But if that were the case, it could probably easily be modified to include a 50,000 limit and offset.

That’s about all I’ve got at the moment. Hopefully some of this will be of use, and not too much of it is flawed. You may also find useful Google’s list of third party solutions based on Google Sitemaps.


2 comments

1

Just a few notes:

1) permlinkurl_id makes a query for every article. This may become a problem for sites with many articles.

2) The rewrite rule rewrites the uri /sitemap.gz to call /sitemap.php – where does the compression happen? AFAI can see it doesn’t. You may want to use ob_start(‘gzhandler’); which will automatically check the HTTP header of the client, and compress the output if the Client supports it.

Sencer → sencer.de
2

Thanks for the bug tips, sencer. It was making twice the necessary calls to the Textpattern database. I’ve also added the option to use ob_gzhandler. My script isn’t the prettiest solution, but it seems to work ok for at least an index of published articles.

Truth be told, your plugin seems like a much better way to go since you can keep it centralized under the Textpattern admin interface and it allows to add the callback handler that pings Google with each update (and for the simple fact that it looks like you have a better grasp of HTTP transactions than I).

andrew → compooter.org

Comments are closed for this article.

Previously