Fixing WordPress 404 Problems for Google Sitemaps

Fixing WordPress 404 Problems for Google Sitemaps

I had a problem with my Google Sitemap, which was not being recognized by Google because my “404 (file not found) error page returns a status of 200 (Success) in the header.” So I dug around to fix my 404 page setup, which never really worked. Geeky notes follow, so I don’t have to look this up again.

Setting up a Custom 404 Page

I had noticed some time ago that non-existent pages on my site which should have generated 404 pages were instead delivering “post not found” pages. This was right after I upgraded to WordPress 2.0 from 1.5, so I figured it was just some change to the way it worked.

As I was researching Google’s 404 verification requirements and WordPress, I realized that it was that my custom theme doesn’t have a custom 404.php page. So I added one, following the directions. Still no go on Google verification. I used a web page header display tool to check that the 404 was being sent. It worked, but then when I told Google to verify the site again, it failed. Weirdness.

Caching

After some digging, I tracked it down to WP-Cache 2.0.17, the plugin I use to reduce the load on my shared server. What happened: when an attempt to access a non-existent page occured, the first time WordPress properly delivers a 404 page with the right headers set. However, this output is CACHED by WP-Cache, so the *next time** the bad page is request, the cached error page is delivered! And of course, that’s not a 404, but a successful delivery.

WP-Cache 2.0.19 fixes this by no longer caching 404 errors. Google Sitemaps verified my site, and everything seems to be working again

Spiffing up the 404 Page

I came across the A Perfect 404 article as I was figuring out what was going on, and cleaned up my 404.php file to be friendlier. If the $_SERVER['HTTP_REFERER'] variable exists, it emits it as partof the error message, and provides a link back. If it doesn’t exist, it prints a more generic message. I was thinking of implementing a check of the referring link to customize the message to search engine traffic, but I’ll leave that for another day. The A Perfect 404 has some instructions if you’re interested.

SECURITY UPDATE

In the comments, reader “epc” points out that printing out the value of Referer without some escaping is not a safe practice. I added a test that checked whether the referer value begins with http://davidseah.com or http://www.davidseah.com, and further escaped the output using the htmlspecialchars() function. I’m not sure what can really be done with the 404 page that might be dangerous, but thinking about issues like this is a good habit to get into. This article on Top 7 PHP Security Blunders was helpful in understanding some of the other issues. Thanks epc!