dave seah: better living through new media Filter Navigation Temporary Redirect Page Personal Articles Productivity Articles Compact Calendar The Printable CEO Series The Printable CEO Series Back to Home Page Admin:Login

Viewing Category: Gweeping

Importing Large WXR Files into WordPress 2.1

POSTED 03/24/2007 UNDER BloggingGweeping

I've been trying to import my current WordPress database into my staging blog so I can play around with formatting; alas, the wordpress to wordpress importer has presented a few hurdles. Here's what's supposed to happen:

  1. On the old blog, go to Manage -> Export, download a WXR file with all yours post
  2. On the new blog, go to Manage -> Import, and upload the WXR file. Easy!!!

Except it's not, when you have more than 2 megabytes of data. You're not allowed to upload more than 2MB, because PHP poops out due to an internal limit.

I tried to work around this by patching WordPress to look for the file on the server instead of requiring you to upload it. That way, you can download your giant WXR file and use FTP to upload it somewhere to your server first. Geeky notes follow so I don't forget this stuff.

The Problem

There are several bottlenecks, many of them related to PHP's built-in limits:

  • The post_max_size and upload_max_filesize settings in php.ini are often set to something like 2MB or 8MB. That means you can't upload a file larger than that. If you have a lot of writing in your blog, as I do, you just won't be able to upload a big enough file. Fortunately, I have a dedicated virtual server and can up those limits, but if you're on a shared server you're screwed.

To work around this, I spend a couple hours trying to modify the WordPress import filter to use a file that had been already uploaded. I eventually hacked it to actually work, but hit another bottleneck related to memory_limit. But first, here are the brutal modifications I made to the WordPress 2.1 files:

In wp-admin/admin-functions.php: wp_import_handle_upload():

Added the following lines between $overrides = array... and $file = wp_handle_upload(...) as follows:

$overrides['test_size'] = false; $localFile = array('name'=>'import',tmp_name=>'/full/path/to/wp-import.xml');

Don't forget to replace /full/path/to/wp-import.xml with the name of your exported WXR file. Next, I modified the $file = wp_handle_upload(... line to read as follows:

$file = wp_handle_upload( $localFile, $overrides );

Next, in wp-admin/admin-functions.php: wp__handle_upload():

Commented out the following around line 1838-9:

    // if (! @ is_uploaded_file( $file['tmp_name'] ) )
    //  return $upload_error_handler( $file, __( 'Specified file failed upload test.' ));

Modified the move_uploaded_file() call to use copy() instead, around line 1879

if ( false === @ copy ( $fname, $new_file ) ) // made this rename wp_die( printf( __('The uploaded file %s could not be moved to %s.' ), $fname, $uploads['path'] ));

Finally, in wp-admin/import/wordpress.php:

Comment out the statements for case 0 around line 324-325, so control flows through to case 1:

case 0 : // $this->greet(); // break;

The net result of these changes is to bypass the uploading form when you click the import -> wordpress selection. It should automatically attempt to read the WXR file. The modifications above are to bypass the security mechanisms in place that prevent you from using non-uploaded files.

If you find that you're getting a dialog box that asks you to download admin.php, what has probably happened is you've run out of memory (check your PHP error log). The WordPress importer reads the entire file into memory at once, so if you've got a big file you'll need a lot of working memory to process everything. For my blog, I needed about 64MB of working PHP memory, which I could fix by changing memory_limit to 64M in my php.ini file. If you're on a shared server, you're kind of screwed if you can't change these.

You probably are better off exporting piecemeal using Aaron Brazell's WordPress to WordPress importer, which gives you the option to export selected categories. This is what I did the first time. Note that it works best between the same Wordpress database versions; Importing from WP 2.0 to WP 2.1, for example, will cause some funny things to happen with Pages and vice versa. As it is, subpage importing is currently broken, so watch out for that too.

Intermittent Blank Pages with WP-Cache

POSTED 01/24/2007 UNDER BloggingGweeping

WP-Cache helps my site run smoothly by storing copies of the web pages that are fetched most often from this site, which means that the WordPress system just needs to generate that page once. This has worked fine, except for a mysterious disappearing page problem that happens every once in a while.

A few of you have probably seen it: a visit to a popular page shows just the top-half of the page. I spent a little time debugging this tonight because I saw this problem occur three times with a popular post, which meant that a lot of people were unable to see it when they tried to. Well, no more!

WARNING! Geeky notes follow, so I don't forget what I did.

Occasional Cache Corruption

Every once in a while, someone will send me a nice email telling me that a certain page is coming up "blank", showing just the top-half of the web page with no text. What I usually do in this case is invalidate the cache in the WP-Cache options menu, and this fixes it. Unfortunately, it means all the cached files are also gone, which means the server has to rebuild all the cache files.

I eventually noticed that the problem files could be found by listing the cache contents under the WP-Cache Options panel, seeing which file is unusually small, and then deleting just that file. This tells WP-Cache to rebuild it next time someone requests it. In my case, if I see a posting file that is less than 9K, it's probably screwed up. It appears that the file is truncated, chopped off at a certain point before the rest of WordPress has a chance to populate the page.

After today's issues with my "Procrastinator's Clock" page going under without warning, I decided to poke through the WP-Cache source code to see if I could insert an automatic check, so I wouldn't have to worry about it.

Modifications to WP-Cache

The file wp-cache-phase1.php is that piece of WP-Cache that checks whether a particular URI has been cached already, serving the cached copy if it exists. Around line 35, I inserted the following:

...
foreach ($meta->headers as $header) {  
    header($header);  
}

// DS: start hack
$url = $meta->uri;  
$size = @filesize($cache_file);  
if ( $size < 9216 ) {  
    error_log ("WPCache: $size < 9216, expiring $url");
    // write problem file 
    $myFile = "/path/to/writeable/file/in/htdocs/wpcache.log";
    $fh = fopen($myFile, 'a') or die("can't open file");
    $stringData = file_get_contents($cache_file);
    fwrite($fh, "nn##n## Truncated File: ".$meta->uri." ($size) bytesnn");
    fwrite($fh, $stringData);
    fclose($fh);
    // tell WP to recreate cached file
    $file_expired = true;
    return;
}
// DS: end hack

$log = "<!-- Cached page served by WP-Cache -->n";
if ( !($content_size = @filesize($cache_file)) > 0 || $mtime < @filemtime($cache_file))
...

The code block marked DS: start hack is the new stuff. It grabs the url of the cached file being loaded out of the meta information stored with it and then sees how big the cached copy is. A good page on my site is always bigger than 9K, so if it's LESS than that this means that the cached copy has been screwed up. This error is logged to the PHP Error Log AND the truncated output is written to another file called wpcache.log so I can analyze it later. The hack tells WordPress to re-generate the page by setting the "file expired" flag to true; WP-Cache will then recache it through a different module.

I'm hoping that this ensures that corrupted cache copies don't stick around for hours as they've done in the past. Also, because I'm logging the errors, hopefully I'll be able to figure out what the pattern is that causes this cache corruption to occur.

The (dv) gets Dugg

POSTED 01/16/2007 UNDER BloggingGweeping

I got dugg for the first time yesterday, for the Water post of all things, and this was an excellent test of my new Media Temple dedicated virtual (dv) server.

I'm running the very cheapest of (dv) plans ($50/month), which has a "guaranteed" memory allocation of 256MB. It actually can use more, because the (dv) is a virtual server sharing a single machine with others. If you need more memory, and it's available, your server can grab it. Freshly minted, my (dv) was configured to make as much use as possible of this pooled memory, which I suppose encourages people to upgrade to higher-capacity (and more expensive) plans. I can't afford that, so I learned how to modify the MySQL, Apache, and SMTP configuration to run within a 256MB footprint. Then, still seeing esoteric memory allocation failures, I tracked down some significant inefficiencies in my WordPress installation and got rid of them. Just in time too, to handle the unexpected spike in traffic.

It may have been the time of day (2PM), but the peak Digg traffic lasted only a couple hours. Those first couple of hours, though, the (dv) served 2500-2750 pageloads per hour without breaking a sweat, the server load hovering between 0.5 and 0.7 for most of the time. The site remained highly responsive, once I turned off the "KeepAlive" web server option. This option allows a web browser connection to serve more than multiple chunks of data (like all the graphic files on a web page) in one long transaction; ordinarily it's one chunk per transaction. KeepAlive is sort of like being able to monopolize a shoe salesman at a big shoe warehouse, insisting that he bring you a steady stream of shoes for your convenience exclusively. This isn't a problem until the number of pushy customers exceeds the number of salespeople. Then, anyone who's late to the party will wait a looong time to get any service. With 2750 page requests, each with 30 chunks of data and only 30 processes maximum to deal with them, I had to turn off KeepAlive so everyone got served in a timely manner instead of timing out. And yes, I did have a short KeepAliveTimeout set (2 seconds). There is probably some interesting formula to calculate the optimal way to serve the most connections with the least resources, but since I didn't know it I just watched the server and made sure it didn't boil over. When it failed to even get warm, I disabled WP-Cache (remembering to delete the existing cache) to see what kind of increase I'd see. By this time traffic was starting to die off slightly, pulling only 20-40 pageloads per minute, but I saw the load climb to about 1.5 to 2.5. Still not too bad, but I turned the cache back on.

As far as Digg effects go, my experience was relatively mild compared to others. 2750 pageloads/hour is still the record for my site; previously the max I saw was 1600 pageloads/hour, which almost killed the shared host I was on. Of course, the inefficiencies in my WordPress setup (the Mint pepper DLoads, primarily) helped drag the entire server down. I'm starting to keep notes in a new area of the site; if you want a sneak peek, you can read about my experiences with WordPress and shared hosting. I'll be writing up my (dv) experience (and configuration) later.

On a side note, I've been fairly happy with (mt) customer service. They can take a couple days to get back to you via the request system (weekends are especially long), but the quality of support has not been bad. Everyone I've talked with, via email and phone, has been polite and respectful. Of course if you need something done right now or you're experiencing yet another (gs) outage, you probably have a different view of things.

That's it for now!

Moving Wordpress Part II: Media Temple Ho!

POSTED 01/08/2007 UNDER BloggingGweeping

Visit Media Temple via affiliate link So here I am on the new server, a dedicated virtual server (dv) from Media Temple. I didn't realize that this (dv) platform is brand new, having launched at the very end of December 2006. Lucky me! My server problems could not have been timed better.

It's only been a few month since I moved to FutureQuest, so the whole How to Move a Working WordPress Installation procedure was relatively fresh in my mind. It worked out a little differently, so I'm documenting this process again.

Geeky notes follow!

The Basic Idea

Since I have a working installation on davidseah.com, I need to do a few things in this order:

  1. Buy new hosting from Media Temple, keeping the old host active so I can move files.
  2. Move my email mailboxes over to the new host
  3. Move my Wordpress files and the MySQL database that powers it
  4. Move any other non-wordpress services that might exist on the old site.
  5. Change the official name servers for the davidseah.com domain to use the new ones

What is dedicated virtual hosting?

The dedicated virtual server (dv) from Media Temple is different from the usual shared hosting I was working with. For one thing, you don't share the server with anyone else from your point of view. Technically speaking, you're actually running a simulated dedicated server (that's why it's called "virtual"), on hardware that is shared with other virtual servers. The advantage is that you can make any changes you like to the operating system environment, including having full root access. You also gain the economy that comes from sharing hardware resources, with improved isolation from your neighbors CPU- and memory-hogging hijinx.

On the down side, you need to know something about system administration. The (dv) version 3 package uses an enterprise-level Linux (CentOS) on Virtuozzo, which is controllable to some extent by the Plesk 8.1 control panel. With Plesk, you adminstrate the server and can create clients, domains, and mail users. You can do some limited configuration of common services, but it does assume understanding of how these services work. If words like daemon, mysqld, cron, xinetd, smtp and ssh mean nothing to you, then Plesk might not be all that easy to understand for ya.

The main advantage of Plesk, from my perspective, is that it allows you to manage your domains and hosting clients from within a pretty GUI that works. Plesk also provides a measure of stability on your server because all the software and operating system components have been tested and made to fit together; sort of like having managed hosting without the expense or the expertise. You can purchase additional "snapshots" of your server configuration, which is really really handy if you're the kind of person to mess around with things and need a way to undo the damage. Especially useful if you have a tweaked configuration.

Where Plesk falls short is lower-level configuration of the services it manages. If you want to change the setting of some internal MySQL or PHP variable, you'll have to get root level access to the server (which you can, since it's dedicated to you). Plesk will restart a service (like mail) for you, but that's about it.

Since the server is dedicated, you can install your own software on it. At Media Temple you need to request to have the developer tools installed first, then you can do things like install Ruby on Rails and compile from sources. Note, though, that if you update the "plesk managed" parts of your server, you may not longer be eligble for the automatic updates that MediaTemple will do for you through their Update Option Program.

Getting situated on the new (dv) 3

There was a problem with my order, and I didn't get my welcome emails that described what I was supposed to do. Normally with shared servers, you would receive an email that tells you the basics of how to move your files and how to set up email. I didn't get any of that, so I had to figure it out from scratch.

The good news: You use Plesk to set that all up. The bad news: you're on your own. There is a Plesk User Administration Manual I just found, which is probably worth reading. Until you set up your accounts to allow FTP and ssh into the system, you're sunk. The quick way to do this:

  • Set up your domain through Plesk. Then select the domain and click the SETUP icon, choose Account Preferences and enter the FTP Login name. This create a user that you can use to FTP into the site. You can also optionally allow this user to SSH in, if you assign a shell using the dropdown menu.

  • FTP your files using the login name you created. Drop 'em in the httpdocs directory.

  • Create email mailboxes by clicking back on the domain you created (in my case, davidseah.com) and then clicking MAIL under the Services heading. This is where you can add your email boxes and configure aliases for each one. It's actually pretty nice. I used the exact same names and login credentials for my new mailboxes, so theoretically I won't have to change much in my email program setup.

I'm skipping a lot of strange first-time configuration here...you'll be forced to set up your domain and a default Client. Every domain (like davidseah.com) is tied to a Client (for me, I chose Dave Seah). I can create more domains and clients under my dedicated IP address (up to 30 with the basic license) and run a mini hosting business through Plesk, which is pretty cool.

Moving Your WordPress Files

I've got about 500MB of files on my current host that I got to move, not including the WordPress MySQL database. I could download the files to my computer and re-upload them, but that tends to be slow, so I do a server-to-server FTP transfer. This requires shell access on each host.

  1. Login to both your new server and your old server via SSH. For the purposes of this description, the new server will be called newserver.com and the old one oldserver.com.

  2. On the old server, use the tar command to compress your wordpress folder. Like tar cvzf wordpress.tar.gz wordpress/*, assuming that you're at the same directory level of your wordpress folder.

  3. On the new server, type ftp oldserver.com into the shell and login to your account. Make sure bin is set, then do a get wordpress.tar.gz. Since your hosts have a lot more bandwidth than your crappy home cable connection, this will be way faster.

  4. On the new server, uncompress your archive with tar xvzf wordpress.tar.gz. The entire directory structure will be recreated.

There's actually more than this that you have to do usually, because there are probably other directories you'll want to move, including hidden files like the .htaccess file at the root of your installation. Move 'em all! Make sure you have enough disk space on your old server to create the tarfiles...they can get big, even with the compression.

If all this talk of tar and command-line FTP makes you ill, you could of course just re-upload your wordpress files from your home computer using FTP or DreamWeaver or whatever you're using. My cable modem upload speed is about 40K per second, versus the megabyte-per-second bandwidth when talking server-to-server.

Moving your WordPress Database

The database lives in MySQL, the database engine that stores all of the posts and comments in my blog. This isn't stored as a regular file, so you have to use two command-line tools called mysqldump and mysql.

  • If you're using WP-CACHE, disable it before doing the following steps.

  • Shut down your wordpress installation by renaming your wordpress directory or equivalent. You don't want people hitting your database while you're trying to dump out a copy of the data.

  • Dump the database on oldserver.com into a file called wordpress.sql.gz with the mysqldump command. You can get the values of db_name db_host, db_user, and db_passwd out of your wordpress/wp-config.php file.

    mysqldump db_name -hdb_host -udb_user -pdb_passwd -Q --opt > wordpressdb.sql  
    gzip wordpressdb.sql
    
  • FTP the wordpressdb.sql.gz file to newserver.com.

  • On newserver.com, make sure that you have a database setup with a database user and password. Usually you can find some tool that will do it for you, like PHPMyAdmin or a control panel of some kind.

  • There's a good chance that the new server will have different db_name, db_user, db_host, etc values, so you'll have to update your wp-config file to use the new values. Do this now!

  • Assuming you've got the new database set up, it's time to re-import the database. Using the values you just generated when creating the new database and database user, do the following on the new server:

    gunzip wordpressdb.sql.gz  
    mysql db_name -hdb_host -udb_user -pdb_passwd < wordpressdb.sql
    
  • If you're very lucky, everything will import cleanly, and you'll get no errors. I got a couple, a "syntax error" that resulted because the MySQL 4.0.x installation on my old server didn't emit quotes properly, as some tables used reserved keywords for their field names (adding the -Q option to mysqldump fixed that). The second error was due to the max_packet_size value being set too small by default on the (dv). It defaults to 1 megabyte, but is usually higher on shared servers. I had to modify the etc/my.cnf file and restart MySQL, which did the trick. You will need root access to edit the configuration file, so make sure you request it immediately when you buy your (dv). It took (mt) 3 days to respond to my request!!!

  • You might have import problems too if you are downgrading from one version of MySQL to another, as I did the last time I moved. You might want to read the older post for some hints on how to use mysqldump to get around that.

  • Ok, you've just moved a copy of your wordpress database to the new server! Unfortunately, you've got to do a little surgery on it now, because your new server doesn't have a domain name yet. WordPress stores the domain name in your database, so you'll have to use a database tool like PHPMyAdmin to edit the wp-options table, specifically the siteurl entry, to point to my temporary server address. In my case, it's the numeric IP number of the new server, which is 64.13.223.31. The value of siteurl from my old site is http://davidseah.com/wordpress, so I need to change it to http://64.13.223.31/wordpress so I can test the site.

  • At this point, I'm ready to test that WordPress has made it over. Visit http://64.13.223.31 in a web browser and cross your fingers. If you get a blank page, that might mean you have to update the WP Cache symbolic link in your wp-content directory (I always forget this). Check your PHP error logs to see what the problem is. You could be missing files from your transfer, or you might need to change permissions for certain folders. My old FutureQuest environment kicked ass, so I had to re-adjust to some of the restrictions in place on the new server.

  • If everything pops up, yay! You now need to change another option in WordPress temporarily. Login as a wordpress administrator, go to OPTIONS->GENERAL and change the Blog Address to match the numeric IP.

  • Also, go back and re-enable WP-Cache. You might have to do some additional configuration based on what it complains about.

  • There's an additional checklist I follow, which is available in part 1. I have just finished running through it, so now I'm ready for the last step.

Do the Nameserver Switch

I control my domain name via a third-party registry, so the last step is to tell the world that the new home of davidseah.com is Media Temple. Media Temple's name servers are responsible for telling the world this now, so I update my domain registration to make them the "name servers of record." Some notes:

  • Some plugins, like ones that depend on the Flickr API, may not work until the domain name change switches over. At least I think that's what's going on.

  • It takes 24-48 hours for the entire world to see the switch. In the meantime, email will probably be going to both servers, so be sure to use webmail to check both.

  • After the new domain servers are stable, I'll switch the WordPress SiteURL options from the numeric IP address back to davidseah.com.

Here goes nothing...see ya on the flip side!

Pulling the switch!!!

UPDATE: I have started writing a guide to configuring the (dv) Base for WordPress to optimize performance. They're quite long, but if you're having problems running out of memory this might be helpful to you.

Fixing WordPress 404 Problems for Google Sitemaps

POSTED 12/30/2006 UNDER BloggingGweeping

I had a problem with my Google Sitemap, which was not being recognized by Google because my "404 (file not found) error page returns a status of 200 (Success) in the header." So I dug around to fix my 404 page setup, which never really worked. Geeky notes follow, so I don't have to look this up again.

Setting up a Custom 404 Page

I had noticed some time ago that non-existent pages on my site which should have generated 404 pages were instead delivering "post not found" pages. This was right after I upgraded to WordPress 2.0 from 1.5, so I figured it was just some change to the way it worked.

As I was researching Google's 404 verification requirements and WordPress, I realized that it was that my custom theme doesn't have a custom 404.php page. So I added one, following the directions. Still no go on Google verification. I used a web page header display tool to check that the 404 was being sent. It worked, but then when I told Google to verify the site again, it failed. Weirdness.

Caching

After some digging, I tracked it down to WP-Cache 2.0.17, the plugin I use to reduce the load on my shared server. What happened: when an attempt to access a non-existent page occured, the first time WordPress properly delivers a 404 page with the right headers set. However, this output is CACHED by WP-Cache, so the next time* the bad page is request, the cached error page is delivered! And of course, that's not a 404, but a successful delivery.

WP-Cache 2.0.19 fixes this by no longer caching 404 errors. Google Sitemaps verified my site, and everything seems to be working again

Spiffing up the 404 Page

I came across the A Perfect 404 article as I was figuring out what was going on, and cleaned up my 404.php file to be friendlier. If the $_SERVER['HTTP_REFERER'] variable exists, it emits it as partof the error message, and provides a link back. If it doesn't exist, it prints a more generic message. I was thinking of implementing a check of the referring link to customize the message to search engine traffic, but I'll leave that for another day. The A Perfect 404 has some instructions if you're interested.

SECURITY UPDATE

In the comments, reader "epc" points out that printing out the value of Referer without some escaping is not a safe practice. I added a test that checked whether the referer value begins with http://davidseah.com or http://www.davidseah.com, and further escaped the output using the htmlspecialchars() function. I'm not sure what can really be done with the 404 page that might be dangerous, but thinking about issues like this is a good habit to get into. This article on Top 7 PHP Security Blunders was helpful in understanding some of the other issues. Thanks epc!

Page 1 of 16 pages  1 2 3 >  Last »
Thank you for printing this article! Please note that all material on this website is copyrighted by either David Seah or individual comment contributors. To request permission for republication and distribution, please contact David Seah (http://davidseah.com/contact).