Importing Large WXR Files into WordPress 2.1

I’ve been trying to import my current WordPress database into my staging blog so I can play around with formatting; alas, the wordpress to wordpress importer has presented a few hurdles. Here’s what’s supposed to happen:

  1. On the old blog, go to Manage -> Export, download a WXR file with all yours post
  2. On the new blog, go to Manage -> Import, and upload the WXR file. Easy!!!

Except it’s not, when you have more than 2 megabytes of data. You’re not allowed to upload more than 2MB, because PHP poops out due to an internal limit.

I tried to work around this by patching WordPress to look for the file on the server instead of requiring you to upload it. That way, you can download your giant WXR file and use FTP to upload it somewhere to your server first. Geeky notes follow so I don’t forget this stuff.

The Problem

There are several bottlenecks, many of them related to PHP’s built-in limits:

The post_max_size and upload_max_filesize settings in php.ini are often set to something like 2MB or 8MB. That means you can’t upload a file larger than that. If you have a lot of writing in your blog, as I do, you just won’t be able to upload a big enough file. Fortunately, I have a dedicated virtual server and can up those limits, but if you’re on a shared server you’re screwed.

The Workaround

To work around this, I spend a couple hours trying to modify the WordPress import filter to use a file that had been already uploaded. I eventually hacked it to actually work, but hit another bottleneck related to memory_limit. But first, here are the brutal modifications I made to the WordPress 2.1 files:

In wp-admin/admin-functions.php: wp_import_handle_upload():

Added the following lines between $overrides = array... and $file = wp_handle_upload(...) as follows:

$overrides['test_size'] = false;
$localFile = array('name'=>'import',tmp_name=>'/full/path/to/wp-import.xml');

Don’t forget to replace /full/path/to/wp-import.xml with the name of your exported WXR file. Next, I modified the $file = wp_handle_upload(... line to read as follows:

$file = wp_handle_upload( $localFile, $overrides );

Next, in wp-admin/admin-functions.php: wp__handle_upload():

Commented out the following around line 1838-9:

    // if (! @ is_uploaded_file( $file['tmp_name'] ) )
    //  return $upload_error_handler( $file, __( 'Specified file failed upload test.' ));

Modified the move_uploaded_file() call to use copy() instead, around line 1879

if ( false === @ copy ( $fname, $new_file ) )  // made this rename
    wp_die( printf( __('The uploaded file %s could not be moved to %s.' ), $fname, $uploads['path'] ));

Finally, in wp-admin/import/wordpress.php:

Comment out the statements for case 0 around line 324-325, so control flows through to case 1:

case 0 :
    // $this->greet();
    // break;

The net result of these changes is to bypass the uploading form when you click the import -> wordpress selection. It should automatically attempt to read the WXR file. The modifications above are to bypass the security mechanisms in place that prevent you from using non-uploaded files.

If you find that you’re getting a dialog box that asks you to download admin.php, what has probably happened is you’ve run out of memory (check your PHP error log). The WordPress importer reads the entire file into memory at once, so if you’ve got a big file you’ll need a lot of working memory to process everything. For my blog, I needed about 64MB of working PHP memory, which I could fix by changing memory_limit to 64M in my php.ini file. If you’re on a shared server, you’re kind of screwed if you can’t change these.

You probably are better off exporting piecemeal using Aaron Brazell’s WordPress to WordPress importer, which gives you the option to export selected categories. This is what I did the first time. Note that it works best between the same WordPress database versions; Importing from WP 2.0 to WP 2.1, for example, will cause some funny things to happen with Pages and vice versa. As it is, subpage importing is currently broken, so watch out for that too.