Recently I’ve been working with a client who’s site we’re going to soon be migrating. To help with any downtime, we’ve been looking at reducing their database size, which is something around 50-60 gigabytes. After looking through the database, one easy win would be to purge as many post revisions as possible, since their wp_posts
table was about 50% revisions 😱
We couldn’t just mass delete anything with a revision
post type though, because the client had some specific needs:
- Keep all revisions for posts from the last 12 months.
- Keep all revisions that were made after the post was published.
- Keep a backup of all revisions deleted.
To do this, I crafted a custom WP-CLI command to purge these revisions. I’ve accidentally deleted the final version of the script, since it was only a one-run thing, but here’s an earlier version that could be a good starting point for anyone else that has a similar need to prune revisions:
/**
* WP-CLI command that deletes pre-publish post revisions for posts older than 12 months.
*
* @subcommand prune-revisions [--live] [--verbose]
*/
public function prune_revisions( $args, $assoc_args ) {
global $wpdb;
$live = (bool) $assoc_args[ 'live' ];
$verbose = (bool) $assoc_args[ 'verbose' ];
$offset = 0;
$limit = 500;
$count = 0;
$deletes = 0;
if ( $live ) {
$output_file = sanitize_file_name( sprintf( 'prune-revisions-backup_%d_%d.csv', get_bloginfo( 'name' ), current_time( 'timestamp', true ) ) );
$handle = fopen( $output_file, 'wb' );
if ( false === $handle ) {
WP_CLI::error( sprintf( 'Error opening %s for writing!', $output_file ) );
}
// Headers.
$csv_headers = array(
'ID',
'post_author',
'post_date',
'post_date_gmt',
'post_content',
'post_title',
'post_excerpt',
'post_status',
'comment_status',
'ping_status',
'post_password',
'post_name',
'to_ping',
'pinged',
'post_modified',
'post_modified_gmt',
'post_content_filtered',
'post_parent',
'guid',
'menu_order',
'post_type',
'post_mime_type',
'comment_count',
'filter',
);
fputcsv(
$handle,
$csv_headers
);
}
$count_sql = 'SELECT COUNT(ID) FROM ' . $wpdb->posts . ' WHERE post_type = "revision"';
$revision_count = (int) $wpdb->get_row( $count_sql, ARRAY_N )[0];
$progress = \WP_CLI\Utils\make_progress_bar( sprintf( 'Checking %s revisions', number_format( $revision_count ) ), $revision_count );
do {
$sql = $wpdb->prepare( 'SELECT ID FROM ' . $wpdb->posts . ' WHERE post_type = "revision" LIMIT %d,%d', $offset, $limit );
$revisions = $wpdb->get_results( $sql );
foreach ( $revisions as $revision ) {
$count++;
$post_parent_id = wp_get_post_parent_id( $revision->ID );
// Fail on either 0 or false.
if ( false == $post_parent_id ) {
WP_CLI::warning( sprintf( 'Revision %d does not have a parent! Skipping!', $revision->ID ) );
continue;
}
$revision_modified = get_post_modified_time( 'U', false, $revision->ID );
$parent_publish_time = get_post_time( 'U', false, $post_parent_id );
if ( $parent_publish_time < current_time( 'timestamp') - ( MONTH_IN_SECONDS * 12 ) ) {
// Post is older than 12 months, safe to delete pre-publish revisions.
if ( $revision_modified < $parent_publish_time ) {
if ( $live ) {
// We're doing it live!
WP_CLI::log( sprintf( 'Deleting revision %d for post %d. (%d%% done)', $revision->ID, $post_parent_id, ( $count / $revision_count ) * 100 ) );
// Backup data!
$output = [];
$data = get_post( $revision->ID );
// Validate the field is set, just in case. IDK how it couldn't be.
foreach ( $csv_headers as $field ) {
$output = isset( $data->$field ) ? $data->$field : '';
}
fputcsv( $handle, $output );
$did_delete = wp_delete_post_revision( $revision->ID );
// Something went wrong while deleting the revision?
if ( false === $did_delete || is_wp_error( $did_delete ) ) {
WP_CLI::warning( sprintf( 'Revision %d for post %d DID NOT DELETE! wp_delete_post_revision returned:', $revision->ID, $post_parent_id ) );
}
$deletes++;
// Pause after lots of db modifications.
if ( 0 === $deletes % 50 ) {
if ( $verbose ) {
WP_CLI::log( sprintf( 'Current Deletes: %d', $deletes ) );
}
sleep( 1 );
}
} else {
// Not live, just output info.
WP_CLI::log( sprintf( 'Will delete revision %d for post %d.', $revision->ID, $post_parent_id ) );
}
} else {
// Revision is after the post has been published.
if ( $verbose ) {
WP_CLI::log( sprintf( 'Post-Publish: Will NOT delete Revision %d for post %d.', $revision->ID, $post_parent_id ) );
}
}
} else {
// Post is too new to prune.
if ( $verbose ) {
WP_CLI::log( sprintf( 'Too-New: Will NOT delete Revision %d for post %d.', $revision->ID, $post_parent_id ) );
}
}
}
// Pause after lots of db reads.
if ( 0 === $count % 5000 ) {
if ( $verbose ) {
WP_CLI::log( sprintf( 'Current Count: %d', $count ) );
}
sleep( 1 );
}
// Free up memory.
$this->stop_the_insanity();
// Paginate.
if ( count( $revisions ) ) {
$offset += $limit;
$progress->tick( $limit );
} else {
WP_CLI::warning( 'Possible MySQL Error, retrying in 10 seconds!' );
sleep( 10 );
}
} while ( $count < $revision_count );
$progress->finish();
if ( $live ) {
fclose( $handle );
WP_CLI::success( sprintf( 'Deleted %d revisions', $deleted ) );
} else {
WP_CLI::success( sprintf( 'Processed %d revisions', $revision_count ) );
}
}
Code language: PHP (php)
Leave a Reply