Deleting Old Post Revisions in WordPress with WP-CLI

Recently I’ve been working with a client who’s site we’re going to soon be migrating. To help with any downtime, we’ve been looking at reducing their database size, which is something around 50-60 gigabytes. After looking through the database, one easy win would be to purge as many post revisions as possible, since their wp_posts table was about 50% revisions 😱

We couldn’t just mass delete anything with a revision post type though, because the client had some specific needs:

  • Keep all revisions for posts from the last 12 months.
  • Keep all revisions that were made after the post was published.
  • Keep a backup of all revisions deleted.

To do this, I crafted a custom WP-CLI command to purge these revisions. I’ve accidentally deleted the final version of the script, since it was only a one-run thing, but here’s an earlier version that could be a good starting point for anyone else that has a similar need to prune revisions:

/**
 * WP-CLI command that deletes pre-publish post revisions for posts older than 12 months.
 *
 * @subcommand prune-revisions [--live] [--verbose]
 */
public function prune_revisions( $args, $assoc_args ) {
	global $wpdb;

	$live    = (bool) $assoc_args[ 'live' ];
	$verbose = (bool) $assoc_args[ 'verbose' ];
	$offset  = 0;
	$limit   = 500;
	$count   = 0;
	$deletes = 0;

	if ( $live ) {
		$output_file = sanitize_file_name( sprintf( 'prune-revisions-backup_%d_%d.csv', get_bloginfo( 'name' ), current_time( 'timestamp', true ) ) );
		$handle      = fopen( $output_file, 'wb' );

		if ( false === $handle ) {
			WP_CLI::error( sprintf( 'Error opening %s for writing!', $output_file ) );
		}

		// Headers.
		$csv_headers = array(
			'ID',
			'post_author',
			'post_date',
			'post_date_gmt',
			'post_content',
			'post_title',
			'post_excerpt',
			'post_status',
			'comment_status',
			'ping_status',
			'post_password',
			'post_name',
			'to_ping',
			'pinged',
			'post_modified',
			'post_modified_gmt',
			'post_content_filtered',
			'post_parent',
			'guid',
			'menu_order',
			'post_type',
			'post_mime_type',
			'comment_count',
			'filter',
		);

		fputcsv(
			$handle,
			$csv_headers
		);
	}

	$count_sql      = 'SELECT COUNT(ID) FROM ' . $wpdb->posts . ' WHERE post_type = "revision"';
	$revision_count = (int) $wpdb->get_row( $count_sql, ARRAY_N )[0];
	$progress       = \WP_CLI\Utils\make_progress_bar( sprintf( 'Checking %s revisions', number_format( $revision_count ) ), $revision_count );

	do {
		$sql       = $wpdb->prepare( 'SELECT ID FROM ' . $wpdb->posts . ' WHERE post_type = "revision" LIMIT %d,%d', $offset, $limit );
		$revisions = $wpdb->get_results( $sql );

		foreach ( $revisions as $revision ) {
			$count++;
			$post_parent_id = wp_get_post_parent_id( $revision->ID );

			// Fail on either 0 or false.
			if ( false == $post_parent_id ) {
				WP_CLI::warning( sprintf( 'Revision %d does not have a parent!  Skipping!', $revision->ID ) );
				continue;
			}

			$revision_modified   = get_post_modified_time( 'U', false, $revision->ID );
			$parent_publish_time = get_post_time( 'U', false, $post_parent_id );

			if ( $parent_publish_time < current_time( 'timestamp') - ( MONTH_IN_SECONDS * 12 ) ) {
				// Post is older than 12 months, safe to delete pre-publish revisions.
				if ( $revision_modified < $parent_publish_time ) {
					if ( $live ) {
						// We're doing it live!
						WP_CLI::log( sprintf( 'Deleting revision %d for post %d. (%d%% done)', $revision->ID, $post_parent_id, ( $count / $revision_count ) * 100 ) );

						// Backup data!
						$output = [];
						$data   = get_post( $revision->ID );

						// Validate the field is set, just in case.  IDK how it couldn't be.
						foreach ( $csv_headers as $field ) {
							$output = isset( $data->$field ) ? $data->$field : '';
						}

						fputcsv( $handle, $output );

						$did_delete = wp_delete_post_revision( $revision->ID );

						// Something went wrong while deleting the revision?
						if ( false === $did_delete || is_wp_error( $did_delete ) ) {
							WP_CLI::warning( sprintf( 'Revision %d for post %d DID NOT DELETE! wp_delete_post_revision returned:', $revision->ID, $post_parent_id ) );
						}
						$deletes++;

						// Pause after lots of db modifications.
						if ( 0 === $deletes % 50 ) {
							if ( $verbose ) {
								WP_CLI::log( sprintf( 'Current Deletes: %d', $deletes ) );
							}
							sleep( 1 );
						}
					} else {
						// Not live, just output info.
						WP_CLI::log( sprintf( 'Will delete revision %d for post %d.', $revision->ID, $post_parent_id ) );
					}
				} else {
					// Revision is after the post has been published.
					if ( $verbose ) {
						WP_CLI::log( sprintf( 'Post-Publish: Will NOT delete Revision %d for post %d.', $revision->ID, $post_parent_id ) );
					}
				}
			} else {
				// Post is too new to prune.
				if ( $verbose ) {
					WP_CLI::log( sprintf( 'Too-New: Will NOT delete Revision %d for post %d.', $revision->ID, $post_parent_id ) );
				}
			}
		}

		// Pause after lots of db reads.
		if ( 0 === $count % 5000 ) {
			if ( $verbose ) {
				WP_CLI::log( sprintf( 'Current Count: %d', $count ) );
			}
			sleep( 1 );
		}

		// Free up memory.
		$this->stop_the_insanity();

		// Paginate.
		if ( count( $revisions ) ) {
			$offset += $limit;
			$progress->tick( $limit );
		} else {
			WP_CLI::warning( 'Possible MySQL Error, retrying in 10 seconds!' );
			sleep( 10 );
		}

	} while ( $count < $revision_count );

	$progress->finish();

	if ( $live ) {
		fclose( $handle );
		WP_CLI::success( sprintf( 'Deleted %d revisions', $deleted ) );
	} else {
		WP_CLI::success( sprintf( 'Processed %d revisions', $revision_count ) );
	}
}Code language: PHP (php)

Other Posts Not Worth Reading

Hey, You!

Like this kind of garbage? Subscribe for more! I post like once a month or so, unless I found something interesting to write about.


Comments

Leave a Reply