Tag: cleanup

  • Quick Tip: Get Size of Revisions in WordPress

    Quick Tip: Get Size of Revisions in WordPress

    One thing that you might not think of when watching the size of a large WordPress site grow, is unnecessary data in the database. With the introduction of the block editor years ago, there has been a large increase in the number of revisions a post makes when being edited.

    This can create a lot of revisions in the database if you’re not setting a limit.

    If you’d like to do a quick and dirty audit of your revision data, you can use a very ugly SQL query like this:

    SELECT COUNT( ID ) as revision_count, ( SUM( CHAR_LENGTH( ID ) ) + SUM( CHAR_LENGTH( post_author ) ) + SUM( CHAR_LENGTH( post_date ) ) + SUM( CHAR_LENGTH( post_date_gmt ) ) + SUM( CHAR_LENGTH( post_content ) ) + SUM( CHAR_LENGTH( post_title ) ) + SUM( CHAR_LENGTH( post_excerpt ) ) + SUM( CHAR_LENGTH( post_status ) ) + SUM( CHAR_LENGTH( comment_status ) ) + SUM( CHAR_LENGTH( ping_status ) ) + SUM( CHAR_LENGTH( post_password ) ) + SUM( CHAR_LENGTH( post_name ) ) + SUM( CHAR_LENGTH( to_ping ) ) + SUM( CHAR_LENGTH( pinged ) ) + SUM( CHAR_LENGTH( post_modified ) ) + SUM( CHAR_LENGTH( post_modified_gmt ) ) + SUM( CHAR_LENGTH( post_content_filtered ) ) + SUM( CHAR_LENGTH( post_parent ) ) + SUM( CHAR_LENGTH( guid ) ) + SUM( CHAR_LENGTH( menu_order ) ) + SUM( CHAR_LENGTH( post_type ) ) + SUM( CHAR_LENGTH( post_mime_type ) ) + SUM( CHAR_LENGTH( comment_count ) ) ) as post_size FROM wp_posts WHERE post_type='revision' GROUP BY post_type ORDER BY revision_count DESC;Code language: JavaScript (javascript)

    This will give you the number of revisions you have, and the approximate amount of data its using in the database:

    revision_count post_size
    441419 2842450412

    You can see now a very large WordPress site can amass a lot of unnecessary data in its database over a number of years. 2.8 Gigabytes of revisions is a lot of stuff if you’re never going to use them again.

  • Deleting Old Post Revisions in WordPress with WP-CLI

    Deleting Old Post Revisions in WordPress with WP-CLI

    Recently I’ve been working with a client who’s site we’re going to soon be migrating. To help with any downtime, we’ve been looking at reducing their database size, which is something around 50-60 gigabytes. After looking through the database, one easy win would be to purge as many post revisions as possible, since their wp_posts table was about 50% revisions 😱

    We couldn’t just mass delete anything with a revision post type though, because the client had some specific needs:

    • Keep all revisions for posts from the last 12 months.
    • Keep all revisions that were made after the post was published.
    • Keep a backup of all revisions deleted.

    To do this, I crafted a custom WP-CLI command to purge these revisions. I’ve accidentally deleted the final version of the script, since it was only a one-run thing, but here’s an earlier version that could be a good starting point for anyone else that has a similar need to prune revisions:

    /**
     * WP-CLI command that deletes pre-publish post revisions for posts older than 12 months.
     *
     * @subcommand prune-revisions [--live] [--verbose]
     */
    public function prune_revisions( $args, $assoc_args ) {
    	global $wpdb;
    
    	$live    = (bool) $assoc_args[ 'live' ];
    	$verbose = (bool) $assoc_args[ 'verbose' ];
    	$offset  = 0;
    	$limit   = 500;
    	$count   = 0;
    	$deletes = 0;
    
    	if ( $live ) {
    		$output_file = sanitize_file_name( sprintf( 'prune-revisions-backup_%d_%d.csv', get_bloginfo( 'name' ), current_time( 'timestamp', true ) ) );
    		$handle      = fopen( $output_file, 'wb' );
    
    		if ( false === $handle ) {
    			WP_CLI::error( sprintf( 'Error opening %s for writing!', $output_file ) );
    		}
    
    		// Headers.
    		$csv_headers = array(
    			'ID',
    			'post_author',
    			'post_date',
    			'post_date_gmt',
    			'post_content',
    			'post_title',
    			'post_excerpt',
    			'post_status',
    			'comment_status',
    			'ping_status',
    			'post_password',
    			'post_name',
    			'to_ping',
    			'pinged',
    			'post_modified',
    			'post_modified_gmt',
    			'post_content_filtered',
    			'post_parent',
    			'guid',
    			'menu_order',
    			'post_type',
    			'post_mime_type',
    			'comment_count',
    			'filter',
    		);
    
    		fputcsv(
    			$handle,
    			$csv_headers
    		);
    	}
    
    	$count_sql      = 'SELECT COUNT(ID) FROM ' . $wpdb->posts . ' WHERE post_type = "revision"';
    	$revision_count = (int) $wpdb->get_row( $count_sql, ARRAY_N )[0];
    	$progress       = \WP_CLI\Utils\make_progress_bar( sprintf( 'Checking %s revisions', number_format( $revision_count ) ), $revision_count );
    
    	do {
    		$sql       = $wpdb->prepare( 'SELECT ID FROM ' . $wpdb->posts . ' WHERE post_type = "revision" LIMIT %d,%d', $offset, $limit );
    		$revisions = $wpdb->get_results( $sql );
    
    		foreach ( $revisions as $revision ) {
    			$count++;
    			$post_parent_id = wp_get_post_parent_id( $revision->ID );
    
    			// Fail on either 0 or false.
    			if ( false == $post_parent_id ) {
    				WP_CLI::warning( sprintf( 'Revision %d does not have a parent!  Skipping!', $revision->ID ) );
    				continue;
    			}
    
    			$revision_modified   = get_post_modified_time( 'U', false, $revision->ID );
    			$parent_publish_time = get_post_time( 'U', false, $post_parent_id );
    
    			if ( $parent_publish_time < current_time( 'timestamp') - ( MONTH_IN_SECONDS * 12 ) ) {
    				// Post is older than 12 months, safe to delete pre-publish revisions.
    				if ( $revision_modified < $parent_publish_time ) {
    					if ( $live ) {
    						// We're doing it live!
    						WP_CLI::log( sprintf( 'Deleting revision %d for post %d. (%d%% done)', $revision->ID, $post_parent_id, ( $count / $revision_count ) * 100 ) );
    
    						// Backup data!
    						$output = [];
    						$data   = get_post( $revision->ID );
    
    						// Validate the field is set, just in case.  IDK how it couldn't be.
    						foreach ( $csv_headers as $field ) {
    							$output = isset( $data->$field ) ? $data->$field : '';
    						}
    
    						fputcsv( $handle, $output );
    
    						$did_delete = wp_delete_post_revision( $revision->ID );
    
    						// Something went wrong while deleting the revision?
    						if ( false === $did_delete || is_wp_error( $did_delete ) ) {
    							WP_CLI::warning( sprintf( 'Revision %d for post %d DID NOT DELETE! wp_delete_post_revision returned:', $revision->ID, $post_parent_id ) );
    						}
    						$deletes++;
    
    						// Pause after lots of db modifications.
    						if ( 0 === $deletes % 50 ) {
    							if ( $verbose ) {
    								WP_CLI::log( sprintf( 'Current Deletes: %d', $deletes ) );
    							}
    							sleep( 1 );
    						}
    					} else {
    						// Not live, just output info.
    						WP_CLI::log( sprintf( 'Will delete revision %d for post %d.', $revision->ID, $post_parent_id ) );
    					}
    				} else {
    					// Revision is after the post has been published.
    					if ( $verbose ) {
    						WP_CLI::log( sprintf( 'Post-Publish: Will NOT delete Revision %d for post %d.', $revision->ID, $post_parent_id ) );
    					}
    				}
    			} else {
    				// Post is too new to prune.
    				if ( $verbose ) {
    					WP_CLI::log( sprintf( 'Too-New: Will NOT delete Revision %d for post %d.', $revision->ID, $post_parent_id ) );
    				}
    			}
    		}
    
    		// Pause after lots of db reads.
    		if ( 0 === $count % 5000 ) {
    			if ( $verbose ) {
    				WP_CLI::log( sprintf( 'Current Count: %d', $count ) );
    			}
    			sleep( 1 );
    		}
    
    		// Free up memory.
    		$this->stop_the_insanity();
    
    		// Paginate.
    		if ( count( $revisions ) ) {
    			$offset += $limit;
    			$progress->tick( $limit );
    		} else {
    			WP_CLI::warning( 'Possible MySQL Error, retrying in 10 seconds!' );
    			sleep( 10 );
    		}
    
    	} while ( $count < $revision_count );
    
    	$progress->finish();
    
    	if ( $live ) {
    		fclose( $handle );
    		WP_CLI::success( sprintf( 'Deleted %d revisions', $deleted ) );
    	} else {
    		WP_CLI::success( sprintf( 'Processed %d revisions', $revision_count ) );
    	}
    }Code language: PHP (php)