Tag: sql

  • Syntax Highlighting SQL in Terminal

    Syntax Highlighting SQL in Terminal

    Do you ever find yourself doing some debugging with error_log() or its friends? Does that debugging ever involve SQL queries? Are you tired of staring at grey queries all the time? I have just the product for you!

    Introducing Syntax Highlighting SQL in Terminal! Brought to you by our friends at Large Language Models, Incorporated, this new PHP function will use ANSI escape sequences to colorize your SQL queries for easier viewing and debugging!

    <?php
    
    /**
     * Highlights SQL queries.
     *
     * This method takes a SQL query as input and returns the query with syntax highlighting applied.
     *
     * @param string $sql The SQL query to highlight.
     *
     * @return string The highlighted SQL query.
     */
    function highlight_sql( string $sql ): string {
    	$keywords = array(
    		'SELECT',
    		'FROM',
    		'WHERE',
    		'AND',
    		'OR',
    		'INSERT',
    		'INTO',
    		'VALUES',
    		'UPDATE',
    		'SET',
    		'DELETE',
    		'CREATE',
    		'TABLE',
    		'ALTER',
    		'DROP',
    		'JOIN',
    		'INNER',
    		'LEFT',
    		'RIGHT',
    		'ON',
    		'AS',
    		'DISTINCT',
    		'GROUP',
    		'BY',
    		'ORDER',
    		'HAVING',
    		'LIMIT',
    		'OFFSET',
    		'UNION',
    		'ALL',
    		'COUNT',
    		'SUM',
    		'AVG',
    		'MIN',
    		'MAX',
    		'LIKE',
    		'IN',
    		'BETWEEN',
    		'IS',
    		'NULL',
    		'NOT',
    		'PRIMARY',
    		'KEY',
    		'FOREIGN',
    		'REFERENCES',
    		'DEFAULT',
    		'AUTO_INCREMENT',
    	);
    
    	$functions = array(
    		'COUNT',
    		'SUM',
    		'AVG',
    		'MIN',
    		'MAX',
    		'NOW',
    		'CURDATE',
    		'CURTIME',
    		'DATE',
    		'TIME',
    		'YEAR',
    		'MONTH',
    		'DAY',
    		'HOUR',
    		'MINUTE',
    		'SECOND',
    		'TIMESTAMP',
    	);
    
    	$colors = array(
    		'keyword'  => "\033[1;34m", // Blue.
    		'function' => "\033[1;32m", // Green.
    		'string'   => "\033[1;33m", // Yellow.
    		'comment'  => "\033[1;90m", // Bright Black (Gray).
    		'reset'    => "\033[0m",
    	);
    
    	// Highlight comments.
    	$sql = preg_replace( '/\/\*.*?\*\//s', $colors['comment'] . '$0' . $colors['reset'], $sql );
    
    	// Highlight single-quoted strings.
    	$sql = preg_replace( '/\'[^\']*\'/', $colors['string'] . '$0' . $colors['reset'], $sql );
    
    	// Highlight double-quoted strings.
    	$sql = preg_replace( '/"[^"]*"/', $colors['string'] . '$0' . $colors['reset'], $sql );
    
    	// Highlight functions.
    	foreach ( $functions as $function ) {
    		$sql = preg_replace( '/\b' . preg_quote($function, '/') . '\b/i', $colors['function'] . '$0' . $colors['reset'], $sql );
    	}
    
    	// Highlight keywords.
    	foreach ( $keywords as $keyword ) {
    		$sql = preg_replace( '/\b' . preg_quote($keyword, '/') . '\b/i', $colors['keyword'] . '$0' . $colors['reset'], $sql );
    	}
    
    	return $sql;
    }
    
    echo highlight_sql( 'SELECT COUNT(*) AS total FROM users WHERE id = 1 AND user_id = "example"; /* YEAH! Comment! */' ) . PHP_EOL;Code language: PHP (php)

    Don’t blame me if this breaks anything, use at your own debugging risk.

  • Gathering database performance with WP-CLI

    Gathering database performance with WP-CLI

    Recently at work, my team was asked to help gather data about database server performance before and after an upgrade. To help with this, we collected a number of heavy database pages on some WordPress sites, dumped every query running to generate the page, and grabbed them to profile.

    I whipped up this quick and dirty WP-CLI command that will run a list of SQL queries 1,000 times and give you the minimum time, maximum time, average time, and standard deviation for each query.

    With this data, you can re-run the same queries again after a server change to determine if there has been any major SQL performance differences:

    /**
     * Profiles DB performance by running SQL queries and returning timing statistics.
     */
    public function db_profile( $args, $assoc_args ) {
    	$format = WP_CLI\Utils\get_flag_value( $assoc_args, 'format', 'table' );
    
    // Put all of your queries here!
    $site_sql_lines = <<<END
    SHOW TABLES LIKE 'wp_posts';
    END;
    
    	$lines = explode( PHP_EOL, $site_sql_lines );
    
    	global $wpdb;
    
    	$stats         = array();
    	$total_time_us = 0;
    	$runs          = 100;
    	$progress      = \WP_CLI\Utils\make_progress_bar( sprintf( 'Running %s Queries', number_format( count( $lines ) * $runs ) ), count( $lines ) * $runs );
    
    	for ( $run = 1; $run <= $runs; $run++ ) {
    		foreach ( $lines as $index => $line ) {
    			if ( ! isset( $stats[ $index ] ) || ! is_array( $stats[ $index ] ) ) {
    				$stats[ $index ] = array(
    					'query' => $line,
    					'runs'  => array(),
    				);
    			}
    
    			$start_time = hrtime( true );
    			$results    = count( $wpdb->get_results( $line, ARRAY_N ) ); // phpcs:ignore WordPress.DB.DirectDatabaseQuery, WordPress.DB.PreparedSQL.NotPrepared
    			$end_time   = hrtime( true );
    			$time_us    = ( $end_time - $start_time ) / 1000;
    
    			$total_time_us += $time_us;
    
    			$stats[ $index ]['runs'][ $run ] = $time_us;
    			$progress->tick();
    		}
    	}
    
    	$data = array();
    
    	foreach ( $stats as $index => $stat ) {
    		$stats[ $index ]['stats'] = array(
    			'min' => min( $stats[ $index ]['runs'] ),
    			'max' => max( $stats[ $index ]['runs'] ),
    			'avg' => array_sum( $stats[ $index ]['runs'] ) / count( $stats[ $index ]['runs'] ),
    		);
    
    		$data[ $index ] = array(
    			'Query'  => $stat['query'],
    			'Min'    => number_format( (float) $stats[ $index ]['stats']['min'] / 1000, 3 ),
    			'Max'    => number_format( (float) $stats[ $index ]['stats']['max'] / 1000, 3 ),
    			'Avg'    => number_format( (float) $stats[ $index ]['stats']['avg'] / 1000, 3 ),
    			'StdDev' => number_format( (float) $this->stats_standard_deviation( $stats[ $index ]['runs'] ) / 1000, 3 ),
    		);
    
    		if ( 'csv' === $format ) {
    			$data[ $index ]['Runs'] = wp_json_encode( $stats[ $index ]['runs'] );
    		}
    	}
    
    	$progress->finish();
    
    	// Output data.
    	WP_CLI\Utils\format_items( $format, $data, array_keys( $data[0] ) );
    
    	// Output totals if we're not piping somewhere.
    	if ( ! WP_CLI\Utils\isPiped() ) {
    		WP_CLI::success(
    			sprintf(
    				'Total queries ran: %s, DB Time Taken: %s',
    				WP_CLI::colorize( '%g' . number_format( count( $lines ) * $runs, 0 ) . '%n' ),
    				WP_CLI::colorize( '%g' . $this->convert_to_human_readable( $total_time_us ) . '%n' ),
    			)
    		);
    	}
    }
    
    Code language: PHP (php)

    You can then run it and gather the data as a table, or as a CSV file (--format=csv) which is much more likely to be useful.

  • Quick Tip: Get Size of Revisions in WordPress

    Quick Tip: Get Size of Revisions in WordPress

    One thing that you might not think of when watching the size of a large WordPress site grow, is unnecessary data in the database. With the introduction of the block editor years ago, there has been a large increase in the number of revisions a post makes when being edited.

    This can create a lot of revisions in the database if you’re not setting a limit.

    If you’d like to do a quick and dirty audit of your revision data, you can use a very ugly SQL query like this:

    SELECT COUNT( ID ) as revision_count, ( SUM( CHAR_LENGTH( ID ) ) + SUM( CHAR_LENGTH( post_author ) ) + SUM( CHAR_LENGTH( post_date ) ) + SUM( CHAR_LENGTH( post_date_gmt ) ) + SUM( CHAR_LENGTH( post_content ) ) + SUM( CHAR_LENGTH( post_title ) ) + SUM( CHAR_LENGTH( post_excerpt ) ) + SUM( CHAR_LENGTH( post_status ) ) + SUM( CHAR_LENGTH( comment_status ) ) + SUM( CHAR_LENGTH( ping_status ) ) + SUM( CHAR_LENGTH( post_password ) ) + SUM( CHAR_LENGTH( post_name ) ) + SUM( CHAR_LENGTH( to_ping ) ) + SUM( CHAR_LENGTH( pinged ) ) + SUM( CHAR_LENGTH( post_modified ) ) + SUM( CHAR_LENGTH( post_modified_gmt ) ) + SUM( CHAR_LENGTH( post_content_filtered ) ) + SUM( CHAR_LENGTH( post_parent ) ) + SUM( CHAR_LENGTH( guid ) ) + SUM( CHAR_LENGTH( menu_order ) ) + SUM( CHAR_LENGTH( post_type ) ) + SUM( CHAR_LENGTH( post_mime_type ) ) + SUM( CHAR_LENGTH( comment_count ) ) ) as post_size FROM wp_posts WHERE post_type='revision' GROUP BY post_type ORDER BY revision_count DESC;Code language: JavaScript (javascript)

    This will give you the number of revisions you have, and the approximate amount of data its using in the database:

    revision_count post_size
    441419 2842450412

    You can see now a very large WordPress site can amass a lot of unnecessary data in its database over a number of years. 2.8 Gigabytes of revisions is a lot of stuff if you’re never going to use them again.

  • Getting WordPress Database Size via WP-CLI

    Getting WordPress Database Size via WP-CLI

    One WP-CLI command that I’ve found handy is this db-size command. It allows you to output a site’s registered database tables along with the data and index size in any format that WP-CLI natively supports, with multiple sort options:

    /**
     * Gets size of database tables for the current site.
     *
     * ## OPTIONS
     *
     * [--raw]
     * : Outputs full size in bytes instead of human readable sizes.
     *
     * [--order_by=<Total><total>]
     * : Allows custom ordering of the data.
     * ---
     * default: Total
     * options:
     *   - Table
     *   - Data Size
     *   - Index Size
     *   - Total
     * ---
     *
     * [--order=<asc><asc>]
     * : Allows custom ordering direction of the data.
     * ---
     * default: asc
     * options:
     *   - asc
     *   - desc
     * ---
     *
     * [--format=<format><format>]
     * : Render output in a particular format.
     * ---
     * default: table
     * options:
     *   - table
     *   - csv
     *   - json
     *   - count
     *   - yaml
     * ---
     *
     * @subcommand db-size
     */
    public function db_size( $args, $assoc_args ) {
    	global $wpdb;
    
    	$output   = array();
    	$db_size  = array();
    	$order_by = WP_CLI\Utils\get_flag_value( $assoc_args, 'order_by', 'Total' );
    	$order    = WP_CLI\Utils\get_flag_value( $assoc_args, 'order', 'asc' );
    	$format   = WP_CLI\Utils\get_flag_value( $assoc_args, 'format', 'table' );
    	$raw      = (bool) WP_CLI\Utils\get_flag_value( $assoc_args, 'raw', false );
    
    	// Fetch list of tables from database.
    	$tables = array_map(
    		function( $val ) {
    			return $val[0];
    		},
    		$wpdb->get_results( 'SHOW TABLES;', ARRAY_N ) // phpcs:ignore WordPress.DB.DirectDatabaseQuery
    	);
    
    	// Fetch table information from database.
    	$report = array_map(
    		function( $table ) use ( $wpdb ) { // phpcs:ignore WordPress.DB.DirectDatabaseQuery
    			return $wpdb->get_row( "SHOW TABLE STATUS LIKE '$table'" ); // phpcs:ignore WordPress.DB.PreparedSQL.InterpolatedNotPrepared
    		},
    		$tables
    	);
    
    	foreach ( $report as $table ) {
    		// Keep a running total of sizes.
    		$db_size['data']  += $table->Data_length; // phpcs:ignore WordPress.NamingConventions.ValidVariableName.UsedPropertyNotSnakeCase
    		$db_size['index'] += $table->Index_length; // phpcs:ignore WordPress.NamingConventions.ValidVariableName.UsedPropertyNotSnakeCase
    
    		// Format output for WP-CLI's format_items function.
    		$output[] = array(
    			'Table'      => $table->Name, // phpcs:ignore WordPress.NamingConventions.ValidVariableName.UsedPropertyNotSnakeCase
    			'Data Size'  => $table->Data_length, // phpcs:ignore WordPress.NamingConventions.ValidVariableName.UsedPropertyNotSnakeCase
    			'Index Size' => $table->Index_length, // phpcs:ignore WordPress.NamingConventions.ValidVariableName.UsedPropertyNotSnakeCase
    			'Total'      => $table->Data_length + $table->Index_length, // phpcs:ignore WordPress.NamingConventions.ValidVariableName.UsedPropertyNotSnakeCase
    		);
    	}
    
    	// Sort table data.
    	self::sort_table_by( $order_by, $output, $order );
    
    	if ( ! $raw ) {
    		// Make data human readable.
    		foreach ( array_keys( $output ) as $key ) {
    			$output[ $key ]['Data Size']  = size_format( $output[ $key ]['Data Size'] );
    			$output[ $key ]['Index Size'] = size_format( $output[ $key ]['Index Size'] );
    			$output[ $key ]['Total']      = size_format( $output[ $key ]['Total'] );
    		}
    	}
    
    	// Output data.
    	WP_CLI\Utils\format_items( $format, $output, array( 'Table', 'Data Size', 'Index Size', 'Total' ) );
    
    	// Output totals if we're not piping somewhere.
    	if ( ! WP_CLI\Utils\isPiped() ) {
    		WP_CLI::success(
    			sprintf(
    				'Total size of the database for %s is %s. Data: %s; Index: %s',
    				home_url(),
    				WP_CLI::colorize( '%g' . size_format( $db_size['data'] + $db_size['index'] ) . '%n' ),
    				WP_CLI::colorize( '%g' . size_format( $db_size['data'] ) . '%n' ),
    				WP_CLI::colorize( '%g' . size_format( $db_size['index'] ) . '%n' )
    			)
    		);
    	}
    }
    
    /**
     * Sorts a table by a specific field and direction.
     *
     * @param string $field The field to order by.
     * @param array  &$array The table array to sort.
     * @param string $direction The direction to sort. Ascending ('asc') or descending ('desc').
     */
    private function sort_table_by( $field, &$array, $direction ) {
    	// Taken from https://joshtronic.com/2013/09/23/sorting-associative-array-specific-key/ Thanks!
    	usort(
    		$array,
    		function ( $a, $b ) use ( $field, $direction ) {
    			$a = $a[ $field ];
    			$b = $b[ $field ];
    
    			if ( $a === $b ) {
    				return 0;
    			}
    
    			switch ( $direction ) {
    				case 'asc':
    					return ( $a < $b ) ? -1 : 1;
    				case 'desc':
    					return ( $a > $b ) ? -1 : 1;
    				default:
    					return 0;
    			}
    		}
    	);
    	return true;
    }</format></asc></total>Code language: PHP (php)

    Here’s some example output from one of my test sites:

    $ wp test db-size
     +----------------------------+-----------+------------+-------+
     | Table                      | Data Size | Index Size | Total |
     +----------------------------+-----------+------------+-------+
     | wp_mlp_relationships       | 16KB      | 0B         | 16KB  |
     | wp_redacted_table          | 16KB      | 0B         | 16KB  |
     | wp_redacted_table          | 16KB      | 0B         | 16KB  |
     | wp_3_links                 | 16KB      | 16KB       | 32KB  |
     | wp_term_relationships      | 16KB      | 16KB       | 32KB  |
     | wp_site                    | 16KB      | 16KB       | 32KB  |
     | wp_registration_log        | 16KB      | 16KB       | 32KB  |
     | wp_mlp_site_relations      | 16KB      | 16KB       | 32KB  |
     | wp_mlp_languages           | 16KB      | 16KB       | 32KB  |
     | wp_mlp_content_relations   | 16KB      | 16KB       | 32KB  |
     | wp_links                   | 16KB      | 16KB       | 32KB  |
     | wp_redacted_table          | 16KB      | 16KB       | 32KB  |
     | wp_3_term_relationships    | 16KB      | 16KB       | 32KB  |
     | wp_blog_versions           | 16KB      | 16KB       | 32KB  |
     | wp_2_links                 | 16KB      | 16KB       | 32KB  |
     | wp_2_term_relationships    | 16KB      | 16KB       | 32KB  |
     | wp_2_term_taxonomy         | 16KB      | 32KB       | 48KB  |
     | wp_2_commentmeta           | 16KB      | 32KB       | 48KB  |
     | wp_2_postmeta              | 16KB      | 32KB       | 48KB  |
     | wp_term_taxonomy           | 16KB      | 32KB       | 48KB  |
     | wp_3_commentmeta           | 16KB      | 32KB       | 48KB  |
     | wp_2_termmeta              | 16KB      | 32KB       | 48KB  |
     | wp_2_terms                 | 16KB      | 32KB       | 48KB  |
     | wp_termmeta                | 16KB      | 32KB       | 48KB  |
     | wp_commentmeta             | 16KB      | 32KB       | 48KB  |
     | wp_blogmeta                | 16KB      | 32KB       | 48KB  |
     | wp_blogs                   | 16KB      | 32KB       | 48KB  |
     | wp_2_a8c_cron_control_jobs | 16KB      | 32KB       | 48KB  |
     | wp_redacted_table          | 16KB      | 32KB       | 48KB  |
     | wp_3_terms                 | 16KB      | 32KB       | 48KB  |
     | wp_3_termmeta              | 16KB      | 32KB       | 48KB  |
     | wp_3_term_taxonomy         | 16KB      | 32KB       | 48KB  |
     | wp_redacted_table          | 16KB      | 32KB       | 48KB  |
     | wp_terms                   | 16KB      | 32KB       | 48KB  |
     | wp_3_postmeta              | 16KB      | 32KB       | 48KB  |
     | wp_usermeta                | 16KB      | 32KB       | 48KB  |
     | wp_sitemeta                | 16KB      | 32KB       | 48KB  |
     | wp_users                   | 16KB      | 48KB       | 64KB  |
     | wp_postmeta                | 16KB      | 48KB       | 64KB  |
     | wp_posts                   | 16KB      | 64KB       | 80KB  |
     | wp_signups                 | 16KB      | 64KB       | 80KB  |
     | wp_2_posts                 | 16KB      | 64KB       | 80KB  |
     | wp_3_posts                 | 16KB      | 64KB       | 80KB  |
     | wp_2_comments              | 16KB      | 80KB       | 96KB  |
     | wp_comments                | 16KB      | 80KB       | 96KB  |
     | wp_3_comments              | 16KB      | 80KB       | 96KB  |
     | wp_2_options               | 80KB      | 32KB       | 112KB |
     | wp_3_options               | 80KB      | 32KB       | 112KB |
     | wp_options                 | 176KB     | 32KB       | 208KB |
     +----------------------------+-----------+------------+-------+
     Success: Total size of the database for https://www.example.com is 2.6MB. Data: 1MB; Index: 1.5MBCode language: JavaScript (javascript)

    Enjoy!

  • Quick Tip: Export WordPress SQL output via WP-CLI

    Quick Tip: Export WordPress SQL output via WP-CLI

    If for some reason you can’t run wp db query, but need to export SQL output to a CSV or other file, then have a look at this small WP-CLI command I whipped up that should allow this:

    /**
     * Runs a SQL query against the site database.
     *
     * ## OPTIONS
     *
     * 
     * : SQL Query to run.
     *
     * [--format=]
     * : Render output in a particular format.
     * ---
     * default: table
     * options:
     * - table
     * - csv
     * - json
     * - count
     * - yaml
     * ---
     *
     * [--dry-run=]
     * : Performa a dry run
     *
     * @subcommand sql
     */
    public function sql( $args, $assoc_args ) {
         global $wpdb;
     
         $sql     = $args[0];
         $format  = WP_CLI\Utils\get_flag_value( $assoc_args, 'format', 'table' );
         $dry_run = WP_CLI\Utils\get_flag_value( $assoc_args, 'dry-run', 'true' );
     
         // Just some precautions.
         if ( preg_match( '/[update|delete|drop|insert|create|alter|rename|truncate|replace]/i', $sql ) ) {
             WP_CLI::error( 'Please do not modify the database with this command.' );
         }
     
         if ( 'false' !== $dry_run ) {
             WP_CLI::log( WP_CLI::colorize( '%gDRY-RUN%n: <code>EXPLAIN</code> of the query is below: https://mariadb.com/kb/en/explain/' ) );
             $sql = 'EXPLAIN ' . $sql;
         }
     
         // Fetch results from database.
         $results = $wpdb->get_results( $sql, ARRAY_A ); // phpcs:ignore WordPress.DB
     
         // Output data.
         WP_CLI\Utils\format_items( $format, $results, array_keys( $results[0] ) );
    }Code language: PHP (php)

    I’d add an example here, but I don’t have any right now that I can share 😐 I’ll try to find one later (don’t hold your breath on me remembering to do that)

  • phpMyAdmin and MariaDB with Docker

    phpMyAdmin and MariaDB with Docker

    I used to run a MariaDB server on an old Linux machine for working with database dumps and other things, but after moving all of that to a new Raspberry Pi 4, I never really set it back up.

    So instead I thought I’d play with this fancy new Docker stuff all the cool kids are talking about. I ended up getting phpMyAdmin and MariaDB working with Docker on my MacBook Pro.

    Caveat:

    To start up the MariaDB container thing, I ran this command:

    docker run -d \
     -v $HOME:/home/hosthome \
     --name mariadb \
     -e MYSQL_ROOT_PASSWORD=hunter2 \
     -e MYSQL_DATABASE='default_db' \
     mariadbCode language: JavaScript (javascript)

    A few things here you might want to make note of:

    • hunter2 – My MariaDB root password.
    • default_db – The default database created, just to make things easy.
    • $HOME – This attached my local machine’s home directory to the MariaDB container so I can import/export files.

    Secondly, I ran this to start phpMyAdmin and connect it to the Docker container:

    docker run -d \
     --name phpmyadmin \
     --link mariadb:db \
     -p 8081:80 \
     -e UPLOAD_LIMIT='4096M' \
     phpmyadmin/phpmyadminCode language: JavaScript (javascript)

    A few things here you might want to make note of:

    • 8081 – This is the local machine port that I will connect to via HTTP
    • 4096M – The default upload limit, set to 4 Gigs.

    Now, once this done, you should be able to connect to phpMyAdmin via http://localhost:8081/ and do all sorts of terrible things.

    Here’s a few more commands that may come in handy:

    Start a shell in the MariaDB container: docker exec -it mariadb bash

    Import a SQL file: docker exec -i mariadb sh -c 'exec mysql -uroot -phunter2 default_db' < /some/path/on/your/host/all-databases.sql

    Start a MariaDB “mysql” shell: docker exec -it mariadb sh -c 'exec mysql -uroot -phunter2 default_db'

    References that helped me:

  • Converting CSV to SQL

    Converting CSV to SQL

    I was recently working on an issue that required me to dig through gigabytes of CSV files to look for patterns and data points. I had no intention of bringing all of this data in to Excel, because there’s not enough RAM on earth for that to work. Instead I thought it would be easier to dump the data into MariaDB (MySQL would be fine as well) and just write some simple SQL statements.

    There were a few things I tried, such as LOAD DATA INFILE and other SQL type things. Nothing really worked properly, so instead I decided to just convert the CSV files into SQL statements I could import.

    This is a bad idea! Don’t ever let this touch a live production database!

    I wrote this little PHP script to convert all of the CSV files in a directory to SQL:

    <?php
    $files    = glob( './*.{csv}', GLOB_BRACE );
    $db_table = 'csv_table';
    
    foreach ( $files as $file ) {
    	$row       = 0;
    	$handle    = fopen( $file, 'rb' );
    	$row_names = '';
    
    	if ( false !== $handle ) {
    		while ( false !== ( $data = fgetcsv( $handle ) ) ) {
    			$row++;
    			$values = '';
    
    			if ( 1 === $row ) {
    				// Do Header things.
    				$index  = 0;
    
    				echo sprintf( 'DROP TABLE IF EXISTS `%s`;' . PHP_EOL, $db_table );
    				echo sprintf( 'CREATE TABLE `%s` (' . PHP_EOL, $db_table );
    
    				foreach ( $data as $key => $value ) {
    					$index++;
    					$data[ $key ] = strtolower( preg_replace( '/[^A-Za-z0-9]/', '_', $value ) );
    
    					// Add a comma if we're not at the end of the array.
    					if ( count( $data ) !== $index ) {
    						echo sprintf( '	`%s` text NOT NULL,' . PHP_EOL, $data[ $key ] );
    					} else {
    						echo sprintf( '	`%s` text NOT NULL' . PHP_EOL, $data[ $key ] );
    					}
    				}
    
    				echo ') ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;' . PHP_EOL;
    				$row_names = '(`' . implode( '`,`', $data ) . '`)';
    				continue;
    			}
    
    			foreach ( $data as $key => $value ) {
    				$data[ $key ] = fake_sql_escape( $value );
    			}
    
    			$values = '(\'' . implode( '\',\'', $data ) . '\')';
    
    			echo sprintf( 'INSERT INTO `%s` %s VALUES %s;' . PHP_EOL, $db_table, $row_names, $values );
    		}
    		fclose( $handle );
    	}
    }
    
    function fake_sql_escape( $value ) {
    	// https://stackoverflow.com/questions/1162491/alternative-to-mysql-real-escape-string-without-connecting-to-db/1163091#1163091
    	// Totaly not safe for production!
    	$return = '';
    	$length = strlen( $value );
    
    	for ( $i = 0; $i < $length; ++$i ) {
    		$char = $value[ $i ];
    		$ord  = ord( $char );
    		if (
    			"'" !== $char &&
    			'"' !== $char &&
    			'\\' !== $char &&
    			$ord >= 32 &&
    			$ord <= 126
    		) {
    			$return .= $char;
    		} else {
    			$return .= '\\x' . dechex( $ord );
    		}
    	}
    	return $return;
    }Code language: PHP (php)

    Borrowing the example airtravel.csv from https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html this is what the output looks like:

    DROP TABLE IF EXISTS `csv_table`;
    CREATE TABLE `csv_table` (
    	`month` text NOT NULL,
    	`1958` text NOT NULL,
    	`1959` text NOT NULL,
    	`1960` text NOT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
    INSERT INTO `csv_table` (`month`,`1958`,`1959`,`1960`,`type`) VALUES ('JAN','  340','  360','  417');
    INSERT INTO `csv_table` (`month`,`1958`,`1959`,`1960`,`type`) VALUES ('FEB','  318','  342','  391');
    INSERT INTO `csv_table` (`month`,`1958`,`1959`,`1960`,`type`) VALUES ('MAR','  362','  406','  419');
    INSERT INTO `csv_table` (`month`,`1958`,`1959`,`1960`,`type`) VALUES ('APR','  348','  396','  461');
    INSERT INTO `csv_table` (`month`,`1958`,`1959`,`1960`,`type`) VALUES ('MAY','  363','  420','  472');
    INSERT INTO `csv_table` (`month`,`1958`,`1959`,`1960`,`type`) VALUES ('JUN','  435','  472','  535');
    INSERT INTO `csv_table` (`month`,`1958`,`1959`,`1960`,`type`) VALUES ('JUL','  491','  548','  622');
    INSERT INTO `csv_table` (`month`,`1958`,`1959`,`1960`,`type`) VALUES ('AUG','  505','  559','  606');
    INSERT INTO `csv_table` (`month`,`1958`,`1959`,`1960`,`type`) VALUES ('SEP','  404','  463','  508');
    INSERT INTO `csv_table` (`month`,`1958`,`1959`,`1960`,`type`) VALUES ('OCT','  359','  407','  461');
    INSERT INTO `csv_table` (`month`,`1958`,`1959`,`1960`,`type`) VALUES ('NOV','  310','  362','  390');
    INSERT INTO `csv_table` (`month`,`1958`,`1959`,`1960`,`type`) VALUES ('DEC','  337','  405','  432');Code language: JavaScript (javascript)

    And if you’re too lazy to see what the input CSV looks like (I would be) here it is, in its LGPL glory:

    "Month", "1958", "1959", "1960"
    "JAN",  340,  360,  417
    "FEB",  318,  342,  391
    "MAR",  362,  406,  419
    "APR",  348,  396,  461
    "MAY",  363,  420,  472
    "JUN",  435,  472,  535
    "JUL",  491,  548,  622
    "AUG",  505,  559,  606
    "SEP",  404,  463,  508
    "OCT",  359,  407,  461
    "NOV",  310,  362,  390
    "DEC",  337,  405,  432Code language: JavaScript (javascript)

    There’s a lot of ways that this could be made better, but it works in a quick pinch. If you need something similar, feel free to take this as a starting point.

  • Deleting Old Post Revisions in WordPress with WP-CLI

    Deleting Old Post Revisions in WordPress with WP-CLI

    Recently I’ve been working with a client who’s site we’re going to soon be migrating. To help with any downtime, we’ve been looking at reducing their database size, which is something around 50-60 gigabytes. After looking through the database, one easy win would be to purge as many post revisions as possible, since their wp_posts table was about 50% revisions 😱

    We couldn’t just mass delete anything with a revision post type though, because the client had some specific needs:

    • Keep all revisions for posts from the last 12 months.
    • Keep all revisions that were made after the post was published.
    • Keep a backup of all revisions deleted.

    To do this, I crafted a custom WP-CLI command to purge these revisions. I’ve accidentally deleted the final version of the script, since it was only a one-run thing, but here’s an earlier version that could be a good starting point for anyone else that has a similar need to prune revisions:

    /**
     * WP-CLI command that deletes pre-publish post revisions for posts older than 12 months.
     *
     * @subcommand prune-revisions [--live] [--verbose]
     */
    public function prune_revisions( $args, $assoc_args ) {
    	global $wpdb;
    
    	$live    = (bool) $assoc_args[ 'live' ];
    	$verbose = (bool) $assoc_args[ 'verbose' ];
    	$offset  = 0;
    	$limit   = 500;
    	$count   = 0;
    	$deletes = 0;
    
    	if ( $live ) {
    		$output_file = sanitize_file_name( sprintf( 'prune-revisions-backup_%d_%d.csv', get_bloginfo( 'name' ), current_time( 'timestamp', true ) ) );
    		$handle      = fopen( $output_file, 'wb' );
    
    		if ( false === $handle ) {
    			WP_CLI::error( sprintf( 'Error opening %s for writing!', $output_file ) );
    		}
    
    		// Headers.
    		$csv_headers = array(
    			'ID',
    			'post_author',
    			'post_date',
    			'post_date_gmt',
    			'post_content',
    			'post_title',
    			'post_excerpt',
    			'post_status',
    			'comment_status',
    			'ping_status',
    			'post_password',
    			'post_name',
    			'to_ping',
    			'pinged',
    			'post_modified',
    			'post_modified_gmt',
    			'post_content_filtered',
    			'post_parent',
    			'guid',
    			'menu_order',
    			'post_type',
    			'post_mime_type',
    			'comment_count',
    			'filter',
    		);
    
    		fputcsv(
    			$handle,
    			$csv_headers
    		);
    	}
    
    	$count_sql      = 'SELECT COUNT(ID) FROM ' . $wpdb->posts . ' WHERE post_type = "revision"';
    	$revision_count = (int) $wpdb->get_row( $count_sql, ARRAY_N )[0];
    	$progress       = \WP_CLI\Utils\make_progress_bar( sprintf( 'Checking %s revisions', number_format( $revision_count ) ), $revision_count );
    
    	do {
    		$sql       = $wpdb->prepare( 'SELECT ID FROM ' . $wpdb->posts . ' WHERE post_type = "revision" LIMIT %d,%d', $offset, $limit );
    		$revisions = $wpdb->get_results( $sql );
    
    		foreach ( $revisions as $revision ) {
    			$count++;
    			$post_parent_id = wp_get_post_parent_id( $revision->ID );
    
    			// Fail on either 0 or false.
    			if ( false == $post_parent_id ) {
    				WP_CLI::warning( sprintf( 'Revision %d does not have a parent!  Skipping!', $revision->ID ) );
    				continue;
    			}
    
    			$revision_modified   = get_post_modified_time( 'U', false, $revision->ID );
    			$parent_publish_time = get_post_time( 'U', false, $post_parent_id );
    
    			if ( $parent_publish_time < current_time( 'timestamp') - ( MONTH_IN_SECONDS * 12 ) ) {
    				// Post is older than 12 months, safe to delete pre-publish revisions.
    				if ( $revision_modified < $parent_publish_time ) {
    					if ( $live ) {
    						// We're doing it live!
    						WP_CLI::log( sprintf( 'Deleting revision %d for post %d. (%d%% done)', $revision->ID, $post_parent_id, ( $count / $revision_count ) * 100 ) );
    
    						// Backup data!
    						$output = [];
    						$data   = get_post( $revision->ID );
    
    						// Validate the field is set, just in case.  IDK how it couldn't be.
    						foreach ( $csv_headers as $field ) {
    							$output = isset( $data->$field ) ? $data->$field : '';
    						}
    
    						fputcsv( $handle, $output );
    
    						$did_delete = wp_delete_post_revision( $revision->ID );
    
    						// Something went wrong while deleting the revision?
    						if ( false === $did_delete || is_wp_error( $did_delete ) ) {
    							WP_CLI::warning( sprintf( 'Revision %d for post %d DID NOT DELETE! wp_delete_post_revision returned:', $revision->ID, $post_parent_id ) );
    						}
    						$deletes++;
    
    						// Pause after lots of db modifications.
    						if ( 0 === $deletes % 50 ) {
    							if ( $verbose ) {
    								WP_CLI::log( sprintf( 'Current Deletes: %d', $deletes ) );
    							}
    							sleep( 1 );
    						}
    					} else {
    						// Not live, just output info.
    						WP_CLI::log( sprintf( 'Will delete revision %d for post %d.', $revision->ID, $post_parent_id ) );
    					}
    				} else {
    					// Revision is after the post has been published.
    					if ( $verbose ) {
    						WP_CLI::log( sprintf( 'Post-Publish: Will NOT delete Revision %d for post %d.', $revision->ID, $post_parent_id ) );
    					}
    				}
    			} else {
    				// Post is too new to prune.
    				if ( $verbose ) {
    					WP_CLI::log( sprintf( 'Too-New: Will NOT delete Revision %d for post %d.', $revision->ID, $post_parent_id ) );
    				}
    			}
    		}
    
    		// Pause after lots of db reads.
    		if ( 0 === $count % 5000 ) {
    			if ( $verbose ) {
    				WP_CLI::log( sprintf( 'Current Count: %d', $count ) );
    			}
    			sleep( 1 );
    		}
    
    		// Free up memory.
    		$this->stop_the_insanity();
    
    		// Paginate.
    		if ( count( $revisions ) ) {
    			$offset += $limit;
    			$progress->tick( $limit );
    		} else {
    			WP_CLI::warning( 'Possible MySQL Error, retrying in 10 seconds!' );
    			sleep( 10 );
    		}
    
    	} while ( $count < $revision_count );
    
    	$progress->finish();
    
    	if ( $live ) {
    		fclose( $handle );
    		WP_CLI::success( sprintf( 'Deleted %d revisions', $deleted ) );
    	} else {
    		WP_CLI::success( sprintf( 'Processed %d revisions', $revision_count ) );
    	}
    }Code language: PHP (php)