Tag: quill meetings

  • Extract Transcript from Quill Meetings Files

    Extract Transcript from Quill Meetings Files

    I use Quill Meetings for local on-device transcriptions of calls. It’s pretty great!

    The app definitely has some quirks and is missing some features that I’d prefer, like the ability just export a text file of a call transcript. Sure, I can “copy” it and paste it into a file, but it’s missing things like timestamps:

    So I built a quick script to extract transcripts from .qm files for me. .qm files are basically just JSON files:

    #!/opt/homebrew/bin/php
    <?php
    declare(strict_types=1);
    
    error_reporting( E_ALL );
    ini_set( 'display_errors', '1' );
    
    // Quill export dir is first argument, or current directory if not provided.
    $export_dir = isset( $argv[1] ) ? rtrim( $argv[1], '/' ) : getcwd();
    
    // Find every file that ends in .qm in the export directory.
    $files = glob( $export_dir . '/*.qm' );
    if ( ! $files ) {
    	echo "No .qm files found in the directory: $export_dir\n";
    	exit( 1 );
    }
    
    /**
     * Each QM file is just a JSON file with a .qm extension and the first line being "QMv2"
     * We need to read each file, remove the first line, and decode the JSON.
     */
    foreach( $files as $file ) {
    	if ( ! is_readable( $file ) ) {
    		echo "Cannot read file: $file\n";
    		continue;
    	}
    
    	// Read the file and remove the first line.
    	$content = file_get_contents( $file );
    	if ( false === $content ) {
    		echo "Failed to read file: $file\n";
    		continue;
    	}
    
    	// Remove the first line (QMv2).
    	$lines = explode( "\n", $content );
    	array_shift( $lines ); // Remove the first line.
    	$json_content = implode( "\n", $lines );
    
    	// Decode the JSON content.
    	$data = json_decode( $json_content, true );
    	if ( null === $data && json_last_error() !== JSON_ERROR_NONE ) {
    		echo "Invalid JSON in file: $file\n";
    		continue;
    	}
    
    	// Pretty print the JSON data.
    	$pretty_json = json_encode( $data, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES );
    	if ( false === $pretty_json ) {
    		echo "Failed to encode JSON for file: $file\n";
    		continue;
    	}
    
    	$speakers = array();
    	$transcript = array();
    	$output_string = '';
    	$output_file = '';
    	foreach ( $data as $quill_objects => $quill_object ) {
    	    // Each Quill object is an array. We want to check if it has a 'type' of 'Meeting'.
    		if ( isset( $quill_object['type'] ) && $quill_object['type'] === 'Meeting' ) {
    			$output_file = $quill_object['data']['start'] . '-' . $quill_object['data']['end'] . ': ' . $quill_object['data']['title'] . '.txt';
    			// The "audio_transcript" is just a JSON string that we need to decode.
    			$audio_transcript = json_decode( $quill_object['data']['audio_transcript'], true );
    			$encoded_speakers = $quill_object['data']['speakers'] ?? [];
    			foreach( $encoded_speakers as $encoded_speaker ) {
    				$speakers[ $encoded_speaker['id'] ] = $encoded_speaker['name'] ?? 'Unknown Speaker ' . $encoded_speaker['id'];
    			}
    			if ( ! isset ( $audio_transcript['startTime'] ) ) {
    				echo "Invalid start time in audio transcript for file: $file\n";
    				continue;
    			}
    			$start_time = $audio_transcript['startTime'];
    			$end_time   = $audio_transcript['endTime'];
    			foreach( $audio_transcript['blocks'] as $block ) {
    				$time_block = ms_to_readable( $block['from'] - $start_time );
    				if ( isset( $block['speaker_id' ] ) ) {
    					$speaker_block = $speakers[ $block['speaker_id'] ];
    				} else {
    					echo 'Unkown Speaker found. Please manually mark all speakers in Quill before exporting.' . PHP_EOL;
    					die( 1 );
    				}
    				$output_string .= sprintf( "%s %s: %s\n", $time_block, $speaker_block, $block['text'] );
    			}
    		}
    	}
    
    	if ( ! empty( $output_string ) && ! empty( $output_file ) ) {
    		// Sanitize the filename.
    		$output_file = sanitize_filename( $output_file );
    		// Write the output string to the file.
    		if ( file_put_contents( $output_file, $output_string ) === false ) {
    			echo "Failed to write to file: $output_file\n";
    		} else {
    			echo "Exported to: $output_file\n";
    		}
    	} else {
    		echo "No valid Meeting data found in file: $file\n";
    	}
    }
    
    function ms_to_readable(int $ms): string {
    	// round to nearest second
    	$secs = (int) round($ms / 1000);
    	// gmdate formats seconds since 0 into H:i:s โ€” we just need i:s
    	return '[' . gmdate('i:s', $secs) . ']';
    }
    
    function sanitize_filename(string $filename): string {
    	// strip any path information
    	$fname = basename($filename);
    	// replace any character that is NOT a-z, 0-9, dot, hyphen or underscore with an underscore
    	$clean = preg_replace('/[^\w\.-]+/', '_', $fname);
    	// collapse multiple underscores
    	return preg_replace('/_+/', '_', $clean);
    }Code language: PHP (php)

    and when I say “I” wrote it, it was probably half AI ๐Ÿ™ƒ

    This gives me a nice text file with timestamps:

    So, yeah. Whatever.