prickly-quilled North American porcupine hangs

Extract Transcript from Quill Meetings Files

I use Quill Meetings for local on-device transcriptions of calls. It’s pretty great!

The app definitely has some quirks and is missing some features that I’d prefer, like the ability just export a text file of a call transcript. Sure, I can “copy” it and paste it into a file, but it’s missing things like timestamps:

So I built a quick script to extract transcripts from .qm files for me. .qm files are basically just JSON files:

#!/opt/homebrew/bin/php
<?php
declare(strict_types=1);

error_reporting( E_ALL );
ini_set( 'display_errors', '1' );

// Quill export dir is first argument, or current directory if not provided.
$export_dir = isset( $argv[1] ) ? rtrim( $argv[1], '/' ) : getcwd();

// Find every file that ends in .qm in the export directory.
$files = glob( $export_dir . '/*.qm' );
if ( ! $files ) {
	echo "No .qm files found in the directory: $export_dir\n";
	exit( 1 );
}

/**
 * Each QM file is just a JSON file with a .qm extension and the first line being "QMv2"
 * We need to read each file, remove the first line, and decode the JSON.
 */
foreach( $files as $file ) {
	if ( ! is_readable( $file ) ) {
		echo "Cannot read file: $file\n";
		continue;
	}

	// Read the file and remove the first line.
	$content = file_get_contents( $file );
	if ( false === $content ) {
		echo "Failed to read file: $file\n";
		continue;
	}

	// Remove the first line (QMv2).
	$lines = explode( "\n", $content );
	array_shift( $lines ); // Remove the first line.
	$json_content = implode( "\n", $lines );

	// Decode the JSON content.
	$data = json_decode( $json_content, true );
	if ( null === $data && json_last_error() !== JSON_ERROR_NONE ) {
		echo "Invalid JSON in file: $file\n";
		continue;
	}

	// Pretty print the JSON data.
	$pretty_json = json_encode( $data, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES );
	if ( false === $pretty_json ) {
		echo "Failed to encode JSON for file: $file\n";
		continue;
	}

	$speakers = array();
	$transcript = array();
	$output_string = '';
	$output_file = '';
	foreach ( $data as $quill_objects => $quill_object ) {
	    // Each Quill object is an array. We want to check if it has a 'type' of 'Meeting'.
		if ( isset( $quill_object['type'] ) && $quill_object['type'] === 'Meeting' ) {
			$output_file = $quill_object['data']['start'] . '-' . $quill_object['data']['end'] . ': ' . $quill_object['data']['title'] . '.txt';
			// The "audio_transcript" is just a JSON string that we need to decode.
			$audio_transcript = json_decode( $quill_object['data']['audio_transcript'], true );
			$encoded_speakers = $quill_object['data']['speakers'] ?? [];
			foreach( $encoded_speakers as $encoded_speaker ) {
				$speakers[ $encoded_speaker['id'] ] = $encoded_speaker['name'] ?? 'Unknown Speaker ' . $encoded_speaker['id'];
			}
			if ( ! isset ( $audio_transcript['startTime'] ) ) {
				echo "Invalid start time in audio transcript for file: $file\n";
				continue;
			}
			$start_time = $audio_transcript['startTime'];
			$end_time   = $audio_transcript['endTime'];
			foreach( $audio_transcript['blocks'] as $block ) {
				$time_block = ms_to_readable( $block['from'] - $start_time );
				if ( isset( $block['speaker_id' ] ) ) {
					$speaker_block = $speakers[ $block['speaker_id'] ];
				} else {
					echo 'Unkown Speaker found. Please manually mark all speakers in Quill before exporting.' . PHP_EOL;
					die( 1 );
				}
				$output_string .= sprintf( "%s %s: %s\n", $time_block, $speaker_block, $block['text'] );
			}
		}
	}

	if ( ! empty( $output_string ) && ! empty( $output_file ) ) {
		// Sanitize the filename.
		$output_file = sanitize_filename( $output_file );
		// Write the output string to the file.
		if ( file_put_contents( $output_file, $output_string ) === false ) {
			echo "Failed to write to file: $output_file\n";
		} else {
			echo "Exported to: $output_file\n";
		}
	} else {
		echo "No valid Meeting data found in file: $file\n";
	}
}

function ms_to_readable(int $ms): string {
	// round to nearest second
	$secs = (int) round($ms / 1000);
	// gmdate formats seconds since 0 into H:i:s โ€” we just need i:s
	return '[' . gmdate('i:s', $secs) . ']';
}

function sanitize_filename(string $filename): string {
	// strip any path information
	$fname = basename($filename);
	// replace any character that is NOT a-z, 0-9, dot, hyphen or underscore with an underscore
	$clean = preg_replace('/[^\w\.-]+/', '_', $fname);
	// collapse multiple underscores
	return preg_replace('/_+/', '_', $clean);
}Code language: PHP (php)

and when I say “I” wrote it, it was probably half AI ๐Ÿ™ƒ

This gives me a nice text file with timestamps:

So, yeah. Whatever.

Other Posts Not Worth Reading

Hey, You!

Like this kind of garbage? Subscribe for more! I post like once a month or so, unless I found something interesting to write about.


Comments

Leave a Reply