Google Gemini Beats ChatGPT in Audio Transcription with Speaker Labels
The Transcription Problem
The iPhone Notes app can record audio and generate a basic transcription, but the output lacks speaker labels, blending all dialogue into a single block of text. This makes it difficult to distinguish between the interviewer's questions and the subject's answers, forcing the user to manually relisten and annotate the recording.
Using Google Gemini 3 Pro
To solve the issue, the user exported the recording from Notes as an M4A file and transferred it to a MacBook Pro via AirDrop. In Google Gemini 3 Pro, the user attached the audio file and prompted the model to “listen to this, transcribe it and be sure to identify the different speakers.” Gemini quickly produced a complete transcript, labeling each speaker as “Interviewer” and providing the subject’s name and title. Apart from a minor naming error that the user later corrected, the transcription was accurate and included clear speaker distinctions.
Attempt with ChatGPT 5.1
The same user then tried to replicate the process with ChatGPT 5.1, using a Plus account. After attaching the same M4A file and issuing an identical prompt, ChatGPT responded that it could not access or play the file directly. The model suggested various work‑arounds, such as converting the file to a zip archive, but none allowed it to process the audio. The interaction turned into a back‑and‑forth exchange without any successful transcription.
Implications
This side‑by‑side comparison demonstrates that Google Gemini 3 Pro can handle raw audio inputs and perform speaker identification out of the box, whereas ChatGPT 5.1 currently lacks the ability to ingest audio files directly. For users needing reliable transcription with speaker labels, Gemini offers a ready‑to‑use solution, while ChatGPT’s limitations may require additional steps or external tools.
Usado: News Factory APP - descubrimiento de noticias y automatización - ChatGPT para Empresas