I'm using OpenAI's Realtime API for voice conversations and have written Node.js code based on the documentation.
https://platform.openai.com/docs/guides/realtime-conversations
I'm able to receive the generated audio and text from OpenAI through the response.translation_audio.delta and response.audio_transcript.delta events.
However, I now want to get the transcription of my own input audio, but I'm not sure how to do that.
I tried listening to the conversation.item_input_audio_transcription.delta event, but I'm not receiving it in my code.
Here's the key part of my code — I added the input_audio_transcription parameter, but it doesn't seem to have any effect.
const openaiSession: SessionUpdateEvent.Session = {
voice: 'shimmer',
modalities: ['text', 'audio'],
instructions: this.genInstructions(),
model: config.openai.model,
turn_detection: {
type: 'semantic_vad',
eagerness: 'high',
create_response: true,
interrupt_response: true,
},
input_audio_transcription: {
model: "gpt-4o-transcribe",
language: 'en',
prompt: this.genTranscriptionPrompt()
}
}
this.openaiWS.send({
type: 'session.update',
session: openaiSession,
});
1 Answer 1
It looks like the settings for transcription are expected in the "session.input.trascription" object, so you should pass a "session.update" event:
{
"type": "session.update",
"audio": {
"input": {
"transcription": {
"model": "gpt-4o-transcribe",
"prompt": "Transcribe the audio"
}
}
}
}
The OpenAI documentation is full of errors >.>
In your case it should be:
const openaiSession: SessionUpdateEvent.Session = {
voice: 'shimmer',
modalities: ['text', 'audio'],
instructions: this.genInstructions(),
model: config.openai.model,
turn_detection: {
type: 'semantic_vad',
eagerness: 'high',
create_response: true,
interrupt_response: true,
},
audio: {
input: {
transcription: {
model: "gpt-4o-transcribe",
language: 'en',
prompt: this.genTranscriptionPrompt()
}
}
}
}
this.openaiWS.send({
type: 'session.update',
session: openaiSession,
});
I hope this helps
Comments
Explore related questions
See similar questions with these tags.