openAI TTS sketch · pschatzmann/arduino-audio-tools · Discussion #1839

SaKiEQ
Dec 18, 2024

Howdy,

I cobbled together a openAI TextToSpeech retrieval sketch / function.

I haven't tested this sketch into detail, as its just put together from a larger project I am working on.
I thought I share it so it might help somebody further down the line.

NOTE: you need to register with openAI to obtain an API key - this will not be a free service from openAI !

//***********************************************************************************************
//*
//* openAI - blocking function to fetch openAI TTS response and play the audio
//* by SaKiE 2024 - Doha Qatar
//*
//***********************************************************************************************
#pragma once 
#include <HTTPClient.h>
#include <ArduinoJson.h>
#include "AudioTools.h" // Main AudioTools library
#include "AudioTools/AudioCodecs/CodecMP3Helix.h"
#include "AudioTools/AudioLibs/AudioBoardStream.h"
#define I2S_BCK 26
#define I2S_WS 25
#define I2S_DAT 22
//***************************************************************
//** Defaults openAI
//***************************************************************
const char* model_OpenAI = "tts-1";
const char* voice_OpenAI = "alloy";
const char* apiKey = "*************************";
const char* apiUrl = "https://api.openai.com/v1/audio/speech";
//***************************************************************
//** Defaults wifi
//***************************************************************
// WiFi credentials
const char* ssid = "****************";
const char* password = "****************";
HTTPClient http;
//***************************************************************
//** Objects i2s Pipeline
//***************************************************************
AudioInfo info(44100, 2, 16);
AudioBoardStream i2s(NoBoard);
EncodedAudioStream decoderTTS(&i2s, new MP3DecoderHelix()); // Decoding stream
void setup() {
 Serial.begin(115200);
 Serial.flush();
 delay(2000);
 //clear the terminal
 for (int i = 0; i < 10; i++) {
 Serial.println();
 }
 Serial.println("\t\tWelcome !");
 //*************************************************************
 //****** Comms
 //*************************************************************
 // Connect to WiFi
 WiFi.begin(ssid, password);
 Serial.print("Connecting to WiFi...");
 while (WiFi.status() != WL_CONNECTED) {
 delay(500);
 Serial.print(".");
 }
 Serial.println("\nWiFi connected.");
 //*************************************************************
 //****** i2s
 //*************************************************************
 //I2S Output Intercafe Config
 Serial.println("\t\tSetting up I2S Speaker Environment");
 auto cfg = i2s.defaultConfig(TX_MODE);
 cfg.sample_rate = 44100;
 cfg.bits_per_sample = 16;
 cfg.channels = 2;
 cfg.buffer_count = 20;
 cfg.buffer_size = 512;
 // Custom I2S output pins
 cfg.pin_bck = I2S_BCK;
 cfg.pin_ws = I2S_WS;
 cfg.pin_data = I2S_DAT;
 i2s.begin(cfg);
 auto config = decoderTTS.defaultConfig();
 //Requires adjustment depending on selected model
 config.sample_rate = 24000; 
 config.bits_per_sample = 16;
 config.channels = 1;
 if (!decoderTTS.begin(config)) {
 Serial.println("Failed to initialize decoder.");
 } else {
 if (fetchOpenAI("I am a wonderful openAI Text To Speech voice !")) {
 Serial.println("Successful openAI TTS-1 retrieval.");
 } else {
 Serial.println("Failed to process openAI TTS requst !");
 }
 }
}
bool fetchOpenAI(const char* requestStr) {
 
 HTTPClient http;
 if (WiFi.status() == WL_CONNECTED) {
 // Prepare the JSON payload
 DynamicJsonDocument jsonDoc(512);
 jsonDoc["model"] = model_OpenAI;
 jsonDoc["input"] = requestStr;
 jsonDoc["voice"] = voice_OpenAI;
 String requestBody;
 serializeJson(jsonDoc, requestBody);
 // Set up the HTTPS request
 http.begin(apiUrl);
 http.addHeader("Authorization", String("Bearer ") + apiKey);
 http.addHeader("Content-Type", "application/json");
 Serial.println("Sending POST request...");
 int httpResponseCode = http.POST(requestBody);
 if (httpResponseCode) {// == 200
 Serial.println("Request successful. Streaming audio...");
 // Use the HTTP response stream directly
 WiFiClient* audioStream = http.getStreamPtr();
 // Stream and play audio
 StreamCopy copier(decoderTTS, *audioStream);
 copier.begin();
 unsigned long lastCopyTime = millis(); // Track the time of last copy to detect stalls
 unsigned long maxIdleTime = 5000; // Maximum time without data copy before timing out
 Serial.println("Start playing TTS audio...");
 do {
 if (copier.copy()) {
 // Reset the last copy time whenever data is successfully copied
 lastCopyTime = millis();
 }
 
 // Check if the stream has been idle too long (without data copied)
 if (millis() - lastCopyTime > maxIdleTime) {
 Serial.println("Timeout while copying audio due to inactivity.");
 break; // Exit the loop if we timeout
 }
 } while (copier.available()); // Continue while data is available to copy
 Serial.println("End playing TTS audio.");
 Serial.println("Audio playback complete.");
 } else {
 Serial.printf("HTTP Request failed: %d\n", httpResponseCode);
 return false;
 }
 http.end(); // Close connection
 return true;
 } else {
 Serial.println("WiFi not connected.");
 return false;
 }
}
void loop() {
 yield();
}

Replies: 3 comments 16 replies

pschatzmann
Dec 18, 2024
Maintainer

Cool,

I suggest to replace

AudioBoardStream i2s(NoBoard);

with

I2SStream i2s;

The AudioBoardStream is mainly intended for codecs that need to be configured via I2C.

0 replies

Thank you for this -- it is just what I need. Unfortunately, it does something bad to my ESP32. The code builds, and seems to run as far as playing a brief snatch of sound, but then stops playing, claims Audio playback is complete, and crashes the ESP32 with a "corrupt heap" error. Any thoughts as to what is going wrong?

14 replies

@hammerheaddown

hammerheaddown Feb 21, 2025

@astromikemerri

please share it or maybe post on pastebin or something

@astromikemerri

astromikemerri Feb 21, 2025

Here's the function call to the ElevenLabs TTS API. which takes the text to convert as an argument, and saves the MP3 audio file the API returns to an SD card:

/ -----------------------------------------------------------------------------
// TTSElevenLabsAPI()
// -----------------------------------------------------------------------------
bool TTSElevenLabsAPI(String text) {
 String elevenlabs_api_key = "*** REPLACE WITH YOUR ELEVENLABS API KEY ***";
 String voiceID = "*** REPLACE WITH YOUR ELEVENLABS VOICEID ***"; 
 String apiUrl = "https://api.elevenlabs.io/v1/text-to-speech/" + voiceID + "?output_format=mp3_44100_128";
 http.begin(client, apiUrl);
 http.addHeader("Content-Type", "application/json");
 http.addHeader("xi-api-key", elevenlabs_api_key); // Use the correct ElevenLabs API key
 String payload = "{\"text\":\"" + text + "\", \"model_id\":\"eleven_multilingual_v2\"}";
 int httpResponseCode = http.POST(payload);
 if (httpResponseCode == 200) {
 File outputFile = SD.open(mp3FilePath, FILE_WRITE);
 if (!outputFile) {
 Serial.println("Failed to open file for writing.");
 http.end();
 return false;
 }
 http.writeToStream(&outputFile);
 outputFile.close();
 http.end();
 return true;
 } else {
 Serial.printf("HTTP POST failed with code %d\n", httpResponseCode);
 String respBody = http.getString();
 Serial.println("Response body: " + respBody);
 http.end();
 return false;
 }
}

@astromikemerri

astromikemerri Feb 21, 2025

(For context, the project, with the full code into which i have slotted this function, is at https://github.com/astromikemerri/ESPGPT)

@hammerheaddown

hammerheaddown Feb 24, 2025

(For context, the project, with the full code into which i have slotted this function, is at https://github.com/astromikemerri/ESPGPT)

wow i dint even know you replied, i am def jumping into this today! thanks

@nosliwmichael

nosliwmichael Apr 16, 2025

Ah! I appreciate your code snippet @astromikemerri. For some reason, the OpenAI TTS endpoint wasn't playing nicely with StreamCopy. I kept having choppy/robotic audio output. But then I noticed the http.writeToStream in your example and that made things clear to me.

AudioInfo info(24000, 1, 16);
I2SStream i2s; // final output of decoded stream
MP3DecoderHelix codec; // MP3 decoder
EncodedAudioStream dec(&i2s, &codec); // Decoding stream
setup() {
 // setup, initialize, yada yada...
 int httpResponseCode = http.POST(requestBody);
 if (httpResponseCode == HTTP_CODE_OK) {
 Serial.println("Request successful. Streaming audio...");
 http.writeToStream(&dec); // Have the http client write directly to the EncodedAudioStream
 }
 // Error handle, clean up, yada yada
 http.end();
}

The quality is quite good now. I even made a custom T-stream if I want to write the MP3 buffer to a file before sending it off to the EncodedAudioStream.

astromikemerri
Jan 12, 2025

(should say that one of the options it will return is a PCM stream, so in principle there need be very little decoding before passing to the I2S.)

2 replies

@pschatzmann

pschatzmann Jan 12, 2025
Maintainer

Is it a WAV file or just a PCM Stream ? If it is a PCM Stream, you can just replace the File with an I2SStream:
You can easily double check this with your files: if the content starts with a RIFF then it is a WAV file!

@astromikemerri

astromikemerri Jan 12, 2025

PCM is one of the API's output options (as is WAV). I am sorry I must be being incredible stupid and frustrating to those more competent than me, but I have not been able to do this successfully. Could you bear to show me?

Uh oh!

openAI TTS sketch #1839

Uh oh!

Uh oh!

Replies: 3 comments · 16 replies

Uh oh!

Uh oh!

pschatzmann Dec 18, 2024 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pschatzmann Jan 12, 2025 Maintainer

Uh oh!

Replies: 3 comments 16 replies

pschatzmann
Dec 18, 2024
Maintainer

pschatzmann Jan 12, 2025
Maintainer