Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

openAI TTS sketch #1839

SaKiEQ started this conversation in Show and tell
Dec 18, 2024 · 3 comments · 16 replies
Discussion options

Howdy,

I cobbled together a openAI TextToSpeech retrieval sketch / function.

I haven't tested this sketch into detail, as its just put together from a larger project I am working on.
I thought I share it so it might help somebody further down the line.

NOTE: you need to register with openAI to obtain an API key - this will not be a free service from openAI !

//***********************************************************************************************
//*
//* openAI - blocking function to fetch openAI TTS response and play the audio
//* by SaKiE 2024 - Doha Qatar
//*
//***********************************************************************************************
#pragma once 
#include <HTTPClient.h>
#include <ArduinoJson.h>
#include "AudioTools.h" // Main AudioTools library
#include "AudioTools/AudioCodecs/CodecMP3Helix.h"
#include "AudioTools/AudioLibs/AudioBoardStream.h"
#define I2S_BCK 26
#define I2S_WS 25
#define I2S_DAT 22
//***************************************************************
//** Defaults openAI
//***************************************************************
const char* model_OpenAI = "tts-1";
const char* voice_OpenAI = "alloy";
const char* apiKey = "*************************";
const char* apiUrl = "https://api.openai.com/v1/audio/speech";
//***************************************************************
//** Defaults wifi
//***************************************************************
// WiFi credentials
const char* ssid = "****************";
const char* password = "****************";
HTTPClient http;
//***************************************************************
//** Objects i2s Pipeline
//***************************************************************
AudioInfo info(44100, 2, 16);
AudioBoardStream i2s(NoBoard);
EncodedAudioStream decoderTTS(&i2s, new MP3DecoderHelix()); // Decoding stream
void setup() {
 Serial.begin(115200);
 Serial.flush();
 delay(2000);
 //clear the terminal
 for (int i = 0; i < 10; i++) {
 Serial.println();
 }
 Serial.println("\t\tWelcome !");
 //*************************************************************
 //****** Comms
 //*************************************************************
 // Connect to WiFi
 WiFi.begin(ssid, password);
 Serial.print("Connecting to WiFi...");
 while (WiFi.status() != WL_CONNECTED) {
 delay(500);
 Serial.print(".");
 }
 Serial.println("\nWiFi connected.");
 //*************************************************************
 //****** i2s
 //*************************************************************
 //I2S Output Intercafe Config
 Serial.println("\t\tSetting up I2S Speaker Environment");
 auto cfg = i2s.defaultConfig(TX_MODE);
 cfg.sample_rate = 44100;
 cfg.bits_per_sample = 16;
 cfg.channels = 2;
 cfg.buffer_count = 20;
 cfg.buffer_size = 512;
 // Custom I2S output pins
 cfg.pin_bck = I2S_BCK;
 cfg.pin_ws = I2S_WS;
 cfg.pin_data = I2S_DAT;
 i2s.begin(cfg);
 auto config = decoderTTS.defaultConfig();
 //Requires adjustment depending on selected model
 config.sample_rate = 24000; 
 config.bits_per_sample = 16;
 config.channels = 1;
 if (!decoderTTS.begin(config)) {
 Serial.println("Failed to initialize decoder.");
 } else {
 if (fetchOpenAI("I am a wonderful openAI Text To Speech voice !")) {
 Serial.println("Successful openAI TTS-1 retrieval.");
 } else {
 Serial.println("Failed to process openAI TTS requst !");
 }
 }
}
bool fetchOpenAI(const char* requestStr) {
 
 HTTPClient http;
 if (WiFi.status() == WL_CONNECTED) {
 // Prepare the JSON payload
 DynamicJsonDocument jsonDoc(512);
 jsonDoc["model"] = model_OpenAI;
 jsonDoc["input"] = requestStr;
 jsonDoc["voice"] = voice_OpenAI;
 String requestBody;
 serializeJson(jsonDoc, requestBody);
 // Set up the HTTPS request
 http.begin(apiUrl);
 http.addHeader("Authorization", String("Bearer ") + apiKey);
 http.addHeader("Content-Type", "application/json");
 Serial.println("Sending POST request...");
 int httpResponseCode = http.POST(requestBody);
 if (httpResponseCode) {// == 200
 Serial.println("Request successful. Streaming audio...");
 // Use the HTTP response stream directly
 WiFiClient* audioStream = http.getStreamPtr();
 // Stream and play audio
 StreamCopy copier(decoderTTS, *audioStream);
 copier.begin();
 unsigned long lastCopyTime = millis(); // Track the time of last copy to detect stalls
 unsigned long maxIdleTime = 5000; // Maximum time without data copy before timing out
 Serial.println("Start playing TTS audio...");
 do {
 if (copier.copy()) {
 // Reset the last copy time whenever data is successfully copied
 lastCopyTime = millis();
 }
 
 // Check if the stream has been idle too long (without data copied)
 if (millis() - lastCopyTime > maxIdleTime) {
 Serial.println("Timeout while copying audio due to inactivity.");
 break; // Exit the loop if we timeout
 }
 } while (copier.available()); // Continue while data is available to copy
 Serial.println("End playing TTS audio.");
 Serial.println("Audio playback complete.");
 } else {
 Serial.printf("HTTP Request failed: %d\n", httpResponseCode);
 return false;
 }
 http.end(); // Close connection
 return true;
 } else {
 Serial.println("WiFi not connected.");
 return false;
 }
}
void loop() {
 yield();
}
You must be logged in to vote

Replies: 3 comments 16 replies

Comment options

Cool,

I suggest to replace

AudioBoardStream i2s(NoBoard);

with

I2SStream i2s;

The AudioBoardStream is mainly intended for codecs that need to be configured via I2C.

You must be logged in to vote
0 replies
Comment options

Thank you for this -- it is just what I need. Unfortunately, it does something bad to my ESP32. The code builds, and seems to run as far as playing a brief snatch of sound, but then stops playing, claims Audio playback is complete, and crashes the ESP32 with a "corrupt heap" error. Any thoughts as to what is going wrong?

You must be logged in to vote
14 replies
Comment options

@astromikemerri

please share it or maybe post on pastebin or something

Comment options

Here's the function call to the ElevenLabs TTS API. which takes the text to convert as an argument, and saves the MP3 audio file the API returns to an SD card:

/ -----------------------------------------------------------------------------
// TTSElevenLabsAPI()
// -----------------------------------------------------------------------------
bool TTSElevenLabsAPI(String text) {
 String elevenlabs_api_key = "*** REPLACE WITH YOUR ELEVENLABS API KEY ***";
 String voiceID = "*** REPLACE WITH YOUR ELEVENLABS VOICEID ***"; 
 String apiUrl = "https://api.elevenlabs.io/v1/text-to-speech/" + voiceID + "?output_format=mp3_44100_128";
 http.begin(client, apiUrl);
 http.addHeader("Content-Type", "application/json");
 http.addHeader("xi-api-key", elevenlabs_api_key); // Use the correct ElevenLabs API key
 String payload = "{\"text\":\"" + text + "\", \"model_id\":\"eleven_multilingual_v2\"}";
 int httpResponseCode = http.POST(payload);
 if (httpResponseCode == 200) {
 File outputFile = SD.open(mp3FilePath, FILE_WRITE);
 if (!outputFile) {
 Serial.println("Failed to open file for writing.");
 http.end();
 return false;
 }
 http.writeToStream(&outputFile);
 outputFile.close();
 http.end();
 return true;
 } else {
 Serial.printf("HTTP POST failed with code %d\n", httpResponseCode);
 String respBody = http.getString();
 Serial.println("Response body: " + respBody);
 http.end();
 return false;
 }
}
Comment options

(For context, the project, with the full code into which i have slotted this function, is at https://github.com/astromikemerri/ESPGPT)

Comment options

(For context, the project, with the full code into which i have slotted this function, is at https://github.com/astromikemerri/ESPGPT)

wow i dint even know you replied, i am def jumping into this today! thanks

Comment options

Ah! I appreciate your code snippet @astromikemerri. For some reason, the OpenAI TTS endpoint wasn't playing nicely with StreamCopy. I kept having choppy/robotic audio output. But then I noticed the http.writeToStream in your example and that made things clear to me.

AudioInfo info(24000, 1, 16);
I2SStream i2s; // final output of decoded stream
MP3DecoderHelix codec; // MP3 decoder
EncodedAudioStream dec(&i2s, &codec); // Decoding stream
setup() {
 // setup, initialize, yada yada...
 int httpResponseCode = http.POST(requestBody);
 if (httpResponseCode == HTTP_CODE_OK) {
 Serial.println("Request successful. Streaming audio...");
 http.writeToStream(&dec); // Have the http client write directly to the EncodedAudioStream
 }
 // Error handle, clean up, yada yada
 http.end();
}

The quality is quite good now. I even made a custom T-stream if I want to write the MP3 buffer to a file before sending it off to the EncodedAudioStream.

Comment options

(should say that one of the options it will return is a PCM stream, so in principle there need be very little decoding before passing to the I2S.)

You must be logged in to vote
2 replies
Comment options

Is it a WAV file or just a PCM Stream ? If it is a PCM Stream, you can just replace the File with an I2SStream:
You can easily double check this with your files: if the content starts with a RIFF then it is a WAV file!

Comment options

PCM is one of the API's output options (as is WAV). I am sorry I must be being incredible stupid and frustrating to those more competent than me, but I have not been able to do this successfully. Could you bear to show me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

AltStyle によって変換されたページ (->オリジナル) /