-
-
Notifications
You must be signed in to change notification settings - Fork 309
openAI TTS sketch #1839
-
Howdy,
I cobbled together a openAI TextToSpeech retrieval sketch / function.
I haven't tested this sketch into detail, as its just put together from a larger project I am working on.
I thought I share it so it might help somebody further down the line.
NOTE: you need to register with openAI to obtain an API key - this will not be a free service from openAI !
//***********************************************************************************************
//*
//* openAI - blocking function to fetch openAI TTS response and play the audio
//* by SaKiE 2024 - Doha Qatar
//*
//***********************************************************************************************
#pragma once
#include <HTTPClient.h>
#include <ArduinoJson.h>
#include "AudioTools.h" // Main AudioTools library
#include "AudioTools/AudioCodecs/CodecMP3Helix.h"
#include "AudioTools/AudioLibs/AudioBoardStream.h"
#define I2S_BCK 26
#define I2S_WS 25
#define I2S_DAT 22
//***************************************************************
//** Defaults openAI
//***************************************************************
const char* model_OpenAI = "tts-1";
const char* voice_OpenAI = "alloy";
const char* apiKey = "*************************";
const char* apiUrl = "https://api.openai.com/v1/audio/speech";
//***************************************************************
//** Defaults wifi
//***************************************************************
// WiFi credentials
const char* ssid = "****************";
const char* password = "****************";
HTTPClient http;
//***************************************************************
//** Objects i2s Pipeline
//***************************************************************
AudioInfo info(44100, 2, 16);
AudioBoardStream i2s(NoBoard);
EncodedAudioStream decoderTTS(&i2s, new MP3DecoderHelix()); // Decoding stream
void setup() {
Serial.begin(115200);
Serial.flush();
delay(2000);
//clear the terminal
for (int i = 0; i < 10; i++) {
Serial.println();
}
Serial.println("\t\tWelcome !");
//*************************************************************
//****** Comms
//*************************************************************
// Connect to WiFi
WiFi.begin(ssid, password);
Serial.print("Connecting to WiFi...");
while (WiFi.status() != WL_CONNECTED) {
delay(500);
Serial.print(".");
}
Serial.println("\nWiFi connected.");
//*************************************************************
//****** i2s
//*************************************************************
//I2S Output Intercafe Config
Serial.println("\t\tSetting up I2S Speaker Environment");
auto cfg = i2s.defaultConfig(TX_MODE);
cfg.sample_rate = 44100;
cfg.bits_per_sample = 16;
cfg.channels = 2;
cfg.buffer_count = 20;
cfg.buffer_size = 512;
// Custom I2S output pins
cfg.pin_bck = I2S_BCK;
cfg.pin_ws = I2S_WS;
cfg.pin_data = I2S_DAT;
i2s.begin(cfg);
auto config = decoderTTS.defaultConfig();
//Requires adjustment depending on selected model
config.sample_rate = 24000;
config.bits_per_sample = 16;
config.channels = 1;
if (!decoderTTS.begin(config)) {
Serial.println("Failed to initialize decoder.");
} else {
if (fetchOpenAI("I am a wonderful openAI Text To Speech voice !")) {
Serial.println("Successful openAI TTS-1 retrieval.");
} else {
Serial.println("Failed to process openAI TTS requst !");
}
}
}
bool fetchOpenAI(const char* requestStr) {
HTTPClient http;
if (WiFi.status() == WL_CONNECTED) {
// Prepare the JSON payload
DynamicJsonDocument jsonDoc(512);
jsonDoc["model"] = model_OpenAI;
jsonDoc["input"] = requestStr;
jsonDoc["voice"] = voice_OpenAI;
String requestBody;
serializeJson(jsonDoc, requestBody);
// Set up the HTTPS request
http.begin(apiUrl);
http.addHeader("Authorization", String("Bearer ") + apiKey);
http.addHeader("Content-Type", "application/json");
Serial.println("Sending POST request...");
int httpResponseCode = http.POST(requestBody);
if (httpResponseCode) {// == 200
Serial.println("Request successful. Streaming audio...");
// Use the HTTP response stream directly
WiFiClient* audioStream = http.getStreamPtr();
// Stream and play audio
StreamCopy copier(decoderTTS, *audioStream);
copier.begin();
unsigned long lastCopyTime = millis(); // Track the time of last copy to detect stalls
unsigned long maxIdleTime = 5000; // Maximum time without data copy before timing out
Serial.println("Start playing TTS audio...");
do {
if (copier.copy()) {
// Reset the last copy time whenever data is successfully copied
lastCopyTime = millis();
}
// Check if the stream has been idle too long (without data copied)
if (millis() - lastCopyTime > maxIdleTime) {
Serial.println("Timeout while copying audio due to inactivity.");
break; // Exit the loop if we timeout
}
} while (copier.available()); // Continue while data is available to copy
Serial.println("End playing TTS audio.");
Serial.println("Audio playback complete.");
} else {
Serial.printf("HTTP Request failed: %d\n", httpResponseCode);
return false;
}
http.end(); // Close connection
return true;
} else {
Serial.println("WiFi not connected.");
return false;
}
}
void loop() {
yield();
}
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 3 comments 16 replies
-
Cool,
I suggest to replace
AudioBoardStream i2s(NoBoard);
with
I2SStream i2s;
The AudioBoardStream is mainly intended for codecs that need to be configured via I2C.
Beta Was this translation helpful? Give feedback.
All reactions
-
Thank you for this -- it is just what I need. Unfortunately, it does something bad to my ESP32. The code builds, and seems to run as far as playing a brief snatch of sound, but then stops playing, claims Audio playback is complete, and crashes the ESP32 with a "corrupt heap" error. Any thoughts as to what is going wrong?
Beta Was this translation helpful? Give feedback.
All reactions
-
please share it or maybe post on pastebin or something
Beta Was this translation helpful? Give feedback.
All reactions
-
Here's the function call to the ElevenLabs TTS API. which takes the text to convert as an argument, and saves the MP3 audio file the API returns to an SD card:
/ -----------------------------------------------------------------------------
// TTSElevenLabsAPI()
// -----------------------------------------------------------------------------
bool TTSElevenLabsAPI(String text) {
String elevenlabs_api_key = "*** REPLACE WITH YOUR ELEVENLABS API KEY ***";
String voiceID = "*** REPLACE WITH YOUR ELEVENLABS VOICEID ***";
String apiUrl = "https://api.elevenlabs.io/v1/text-to-speech/" + voiceID + "?output_format=mp3_44100_128";
http.begin(client, apiUrl);
http.addHeader("Content-Type", "application/json");
http.addHeader("xi-api-key", elevenlabs_api_key); // Use the correct ElevenLabs API key
String payload = "{\"text\":\"" + text + "\", \"model_id\":\"eleven_multilingual_v2\"}";
int httpResponseCode = http.POST(payload);
if (httpResponseCode == 200) {
File outputFile = SD.open(mp3FilePath, FILE_WRITE);
if (!outputFile) {
Serial.println("Failed to open file for writing.");
http.end();
return false;
}
http.writeToStream(&outputFile);
outputFile.close();
http.end();
return true;
} else {
Serial.printf("HTTP POST failed with code %d\n", httpResponseCode);
String respBody = http.getString();
Serial.println("Response body: " + respBody);
http.end();
return false;
}
}
Beta Was this translation helpful? Give feedback.
All reactions
-
❤️ 1 -
🚀 1
-
(For context, the project, with the full code into which i have slotted this function, is at https://github.com/astromikemerri/ESPGPT)
Beta Was this translation helpful? Give feedback.
All reactions
-
❤️ 1
-
(For context, the project, with the full code into which i have slotted this function, is at https://github.com/astromikemerri/ESPGPT)
wow i dint even know you replied, i am def jumping into this today! thanks
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
Ah! I appreciate your code snippet @astromikemerri. For some reason, the OpenAI TTS endpoint wasn't playing nicely with StreamCopy. I kept having choppy/robotic audio output. But then I noticed the http.writeToStream
in your example and that made things clear to me.
AudioInfo info(24000, 1, 16); I2SStream i2s; // final output of decoded stream MP3DecoderHelix codec; // MP3 decoder EncodedAudioStream dec(&i2s, &codec); // Decoding stream setup() { // setup, initialize, yada yada... int httpResponseCode = http.POST(requestBody); if (httpResponseCode == HTTP_CODE_OK) { Serial.println("Request successful. Streaming audio..."); http.writeToStream(&dec); // Have the http client write directly to the EncodedAudioStream } // Error handle, clean up, yada yada http.end(); }
The quality is quite good now. I even made a custom T-stream if I want to write the MP3 buffer to a file before sending it off to the EncodedAudioStream.
Beta Was this translation helpful? Give feedback.
All reactions
-
(should say that one of the options it will return is a PCM stream, so in principle there need be very little decoding before passing to the I2S.)
Beta Was this translation helpful? Give feedback.
All reactions
-
Is it a WAV file or just a PCM Stream ? If it is a PCM Stream, you can just replace the File with an I2SStream:
You can easily double check this with your files: if the content starts with a RIFF then it is a WAV file!
Beta Was this translation helpful? Give feedback.
All reactions
-
PCM is one of the API's output options (as is WAV). I am sorry I must be being incredible stupid and frustrating to those more competent than me, but I have not been able to do this successfully. Could you bear to show me?
Beta Was this translation helpful? Give feedback.