Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

yeonsh/speak-easy

Repository files navigation

SpeakEasy

A desktop app for practicing foreign languages with AI. Speak, listen, and get corrections — with offline-first design and optional cloud TTS for higher quality voices. Includes a built-in web server for remote access via Tailscale or local network.

Supports 16 languages — English, Spanish, French, Chinese, Japanese, German, Korean, Portuguese (BR), Italian, Russian, Arabic, Hindi, Turkish, Indonesian, Vietnamese, and Polish — with two practice modes (Free Talk and Scenario Role-Play) and an optional Corrections toggle. The entire interface is localized in all 16 languages.

Features

  • Free Talk — open conversation practice in the target language
  • Scenario Mode — 20+ real-world situations per language (cafe, hotel, dentist, etc.) with scenario picker
  • Native Language — choose any of the 16 supported languages as your native language; all UI, corrections, translations, and scenario descriptions adapt accordingly
  • Corrections Toggle — enable in either mode to get grammar/meaning feedback in your native language
  • Replay — re-listen to any message (yours or the assistant's) via TTS
  • Translate — one-tap translation of assistant messages into your native language, pre-fetched during TTS playback for instant display
  • Word Lookup — click any target-language word for instant dictionary lookup; select multiple words for contextual explanation with grammar and examples
  • Personal Dictionary — save looked-up words to your dictionary; browse by language, replay pronunciation, and delete entries
  • Sample Responses — get 2 suggested replies with native language translations
  • AI Tutor — speak or type in your native language to get translations into the target language (auto-detected)
  • CEFR Difficulty — set your proficiency level (A1–C2) per language; AI adapts vocabulary and grammar complexity accordingly
  • Speaking Courage — gamified scoring that tracks word count, turn count, complexity, and response speed across sessions
  • External LLM — use Gemini API or any OpenAI-compatible endpoint as an alternative to local LLM
  • Dual TTS Engine — Edge TTS (online, high quality) or Kokoro (offline, fully private); switchable in settings
  • Web Interface — access from any device on your network via built-in Axum web server (port 3456); ideal for remote practice over Tailscale
  • Streaming TTS — sentence-by-sentence audio with natural pauses between sentences
  • Voice Preview — hear a sample phrase when selecting a voice in settings
  • Language Reset — switching practice language resets conversation and returns to the initial screen
  • UI Localization — interface language follows your native language setting (all 16 languages)
  • Japanese support — MeCab-based kanji-to-kana conversion for accurate TTS pronunciation
  • CJK support — CJK-aware punctuation handling and word counting

Architecture

Built with Tauri 2 (Rust backend + React frontend) and three embedded AI engines:

Engine Purpose Technology
STT Speech-to-text whisper.cpp via whisper-rs (bilingual detection)
LLM Conversation llama.cpp (llama-server sidecar) or Gemini API
TTS Text-to-speech Edge TTS (online) or Kokoro (offline)
Web Remote access Axum HTTP/WebSocket server with shared state
Dictionary Word lookup cache + personal vocabulary SQLite via rusqlite

Prerequisites

  • Rust (1.70+): https://rustup.rs
  • Node.js (18+): https://nodejs.org
  • espeak-ng — required for Kokoro TTS phonemization (the setup wizard can install it automatically):
    • macOS: brew install espeak-ng
    • Windows: downloaded automatically from the official release
    • Linux: sudo apt install espeak-ng or equivalent
  • MeCab (optional) — improves Japanese TTS pronunciation by converting kanji to kana:
    • macOS: brew install mecab mecab-ipadic
    • Linux: sudo apt install mecab libmecab-dev mecab-ipadic-utf8
  • Tauri 2 system dependencies:

Getting Started

# Clone the repo
git clone https://github.com/yeonsh/speak-easy.git
cd speak-easy
# Install dependencies
npm install
# Run in development mode (desktop + web server)
npm run serve

This builds the frontend and starts the Tauri app with an embedded web server on port 3456.

Remote Access via Tailscale

  1. Install Tailscale on both machines
  2. Run npm run serve on your home machine
  3. Access http://<tailscale-ip>:3456 from any device on your tailnet

The web interface shares all state with the desktop app — models load once and are available to both interfaces. The web server port is configurable via SPEAKEASY_WEB_PORT environment variable.

On first launch, the setup wizard will guide you through downloading all required models:

  1. Whisper model (~150 MB) — for speech recognition
  2. llama-server binary (~45 MB) — the LLM inference engine
  3. GGUF language model — pick one:
    • Qwen3 4B (~2.5 GB) — fast, good for casual practice
    • Qwen3 30B-A3B (~17 GB) — higher quality conversations
  4. espeak-ng — phonemizer for TTS (auto-install via Homebrew on macOS or MSI on Windows)
  5. Kokoro TTS — two files covering all languages:
    • Kokoro model (~325 MB) — the neural TTS engine
    • Voice pack (~28 MB) — 50+ voices across all supported languages

Everything downloads with one click. All files are stored in ~/.speakeasy/.

Building for Production

npm run tauri build

The output is in src-tauri/target/release/bundle/.

Project Structure

src/ # React frontend
 components/ # UI: ChatView, MicButton, SetupWizard, etc.
 hooks/ # useLlm, useStt, useTts, useAudioRecorder
 lib/ # Types, per-language prompts, i18n, backend adapter
src-tauri/src/ # Rust backend
 lib.rs # Tauri command registration
 llm.rs # llama-server lifecycle management
 chat.rs # Streaming chat + TTS pipeline, explain/suggest/lookup commands
 gemini.rs # Gemini API integration (streaming + non-streaming)
 dictionary.rs # SQLite dictionary cache + personal vocabulary store
 courage.rs # Speaking courage scoring algorithm and trend analysis
 session.rs # Session persistence and review generation
 stt.rs # Whisper transcription with bilingual detection
 tts.rs # TTS engine dispatch (Kokoro/Edge), text cleaning, sentence splitting
 edge_tts.rs # Edge TTS via msedge-tts (online)
 downloads.rs # Model download with progress events
 settings.rs # Settings persistence
 web.rs # Axum web server (REST API, WebSocket, static files)
 event_bus.rs # Broadcast channel for Tauri-to-WebSocket event bridging

Data Directories

Path Contents
~/.speakeasy/models/ Whisper models (.bin) and LLM models (.gguf)
~/.speakeasy/voices/ Kokoro TTS model (kokoro-v1.0.onnx) and voice pack (voices-v1.0.bin)
~/.speakeasy/bin/ Downloaded llama-server binary
~/.speakeasy/settings.json User preferences (persisted across sessions)
~/.speakeasy/dictionary.db SQLite cache for word lookups, personal vocabulary, sessions, and courage scores

License

See LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages

AltStyle によって変換されたページ (->オリジナル) /