-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Releases: THU-MAIC/OpenMAIC
Releases · THU-MAIC/OpenMAIC
v0.2.2
Features
- MAIC Editor (v0) — slide editing surface — A new Pro Mode toggle turns any generated slide into an editable canvas: select and edit text, insert text boxes and images, navigate and reorder slides from a thumbnail rail, with history-aware undo/redo. This is the first surface of the broader MAIC Editor framework (gated behind
NEXT_PUBLIC_MAIC_EDITOR_ENABLED) #615 - Editable outline before generation — The streaming course outline now morphs into an inline editor: review, edit, reorder, and add or delete scenes and bullet points, then confirm to generate the full course — so you catch structure problems before spending a full generation #558
- Offline-ready classroom export — Exported teaching resource packs and classroom ZIPs now inline external assets so interactive pages open fully offline, even when copied to another machine #613
- Add Claude Opus 4.8 and MiniMax M3 to the default model registry #635
- Add Gemini 3.5 Flash #584
- Add Xiaomi MiMo Token Plan support #578 (by @xuruiray)
- Add web search providers: Brave and Baidu #42 (by @YizukiAme), Bocha #524, and MiniMax #634
- Add Azure STT (Fast Transcription) as a speech-to-text provider #175 (by @ismailariyan)
- Add HappyHorse video adapter #509 (by @xuruiray) and Lemonade as an LLM provider #508
- Add OpenAI image generation environment-variable fallback #510 (by @xuruiray)
- Add generated-video manifest references so produced videos survive export/import #540
- Add Traditional Chinese (zh-TW) #517 (by @alvinets) and Brazilian Portuguese (pt-BR) #602 (by @hemanz) interface languages
Bug Fixes
- Server-configured providers are now admin-managed — providers set via server environment can no longer be overridden by client settings, preventing base-URL/key tampering on shared deployments #624; fixes server API-key fallback when the client echoes the provider base URL #533 (by @LooThao); auto-selects the server LLM model #577 (by @xuruiray); and enforces a "usable provider ⇒ concrete model" invariant #581
- Keep interactive scenes alive across remounts with an iframe keep-alive pool, so interactive content no longer reloads when navigating #629
- Restore the orchestration director's ability to answer the user's question and stop runaway turns (removed
maxTurns) #599; restore agent attribution in the director summary #554 (by @ashutoshrana) - Skip shapes with malformed SVG paths instead of aborting the whole PPTX export #505; prevent memory leaks and silent export failures #552 (by @arnow117)
- Add defensive checks in ChartElement to prevent crashes on malformed chart data #588 (by @tongshu2023)
- Let whiteboard code elements capture internal scroll/drag instead of the canvas #544 (by @cosarah)
- Preserve discussion triggers when importing classroom ZIPs #557 (by @cosarah)
- Fix generated video thumbnails #546
- Gate media snippets in the interactive-outlines prompt template #628
- Hide the unsupported MiniMax Hailuo fast text-to-video model #632; remove weak Lemonade recommended models #567 (by @cosarah)
- Fix Haiku 4.5 thinking controls #501
- Use an ESM import for TypeScript in the pptxgenjs rollup config #616
- Align zh-TW provider names with the rest of the locale set
Other Changes
- Add a Fumadocs-based documentation site #622
- Add a VoxCPM2 setup guide and tighten the README section #500 #502
- Fix the commercial licensing contact email #604 (by @DHQ1204)
Full Changelog: v0.2.1...v0.2.2
Contributors
- @arnow117
- @hemanz
- @xuruiray
- @ashutoshrana
- @alvinets
- @ismailariyan
- @YizukiAme
- @cosarah
- @tongshu2023
- @DHQ1204
- @LooThao
arnow117, hemanz, and 9 other contributors
Assets 2
v0.2.1
Features
- VoxCPM2 TTS provider with voice cloning — OpenMAIC adapts to user-managed VoxCPM backends (vLLM-Omni, Nano-VLLM, official Python API). Clone any voice from a reference audio clip you upload or record in the browser, or let Auto Voice generate a fitting voice from each agent's persona at synthesis time. Voice profiles are stored locally to keep the serverless setup model. The Agent Bar exposes a searchable, previewable voice picker that draws from the global VoxCPM voice pool #496
- Per-model thinking configuration — First-class metadata for each model's reasoning capability (effort levels, on/off toggle, adjustable budget, or fixed thinking) flows through chat and all generation paths and is mapped to the right provider-specific request fields (Anthropic
thinking, OpenAIreasoning, etc.). The model selector becomes a unified provider/model/thinking popover with compact search and a much smaller toolbar footprint #494 - End-of-course completion page with persistent quiz state — When the outline is fully materialized, students see a course-complete view with quiz score card, scene-type stat cards, and a (motion-respecting) confetti celebration. Quiz answers persist on submit and grading results persist on completion, so navigating away and back restores the reviewing state with AI feedback intact instead of resetting #484
- Add latest released models including GPT-5.5, DeepSeek-V4 (
-pro,-flash), Xiaomi MiMo (mimo-v2.5-pro,mimo-v2.5), Tencent Hy3, and OpenRouter as a multi-provider gateway #481 #487 - Add OpenAI image generation (GPT-Image-2) as a media provider #481
- Refresh built-in model registries across Anthropic, DeepSeek, Kimi, Qwen, MiniMax, Grok, OpenAI, GLM, SiliconFlow, and Ollama; persisted local settings now rehydrate in registry order so newly curated lists appear consistent without clearing state #481
- Add inline search for recent classrooms on the home page with deferred filtering by name and description, keyboard-driven open/clear/collapse #476
- Add Deep-Interactive badge on classroom thumbnails for sessions generated with Interactive Mode #478
- Replace always-included media instruction blocks in generation prompts with conditional snippet includes gated on
imageEnabled/videoEnabled— disabled capabilities are removed from the prompt entirely instead of relying on negative-override directives the model often ignored #490 (by @YizukiAme)
Bug Fixes
- Fix language drift between outline and scene generation by unifying the languageDirective across the pipeline so the same target language flows from outline planning through every per-scene call #474
Other Changes
- Refactor whiteboard role prompts to file-based markdown templates and add a geometry-conflict detector (overlap, line-through-bbox, canvas clipping) that surfaces problems back to the model. Eval (flash, repeat 3, gemini-3.1-pro scorer) shows overall quality 5.4 → 6.1 and overlap 6.3 → 8.1 from prompt + detector alone #485
- Migrate orchestration prompt builders (
buildStructuredPrompt,buildDirectorPrompt,buildPBLSystemPrompt) from inline TS template literals to file-based markdown templates underlib/prompts/, sharing the loader infrastructure with the generation pipeline.prompt-builder.ts890 → 314 lines; future content tweaks land as markdown edits #459
Full Changelog: v0.2.0...v0.2.1
Assets 2
6 people reacted
v0.2.0
[0.2.0] - 2026年04月20日
Features
- Deep Interactive Mode — Generate hands-on interactive scenes (3D visualization, simulation, game, mind map/diagram, online programming) with an AI teacher who operates the UI to guide students. Fully responsive across desktop, tablet, and mobile #461
- Add code element support on the whiteboard — AI agents can write, display, and reference runnable code during lessons #385 (by @cosarah)
- Add Arabic (ar-SA) interface language #431 (by @YizukiAme)
- Add MinerU Cloud API as a PDF parsing provider, with a dedicated settings UI #438
- Add latest OpenAI models to the default config #416 (by @donghch)
- Add GLM-5.1 and GLM-5V-Turbo to GLM preset models #437
- Add international base URL shortcuts for GLM, Kimi, and MiniMax in provider settings #449
- Add anti-framing security headers (X-Frame-Options + CSP
frame-ancestors) with an optionalALLOWED_FRAME_ANCESTORSoverride #430 (by @YizukiAme) - Add i18n key alignment check to CI so missing or extra translation keys fail the build #447 (by @KanameMadoka520)
- Add whiteboard layout quality eval harness and unify it with the outline-language harness #425 #453
Bug Fixes
- Fix classroom ZIP export to use the latest classroom name from IndexedDB #435
- Fix spotlight cutout for text elements and add element-content variant for image/video #457
Other Changes
- Renew the README with Deep Interactive Mode showcase and visual assets #463 (by @Shirokumaaaa)
- Update Discord invite links across README, CONTRIBUTING, and issue templates
Contributors
donghch, KanameMadoka520, and 3 other contributors
Assets 2
6 people reacted
v0.1.1
[0.1.1] - 2026年04月14日
Features
- Add inline language inference for outline and PBL generation, replacing manual language selector #412 (by @cosarah)
- Add ACCESS_CODE site-level authentication for shared deployments #411
- Add classroom export and import as ZIP #418
- Add custom OpenAI-compatible TTS/ASR provider support #409
- Add Ollama as built-in provider with keyless activation #94 (by @f1rep0wr)
- Add Japanese (ja-JP) locale #365 (by @YizukiAme)
- Add Russian (ru-RU) locale #261 (by @maximvalerevich)
- Migrate i18n infrastructure to i18next framework #331 (by @cosarah)
- Add MiniMax provider support #182 (by @Hi-Jiajun)
- Add Doubao TTS 2.0 (Volcengine) provider #283
- Add configurable model selection for TTS and ASR #108 (by @ShaojieLiu)
- Add context-aware Tavily web search when PDF is uploaded #258 (by @nkmohit)
- Add course rename #58 (by @YizukiAme)
- Add end-to-end generation happy path test #405
Bug Fixes
- Fix DNS rebinding bypass in SSRF validation #386 (by @YizukiAme)
- Add ALLOW_LOCAL_NETWORKS env var for self-hosted deployments #366
- Fix custom provider baseUrl not persisting on creation #417 (by @YizukiAme)
- Hide Ollama from model selector when not configured #420 (by @cosarah)
- Fix agent configs not persisting in server-generated classrooms #336 (by @YizukiAme)
- Fix action filtering logic and add safety improvements #163 (by @zky001)
- Fix modifier-key combos triggering single-key shortcuts #359 (by @YizukiAme)
- Fix agent mode selection for conditionally set generatedAgentConfigs #373 (by @YizukiAme)
- Unify TTS model selection to per-provider and fix ElevenLabs model_id #326
- Allow model-level test connection without client-side API key #309 (by @cosarah)
- Add structured request context to all API error logs #337 (by @YizukiAme)
- Fix breathing bar background color in roundtable #307
Other Changes
- Add missing Ollama and Doubao provider names for ru-RU #389 (by @cosarah)
- Update Ollama logo to official version #400 (by @cosarah)
- Remove deprecated Gemini 3 Pro Preview model #142 (by @Orinameh)
- Update expired Discord invite link
- Create SECURITY.md #281 (by @fai1424)
New Contributors
@f1rep0wr, @maximvalerevich, @Hi-Jiajun, @cosarah, @zky001, @Orinameh, @fai1424
Contributors
- @Orinameh
- @zky001
- @ShaojieLiu
- @f1rep0wr
- @YizukiAme
- @Hi-Jiajun
- @cosarah
- @nkmohit
- @fai1424
- @maximvalerevich
Orinameh, zky001, and 8 other contributors
Assets 2
4 people reacted
v0.1.0
The first tagged release of OpenMAIC, including all improvements since the initial open-source launch.
Highlights
- Discussion TTS — Voice playback during discussion phase with per-agent voice assignment, supporting all TTS providers including browser-native #211
- Immersive Mode — Full-screen view with speech bubbles, auto-hide controls, and keyboard navigation #195 (by @YizukiAme)
- Discussion buffer-level pause — Freeze text reveal without aborting the AI stream #129 (by @YizukiAme)
- Keyboard shortcuts — Comprehensive roundtable controls: T/V/Esc/Space/M/S/C #256 (by @YizukiAme)
- Whiteboard enhancements — Pan, zoom, auto-fit #31, history and auto-save #40 (by @YizukiAme)
- New providers — ElevenLabs TTS #134 (by @nkmohit), Grok/xAI for LLM, image, and video #113 (by @KanameMadoka520)
- Server-side generation — Media and TTS generation on the server #75 (by @cosarah)
- 1.25x playback speed #131 (by @YizukiAme)
- OpenClaw integration — Generate classrooms from Feishu, Slack, Telegram, and 20+ messaging apps #4 (by @cosarah)
- Vercel one-click deploy #2 (by @cosarah)
Security
- Fix SSRF and credential forwarding via client-supplied baseUrl #30 (by @Wing900)
- Use resolved API key in chat route instead of client-sent key #221
Testing
New Contributors
@YizukiAme, @nkmohit, @KanameMadoka520, @Wing900, @Bortlesboat, @JokerQianwei, @humingfeng, @tsinglua, @mehulmpt, @ShaojieLiu, @Rowtion
Contributors
- @mehulmpt
- @ShaojieLiu
- @humingfeng
- @Rowtion
- @JokerQianwei
- @KanameMadoka520
- @YizukiAme
- @cosarah
- @nkmohit
- @Wing900
- @Bortlesboat
- @tsinglua
mehulmpt, ShaojieLiu, and 10 other contributors
Assets 2
9 people reacted