🎲 Reachy DM — a whimsical AI tabletop dungeon master you play out loud
Track: Thousand Token Wood (whimsical). Reachy DM is an AI Game Master for a Fallout tabletop RPG (Modiphius 2d20). You talk to it; it narrates vivid scenes, voices the NPCs in distinct designed voices, rolls the dice, reads your physical character sheet through a camera, sets the mood with your room's smart lights, and remembers your party — all embodied by a Reachy Mini robot. Every model in the stack is open-weights and under the 32B cap.
Code: github.com/olaservo/reachy_mini_conversation_app (branch cascade-integration)
The full experience is a local hardware rig (robot · smart lights · companion screen), so the demo video above is the showcase. This page explains how it works.
How it works — an all-Qwen cascade
A voice loop (Whisper STT · Silero VAD) drives a Qwen brain that orchestrates tools and speaks back in designed character voices, reading the table with vision when you show it something.
| Stage | Model | Where |
|---|---|---|
| Speech-to-text | Whisper | OpenAI API |
| Voice activity | Silero VAD | local (CPU) |
| DM brain (logic + tool-calling) | Qwen3-30B-A3B-Instruct-2507 (FP8) | Modal (1×H100) |
| Character voices (11 designed) | Qwen3-TTS-12Hz-1.7B | Modal (L4) |
| Vision ("read the table") | Qwen3-VL-8B-Instruct | Modal (L40S) |
The brain calls tools over MCP: dice / character sheet / player choice render as live
Pip-Boy widgets (an MCP-Apps custom UI), plus per-character voices (speak_as), durable
memory, the camera, robot motion/expression, and Home Assistant smart lighting.
Best Use of Modal
The three GPU models — brain (H100), character-voice TTS (L4), vision (L40S) — are all served on Modal (serverless vLLM + a custom FastAPI TTS server). Modal is the runtime compute backbone of the whole experience.
Badges
- Best Agent — a genuine multi-step agent choosing among dice, sheets, choices, voices, vision, memory, lighting, and robot motion, all under the 32B cap.
- Best Demo — a robot Game Master that voices NPCs, rolls dice on screen, dims your lights for combat, and reads your printed character sheet with a camera.
- Off Brand — tool calls render as a custom Pip-Boy UI well past the default Gradio look.
Models & constraints
All-Qwen, open-weights, each model < 32B total (30B brain · 8B vision · 1.7B TTS), plus Whisper/Silero for speech. Built on a fork of the Pollen Robotics Reachy Mini conversation app.
Disclaimer
Reachy DM is a personal, non-commercial creative project built for the Build Small hackathon. It runs Fallout: The Roleplaying Game (Modiphius 2d20 system) using rules and content from a copy I own, as a fan would at their own table. It is not affiliated with, endorsed by, or sponsored by Bethesda Softworks or Modiphius Entertainment. Fallout is a trademark of Bethesda Softworks; Fallout: The Roleplaying Game is published by Modiphius Entertainment. All rights to those properties belong to their respective owners.