A partner in China imagined English-language learning living inside toys — children talking naturally to a companion that reads with them, makes up silly tales, and turns practice into a game. They needed someone to take that vision from a whiteboard sketch to a shippable product that could pass App Store review.
We separated the experience into three cooperating services so each could evolve independently. A Flutter client for the family-facing surface. A FastAPI + Firestore control plane for accounts, conversations, and prompt configuration. A LiveKit Agents worker running streaming speech-to-text, Vertex-backed LLM reasoning, and text-to-speech — sharing the same profile and conversation state as the API so behavior stays consistent across text and voice.
After nine months of design, prototyping, and hardening, Mozi was submitted to Apple's App Store — a working voice companion that families can hold in their hands. The voice layer is hardware-portable: when the partner is ready to put it inside toys, the same agent attaches to a LiveKit client or a thin bridge without a rewrite.



