Offline voice command recognition on the Seeed Xiao ESP32-S3 Sense with the Round Display expansion. Say the wake word Jarvis, then speak a command — the recognized phrase appears as text on the 240×240 round LCD.
Built with ESP-IDF 5.5.x, ESP-SR (WakeNet + MultiNet6), and LVGL.
| Item | Notes |
|---|---|
| Seeed Xiao ESP32-S3 Sense | 8 MB flash, 8 MB PSRAM, onboard PDM microphone |
| Seeed Round Display for XIAO | GC9A01 240×240 SPI display |
| USB-C cable | Used for power, flash, and serial monitor |
Before powering on:
- Stack the Sense expansion board on the Xiao (provides the microphone).
- Stack the Round Display on top.
- Set the Round Display power switch to ON (backlight and power path).
- ESP-IDF v5.5.x (tested with 5.5.4)
- VS Code or Cursor with the Espressif ESP-IDF extension
- Windows: ESP-IDF Windows setup guide or use the extension’s guided installer
Pick one of these paths:
Option A — ESP-IDF Tools Installer (Windows, recommended for beginners)
- Follow the Windows ESP-IDF installation guide.
- Install ESP-IDF v5.5.x and the esp32s3 target when prompted.
- Note the install path (e.g.
C:\esp\v5.5.4\esp-idf).
Option B — Configure via VS Code / Cursor extension
- Install the ESP-IDF extension (
espressif.esp-idf). - Open the Command Palette (
Ctrl+Shift+P). - Run
ESP-IDF: Configure ESP-IDF extension. - Choose Express or Advanced install.
- Select ESP-IDF v5.5.x and target esp32s3.
Extension install guide: vscode-esp-idf-extension tutorial
In VS Code or Cursor:
- Open Extensions (
Ctrl+Shift+X). - Search for
Espressif IDF. - Install ESP-IDF by Espressif Systems.
If prompted, also install the recommended extensions from .vscode/extensions.json.
File → Open Folder → esp32-multinet-demo
- Command Palette →
ESP-IDF: Select ESP-IDF Path(orESP-IDF: Configure ESP-IDF extension). - Set
idf.currentSetupto your ESP-IDF directory, for example:C:\esp\v5.5.4\esp-idf
- Confirm the chip target is esp32s3:
- Command Palette →
ESP-IDF: Set Espressif Device Target→ esp32s3
- Command Palette →
Workspace settings in .vscode/settings.json already set IDF_TARGET to esp32s3. Update idf.currentSetup to match your machine.
- Plug in the Xiao via USB.
- In the ESP-IDF status bar, click the port item and choose your board (e.g.
COM4), or leave it ondetect.
Important: Logs use USB Serial/JTAG, not the external UART pins. GPIO 43/44 are used by the round display. This is configured in sdkconfig.defaults.
- Click Build (hammer icon) in the ESP-IDF status bar, or
- Command Palette →
ESP-IDF: Build your project
Or use Terminal → Run Task → ESP-IDF: Build.
.\scripts\build.ps1Or manually:
. C:\esp\v5.5.4\esp-idf\export.ps1 # adjust to your ESP-IDF path
$env:PYTHONIOENCODING = "utf-8" # required on Windows for esp-sr model packaging
cd path\to\esp32-multinet-demo
idf.py set-target esp32s3
idf.py build- Downloads managed components (
esp-sr, LVGL, GC9A01 driver) via the ESP Component Manager. - Packages ~5 MB speech models into
build/srmodels/srmodels.bin— the first build can take several minutes. - Board defaults come from
sdkconfig.defaults. If configuration looks wrong, deletesdkconfigand rebuild.
You do not need idf.py menuconfig for a standard run.
- Select the correct COM port in the status bar.
- Click Flash (lightning icon).
- Click Monitor (plug icon), or use
ESP-IDF: Monitor your device.
Or Terminal → Run Task → ESP-IDF: Build, Flash and Monitor.
.\scripts\flash-monitor.ps1 -Port COM4 # replace COM4 with your portOr:
. C:\esp\v5.5.4\esp-idf\export.ps1
$env:PYTHONIOENCODING = "utf-8"
idf.py -p COM4 flash monitoridf.py flash writes three images: bootloader, application, and the srmodels partition (required for speech recognition).
Press Ctrl+] to exit the serial monitor.
Boot → Display shows: Say "Jarvis"
Say "Jarvis" → Display shows: Jarvis! then Listening...
Say a command → Display shows the recognized phrase
Silence → Returns to: Say "Jarvis"
| Step | Say | Display |
|---|---|---|
| Wake word | Jarvis | Jarvis! → Listening... |
| Command | See list below | Matched phrase |
Built-in commands (defined in main/speech_recognition.c):
| # | Phrase to speak |
|---|---|
| 1 | turn on the light |
| 2 | turn off the light |
| 3 | say hello |
| 4 | what is the weather |
| 5 | what is the time |
Speak clearly, about 30 cm from the microphone. The USB serial log shows wake and command events for debugging.
Edit register_speech_commands() in main/speech_recognition.c, then rebuild and reflash.
| Problem | What to try |
|---|---|
| Blank display | Round Display switch ON; reflash firmware |
| Backwards / mirrored text | Reflash the latest firmware (main/round_display.c sets correct orientation) |
Build fails with UnicodeEncodeError on Windows |
Run .\scripts\build.ps1 or set $env:PYTHONIOENCODING = "utf-8" before idf.py build |
| No serial output | Use the USB port, not external UART pins |
| Wake word not detected | Say Jarvis clearly; check monitor for Wake word detected |
| Flash size errors | Board needs 8 MB flash (set in sdkconfig.defaults) |
| Wrong COM port | Command Palette → ESP-IDF: Select port to use |
| Extension can’t find ESP-IDF | Command Palette → ESP-IDF: Select ESP-IDF Path |
main/
main.c Application entry
board_audio.c PDM microphone (GPIO 41/42)
speech_recognition.c WakeNet + MultiNet pipeline
display_ui.c Round display text UI
round_display.c GC9A01 + LVGL init
idf_component.yml Managed dependencies
sdkconfig.defaults Board and SR model defaults
partitions.csv App + speech model partitions
dependencies.lock Pinned component versions (keep in repo)
scripts/
build.ps1 Windows build helper
flash-monitor.ps1 Windows flash + monitor helper