Isolated endpoint: POST /api/testing/speech/blend/word (multipart: audio, word).
The server converts M4A / WebM / MP3 / … to 16 kHz mono WAV (FFmpeg), labels the file as audio/wav for the API, then calls SuperSpeech.
For a single word, say it as one smooth blend — not separate beats for each letter. Use the preset dropdown or type any word (2–64 letters; API keeps letters only, e.g. hello → /HELLO/).
No result yet.