Play Dialog: A contextual turn-taking TTS model like NotebookLM Playground

49 points by dulldata 21 hours ago

PlayAI (fma PlayHT) founder here, this is a native multiturn voice model that is built for conversations like real-time agents or podcasts. Try it through our playground (https://play.ai/playground) or API (https://docs.play.ai/). Feel free to ask anything.

owenpalmer an hour ago

This is impressively low latency. Also, it's cool to see another option for TTS with real-time streaming.

guytv 20 hours ago

Ouch. If you know Arabic or Hebrew, try selecting those languages and typing something in—it’s hilarious.

Looks like they’re “testing in production.”

mahmoudfelfel 20 hours ago

The current deployed model is English only, we are rolling out a multilingual version later this week!

byearthithatius 16 hours ago

Love the idea but this is not good yet. Mine had random changes in pace/cadence of speech and was basically uncanny valley territory

Stanleyc23 17 hours ago

overall impressive. noticed a weird quirk of reading $100 million as "one hundred dollar million" instead of one hundred million dollars

yawnxyz 17 hours ago

did you listen to the output of your own demo?

> Speaker 1: Dang man, I’d come find you for sure.

that part sounds like a broken robot

yavorgiv 7 hours ago

Hey, can you give an example ? The model is not perfect and this is our first version, so will get better and faster for sure. Still I generated the full prompt you referenced and it sounds good to me. Adds some laughter, but this makes it more non-robotic in my mind.
https://drive.google.com/file/d/1JzfweTdvCWzJ6Wwv0KdgfaxZcyn...
Speaker 1: Oh yes, the deep sea, nature’s basement. Home to creatures so bizarre, even nightmares are like, “Nah, I’ll pass.” Speaker 2: Right? It's like the ocean was running a clearance sale on leftover parts. “Hey, who wants a fish with a lightbulb head? No one? Alright, let’s just drop this bad boy in the Mariana Trench.” Speaker 1: Oh man, let’s start with a classic: the anglerfish. It’s a fish that decided it was uh, tired of chasing its food and thought, “What if I just dangle a glow stick on my head and let dinner come to me?” Speaker 2: Honestly, I respect that. Can you imagine if we had that? Like, I’m sitting on my couch with a glowing Dorito on my forehead, waiting for snacks to find me. Speaker 1: Dang man, I’d come find you for sure.

dulldata 21 hours ago

if you are developer, then there's an api - https://docs.play.ai/tts-api-reference/endpoints/v1/tts/stre...

treesciencebot 21 hours ago

i don't think anyone has done real-time multi-speaker dialog generation before

Asjad 21 hours ago

[flagged]

byearthithatius 16 hours ago

Fake account / bot meant to promote the company. Look at comment history.
- aspenmayer 14 hours ago
  
  Please email hn@ycombinator.com to report these kinds of things.