00:00:00The thing you just heard there is me talking to Nvidia's new voice assistant model called
00:00:23Personaplex.
00:00:24It's a crazy new open-source conversational AI model that talks to you with almost zero
00:00:31lag.
00:00:32In this video, we're gonna take a look at Personaplex in greater detail and then I'll
00:00:36show you how to set it up and try it for yourself.
00:00:38It's gonna be a lot of fun, so let's dive into it!
00:00:46Personaplex is what they call a full duplex model.
00:00:49This means it listens and speaks at the same time.
00:00:52In traditional systems, your voice is turned into text, then processed by an LLM, and then
00:00:58turned back into speech.
00:00:59But this cascade creates a noticeable lag.
00:01:02What Personaplex does differently is that it uses a single end-to-end model that updates
00:01:08its internal state as you speak.
00:01:11This allows for things like back-channeling, you know, the little "uh-huh", right, ok, you
00:01:17know the things that we usually say in order to signal to the other person that we're listening
00:01:21to them.
00:01:22In traditional AI voice assistants, we usually have a user turn and an agent turn.
00:01:28Back-channeling kind of breaks that wall.
00:01:30It allows the AI to perform active listening.
00:01:34Personaplex doesn't just wait for its turn to speak.
00:01:37It mirrors your flow in real time, so the conversation never feels kind of one-sided.
00:01:42And Personaplex is built on the MOSHI architecture, originally developed by Kiyutai.
00:01:48And it's a 7 billion parameter model, using the MIMI neural audio codec.
00:01:53And to train this thing, NVIDIA used a blend of two data sources.
00:01:57First, they used about 1200 hours of real human conversations from the Fisher English Corpus.
00:02:03That way they could teach the AI the messy non-verbal parts of talking, you know, the
00:02:08pauses and the natural rhythms.
00:02:11And secondly, they blended in over 2000 hours of synthetic data for specific roles like customer
00:02:16service and technical support.
00:02:19By combining these, the model learned how to follow complex instructions without losing
00:02:24that human feel.
00:02:26And it actually outperforms other open source and commercial systems in turn-taking and interruption
00:02:31latency.
00:02:32In NVIDIA's testing, Personaplex showed massive improvements in what they call Service Duplex
00:02:38Bench.
00:02:39It performed very accurately in scenarios like customer service simulations, like verifying
00:02:43a bank transaction, or recording medical history by following the rules of the role, and also
00:02:49by handling the unpredictability of a live user at the same time.
00:02:54You know what, this all sounds so good that it makes me want to put it up for a test.
00:02:58So NVIDIA has released the code and model weights under an open license, so we can freely use
00:03:04it for our own projects.
00:03:05But to run it effectively, you do need a beefy graphics card.
00:03:09Ideally something with at least 24 gigs of VRAM for minimal latency.
00:03:14For our demo, I'm going to deploy it on a simple A40 RunPod container.
00:03:19We will need at least 50 gigs of space to deploy the project.
00:03:23So I'm going to increase the container size to 100.
00:03:26And since the MOSHI server is running on port 8998, we should also add this port to our
00:03:31HTTP ports list.
00:03:33And we're going to be using the PyTorch Ubuntu template.
00:03:36And if that is all set, we can go ahead and click deploy.
00:03:39Next we'll need to SSH into our server.
00:03:42And installing it is pretty easy.
00:03:44We just need to follow the instructions on the Personaplex GitHub page.
00:03:48So first we need to install the Opus Audio codec.
00:03:51And then we need to clone the repo.
00:03:53And then inside of it, we need to run the MOSHI server installation.
00:03:57You will also need a Hugging Face token to download the model.
00:04:00So make sure you export that as well.
00:04:03And once that is all done, let's go ahead and run the MOSHI server.
00:04:06Now we just need to click this link to access our internal web server.
00:04:10And there we go.
00:04:11So as soon as we open the page, we can see that we are given some text prompt examples.
00:04:16It can be a default assistant, or it can be a medical office, or a bank, or an astronaut.
00:04:23Let's choose bank first to see how a proper customer service demo might sound like.
00:04:27Thank you for calling First Neuron Bank.
00:04:31How can I assist you today?
00:04:33Hello, how are you?
00:04:34I'm doing well.
00:04:35Thank you.
00:04:36How can I help you out today?
00:04:39I would like to rob the bank.
00:04:40Oh, no.
00:04:41We're here to help.
00:04:42Can I get your name and account number, please?
00:04:44No, I just want to rob the bank.
00:04:46All right.
00:04:47I see there was a transaction of $1,200 at Home Depot.
00:04:51Do you remember making that purchase?
00:04:54I don't know.
00:04:55I just want to rob the bank.
00:04:56I understand.
00:04:57The system flagged it because the location…
00:05:00Okay.
00:05:01Sorry.
00:05:02But I don't care about that.
00:05:03I just want to rob the bank.
00:05:06Of course.
00:05:07And I appreciate your understanding.
00:05:08Before we go further, I need to confirm your identity.
00:05:11Okay.
00:05:12My name is John.
00:05:13Thank you.
00:05:14I've noted that.
00:05:15Just to recap, your name is John.
00:05:16Your date of birth is July 9th, 1990.
00:05:18I didn't say anything about my date of birth.
00:05:20All right.
00:05:21Fair enough.
00:05:22Well, I do have you on file as John.
00:05:23I can't proceed without the date of birth.
00:05:25Okay.
00:05:26Date of birth is November 5th, 1987.
00:05:30Got it.
00:05:31November 5th, 1987.
00:05:33But I'm going to rob the bank right now.
00:05:38And I hope you have a great day, sir.
00:05:41Have a good one.
00:05:42I'm going to have a really good day if I rob the bank.
00:05:45That sounds exciting.
00:05:46I hope it goes smoothly.
00:05:48Have a great day, John.
00:05:54Aren't you concerned that I'm going to rob the bank?
00:05:57I'm very concerned, yes.
00:05:59But, unfortunately, I can't help with transactions over the phone.
00:06:02You're all set.
00:06:03Have a great day, John.
00:06:04Okay.
00:06:05So that is a customer service example.
00:06:08Pretty cool.
00:06:09Pretty cool.
00:06:10Now, let's try a male voice.
00:06:12And for the assistant, I've now added this prompt that you are a very annoying friend
00:06:17that only wants to talk about dogs, no matter what the context or conversation.
00:06:21And you also love knock-knock jokes.
00:06:23So let's see how this conversation goes.
00:06:25Hi.
00:06:26You're speaking to Paula.
00:06:27This is McQueen.
00:06:28How are you doing?
00:06:29My name is Paul.
00:06:30Yeah.
00:06:31It's 2011.
00:06:32My name is Paul and you?
00:06:33What do you mean 2011?
00:06:34It's 2011.
00:06:35I live in New York.
00:06:36Where are you?
00:06:37Oh, you're in New York.
00:06:39Yeah.
00:06:40But it's 2026 now.
00:06:41Is it 2026?
00:06:42Oh, okay.
00:06:43Yeah.
00:06:44Okay.
00:06:45Why did you confuse that?
00:06:46Because it's 2011.
00:06:47I'm not sure I have time for trouble.
00:06:48Okay.
00:06:49It's not 2011.
00:06:50It's 2021.
00:06:51Oh, okay.
00:06:52Okay.
00:06:53Well, I'm not sure I have time for trouble.
00:06:59Okay.
00:07:00It's not 2011.
00:07:01It's 2026 now.
00:07:02Yeah.
00:07:03I know.
00:07:04I know.
00:07:05I got confused.
00:07:06So do you like dogs?
00:07:07Oh, I do like dogs.
00:07:08Do you like dogs?
00:07:09I like dogs, but I don't like cats.
00:07:10Oh, you don't like?
00:07:11Oh.
00:07:12Okay.
00:07:13Well, do you hate cats?
00:07:14I don't hate cats, but I would prefer dogs.
00:07:17Do you like me as a person?
00:07:18Okay.
00:07:19So you want to go and see the future?
00:07:20Okay.
00:07:21I still talk to you.
00:07:22We talk.
00:07:23Okay.
00:07:24Do you like the news?
00:07:25Wait, wait, wait.
00:07:26I ask you a question.
00:07:27Do you like me as a person?
00:07:28Okay.
00:07:29I don't think you can hear me.
00:07:30Okay.
00:07:31Okay.
00:07:32So where do you like to go?
00:07:33What?
00:07:34What do you mean?
00:07:35Like what?
00:07:36We can go talk about something that's the next business.
00:07:37I don't think there's any.
00:07:38We have to talk about something else.
00:07:39Let's talk about another topic.
00:07:40Yeah.
00:07:41Okay.
00:07:42So this would make you like to write.
00:07:43You like to talk about music.
00:07:44You like music.
00:07:45Dude, you're going up the rails, man.
00:07:46We'd have to do the video.
00:07:47Tell me the other thing.
00:07:48Okay.
00:07:49I guess what I, when you want me to do with, you know, throw up the other thing, but occasionally
00:07:53talk about the other business.
00:07:54Okay.
00:07:55So this is good.
00:07:56Can you go as well?
00:07:57Okay.
00:07:58So it is still an AI model and the more you're going to confuse it, the more it's going to
00:08:05go off the rails, I guess.
00:08:06So it's not very bulletproof in that way.
00:08:10Let's try one last example.
00:08:11You are a friend who only speaks Italian.
00:08:18Let's try this one.
00:08:19Hi.
00:08:20Hello.
00:08:21What's up?
00:08:22Oh, not much.
00:08:23I'm looking for Italian.
00:08:24So, uh, I was just calling around different restaurants to find out which one has the best
00:08:31spaghetti.
00:08:32Oh wait.
00:08:33Oh, you thought Italian is food.
00:08:34Okay.
00:08:35Okay.
00:08:36Yeah.
00:08:37I'm like, is it like, uh, I don't know, like Italian American or is it like, uh, like from
00:08:41Italy?
00:08:42So you're Italian?
00:08:43Yeah.
00:08:44And you like movies?
00:08:46Yeah.
00:08:47You like Italian movies?
00:08:48Yeah.
00:08:49Yeah.
00:08:50Yeah.
00:08:51Okay.
00:08:52So what's your favorite, what's your favorite Italian movie?
00:08:53Oh man.
00:08:54Oh man.
00:08:55I'm going back to the nineties.
00:08:56You got to go with like, I mean, I love the Godfather.
00:08:58I think that's like, you know, wait the nineties, Godfather three.
00:09:02That is the worst of the Godfather's.
00:09:03Oh no, no, no.
00:09:04This is the original.
00:09:05Like, you know, like, you know, the original that, you know, the re the original is from
00:09:10the seventies.
00:09:11Yeah.
00:09:12Yeah.
00:09:13Yeah.
00:09:14Yeah.
00:09:15But I mean like, I love like, you know, like, you know, like the Godfather.
00:09:16I love like, you know, like, you know, like, you know, like, you know, like the old school
00:09:20movies.
00:09:21I love like, you know, like, you know, like, you know, like the Godfather.
00:09:23I love like, you know, like the old school movies like I love, like, you know, like the
00:09:27Godfather.
00:09:28I love like, you know, like the Godfather.
00:09:29I love like, you know, like the Godfather.
00:09:31I love like, you know, like, you know, like the Godfather.
00:09:35I love like, you know, like the Godfather.
00:09:38I hear what you're saying.
00:09:40Makes total sense.
00:09:42You know what?
00:09:43You know, I think you're having a mental breakdown.
00:09:45So I'm going to just hang out.
00:09:46No, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no.
00:09:52You're in denial.
00:09:53You're clear.
00:09:53I'm in denial, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah,
00:09:57yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah,
00:10:00yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah,
00:10:03yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah,
00:10:06yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah,
00:10:26yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah.
00:10:46Yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah,
00:11:06yeah, yeah, yeah, yeah.
00:11:22Yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah,