Deepfake video with custom text and original voice made easily (2024)

In this short article I present my approach on how to generate a deepfake video of any person you want with their voice speaking what you want.

That was the challenge I faced when I started creating a wedding gift for my brother and his fiancée — a video where some part was famous people saying wishes for them. As I am Data Scientist it would be possible for me to do that from scratch (to some extend) but there were some preconditions and limitations I met:

  1. I do not have pro computer to make computer vision and deep learning tasks locally.
  2. My knowledge about deepfake stuff (theory/algorithms etc) currently is close to zero, so even adapting an excisting model could be a challenge in a short period of time.
  3. I did not want to pay extra money for Google Collab PRO (or any other online environment) and the free one with time quota was not an option for me to train a model (because you need to move the cursor like once per 2 hours? — otherwise it will kick out your session)
  4. In general the lowest cost the possible was a great option (as usual ;)).
  5. The last but most important — I had around 1 day to find working solution and make a video ready!

My goal was not to have mindblowing deepfake quality video. “The fake” could be “visible”. What I wanted to achieve is quick, satysfying solution the cheapest the possible.

Done is better than perfect.

The solution I was looking for was aiming to solve the two following tasks:

  1. Generate speech of chosen person with their own voice from text.
  2. Synchronize video (lips movement) with the speech.

When doing some research on sync the lip movements (step 2) I found really nice demo with some examples to run online. When I checked this website a bit more, it turned out that there is ready-to-use Google Colab notebook, which I had to adapt for my needs.

Deepfake video with custom text and original voice made easily (3)
Deepfake video with custom text and original voice made easily (4)

I have been using this updated version of the notebook mentioned in the very first comment when one opens “Updated Collab Notebook” link.

Step 2 marked! We see a bit later, that it works. Hurray!

For generating voice of a certain person speaking what I want (step 1) I was adviced some tools and my final choice was elevenlabs. It is not completely free of charge, but monthly trial version costs 1$. It is nice, because it has Python API and once you clone a voice, you can generate speech from text many times.

Some takeaways:

  1. This is TTS (Text To Speech) solution which means you write some text and it converts to voice.
  2. You have (currently) no control on intonation of the speech (which is called voice conversion — e.g. you speak with your rythm and tone of voice and convert it to Elon Musk’s voice keeps your rythm and tone — tool which can do it is called voice.ai but training a model for free took ages for me).
  3. For 5$ subscription (Starter) you can only have 10 voices saved. It means that if you reach 10 voices and you want to have an additional one you could either upgrade your plan or just remove one out of ten and clone the current one you want to keep.
  4. You need to have some samples of voice — the best clean without noise. I have used 2 or 3 samples up to 1 min length and it worked nice (to prepare the sound I used Audacity).
  5. When you clone the voice, its characteristic should be caught by the model, but each time you generate a new speech, the voice can slightly differ (how much — it depends on your voice settings: stability and clarity/similarity).
  6. You can choose between two models: multilingual and English. I used most of time multilingual cause I wanted Elon Musk speaking Polish for instance.
  7. Solution works better when you convert text to the same language the voice samples are, but when text and voice sample is different language it worked fine for me as well.

Let’s try it!

Voice cloning

  • First you need to create an account on https://elevenlabs.io/. Once this is done, you see a page with your cloned voices looking like this:
Deepfake video with custom text and original voice made easily (5)

As you can see, no voice cloned until now. You could use some from voice library, but that is not the scope.

  • To be able to clone the voice, you need to upgrade your plan. As said before, it costs 1$ for the first month unless you forget to unsubscribe :)

You can try on your own with my voice samples (download from my github) I have used in this example.

  • After we see Elon_Musk_voice in VoiceLab — it is ready to use.
  • We now generate some short text with this voice replica.

Ready!

Now we can listen (and also download) the result and compare with the original voice.

It’s a nice result, isn’t it?

We can also generate the same sentence in different language, I choose Polish to test it.

Although I think the English version is better, I need to admit that I was pretty surprised with the quality of the outcome for both versions.

Remark:

When generating, you can play with some parameters to adjust the speech according to your needs.

Deepfake video with custom text and original voice made easily (6)

Generating the video

As we do have an audio, now let’s do the lip sync on chosen person. Here for me the challenge was to find a good video presenting front of the face, where the person is looking directly into camera to make a feeling that the wishes are real.

After some research I finally managed to find a good video of Elon Musk speaking.

I clipped it to the length of the voice using Windows Movie Maker (but any other will be OK) and tried the script I mentioned at the beginning. If the video is on Youtube, you can trim it directly in the code (trim it to the same length as the sound). Then load the generated voice and wait for ready deepfake video. My result below:

I also present another video of Papa Francesco speaking Italian and wishing all the best to the couple.

And some example of famous fit coach in Poland — Ewa Chodakowska!

I hope you find this article useful. You could make a joke for your boss generating him/her speaking about the raise for the whole team:)

Thank you for your time.

Sources:

[1] https://www.youtube.com/watch?v=2IVQwzFzsBo

[2] https://www.youtube.com/watch?v=oO8w6XcXJUs&t=210s

[3] https://www.youtube.com/watch?v=zTa_Oj5WNiE

Deepfake video with custom text and original voice made easily (2024)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Maia Crooks Jr

Last Updated:

Views: 5725

Rating: 4.2 / 5 (43 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Maia Crooks Jr

Birthday: 1997-09-21

Address: 93119 Joseph Street, Peggyfurt, NC 11582

Phone: +2983088926881

Job: Principal Design Liaison

Hobby: Web surfing, Skiing, role-playing games, Sketching, Polo, Sewing, Genealogy

Introduction: My name is Maia Crooks Jr, I am a homely, joyous, shiny, successful, hilarious, thoughtful, joyous person who loves writing and wants to share my knowledge and understanding with you.