Security News
Research
Supply Chain Attack on Rspack npm Packages Injects Cryptojacking Malware
A supply chain attack on Rspack's npm packages injected cryptomining malware, potentially impacting thousands of developers.
com.hecomi.ulipsync
Advanced tools
uLipSync is an asset for lip-syncing in Unity. It has the following features:
Unity.Burst
and Unity.Mathematics
from Package Manager.https://github.com/hecomi/uLipSync.git#upm
to Package Manager.https://registry.npmjs.com
com.hecomi
When a sound is played by AudioSource
, a buffer of the sound comes into the OnAudioFilterRead()
method of a component attached to the same GameObject. We can modify this buffer to apply sound effects like reverb, but at the same time since we know what kind of waveform is being played, we can also analyze it to calculate Mel-Frequency Cepstrum Coefficients (MFCC), which represent the characteristics of the human vocal tract. In other words, if the calculation is done well, you can get parameters that sound like "ah" if the current waveform being played is "a", and parameters that sound like "e" if the current waveform is "e" (in addition to vowels, consonants like "s" can also be analyzed). By comparing these parameters with the pre-registered parameters for each of the "aieou" phonemes, we can calculate how close each phoneme is to the current sound, and reflect this in the blendshape of the SkinnedMeshRenderer
to enable lipsync. If you feed the input from the microphone into AudioSource
, you can also lipsync to your current voice.
The component that performs this analysis is uLipSync
, the data that contains phoneme parameters is Profile
, and the component that moves the blendshape is uLipSyncBlendShape
. We also have a uLipSyncMicrophone
asset that plays the audio from the microphone. Here's an illustration of what it looks like.
Let's set up using Unity-chan. The sample scene is Samples / 01. Play AudioClip / 01-1. Play Audio Clip. If you installed this from UPM, please import Samples / 00. Common sample (which contains Unity's assets).
After placing Unity-chan, add the AudioSource
component to any game object where a sound will be played and set an AudioClip
to it to play a Unity-chan's voice.
First, add a uLipSync
component to the same game object. For now, select uLipSync-Profile-UnityChan
from the list and assign it to the Profile slot of the component (if you assign something different, such as Male, it will not lip sync properly).
Next, set up the blendshape to receive the results of the analysis and move them. Add uLipSyncBlendShape
to the root of Unity-chan's SkinnedMeshRenderer
. Select the target blendshape, MTH_DEF
, and go to Blend Shapes > Phoneme - BlendShape Table and add 7 items, A, I, U, E, O, N, and -, by pushing the + button ("-" is for noise). Then select the blendshape corresponding to each phoneme, as shown in the following image.
Finally, to connect the two, in the uLipSync
component, go to Parameters > On Lip Sync Updated (LipSyncInfo) and press + to add an event, then drag and drop the game object (or component) with the uLipSyncBlendShape
component where it says None (Object). Find uLipSyncBlendShape
in the pull-down list and select OnLipSyncUpdate
in it.
Now when you run the game, Unity-chan moves its mouth as it speaks.
The range of the volume to be recognized and the response speed of the mouth can be set in the Paramteters of the uLipSyncBlendShape
component.
As for the volume, you can see the information about the current, maximum, and minimum volume in the Runtime Information of the uLipSync
component, so try to set it based on this information.
In some cases, you may want to attach the AudioSource
to the mouth position and uLipSync
to some other game object. In this case, it may be a bit troublesome, but you can add a component called uLipSyncAudioSource
to the same game object as the AudioSource
, and set it in uLipSync Parameters > Audio Source Proxy. Samples / 03. AudioSource Proxy is a sample scene.
If you want to use a microphone as an input, add uLipSyncMicrophone
to the same game object as uLipSync
. This component will generate an AudioSource
with the microphone input as a clip. The sample scene is Samples / 02-1. Mic Input.
Select the device to be used for input from Device, and if Is Auto Start is checked, it will start automatically. To start and stop microphone input, press the Stop Mic / Start Mic button in the UI as shown below at runtime.
If you want to control it from a script, please use uLipSync.MicUtil.GetDeviceList()
to identify the microphone to be used, and pass its MicDevice.index
to the index
of this component, then call StartRecord()
to start it or StopRecord()
to stop it.
Note that the microphone input will be played back in Unity a little later than your own speech. If you want to use a voice captured by another software for broadcasting, set Parameters > Output Sound Gain to 0 in the uLipSync
component. If the volume of the AudioSource
is set to 0, the data passed to OnAudioFilterRead()
will be silent and cannot be analyzed.
In the uLipSync
component, go to Profile > Profile and select a profile from the list (Male for male, Female for female, etc.) and run it. However, since the profile is not personalized, the accuracy of the default profile may not be good. Next, we will see how to create a calibration data that matches your own voice.
So far we have used the sample Profile
data, but in this sectio, let's see how to create data adjusted for other voices (voice actors' data or your own voice).
Clicking the Profile > Profile > Create button in the uLipSync
component will create the data in the root of the Assets directory and set it to the component. You can also create it from the Project window by right-clicking > uLipSync > Profile.
Next, register the phonemes you want to be recognized in Profile > MFCC > MFCCs. Basically, AIUEO is fine, but it is recommended to add a phoneme for breath ("-" or other appropriate character) to prevent the breath input. You can use any alphabet, hiragana, katakana, etc. as long as the characters you register match the uLipSyncBlendShape
.
Next, we will calibrate each of the phonemes we have created.
The first way is to use a microphone. uLipSyncMicrophone
should be added to the object. Calibration will be done at runtime, so start the game to analyze the input. Press and hold the Calib button to the right of each phoneme while speaking the sound of each phoneme into the microphone, such as "AAAAA" for A, "IIIIII" for I, and so on. If it's noise, don't say anything or blow on it.
If you set uLipSyncBlendShape
beforehand, it is interesting to see how the mouths gradually match.
If you have a slightly different way of speaking, for example, between your natural voice and your back voice, you can register multiple phonemes of the same name in the Profile
, and adjust them accordingly.
Next is the calibration method using audio data. If there is a voice that says "aaaaaaa" or "iiiiiii", please play it in a loop and press the Calib button as well. However, in most cases, there is no such audio, so we want to achieve calibration by trimming the "aaa"-like or "iii"-like part of the existing audio and playing it back. A useful component for this is uLipSyncCalibrationAudioPlayer
. This is a component that loops the audio waveform while slightly cross-fading the part you want to play.
Select the part that seems to say "aaaaa" by dragging the boundary, and then press the Calib button for each phoneme to register the MFCC to the Profile
.
When calibrating, you should pay attention to the following points.
So far, we have looked at runtime processing. Now we will look at the production of data through pre-calculation.
If you have audio data, you can calculate in advance what kind of analysis results you will receive each frame, so we will bake it into a ScriptableObject
called BakedData
. At runtime, instead of using uLipSync
to analyze the data at runtime, we will use a component named uLipSyncBakedDataPlayer
to play the data. This component can notify the result of the analysis with an event just like uLipSync
, so you can register uLipSyncBlendShape
to realize lipsync. This flow is illustrated in the following figure.
The sample scene is Samples / 05. Bake. You can create a BakedData
from the Project window by going to Create > uLipSync > BakedData.
Here, specify the calibrated Profile
and an AudioClip
, then click the Bake button to analyze the data and complete the data.
If it works well, the data will look like the following.
Set this data to the uLipSyncBakedDataPlayer
.
Now you are ready to play. If you want to check it again in the editor, press the Play button, or if you want to play it from another script, just call Play()
.
By adjusting the Time Offset slider, you can modify the timing of the lipsync. With runtime analysis, it is not possible to adjust the opening of the mouth before the voice, but with pre-calculation, it is possible to open the mouth a little earlier, so it can be adjusted to look more natural.
In some cases, you may want to convert all the character voice AudioClip
s to BakedData
at once. In this case, please use Window > uLipSync > Baked Data Generator.
Select the Profile you want to use for batch conversion, and then select the target AudioClips. If the Input Type is List, register the AudioClips directly (dragging and dropping multiple selections from the Project window is easy). If the Input Type is List, register the AudioClip directly (dragging and dropping multiple selections from the Project window is easy). If the Input Type is Directory, a file dialog will open where you can specify a directory, and it will automatically list the AudioClips under that directory.
Click the Generate button to start the conversion.
When you have already created data, you may want to review the calibration and change the profile. In this case, there is a Reconvert button in the Baked Data tab of each Profile
, which converts all the data using the Profile
.
You can add special tracks and clips for uLipSync in Timeline. We then need to bind which objects will be moved using the data from the Timeline. To do this, a component named uLipSyncTimelineEvent
that receives playback information and notifies uLipSyncBlendShape
is introduced. The flow is illustrated below.
Right-click in the track area in the Timeline and add a dedicated track from uLipSync.Timeline > uLipSync Track. Then right-click in the clip area and add a clip from Add From Baked Data. You can also drag and drop BakedData
directly onto this area.
When you select a clip, you will see the following UI in the Inspector, where you can replace the BakedData
.
Next, add a uLipSyncTimelineEvent
to some game object, and then add the binding so that lipsync can be played. At this time, register the uLipSyncBlendShape
in the On Lip Sync Update (LipSyncInfo).
Then click on the game object with the PlayableDirector
and drag and drop the game object into the slot for binding on the uLipSyncTrack
in the Timeline window.
Now the lipsync information will be sent to uLipSyncTimelineEvent
, and the connection to uLipSyncBlendShape
is established. Playback can also be done during editing, so you can adjust it with the animation and sound.
VRM is a platform-independent file format designed for the use with 3D characters and avatars. Blendshapes in VRM are controlled via a component called VRMBlendShapeProxy
.
With uLipSyncBlendShape
, the blendshapes in the SkinnedMeshRenderer
was controlled directly, but there is a modified component named uLipSyncBlendShapeVRM
that controls VRMBlendShapeProxy
instead.
For more details, please refer to Samples / 04 VRM. The scene can be played if you have set up VRM and imported Alicia.
uLipSyncBlendShape is for 3D models, but if you want to animate a texture for a 2D model instead, you can write your own component to support it. Prepare a component that provides a function to receive uLipSync.LipSyncInfo
and register it to OnLipSyncUpdate(LipSyncInfo) of uLipSync
or uLipSyncBakedDataPlayer
.
For example, the following is an example of a simple script that outputs the result of recognition to Debug.Log()
.
using UnityEngine;
using uLipSync;
public class DebugPrintLipSyncInfo : MonoBehaviour
{
public void OnLipSyncUpdate(LipSyncInfo info)
{
if (!isActiveAndEnabled) return;
if (info.volume < Mathf.Epsilon) return;
Debug.LogFormat($"PHENOME: {info.phoneme}, VOL: {info.volume} ");
}
}
LipSyncInfo
is a structure that has members like the following.
public struct LipSyncInfo
{
public string phoneme; // Main phoneme
public float volume; // Normalized volume (0 ~ 1)
public float rawVolume; // Raw volume
public Dictionary<string, float> phonemeRatios; // Table that contains the pair of the phoneme and its ratio
}
There is a function to save and load the profile to/from JSON. From the editor, specify the JSON you want to save or load from the Import / Export JSON tab, and click the Import or Export button.
If you want to do it in code, you can use the following code.
var lipSync = GetComponent<uLipSync>();
var profile = lipSync.profile;
// Export
profile.Export(path);
// Import
profile.Import(path);
If you want to perform calibration at runtime, you can do it by making a request to uLipSync
with uLipSync.RequestCalibration(int index)
as follows. The MFCC calculated from the currently playing sound will be set to the specified phoneme.
lipSync = GetComponent<uLipSync>();
for (int i = 0; i < lipSync.profile.mfccs.Count; ++i)
{
var key = (KeyCode)((int)(KeyCode.Alpha1) + i);
if (Input.GetKey(key)) lipSync.RequestCalibration(i);
}
Please refer to CalibrationByKeyboardInput.cs to see how it actually works. Also, it is better to save and restore the profile as JSON after building the app because the changes to ScriptableObject
can not be saved.
When building on a Mac, you may encounter the following error.
Building Library/Bee/artifacts/MacStandalonePlayerBuildProgram/Features/uLipSync.Runtime-FeaturesChecked.txt failed with output: Failed because this command failed to write the following output files: Library/Bee/artifacts/MacStandalonePlayerBuildProgram/Features/uLipSync.Runtime-FeaturesChecked.txt
This may be related to the microphone access code, which can be fixed by writing something in Project Settings > Player's Other Settings > Mac Configuration > Microphone Usage Description.
Profile
Examples include Unity-chan assets.
© Unity Technologies Japan/UCL
FAQs
A MFCC-based LipSync plugin for Unity using Job and Burst Compiler
We found that com.hecomi.ulipsync demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
A supply chain attack on Rspack's npm packages injected cryptomining malware, potentially impacting thousands of developers.
Research
Security News
Socket researchers discovered a malware campaign on npm delivering the Skuld infostealer via typosquatted packages, exposing sensitive data.
Security News
Sonar’s acquisition of Tidelift highlights a growing industry shift toward sustainable open source funding, addressing maintainer burnout and critical software dependencies.