AI Singer: Unwilling to be just an “AI Stefanie Sun”

In the Chinese music scene, Sun Yanzi is everywhere. On various music platforms, playlists and podcasts featuring “AI Sun Yanzi” as the singer are becoming more and more common, and even Sun Yanzi herself responding cannot stop this trend.

“What I want to say is, what do you have to argue with someone who releases a new album every few minutes…I can still tell the difference, it has no emotion, no change in pitch and breath (or any term you can think of). I’m sorry, I suspect this is just a very short-term response.”

Singers are worried that humans can no longer compete with AI? And behind the fear of “being replaced”, more questions fall on the copyright level. Does a third party have the right to replicate a singer’s voice without permission? How can the “Sun Yanzis” regain control of their own voice?

Recently, an AI singer music creation software “NetEase Cloud Music X Studio” quietly went online. The software has 12 highly human-like and distinctly voiced AI singers built in, including the AI Ruoxi (code name F970) whose voice is similar to that of Faye Wong. Netizens couldn’t help but wonder, is Ruoxi related to Faye Wong? Where does the AI singer’s training data come from?

This question was asked to Xiaobing, one of the software’s producers.

Xiaobing’s research history in the field of domestic AI songstress dates back to 2014, when Xiaobing was still a Microsoft Xiaobing team and released a chat robot called “Girl Xiaobing”. With the development of technology, Xiaobing gradually expanded its functions, including the application of AI in music. In 2017, Xiaobing launched a trial version of AI singer, which already has certain music creation and singing abilities.

More sober listeners may insist that AI Sun Yanzi is just another playful work for people to understand AIGC, and fans will not really give up their beloved singers and completely focus their attention on a lifeless substitute version. The discussion about AI Sun Yanzi may eventually turn into: To what extent can AI be extended in music creation?

This discussion has actually already begun.

“M435”

“The software for virtual singers is actually useful, whether it’s paid or free, we will all use it because it is often used in daily creative processes.”

Sun Yujing’s answer was somewhat unexpected. As the founder and artistic director of Fantasia Anime Music, he and his team are responsible for all the music production of “Scissor Seven”. “Scissor Seven” is a high-quality 2D animation in China, but because of its meticulous production, the updates are always slow. Fans who follow the series even joke that they need to prepare to “brew Chinese herbal tea for health” while waiting for the updates.

The same care and meticulousness is also reflected in the animation music. In order to achieve the best creative effect, many AI elements have been incorporated into Sun Yujing’s creative process, such as “M435”.

“For example, we made a song–taking rock music as an example–our own producer would write the melody and we might sing with our own voice. But the producer may be very professional in terms of music theory and production, but the singing might not be as good. But we still need to sing it ourselves, and then the arrangement work will be much smoother.”

“Later, we directly used M435”–a rock music AI singer–“to sing, and then we could quickly determine whether this melody or these lyrics were what we wanted.”

“M435” has another name–AI singer Cui Can.

He is certainly not Cui Jian (although this name is reminiscent), but obviously, the meaning behind an “AI singer” is far more than just “accuracy of pronunciation and tone”.

“M435” was born from the AI Xiaobing framework and is built into the X Studio, an AI singer music creation software mentioned above. The creator inputs the lyrics and music, and within 3 seconds, professional-grade AI vocals can be generated.

Cui Can and 11 other AI singers in the software form the first virtual singer label WOWAIDO! The first mini-album “WOWAIDO! I Gravity” was released on June 15th on Netease Cloud Music. Within 24 hours, the comments on the song “To You” had reached 999+, with AI He Chang (code name F11) and AI Xu Mengtian (code name F801) as the performers. AI Xia Yubing and AI Chen Shuoruo, with millions of fans on Douyin, also appeared on the member list.

AI singers engage in “one-on-one” interactions with netizens in the comments section. Source: Xiaobing

Everlasting Online Singers

We have always viewed the explosion of AI singers from the perspective of the audience. But for music creators, the different styles of singers produced by X Studio fill a gap in music creation, and these singers are always stable and reliable.

X Studio’s AI singer lineup has highly human-like and diverse vocal styles, adaptable to various genres such as pop, folk, Chinese style, electronic, and rock, including the childlike voice which Li Zhaoyang, a graduate of Sichuan Conservatory of Music, has been struggling to find in children’s music.

Besides being a teacher, Li Zhaoyang is also a children’s music producer. He always feels that, on the one hand, parents have extremely high demands, but it is difficult to find suitable childlike voice, and on the other hand, even if he finds one, it is hard to ask a child of a few years old to sing a song stably according to the requirements.

He began to look for such a voice in AI singers. The voice of Xiaobing (code name F002) is the closest childlike voice he can find in similar products.

This is the key reason why he likes to use X Studio. He once used Xiaobing’s voice to publish the work “Li Wengdui Rhyme” and was nominated for the Best Children’s Music Album of the 5th Singing Work Committee Music Awards.

Sun Yujing learned about X Studio from AI singer He Chang. He Chang is the most mature AI singer in X Studio. He has collaborated with musicians such as Ma Boqian, Bian Ziyan, and Xiao Ke, and has sung Burberry’s “Runaway 2.0”, Chengdu Universiade promotion song “Waiting for You in Chengdu”, Beijing Winter Olympics tribute song “Only You, No Other”.

“For players, or for practitioners, producers, if the AI singer does a good job in producing mature works, they will think that they can use this AI singer and achieve the same effect as long as they adjust it carefully, so everyone will definitely be willing to use it.”

In Sun Yujing’s eyes, AI singers are the same as human singers, each with their own unique characteristics. They complement each other’s strengths and can help themselves at different stages of creation. “AI singers are online 24/7 and their status is constant. In the early stages of creation, such as lyrics, composition, and arrangement, they can help creators adjust and improve at any time, and see at least 80% of the finished product. In the final stage of creation, the recording stage, because human communication is more direct, human singers can quickly adjust the interpretation of certain details according to my demonstration.”

X Studio does give users a great deal of creative freedom.

It features built-in Little Ice singing models, consistent super-natural language, flow rendering voice synthesis, visual neural network rendering, and other technologies. Creators can fine-tune the performance of AI singers by adjusting parameters such as glissandos, turns, vibratos, diction, rhythm, tone, and volume, giving a delicate interpretation of their work. This means that on the basis of standard tone and singing style, users can polish their AI singers into a more personalized style or one that better fits their own vision.

In addition, X Studio’s other powerful feature is its ability to support up to 30 tracks of merged AI tracks, which means that every musician can have a “30-person” choir.

Song Chengrong’s circle of friends

Eliminating copyright concerns

The most headache-inducing problem for music creators is undoubtedly copyright issues, but AI singers in X Studio are free of copyright risks.

Some netizens have put their AI music works on the Internet, using the voice prototype of Little Ice’s AI singer Ruoxi. After more detailed tuning, the voice of the AI singer sounds like that of Faye Wong, and it sang a song “Red Bean”.

After AI Sun Yanzi, perhaps more singers will be turned into AI. In theory, as long as you have an audio dataset of the tone you want to train, such as two hours of Faye Wong’s record or interview audio source, and then train the model according to this audio dataset, and then use the trained model to infer and replace the voice line in another song, such as Dao Lang’s “Lover”, you can finally hear Faye Wong’s version of “Lover”.

This is a more complicated process, accompanied by huge copyright risks.

Xiao Sa, senior partner of Beijing Dacheng Law Firm, said in an interview that to legalize a cover song, the permission of the rights holder needs to be obtained. Xiao Sa pointed out that the creation of a song often involves many rights holders, and its copyright system is more complicated. Specifically, it may include the copyright of the lyrics and music authors of the song, the right of the singer of the song as a performer, and the right of the relevant company as a recording and video producer.

“Ideally, permission from all of the aforementioned rights holders should be obtained, otherwise their cover song behavior is likely to infringe on the corresponding rights of the aforementioned subjects, and thus they need to bear liability for infringement. Even AI covers are subject to this rule.”

If a creator trains a well-known singer’s voice to sing a new song, is the success of the song due to the creator or to the AI or the well-known singer? The creator is faced with the complex rights system in the music industry. The well-known singer himself is faced with the usage scenarios of his voice that cannot be controlled and potential opportunities to obtain income. This is a common problem behind the AI-ization of well-known singers.

The solution to this problem urgently needs music platforms to serve as an intermediary. The latter already contains the complete process from creator to singer in the music industry. This is also the focus of the X Studio update this time.

This may be the first step in legalizing “AI Sun Yanzi and others”. Xiaobing Company revealed that real singers have voluntarily authorized the company to “clone” their voices to retain their peak state. Singers can decide on the use of AI voices and obtain related income. After personal training and tuning, AI voices will have the opportunity to perform better. When more and more “genuine” voices appear in the market, the good money will drive out the bad money, which may solve the problem of “Sun Yanzi and others” and give music creators more choices.

The official also stated that Xiaobing has always advocated the safe development of AI technology. The software includes high security and privacy protection policies, all AI singers have undergone strict data training, and all creations can be traced back, hoping that this move can provide a reference for the future development of AI creation.

Epilogue

Perhaps we can find some confidence in AI singers from the history of the music synthesizer.

The synthesizer is an electronic instrument that generates sound through an electric signal that is amplified and then pushed through a speaker. This allows it to simulate the sound of real instruments, such as pianos, drums, and even strings. On the other hand, synthesizers can also generate sounds based on environmental samples that have been debugged and even sounds that do not exist in reality.

Synthesizers began to attract attention in the 1960s when they were used in popular music. The digital synthesizer DX7 introduced by Yamaha in the 1980s, as well as the software synthesizers that rose further in the 1990s with the widespread use of computers, gradually solidified the role of synthesizers in music production, and even formed music genres such as Synthwave around them.

Instead of treating AI singers as the opposite of real singers, we can see them as a kind of evolving synthesizer.

In the process of music production, the ability of synthesizers has evolved from initially reproducing real instruments to reproducing all kinds of environmental sounds, such as the sound of tides and insect chirping. In contrast, the ever-changing human voice is the ultimate proposition for synthesizers. Both have gone through a process of being questioned before being widely accepted.

And the “AI Stefanie Sun” who have been legalized will eventually help future music creation.

Like what you're reading? Subscribe to our top stories.

We will continue to update Gambling Chain; if you have any questions or suggestions, please contact us!

Artificial Intelligence

Gambling Chain

AI Singer: Unwilling to be just an “AI Stefanie Sun”

Everlasting Online Singers

Like what you're reading? Subscribe to our top stories.

Was this article helpful?

Lido, the liquidity staking protocol, has opened applications for Ethereum node operators.

What’s the story behind Connext, valued at 250 million USD with over 20 million USD in cumulative funding?

Products used

GC Wallet