Despite the recent proliferation of artificial intelligence (AI), scientists have struggled with making models that don’t move in very unnatural ways, far removed from human movement, but a recent breakthrough could address this problem.
As it happens, researchers at Peking University’s Institute for Artificial Intelligence and the State Key Laboratory of General AI have developed new AI models that could simplify the generation of realistic motions of human characters or avatars, according to a report by Tech Xplore from April 13.
The most recent breakthrough aims to equip AI systems with support for editing and inspiration-based creative workflow, paving the way for the development of more realistic video games and animations, particularly in content viewed using virtual reality (VR) headsets and professional training videos.
As Yixin Zhu, a senior author of the paper, explained, researchers were long “fascinated by recent advances in text-to-motion generation.” However, there was a glaring issue in terms of editing the existing motions:
“We noticed a critical gap in the technological landscape. While generating motions from scratch had seen tremendous progress, the ability to edit existing motions remained severely limited.”
According to Nan Jiang, a co-author of the study, previous systems that attempted motion editing “required extensive pre-collected triplets of original motions, edited motions, and corresponding instructions – data that’s extremely scarce and expensive to create,” which made them inflexible and capable of handling only specific editing scenarios from their explicit training.
How the realistic new AI model works
To address this problem, Zhu and his colleagues developed a new system capable of editing all human motions based on user-provided textual instructions without the need for task-specific inputs or body part specifications, as well as supporting both spatial (changes to specific body parts) and temporal editing (adaptation of motions over time).
Specifically, the proposed approach for the generation of human-like movements draws upon the newly introduced data augmentation training technique called MotionCutMix, which, as Hongjie Li, the study’s co-author said, “helps AI systems learn to edit 3D human motions based on text instructions,” and a diffusion model MotionReFit.
Now, researchers can select particular body parts in a motion sequence and combine them with parts present in a different sequence, thus gradually and more seamlessly blending boundaries between the movements of one body part and another.
As opposed to previous methods trained on fixed datasets based on annotated videos of people moving in different ways, MotionCutMix facilitates the generation of new training samples on the fly and allows learning from large libraries of motion data that don’t require manual annotation.
Meanwhile, the auto-regressive diffusion model MotionReFit processes the training samples and learns to generate and modify human motions, allowing the user to accurately modify sequences of human motions just by describing what they would like to do – a step forward from other human motion generation models.
And the best part – the system relies on a text-based interface, making it accessible to experts and non-expert users alike, requiring no expertise in the creation of games and animations.
What’s more, it may hold value in robotics research as well, where it could act as a tool to advance the movements of humanoid robots, such as those with potential use in households.