Model overview

Recently, Deep Neural Networks (DNNs) have shown their power to improve the performance of character animation, as evidenced by the growing number of publications on the topic of deep motion synthesis. In animation, deep-learning based approaches attempt to handle the complicated human motion and provide promising perspectives for cheaper and faster animation making. We list several methods for skeletal animation in GenMotion, and we are continuing building and expanding methods for deep motion synthesis in the future release of GenMotion.

Sequence to Sequence (Seq2Seq)

Seq2Seq models are popular and widely used in motion prediction/ Seq2Seq-based approaches generally consist in training a Human Motion Recurrent Neural Network (RNN) as encoder to map input to a hidden vector, and training another RNN as decoder to generate motion from the hidden vector [refer to]`julieta2017motion`. Both the encdoer and decoder are trained jointly.

Recurrent Network Models for Human Dynamics

Encoder-Recurrent-Decoder (ERD) [refer to]`fragkiadaki2015recurrent`. is a model for prediction of human body poses from motion capture. The ERD model is a recurrent neural network that incorporates nonlinear encoder and decoder networks before and after recurrent layers.

Variational Autoencoder

Variational autoencoders (VAEs) are a deep learning technique for learning latent representations. They have also been used to draw images, achieve state-of-the-art results in semi-supervised learning, as well as interpolate between sentences.

VAEs can also be applied into generation human motion by combining samples from a variational approximation to the intractable posterior [refer to]`habibie2017recurrent`, or be integrated into sequence generation tasks from a Variational Recurrent Neural Network (VRNN) [refer to]`chung2015recurrent`.

Conditional Variational Autoencoder

Conditional variational autoencoder (CVAE) is a directed graphical generative model which has obtained excellent results and is among the state of the art approaches to generative modeling. It assumes that the data is generated by some random process, involving an unobserved continuous random variable generated from some prior distribution and from some condition distribution. Given a prescribed action type, CVAE aims to generate plausible human motion sequences in 3D. Importantly, the set of generated motions are expected to maintain its diversity to be able to explore the entire action-conditioned motion space; meanwhile, each sampled sequence faithfully resembles a natural human body articulation dynamics [refer to]`guo2020action2motion`.

Transformer

Transformer designed for the motion synthesis task works as a sequence-to-sequence prediction problem conditioned on input keyframes. The transformer-based pipeline contains spatial attention and temporal attention to extract spatial-temporal correlations of bones in the human skeleton [refer to]`liu2021motion`.

Transformer VAE

Transformer-VAE’s learn smooth latent spaces of discrete sequences without any explicit rules in their decoders. This can be used for program synthesis, drug discovery, music generation and motion synthesis. Compared with regular VAEs, Transformer-VAE applies the transformer-structured, i.e., the self-attention machanism for the encoder and decoder. The transformer has had great success in natural language processing (NLP), for example the tasks of machine translation and time series prediction. We re-implemented the Transformer-VAE for motion synthesis in GenMotion.

Transformer CVAE

Transformer Conditional variational autoencoder (Transformer-CVAE) tackles the problem of action-conditioned generation of realistic and diverse human motion sequences using the transformer architectures. It learns an action-aware latent representation for human motions by training a generative variational autoencoder (VAE). By sampling from this latent space and querying a certain duration through a series of positional encodings, we synthesize variable-length motion sequences conditioned on a categorical action [refer to]`petrovich2021action`.