Tutorial: Conditioned Generation of 3D Human Motions (Action2Motion)

Action2Motion can be seen as an inverse of action recognition: given a prescribed action type, it aims to generate plausible human motion sequences in 3D. Importantly, the set of generated motions are expected to maintain its diversity to be able to explore the entire action-conditioned motion space; meanwhile, each sampled sequence faithfully resembles a natural human body articulation dynamics. Motivated by these objectives, Action2Motion follows the physics law of human kinematics by adopting the Lie Algebra theory to represent the natural human motions; we also propose a temporal Variational Auto-Encoder (VAE) that encourages a diverse sampling of the motion space.

See more from original implementation, and paper link

Dataset

To get the pre-process the dataset, please refer to the this Github repository and agree to the license. There following code shows examples from HumanAct12 dataset.

[1]:
# Set data path
data_path = "E://researches/action-to-motion/dataset/humanact12"

Training

[2]:
import torch
[3]:
from genmotion.algorithm.action2motion.configs import params
from genmotion.algorithm.action2motion.utils import paramUtil
from genmotion.algorithm.action2motion.dataset import MotionFolderDatasetHumanAct12, MotionDataset
[5]:
opt = params.TrainingConfig()
print(vars(opt))
{'arbitrary_len': False, 'batch_size': 8, 'checkpoints_dir': './checkpoints/vae', 'clip_set': './dataset/pose_clip_full.csv', 'coarse_grained': True, 'dataset_type': 'humanact12', 'decoder_hidden_layers': 2, 'dim_z': 30, 'eval_every': 2000, 'gpu_id': 0, 'hidden_size': 128, 'isTrain': True, 'is_continue': False, 'iters': 50000, 'lambda_align': 0.5, 'lambda_kld': 0.001, 'lambda_trajec': 0.8, 'lie_enforce': False, 'motion_length': 60, 'name': 'act2motion', 'no_trajectory': False, 'plot_every': 50, 'posterior_hidden_layers': 1, 'print_every': 20, 'prior_hidden_layers': 1, 'save_every': 2000, 'save_latest': 50, 'skip_prob': 0, 'tf_ratio': 0.6, 'time_counter': True, 'use_geo_loss': False, 'use_lie': True}
[6]:
import torch
print("torch version:", torch.__version__)
torch version: 1.7.1
[7]:
device = torch.device("cuda:" + str(opt.gpu_id) if torch.cuda.is_available() else "cpu")
[8]:
joints_num = 0
input_size = 72
data = None
[9]:
if opt.dataset_type == "humanact12":
    input_size = 72
    joints_num = 24
    raw_offsets = paramUtil.humanact12_raw_offsets
    kinematic_chain = paramUtil.humanact12_kinematic_chain
    data = MotionFolderDatasetHumanAct12(data_path, opt, lie_enforce=opt.lie_enforce)
Total number of frames 90099, videos 1191, action types 12
[11]:
data[0][0].shape
[11]:
(64, 72)
[15]:
opt.dim_category = len(data.labels)
# arbitrary_len won't limit motion length, but the batch size has to be 1
if opt.arbitrary_len:
    opt.batch_size = 1
    motion_loader = torch.utils.data.DataLoader(data, batch_size=opt.batch_size, drop_last=True, num_workers=1, shuffle=True)
else:
    motion_dataset = MotionDataset(data, opt)
    motion_loader =  torch.utils.data.DataLoader(motion_dataset, batch_size=opt.batch_size, drop_last=True, num_workers=2, shuffle=True)
[17]:
len(motion_loader)
[17]:
148
[18]:
opt.pose_dim = input_size

if opt.time_counter:
    opt.input_size = input_size + opt.dim_category + 1
else:
    opt.input_size = input_size + opt.dim_category

opt.output_size = input_size