Adventures in Motion Capture: Using Kinect Data (Part 2)

July 8, 2016

Last week I wrote about the basics of getting joint orientation information out of the Kinect v2 sensor to use as motion capture data for 3D characters. This week, I’ll resume the story and detail how to actually use that data.

So, last time I described how I’d modified one of the sample applications from the Kinect 2.0 SDK to write the important data for animating a 3D skeleton out to a file. That data consists of the 25 rotation quaternions for each body that the Kinect tracks over time. The Kinect records thirty frames of this data per second.

To successfully use the Kinect data to rotate the limbs of a 3D skeleton there are several key properties of that data that you should be aware of.

1: All rotations occur in absolute space.

Consider a normal human arm that’s swung straight forward at the shoulder but the elbow is kept straight. In such a case we can consider the angle of the forearm from two different perspectives.

[Absolute versus local movement.]
Absolute versus local movement.

As can be seen, the upper arm has been rotated with an angle of 90 degrees. With respect to the upper arm, the forearm has not rotated at all. It has inherited its angle from its parent bone. This is what’s considered to be local space – local to the parent transformations.

However, the Kinect gives us data in absolute space. From this point of view, both the upper arm and the forearm have been rotated 90 degrees. The Kinect data tells us the final accumulated rotation for each individual joint without us having to do any intermediate calculations to account for the rotations of each parent joint.

When we’re transforming a stationary 3D body to use a specific frame’s rotation data, ultimately we need to know the absolute rotations for each joint to use. We need to rotate each vertex in our model based on the absolute rotation of the joint the vertex is related to.

2: Skeleton bones are oriented along the Y axis with an “up” vector pointing along the Z axis.

It’s nice and all that the Kinect orientations are given in absolute space. However, in order to use those rotations it’s necessary to know what position a given bone is being rotated from. For example, if a person’s arm is to be rotated “up” by ninety degrees, the final direction the arm is pointing is different if the arm started off pointing forward as opposed to pointing straight down.

[Torso oriented on the Y axis.]
Torso oriented on the Y axis.

If you consider that a bone in the skeleton is defined as the line between two joints, the Kinect assumes that all bones are initially pointed along the vertical or Y-Axis. In other words, the Kinect initially assumes that all bones are pointing straight up and down.

[Torso oriented on the Z axis.]
Torso oriented on the Z axis.

Depending on your 3D modelling software this may or may not be a problem. I’ve been using MilkShape 3D to make my test models because it’s an application I’m already familiar with and it very much matches how vertex and polygon data are to be provided to a graphics card. However, the key with MilkShape models is that when they’re saved with skeleton data, MilkShape stores the initial orientations of each joint as if the bones are lying along the Z axis not the Y axis that the Kinect uses.

In order to use the Kinect data then, it’s first necessary to “remap” the Kinect rotations so that they too treat bones as being aligned along the Z axis instead of the Y axis. To do this, we start by creating a quaternion that will rotate -90 degrees about the X axis. We rotate about the X axis because that axis is at ninety degrees to both the Y and Z. This orients the Y axis of the Kinect data to now be pointing along the Z axis of the MilkShape skeleton data.

So:

finalQuaternion = jointQuaternion X ninentyDegreeXQuaternion

This gets the Y axis aligned along the Milkshape Z axis; however, there is one more thing to deal with here and that’s how the limb is rotated about the Z axis.

[Same angle from body but different arm roll.]
Same angle from body but different arm roll.

Consider what happens when your stick your arm straight out in front of you and point your thumb towards the ceiling. Without swinging your shoulder or bending your elbow, you can rotate your arm such that your thumb is pointing straight down. Whether your thumb is pointing up or down, the bones in your arm are both at ninety degrees from your body, but thumb up and thumb down are very different final positions. This is due to the roll of the arm.

Now, the Kinect considers that bones are initially pointed along its Y axis. It also considers the “up” side of bones to be pointing along the Z axis, which increases the farther away from the front of the Kinect a body gets.

[The Kinect v2 joint hierarchy.]
The Kinect v2 joint hierarchy.

If you consider the SpineBase joints, which is the ultimate parent joint in a Kinect skeleton, the spine is considered to initially be pointing up and down with the body’s back pointing away from the Kinect. This pointing away from the Kinect is considered the “up” vector for the bone.

In order to apply the Kinect data to a MilkShape skeleton it’s important that each bone be oriented so that its “up” vector is pointing in the same direction as the Kinect data. This was where I discovered that MilkShape was inconsistent with its initial orientations. In my test model the SpineBase was oriented properly with respect to the Kinect data, but SpineMid was not. This meant that the torso of my test model was initially rolled about its bone 180 degrees, like so:

[Torso twist from reversed up vector.]
Torso twist from reversed up vector.

To correct for this, it became necessary to create another quaternion that rotates 180 degrees about the Z axis, thus:

finalQuaternion = finalQuaternion X oneEightyDegreeZQuaternion

Due to the way the MilkShape skeleton seems to work, some bones require 180 degree rotations and some require 90 degrees. At the time of writing, I haven’t found a consistent way to programmatically determine what corrective rotations, if any, are needed.

The point is, if you’re using Kinect data and you find some bones to be improperly spun but otherwise in the correct place, it’s probably due to a mismatch in the up vectors.

3: Leaf joints inherit their parent’s orientation.

Although the Kinect tracks the position of leaf joints – the head, hand tips, thumbs, and feet – it does not track orientation information for them. This makes sense as an orientation only really has a meaning when it has a bone coming off of it to properly define the direction of orientation.

For leaf joints the orientation quaternions returned have all components set to 0. In these cases, if the 3D model skeleton has been defined to have vertices attached to these points, then the orientations of these joints’ parents should be used. For example, if you’re orienting the left hand tip then you’d use the left wrist orientation instead as the hand tip will be defined as all zeroes.

4: Rotations are provided as if looking in a mirror.

The Kinect sensor was intended to translate player motions into actions in Xbox games. If you think about viewing a person who is facing you through a normal camera, when that person raises their right arm, that arm is to the left side of your POV. This can be disorienting to players if they raise their right arm but see feedback on the left side of the screen in an Xbox game.

To account for this, the Xbox data comes back as if you’re looking in a mirror. The data is still associated with the correct joints (e.g. the right shoulder data is indeed for the models right shoulder) but depending on how you’ve built your model you may have to account for this.

If you’ve created your model as a mirror image, which is what I unintentionally did as it had been some time since I’d last used MilkShape and so got front and back backwards, then you don’t have to do anything further to the data. If you’ve created your model front way round; however, then you’ll find the absolute rotations of the Kinect will bend your character’s limbs backwards. In such a case, you’ll need to transform the final positions of the vertices so as to flip their Z values (positive becomes negative and vice versa).

Conclusion (for now)

At this point you should have enough information to be able to apply the basic body frame data retrieved from the Kinect to your own 3D model. That said, there are still a few more issues to be dealt with to get a really useful representation of the data. However, this is probably enough for today. Come back next week for part 3 where I’ll go beyond the basic orientation data and into how to use position and floor plane data from the Kinect to get a more accurate animation. See you then.