Description
Key Learnings
- Learn about the challenges that animators face, and how AI can alleviate some of those challenges in an artist-centered way.
- Learn how we can build powerful, production-ready workflows when we combine novel AI techniques with existing Autodesk capabilities.
- Discover future research directions and opportunities for AI in the character-animation domain.
Speaker
EVAN ATHERTON: Hi, my name is Evan Atherton, and I'm a senior principal research scientist with Autodesk research. I'm excited to share some of my work and thoughts on how we can use AI in an artist centric way to help improve the character animation pipeline. Before I start, I'm an Autodesk employee, so here's our obligatory safe harbor statement, which basically says what I'm sharing is in the context of research, and I'm not making any promises that this research will become a product, service, or feature.
All right. With that, here's a quick overview of what I'm going to cover during this talk. I'm going to start with a bit of a look at why character animation is still so challenging. Then I'm going to do a bit of a deep dive on a research prototype to showcase one specific technique that we call neural motion control to address some of these challenges. I'm going to talk about how we can create a production artist-- production ready artist centric workflow by connecting this science to existing Autodesk animation capabilities.
Then I'll talk about the data involved, both how much it takes and how it's acquired. And then I'll finish with a few thoughts. All right.
So to motivate us here, I want to start with something that's really important to us at Autodesk, which is AI is a how. AI should be how we do something, not why we do something. Especially in the last year or so with all the hype around generative AI, LLMs image generators. There's been kind of a bit of a push to just throw AI at stuff, but at the end of the day, we want to make sure that we're supporting artists by solving problems that they care about.
And the specific problem we're looking at solving here is that animation is hard. I suspect this is pretty obvious, not super controversial, but my thesis here is that the reason I think animation is still so hard is because the fundamental technique animators use to create performances hasn't changed in over a century. So they start with key poses and then they continue to add more and more poses in between until they get the movement and performance they want.
Here's a time lapse of an animator using Maya to create a simple walk cycle using that exact same process. They're grabbing these rig controllers. They're posing their character, they're keying those poses, and they're going back and forth between their reference. The graph editor, adding more poses, play blasting, refining their keyframes. I won't play this whole time lapse, but this was a three hour exercise for this artist all to get what's essentially two steps that are copy and pasted into a walk cycle. So even though this process has been entirely digitized, it's still super tedious and it takes a lot of work to get that natural fluid motion.
In contrast, here's an art directed animation that was done in around three minutes in Maya on a production ready rig with only a handful of keyframes. And this was done using the workflow I'm going to share with you today that uses neural motion control to help improve this character animation workflow. But before I talk about neural motion control for character animation, I want to give a bit of context around the history and science of neural motion control as a technique.
These are two of the most influential early papers, in my opinion, that lay the scientific foundation for neural motion control that are still being referenced and built on today. The first being phase function neural networks for character control by Holden et al followed by mode adaptive neural networks for quadruped motion control by Zhang, Starke, et al. And in general, the problem that these papers were looking at solving was to learn realistic motion from data and then use that learning to drive the motion of characters in gaming and interactive applications using real time user input.
Highly recommend reading these papers as well as the subsequent works by the same authors, if you want to learn more about the science. For now, I'll just give a high level overview about the technique that's relevant for this talk. The super high level. There's a neural network or in some cases a combination of neural networks and trained on animation data. I'll talk more specifically about data later, but in practice, this data is typically motion capture data, but it can also be keyframe animation data.
The models themselves are relatively small and take on the order of hours to train, in contrast to large models which can take weeks or months to train. Each input sample for the training process consists of the pose of the character at the current frame, along with some trajectory information, which is essentially where the character has been and where we want it to go.
And the output sample during training, which is what we want the network to learn, is essentially the same information, but at the next frame. So given the current state of the character, what should the state of the character be at the next frame.
Once it's trained, we can give the network some new high-level information about how we want the character to move. The network takes that information along with the current pose of the character and predicts the next pose of the character. And it does this every frame over and over such that we get natural fluid motion.
Now, in deploying a neural motion controller in a practical setting, one of the most important tasks becomes defining this information here, which is more or less the trajectory we want our character to follow. And it turns out that the way we're able to handle this trajectory in an offline character animation workflow actually lets us do some new things that are much harder, if not impossible, in a real time context that state of the art research has largely been done in.
What all this means for an artist is that they're able to work at a higher level, more as if they're directing the behavior of the character. So we can set just a handful of keyframes on a single controller to specify where we want this dog to run and when, essentially defining that trajectory. And the motion controller handles producing believable motion. So keep an eye out for complex gait changes as the dog needs to run faster, slow down, how it turns.
This is an animation that would take a professional animator around two weeks. And that's if they already had animated walk and run cycles to work with. And even then, they'd still have to manually match and blend them together. But to me, it's more than pure efficiency gains. The system is super resilient to change and we all know how inevitable and often change happens in the creative process.
Let's say we have what feels like a simple change. We just want the dog to reach its final mark a little faster. All we have to do is change the timing on that single keyframe at the end by sliding it back in the timeline and the performance will change to accommodate that. Whereas with the traditional approach you'd be looking at days or weeks of rework again.
Here's just a quick overlay to really highlight that change in performance. And if we want to do more than just walk and run, well, we can actually encode a lot of things into the neural network during training, like behaviors. So let's say the animator wants the dog to jump and sit. They can just tell the dog to jump and sit and the controller will transition between those behaviors naturally.
Here's another quick example where instead of keyframing our target directly, I just attached it to a motion path, and the controller takes care of the rest. I think, again, if we want to add a jump, we just add a few keyframes telling the dog when to jump, and the dog will jump.
So it's pretty easy to get a diverse set of natural looking performances with just a few key frames, which is great if you need to animate not just one character, but many characters. But it also makes it easier, much easier to iterate and explore to really get that perfect performance you're looking for, even if you're just working on a single character. And of course, this doesn't just work on quadrupeds. It also works just as well for bipeds. And we believe it will work on all sorts of characters and creatures provided training data exists.
So that's a general overview of the system. So next I want to talk about how even though the science isn't particularly new, by combining it with core capabilities we already have at Autodesk, we're able to create a novel artist centric production ready workflow for character animators. The system was built, obviously using Maya, but the core computation, the neural network and everything else related to the motion prediction is all happening in Autodesk Bifrost. That sends the motion data to Maya, where it's then passed to a rig using Human IK. What are the implications of that?
I'll start with Human IK and then touch briefly on Bifrost. So Human IK is an Autodesk capability that's refined across a few of our tools, including motionbuilder and Maya. It gives us a bunch of capabilities out of the box, but the two I'll focus on here are generalized ability and artist control. Starting with generalizability HIK was more or less created to give a standard representation of a character that's somewhat independent of how that character was built.
So one thing it's really good at is passing motion from one rig to another. So here we have two characters with completely different skeletons and proportions. In this case, the motion from the character on the right is being sent over to the character on the left using HIK. And the motion still looks pretty natural on our little Viking. HIK is accounting for differences in rig proportions, but it's essentially just copying and pasting the exact same motion. Same number of steps, same speed as our character on the right.
So you're probably notice that because the Viking's legs are so much shorter, he actually follows a different path. But what if we want the Viking to follow the same path as our character on the right? Well, then we can use our neural motion controller with to not just transfer the motion, but to actually transfer the behavior. So now they're running on an identical path and the same amount of time but our Viking really has to hoof it to get around the corner because his legs are so short.
What all this means is that any biped rig that's been characterized with HIK, like this handsome fellow, can use the biped neural motion controller, even if the skeleton of the rig doesn't match the skeleton of the training data set. And the character's behavior will change to accommodate the differences in rig proportions.
The one thing that's been super important to me personally in wading through all of this generative AI hype is to give artists final control over their art. We don't have to force them into just accepting what a neural network gives them. With capabilities like HIK we can pass the control back to them so that they have final say over every frame.
Let's take my teaser video from the beginning. This is the raw motion that's being generated by the neural motion controller to follow the path of the circle that's been keyframed. But let's say I want to tweak a few things. A standard feature of HIK is the ability to bake motion that's driving a character from an external source down to the Control Rig. So this is more or less the standard set of rig controllers that animators are used to working with, just like the one in that opening time lapse I showed you.
So with the motion baked to these controllers, we can then use our standard suite of Maya animation capabilities to tweak anything on any frame. So here are just a few animation layers I added to tweak some feet placement. And then toward the end, I wanted to change the performance a bit. So the character is facing more towards camera. Added a bit of subtle facial animation. All to get this final performance.
All right, on the Bifrost. So where HIK gives us generalizability and artist control, Bifrost gives us modularity and portability. For those unfamiliar with Bifrost, it's a graphical programming environment that's accessible from Maya, and it was designed for creating procedural effects for film, animation and games, things like smoke, fire, water simulations, destruction. More recently, it's been used for things like procedural geometry and rigging, which to me made it the ideal place to implement the motion generation system.
So as I mentioned earlier, 100% of the motion prediction and all the related logic happens in Bifrost using standard Bifrost nodes. From the standard nodes, I built up a set of deep learning compounds, and from those I constructed the actual neural network architecture. So one implication here is that because it doesn't rely on any specific deep learning framework like TensorFlow or PyTorch, anywhere Bifrost can go, this capability can go with it.
And again, Bifrost was built for modularity and proceduralism, which we can leverage here. So if we want to drive the motion not from Maya but from another Bifrost component, we can easily do that. In this example, this was a basic particle simulation done in Bifrost. And then a bunch of instances of our neural bunch control dogs were assigned to specific particles, and then each dog essentially chases its assigned particle. Giving us a pretty good crowd simulation that we can direct and scale with each character having unique natural motion.
All right. So that's more or less the system. And as with any AI or machine learning project, one of the most important aspects to talk about is data, both how much data is required and where it comes from. I mentioned at the beginning, these are relatively small models. Each one is specialized for a particular character. So one of the biggest benefits of that is that they can work with a relatively small amount of data.
The quadruped in the demo I showed earlier, for instance, was trained on just a half hour data, and the biped was trained on even less. Obviously the more things you want your character to do, the more data you'll need. But we can generally get a pretty capable model with an hour of data or less. But perhaps more importantly than how much data you need is where it comes from. That's been a really hot topic in the past year or so specifically around large generative models. And again, more good news here. The data required is relatively easy and ethical to acquire.
When we started to think about bringing a system like this to our customers, it became obvious that we needed to give them some out of the box controllers. It was important to us that the data we use for that was acquired in a way we were comfortable with. So we scheduled a two day motion capture shoot at Animatrix Film Design in Vancouver, where we had these two wonderful motion actors run around for hours.
Not only did we have these two fantastic humans who were great. We also had two of the sweetest, most well trained dogs on the planet. I seriously don't think I could have gotten my own children to do half the things they did on command. They were just incredible, incredible dogs. It was a good day. Good day.
Anyways, in total, we captured about an hour of quadruped data and close to four hours of biped data. Now our research group and product group have been working closely together to figure out the best way we can bring a system like this to our customers, and from this data set we captured, we hope to have three motion controllers that people will be able to use out of the box for some general animations.
But our media and entertainment customers rarely settle for out of the box, and they often have very specific, specialized and stylized data when it comes to animation. So longer term, one of our aspirations is to enable them to bring their own data, be it mocap or keyframe animation, by generalizing the training and deployment process.
And to wrap things up here, I'll end back where we started. Animation is hard. This workflow is just one approach to improving one particular part of character animation. Still have a lot of work to do, but hopefully I've at least convinced you that we can use AI in an artist centric way to make animators jobs a bit easier. And that's it. Thanks for listening.