AU Class
AU Class
class - AU

Getting Started with Deep Learning Using Bifrost for Maya

Partager ce cours
Rechercher des mots-clés dans les vidéos, les diapositives des présentations et les supports de cours :

Description

Bifrost is a powerful graphical programming environment for creating procedural visual effects simulations, geometry, rigging, and animation. In this session, we'll showcase new, modular, deep-learning workflows in Maya software using Bifrost. We will demonstrate how to use Bifrost for procedurally generating training data for AI systems, as well as how to deploy trained AI models in Bifrost using new Bifrost nodes designed for deep learning. The session will include practical examples of industry problems that are suited to a media and entertainment audience, such as using AI for complex rigging kinematic solvers.

Principaux enseignements

  • Learn about fundamental concepts of machine learning and how they apply to challenges in media and entertainment workflows.
  • Learn how to use Bifrost for Maya to procedurally generate training data to train a deep neural network to solve a concrete task.
  • Learn about deploying a trained neural network in Maya using Bifrost.

Intervenant

Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
  • Chapters
  • descriptions off, selected
  • subtitles off, selected
      Transcript

      EVAN ATHERTON: Hi, everyone. My name is Evan Atherton. I'm a senior principal research scientist with Autodesk Research. And this presentation is going to be an introduction to deep learning using Bifrost and Autodesk Maya. If you're new to deep learning or to Bifrost, or both, that's totally OK.

      My main goal here is to demystify deep learning a bit and to give you a sense of the types of problems and content creation workflows that are well suited for it, and to give you an idea of how to approach building an end-to-end pipeline in Maya using a tangible hands-on example. Before I start, as I'm an Autodesk employee, here's our obligatory safe harbor statement, which basically says I might make some forward-looking statements during my talk. But those statements aren't intended as promises of a future product, service, or feature.

      All right, so a quick overview of what I'm going to cover today. I'll start with some motivation for why we even want or need deep learning in content creation workflows. I'll introduce just a few deep learning concepts that are going to be relevant for the rest of the talk. Then I'll do a deep dive to show a really practical example of an end-to-end deep learning pipeline using Maya and Bifrost. Then I'll finish with some closing thoughts.

      So to begin, I want to talk briefly about why deep learning is useful in media and entertainment workflows. You might have heard someone refer to neural networks as universal function approximators. Functions generally take some input or group of inputs, x. And then they map that input to an output or a group of outputs, y. And they do that with some equation that someone's figured out represented by this arrow here.

      Well, neural networks do the exact same thing, except instead of a bunch of math or rules that someone had to figure out, they're able to take a bunch of examples of inputs and outputs and learn the mapping between them. So that if you give a trained neural network a new input, it's able to approximate an output based on the patterns it's learned from the data. It turns out that a lot of things in content creation workflows are functions.

      So even if you as a user only have to hit a button in Maya, at the end of the day, there's some math or algorithm in the background taking its input from the scene and returning its output back to the scene. So this list is obviously not exhaustive, but just to give you an idea of some of the functions you might be using in your workflows and what parts of the workflows they might pop up in.

      But if we already have a bunch of functions that do the stuff we need, then why do we need deep learning? In my opinion, deep learning really excels in two places in content creation workflows. One is if you do, in fact, have a function that does something you want, but maybe it's really computationally expensive. So in content creation workflows, it's really important for the artist to be able to see the results of their work at interactive or near interactive rates when possible.

      So if you have a process that's computationally expensive and slows down their playback frame rate, it introduces a lot of friction into their workflow. But in some of those cases, you can actually learn an approximation of that function with a neural network that gets you pretty close to the real function, but can be orders of magnitude quicker to compute. A really good example of this is the ML deformer that was just released in Maya 2025, which takes an entire stack of deformers on a character mesh, which can be really slow to compute, can really bog down the scene.

      And it replaces that computationally expensive deformation calculation with a neural network that's able to approximate those solvers. But the neural network is able to compute in a fraction of the time. So instead of the deformation solvers causing the scene to run at 17 frames per second, in this example, the ML deformer computes fast enough for the scene to run closer to 60 frames per second.

      And the second place I think deep learning excels in content creation workflows is when we don't actually have the function. So maybe it's time consuming to derive or maybe it takes special domain knowledge you might not have. But if we have a way to generate paired data of inputs and outputs that doesn't actually rely on knowing the real function, we can use deep learning to learn an entirely new function.

      So I won't elaborate too much further on this one just yet, since the hands-on example I'll be using for the rest of the presentation falls pretty squarely into that category. But before I jump into that example, I want to give just a brief intro to deep learning. Now, there's a lot of nomenclature in deep learning. A lot of it's used interchangeably, sometimes correctly, sometimes incorrectly. There's AI, machine learning, deep learning, neural networks, deep neural networks, transformers, et cetera.

      For the rest of this presentation, we're going to be talking about one single type of network. And that's this. You've probably seen a diagram very similar to this at some point. This is a multi-layer perceptron. It's a fully connected feed-forward network with an input layer, some hidden layers, and an output layer. These are really the bedrock of modern machine learning. And they're super capable of all sorts of tasks.

      When I was putting this presentation together, I debated how much of the math I should show to explain what this type of neural network does. And completely separately, I saw this quote from the creator of Keras, which is one of the most popular Python libraries for machine learning, so definitely knows what he's talking about. "Math in Deep Learning papers is usually worthless and was placed there purely as a sign of seriousness."

      I tend to agree with this. So this inspired me to try and do this intro with as few equations as possible. So I'm going to try to do this with a single equation, which might look intimidating. But I'll break it down. So this is an equation that represents the math that's happening in each layer of a neural network. The goal is to compute this equation at each layer until we reach the final layer.

      Then this equation represents the actual output of the neural network, which is what we're really after. So to find the output of any given layer, we first do a weighted sum of the previous layer's output. We add what's called a bias term. And then we apply an activation function to all that.

      There are a lot of different activation functions. But they're all pretty simple math. And their purpose is just to add non-linearity into the network so they can learn more complex patterns. So at the end of the day, for each layer, we have one matrix multiplication, one element-wise addition, and one element-wise transformation. Perhaps the most important part here is that what the network actually learns during this process are the weight matrices and bias vectors for each layer.

      And then once we have those, we can pass in some new input, use those learned values to compute the output of each layer using that same equation we just walked through until we get the final output we're looking for. That's all I'm going to cover today on the theory. So next, I want to do a bit of a deep dive with a practical example to make this more tangible.

      And the example we're going to look at is creating a neural inverse kinematic solver in Maya using Bifrost for a mechanical rig like this one here. Even though this example is specific to rigging, my main goal is to build an intuition for identifying problems suitable for deep learning, then walk through how to actually build a deep learning pipeline to solve that problem. Our pipeline is going to consist of three parts, procedural data generation in Maya using Bifrost, model training in PyTorch, and then model deployment and iteration back in Maya with Bifrost.

      And again, I don't want to be too focused on rigging in particular. But I chose to use inverse kinematics as our example because it's actually a relevant problem for artists and technical directors. And it happens to be relatively straightforward from a deep learning perspective. So it had this nice balance of being approachable while still being useful.

      If you're not super familiar with rigging, in a nutshell, it's the process of taking a character's mesh or group of meshes, adding an internal skeleton, which as you'd imagine, is a series of virtual joints and bones, and then adding constraints to that skeleton that describe the behavior of the rig given certain artist input. So these constraints can be things like kinematic solvers that define how the skeleton moves or things like deformation solvers, which define how mesh deforms given the movement of the skeleton.

      So when we talk about kinematics, we're referring to either forward or inverse kinematics. Forward kinematics is when the artist sets the rotation of a joint explicitly. And all joints in the skeletal hierarchy below that are similarly transformed. And inverse kinematics is when you have a target you want the end of your kinematic chain to reach. And you mathematically compute all the necessary joint rotations you need to reach that target.

      Forward kinematics is super easy to set up. But it's the inverse kinematics that gives the artist the type of control they need to naturally pose and animate their characters. But while animation packages like Maya have out-of-the-box IK solvers that you can apply to skeletons, it usually takes a skilled artist to string the right set of solvers together in the right way to get the rig to behave properly.

      And depending on the type of rig, that can take sometimes days to do. For the rest of this example, we're going to be looking at this three-axis mechanical rig here. Our goal is to have an IK solver that will take a target position in 3D space and give us the three joint angles we need for the end of the rig to reach that target. And if you recall, the two categories of problems that I mentioned I think deep learning really excels at, this is going to fall into that second category.

      We don't actually have the function that will do this for us. We could, again, string together some solvers we do have or write a new solver analytically specifically for this rig. But that can be time consuming and could require a lot of specialized domain knowledge maybe we don't have. So what we're going to do instead is exploit the fact that it's orders of magnitude easier to set this up as a forward kinematic system. Then we can use the forward kinematic system to generate as many paired training samples as we want by setting each joint angle to a random value, and then measuring where the end of the rig ends up.

      This then becomes our paired sample we can use for training our network. And because forward kinematics is computationally trivial to compute, we can generate thousands or even millions of training samples procedurally, effectively, instantly. And then once our model is trained, we can take a new target position from the scene set by an artist, run that through our network, and then set the predicted joint values back on our rig.

      Now, we're going to get started by setting up our FK data generator in Maya with Bifrost. I plan on releasing a full step-by-step tutorial on this whole thing along with the final scene files. So I'm not going to cover every little detail. But I hope this will at least give you a good sense of the workflow. And then you can visit the tutorial if you'd like when it's out.

      We finally make our way over to Maya. What I have set up is the geometry of our rig. This is just one mesh per joint. And we also have five locators that we're going to use to specify the pivot point of each joint. I mentioned this was a three-axis rig earlier. But the reason we have five pivots is we're going to use an additional locator for the base of the rig and then one for the end of the last joint. And that's what we'll use to measure our target XYZ location for each sample.

      So I have three joints that are going to rotate. And they're each going to rotate around a single axis. The first joint is going to rotate around the x-axis. The second joint is going to rotate around the z-axis. And then the third joint is also going to rotate around the z-axis.

      So the first thing we need to do is build our data generator. Or the first thing we're going to do to build our data generator is to set up our FK solver in Bifrost using these joint pivot locators. Real quick, if you're not familiar with Bifrost, it's a graphical programming environment that's accessible from Maya. And it was designed for creating procedural effects for film, animation, and games, things like smoke, fire, water simulations, destruction. And more recently, it's also being used for things like procedural geometry and rigging.

      I'm not going to cover the basics of Bifrost. But as it uses a visual programming paradigm, I think even if you haven't used it before, you'll be able to follow along quite well. One of my main conceits here is that setting up an FK solver is orders of magnitude easier than building an IK solver the old fashioned way. So I wanted to walk through the Bifrost compound that does the FK solve. And this is going to be our main tool for both the data generation and the model deployment after training.

      What I have here is a compound I created that takes in our joint meshes from Maya, as well as each joint's pivot matrix, which were represented by the locators I just showed in the Maya scene. And then we have our three user inputs, theta_1, theta_2, theta_3, which are going to be the rotation values we want to set on each joint. And what's at the heart of the FK solve is just a single node that runs down the joint chain and applies the rotation we input for each joint to the rest of the chain.

      So if we hook up our geometry and pivots, our rig is now 100% in Bifrost. And we can manipulate it with the controls in our FK solver compound. This is more or less identical to if we grouped each joint under its parent in the Maya outliner, then use the Maya rotate tool to rotate each joint. But now that our rig is entirely in Bifrost, we can do all sorts of procedural stuff on it, including data generation.

      Our data generator is also quite simple. It takes the same pivot matrices as our FK solver. It doesn't use the meshes because those are purely for visualizing the rig in Maya. It has an input where we can set the number of samples we want. And it has a seed for randomization on our input angles. And if we take a look inside, we have one compound that generates three arrays of random angles between a min and max rotation value that we can set.

      And then another node takes the randomly generated angles for each sample, runs it through the same FK solver we just walked through, and then just measures that XYZ location on the last joint. The other thing this compound does is actually format the training samples as arrays of paired data that we're going to export from Bifrost and load into PyTorch during training.

      The machine learning terms, those inputs and outputs arrays, are called feature vectors. If we take a look at those, the column on the left are our input features. And the column on the right are our matching output features. So for any given row, we can see three joint angles and then the resulting XYZ location of the end of our rig.

      And this is where Bifrost really starts to shine here. So I built a scope that lets us visualize each sample with a dot. That dot represents the end of our rig at that sample. But what's most important here is we're actually generating these samples effectively in real time. So if we change the number of samples we want, our features update. We can even change information about the rig, like the rotation limits for each joint. And our features update.

      And we can take this even further. If the underlying geometry of our rig changes, we can just move our joint pivots to match. And our features update. Hopefully, you can start to see how all this work we're doing isn't going to be just for this specific rig. We can throw a brand new rig with completely different topology in here. And everything we've done so far would be identical because it's all procedural.

      So now that we have some samples, the next thing we need to do is to get them out of Maya to train our neural network in PyTorch. I'm not going to spend too much time on how to get the data out of Maya because I had to use a workaround. But the Bifrost team has been working on a way to do this directly from the graph. So hopefully we'll have some good news to share there.

      In any case, what I ended up doing in the meantime was write a Python script to query the input and output data from the graph. Then I converted that to NumPy arrays then saved them out as .npy files that can be loaded during training. If you're not familiar with NumPy, it's a super popular Python library for working with arrays. It's used really heavily in machine learning.

      In any case, now that we have our data saved out into NumPy files, one for input, one for output, now it's time to train. There's no shortage of PyTorch tutorials online, many of which would do a much better job than me if you're just starting from scratch. So I'm just going to quickly walk through some of the blocks in the training code to highlight a few things. But I won't spend too much time here on the training.

      And again, I'm going to release the code as well at some point. So you don't have to worry about catching it all. In any case, this is the main block here of the training code. And it's mostly boilerplate PyTorch stuff. The first part I'll call out is this block. These are our hyperparameters of the training process, which are the main levers we have to increase performance during training. There are entire libraries and schools of thought dedicated to optimizing these hyperparameters.

      But in general, you pick some default ones and not end up changing them too much. The second block I'll highlight here is this one. This loads our data set from our saved NumPy files, instantiates our model, the loss function, and the optimization technique. The loss function is essentially just an equation we're going to use during training for the network to know how well it's doing.

      Generally, most people start with MSE loss, which stands for Mean Squared Error. All it does is take the difference of the predicted output from the network and the expected output from the network, subtracts those, squares them, and then adds them across the samples or averages them across the samples rather. And then the optimizer uses that information to adjust the network weights to minimize that loss.

      The last thing I want to point out here is the neural network definition itself. This is a four-layer neural network that uses what's called ReLU activation functions. That stands for Rectified Linear Unit. Again, pretty boilerplate PyTorch stuff. The amount of layers here is also something you can kind of just pick a place to start, see how it does, and try different variations, and see if you get the model to perform better. A lot of this stuff is just trial and iteration.

      But in any case, the top block here is more or less defining the structure of the network. And then the bottom block is basically the equation we walked through at the beginning. When we run the training script, the network learns the weights and biases for each layer. And then we store those arrays as NumPy files again that we can import back into Bifrost to run our model.

      This is workaround number two, where I had to use another Python script to read the NumPy files and drop the data into the Bifrost graph. Like with the data export, the Bifrost team was working hard on getting us a way to read NumPy files directly into the graph. So that will really smooth out this part of the workflow.

      With the weights and biases in the graph, the last thing we have to do is reconstruct our network. And we can actually do this by recreating the neural network layer map out of native Bifrost nodes. One of the biggest benefits of doing it this way is that it doesn't rely on any third party library, like PyTorch, to actually run the model in Maya. So if you want to send this rig out to a bunch of artists, they can just open it like any other Maya scene without having to install a bunch of extra dependencies, which is super nice.

      So here we have our four layers with our activation functions. And if we go into that linear layer compound, you'll hopefully recognize this as part of our equation from the beginning that just does the weighted sum. Then we add the bias vector, and then run it through the activation function. Then we wrap that whole network in a compound, take the IK target position from the scene, run it through our neural IK solver, which predicts the joint angles, then pass the joint angles through the same FK solver compound we made for the data generation to get the rig geometry visualized in Maya.

      To wrap up here, I want to highlight that even though that might have seemed like a lot of work to do for a single rig, like I mentioned before, now that we have this procedural pipeline set up, all we have to do is feed it a new rig. And we can run it through this whole pipeline in minutes, even if we have a rig that has a prismatic joint, which translates instead of rotating or a rig that has more than three joints that maybe has redundant solutions.

      And lastly, this whole thing was done with a set of general deep learning compounds that you can use for all sorts of deep learning projects, not just the IK solver I showed here. So in addition to getting a full tutorial out for this example, we're working on releasing these compounds so you can use them in your own pipelines. So keep an eye out for that. And that's it. Thanks for listening.