Descripción
Aprendizajes clave
- Learn how to use Azure AI and OpenAI models for natural language data searches.
- Learn about applying AI to analyze and categorize CAD data for better data insight.
- Learn how far AI can go and at which cost.
Orador
- MMMarco MirandolaMarco Mirandola is the CEO and founder of coolOrange, a company specializing in integrating Autodesk products with enterprise systems. With a strong background in software development and a passion for empowering users, Marco has led coolOrange to develop innovative tools that enhance data management and automate workflows across the manufacturing industry. Under his leadership, coolOrange's solutions have become essential for companies looking to bridge the gap between on-premises and cloud environments. Marco is a hands-on leader who continues to explore new technologies, ensuring that coolOrange remains at the forefront of industry advancements. His commitment to customer success and practical solutions has earned him recognition as a top-rated speaker at Autodesk University multiple times. Attendees of his sessions can expect valuable insights and actionable strategies for optimizing their use of Autodesk tools.
MARCO MIRANDOLA: Welcome to this class. Today, we're going to talk about redefining CAD data interaction by using AI-driven search and visualization.
So here's a situation. It's hard to act and interact with your domain-specific data. I mean, you have hundreds or thousands of CAD files with properties, maybe because you're using Vault or PLM or BIM 360. And the only way to interact with this data is through a search, and it's a term-based search. It's hyper-sensible, so you don't have to type typos-- or you can't have typos, and the result is basically just a DEM list. So the question is, Can this situation be improved? Well, let's see whether, with an AI approach, we can interact and transform the domain-specific data in a better way, and so on.
Now, before we move on, I'd like to give you a couple of things about this class. So do not expect that what you're going to see is a finished product that you can download and try out yourself. It's examples, prototypes, concepts that we're working on to try to see how AI can be used in this domain where we are and solve some problems. However, you will see during this session or you will get over the session an overview about tools, about technology, and I'll also try to highlight the benefits and point you out the limitations that you may encounter with today's technology and so on.
OK. My name is Marco. I'm here at coolOrange. coolOrange is an international software company focused on helping Autodesk customers, connecting their design data and processes with the rest of the company. So it could be connecting Vault with the ERP system or with PLM or maybe doing some automation that goes down into production floor, including cloud services, this kind of thing. So if you have issues in that direction, probably we have a couple of ideas for you.
Anyway, the things that we're going to see today or the things that I've prepared for you today are basically these five examples that I'd like to show you. One is about clustering. The other one is about editing data through AI. The other one is about doing actions through AI, through natural language with some software. One is specific about AI search, and another one is about inspecting or identifying errors. So it's all about AI.
And the big question is, What the heck is AI? So in order to answer that question, we first have to understand what is the non-AI, so the standard rule-based algorithm, meaning we have an input and we have an algorithm that processes that input, and then we have a result. Now, the typical algorithm in this case is an If This Then That. So you have rules. You have checks. It's the classic way of coding software, as you probably know, or you at least heard about it.
The AI approach is slightly different. You still have an input and you still have a result, but the way to go from the input to the result is not through If This, Then That, but it's through a neural network-based algorithm. So the question is, now, What is a neural network?
OK, let's assume we want to solve the problem of identifying handwritten numbers. Piece of paper, you write down a number, and then you want to have a piece of software that basically recognizes what you have written down. In this case, we have written down a three. So what we can do is we can put a matrix or a raster on it, for instance, in this case, 5 by 5 pixels. And if we pixelize this image, here, then we see that each pixel becomes darker or less dark. The whole thing becomes blurry, but now we have a matrix which tells us exactly which pixel has which intensity, and maybe the values goes from zero to one.
So the question is, now, from this blurry image, How can a software then understand that we actually mean the number three? Well, let's break it down again a little bit, and let's take our pixels, our blurry image, and put this pixel one on top of the other, and this becomes our input. So every pixel will have a value between zero and one, depending on how dark this pixel is. All right? This is our input.
The output is going to be 10 numbers between zero and nine. So, in this specific case, we expect that whatever happens here in between, the light bulb or the LED or the output number three is going to light up quite darkly. Now, what about the others? Well, ideally, the others should be not light up at all, but because this image so blurry, could be also a nine, a six, an eight, or a zero. It is OK if the other outputs also light a little bit, as long our number three is actually having the most important value so that we know exactly, OK, it's the number three, even though it could be another number.
So how to solve this problem? Well, we're going to place an algorithm, right, If This Then That. Good luck with that. So this algorithm would have to take every single input here and then apply some rules to actually then drive this output here on the right-hand side. The problem is that the moment this number three changes a little bit the place, to the left to the right, this pixel changes, the input values are changing. Or even if the number is written in a slightly different way, again, the input is changing, and we still expect the number three.
And even if you have been able to generate this algorithm for the number three, what about the other numbers? So we need, basically, then, other algorithms, then again starts to interpret this input and try to basically find the right solution for the right output. So this is not going to work. It's too complicated, it's too time-intensive and so on.
There is another approach. This is where AI comes into the game. So, instead of having an If This Then That algorithm, what we're going to have are neurons-- so all these boxes here are basically neurons-- and we're going to have several layers of these neurons. In this case, we have three layers. And because they're not visible to the user, they're called hidden layers. Visible is the input, visible is the output, but the hidden layers are the black box where the magic is happening.
So what is a neuron? Well, a neuron, in this case, it's a little tiny box. If you're a developer, it could be a instance of an object. If you're not a developer, think about it as a little thing, a piece of code that actually is capable to compute numbers or input or whatever. It takes something, and it generates something out of it. And we have quite some of those, right? How many layers we have, how many neurons we have, it's up to us. It's our decision.
Now, the next thing we have to do is that we have to wire all the inputs here, all the little pixels here on the left-hand side, each pixel has to be wired with every single little neuron here of the first layer. Now, every neuron of the first layer has to be also then linked with the neuron of the second layer. And the second layer, again, every neuron with the third layer, third layer with the output.
Now, in this case, I haven't drawn all the lines, first, because, in PowerPoint, it would took quite a while but, second, because the image would actually become just black. It would be just full of lines. But just to make sure you got it, so every little box here has to be connected with every box here on the other side, layer by layer by layer.
All right, the next thing we do is that we're going to add a bias, a number, basically, to each neuron. So each neuron here, each box here will have a specific number different from the other one. Could be a 3, could be a 7, could be a 40, 42, a 72, whatever you want.
All right, next thing is that we're going also to apply a number to each connection. And in this case, this number is called weight, yeah? So the number on the neuron is called bias, and the number on the connection is called weight. Now, the weight, usually, it's a number between zero and one, and it's a multiplier.
So let's take an example. Let's say that we want to take this pixel here, and it has the number 1, and we want to propagate this number down through our network to our output. Again, this 1, here, is going to be combined with every neuron here. And by doing so, it's going to be multiplied by a given weight.
You know, let's say 0.2, so 1 by 0.2 is becoming 0.2. And when it reaches, for instance, this neuron here, it gets added by the bias, I don't know, 7 or whatever it is, all right? Then he navigates again to the next level, to each neuron, and it's multiplied again by the weight, and then again to the next layer up to the end, to the final result.
Now, if we are lucky, the result will be that the output layer will have, here, numbers between zero and one. But chances are that, at the beginning, we even have numbers that are above one. And that is OK for the beginning, but it's not OK for having this neural network running, right?
So this is where machine learning comes into play. We want to have this neural network having numbers as biases and as weights that, when we throw in here this blurry three, we want to have this three, here, to the right light up. And, again, it's OK if the other ones are also lighting up a little bit, as long the number three has the strongest value.
So that means that we throw in the three, see what's coming out. The first outcome will be crap, not useful, not useful. We have to compute the delta between what did come out and what we had expected, and then we have to propagate back this delta to our network. So changing the weights and the biases, that's called backpropagation.
Now, this has to happen a lot of time, for every single possible variation. And we see, here, an example of handwritten numbers. You see they're written in different ways, in different styles, and we expect still that this neural network is capable of recognizing each number correctly.
Now, the good news is that, for each of these images, we have already the correct answer. And, therefore, we can tell to the machine learning algorithm to do this work on his own and to try out every number, see the failure go back, propagate the error-- or, actually, the correction of the error, try again, and see the outcome. And this basically goes on and on and on and on until this network achieved a decent output. And what it means, it's up to us to say, OK, this is good enough, and then we have basically a stable neuronal network.
All right, from that moment onwards, the network is not going to learn anything else. So regardless of what we throw in, it will basically just do his work and provide me an output, meaning if we're going to give him an apple as an image and translate that into pixel, chances are that he's going to tell us, well, this is probably number zero, or something like this. You know?
OK, so this is not something that I have invented. This MNIST data set, if you Google, it's actually almost 30 years old data set. It has been used back then to train the first models and play with the things. So this neuronal network thing, it's 50 or 60 years old. It's not new technologies, but the new technology is that CPUs and algorithms are better, today, to handle this.
All right, so AI can be categorized in different areas. The way that I did it is here by categorizing it by Audio, which could be speech, so text-to-speech, speech-to-text, could be even music recognition, music generation, whatever. It could be Vision, so images and meanwhile even videos, so extracting information from an image, recognizing faces, object, and so on, or even generating images and videos, which is the most recent advancement in this area.
And then we have Language. Well, the classic example is ChatGPT. It's a language model, which gives me the ability to interact with the computer in a natural way.
Now, all these AI have advanced over the last years a lot. And they meanwhile understand the sentiment, the intent, and the meaning of what we're trying to accomplish with these AIs.
OK. Let's start with a couple of, well, with the example-- this is the first one. Image clustering, so bring order to the chaos.
Here is a situation. We have Vault, in this case, that contains my data. And, again, I have the ability to turn on, here, the thumbnail, to have it a little bit more visually. I can even switch to a complete just visual thumbnail way of displaying the data.
But I still do not able to really understand, from a holistic standpoint, what's in there. Are there patterns and trends and whatever and so forth? So, ideally, I would like to have a user interface more like this, where my thumbnails or my objects are basically clustered, grouped, and organized in a big canvas.
So let's have a look. And, for instance, in this case, we're going to see these are basically my thumbnails from my little Vault. And, in this case, we see that, for instance, these thumbnails here are grouped here in this left corner. But then there is another group of thumbnails, here, that are also quite similar to each other, but they're different from the other one, and this is why they are distant from each other. So the AI and the algorithm that we have used is basically capable to extract information out of the images and then basically translate that information into how close or how far away they are from each other.
Let's give a look. Here we have a little video. Again, we see the Vault user interface. And regardless how far I'm going to scroll that, not much more that I can get out of it.
So let's jump into our little application. And, for instance, here, we see that in real time, the thumbnails are getting extracted, and they are now placed in this bigger canvas where I can zoom in, see that basically these particular pieces are quite close together, but they are far away from other pieces. I can maybe navigate in another area, see here another group of elements that are quite similar, even though, if you look right, they're not identical.
And to some extent you would say, well, this is wrong. Yes, it is, and we will talk about that, because AI is not precise. It's an approximation, but we will talk about that later.
Anyway, so you get the idea. You can navigate. You can zoom in, and so on. And just because it was fun, we thought that maybe a 3D version of this graph could be fun, so we basically just gave a third dimension, and you can now rotate it. It looks cool, but I think it's probably pointless and useless. You cannot really understand what's going on here, but it was just fun doing it.
Anyway, one thing that we did a couple of years ago, together with a customer, we took his database, which is about 500,000 files, and we generated this map. So this looks like a broccoli or, if you had enough fantasy, maybe it looks like the head of Albert Einstein. Anyway, but if we roll the video, you will see that effectively this is, again, a heat map or a cluster of thumbnails. Right? But, in this case, we're talking about a half a million or something like this, just to make the point that this is not only working with a few thousands. It can also work with a large data set.
All right, how does it work? Well, first thing is that we use the API for extracting the data from, in this case, Vault. But, again, it could be anything else. Then we had to present this data to our AI. In this case, we used Image Feature Extraction API from Google, from TensorFlow effectively. It's called EfficientNet. And this API or this AI required basically the images to be in a certain format, dimensions, and so on and so forth.
And what this AI is doing, it's generating a multi-dimensional vector. So it's not trying to understand what it is. It doesn't understand that this is a bolt, this is a nut. It's actually extracting things like edges, textures, shape, color, gradients, orientation, pattern, so it's trying to describe the image rather than trying to understand the image.
What we get is then an array of about 1,000 vectors, which is way too much because we just need two axes, you know, x and y, to position our images. So we have to reduce the dimensions via an algorithm. This is not AI. This is just a regular algorithm. And then, finally, we get a 2D map, if you want, that we can then throw into an application like this. This is a .NET Core web application that we just developed to represent the data.
So a couple of comments about this. So we use this pretrained EfficientNet model, and it actually did a decent job. It was not meant for scanning or extracting CAD data in this sense. You know? It actually works better with cats and bananas and whatever, but it was good enough for-- it did a decent job without doing any additional training. It's just out of the box.
The other thing is that, obviously, by filtering and removing unnecessary data, then we get away a little bit of noise, and then again the response, the result is better, and so on. The other thing is that by adding properties such as description and weight and part number and whatever, then we could actually influence the result of the model and get better results. Anyway, for this exercise, it was good enough.
Let's go on with the next one, which is the BOM interaction, so try to understand and transform the Bill of Material. Again, just for convenience, here, we use the Vault. The Vault, in this case has, an item BOM. So we're going to look into this bill of material. And I would like to ask question like, give me the components, the top 10 components that have the biggest quantity, just to understand maybe someone made a mistake in is ordering 1,000 pieces where it should actually just order 100 or whatever.
Another question could be transform the BOM, so retain the structure or maybe even flatten it out, and create me a BOM with a different meaning, where some positions are filtered out or whatever. So we did this by creating this little proof of concept, if you want. It basically generates different outputs.
And if we give a quick look, it's having the original BOM here to the left top side. It has the modified, the BOM generated by the AI, here, on the right side. It has a little reply from the language model and then basically our question.
Let's get into the video and see how it works. So we see, here, we have our bill of material, which has a decent size, not too big, but even not too small. Let's jump into our tool. Here, again, we see the bill of material. And for those who do not believe that this is real, again, it's the car seat assemblies, so it's the same thing.
Anyway, and now we have a couple of prepared questions. I don't want you see me typing, so we just copy paste, here.
The first question is, Are there any BOM rows where the position is not set? So we click. We wait a few seconds, and the language model is now coming back, and it's telling me, yes, there are some BOM rows where the position is not set, and here is the list. And he presented me the list, here, as a JSON. Well, JSON is a way to structure data. It's not really perfect.
Maybe let's try something else and say, OK, can we make a markdown list? Markdown is a way to format data, bold and italic and whatever. So let's see, here, what happened. And, in this case, we see, now, the list coming back with some information. You know? Looks already decent.
Let's try one more thing and say, OK, let's do it as a markdown list but also with columns. So I would like to have something more like a table, and maybe that I can then Copy-Paste and copy into my Excel spreadsheet as an example. Right, let's see what happened. And there you go. So, here, we have, now, a table showing me the results in a better way.
Now, we have a couple of other questions. So, for instance, Are there not released parts? And, if so, how many? So let's try that.
And, in this case, release is fine. But, in this case, the language model understands that probably the work-in-progress states is one of those examples, and it tells me we have 38 of nonreleased parts. OK, that's interesting.
And give me, as I mentioned before, a list of the top 10 components that have the biggest number. And, again, here is the list. Well, again, it's JSON. That's the way that he turns it back, and that's OK.
Let's fast forward a little bit, and the one is about formatting and, again, as a list. But the next one is interesting because we're going to ask him even to sort it by the part number. So the point is that it's not just about retrieving data and showing data in a certain format, but interact with the data and say, hey, sort it by this or add this property or give me more information, and so on.
Another one could be to say, OK, let's create a flat list, so a flat BOM, by excluding, by removing all the assemblies. Same thing. Goes in, reads the bill of material, does the work. And, again, we get JSON result, but we see it's basically a flat list of just components.
And the last example here is create a BOM by retaining the structure but removing, for instance, positions that are not set. And here, again, we then have a result. There we go. So we see, now, it still retains the structure, but now basically all positioned to have no value, have been removed.
All right, so how does it work? Again, similar as before, we use the API. We retrieve the data from Vault. We send the data to our AI.
In this case, we're using the Azure OpenAI language model. I actually use the mini version, so it's not a large language model. It's probably a small or a mid-sized language model or whatever. And we basically get the response and we have to interpret that, interpret that response and then generate an output. And the output is then going to be visualized in our .NET core application, and so on.
A couple of comments here, so one is that the whole BOM must be transferred every time to the language model. So with the request from the user, we also send the data. This is something that could be maybe improved by having a chat history and so on. That's the way that the models are working. So they don't have a history in that sense. You always have to transfer the whole set of data yourself. That's OK.
The other thing is that-- and that's a little issue if you want. The model provides, every time, different result. If I ask five times the same question, I would probably get five different answers, and the answer is not structured. So, ideally, a structured and persistent result would be better from a coding standpoint, as a developer, but that's the way the language model works. It's like asking a person. The person will probably also give you different answer on the same question.
The other thing is that in some times, the model is not providing a full, complete response. So if I'm asking, for instance, to provide me back 30 positions, it could be that I only get back 17 positions, and he thinks that's good enough. So you have to deal with that and then figure out how to overcome those problems.
Let's move ahead. And if, in the example before, we basically took the whole bill of material and hand it over to the AI and asked for results, in this case, I would like to have the AI helping me to dealing and searching with Vault, but I cannot transfer the whole Vault database every time to my model. So the problem I'm trying to solve here is to have more language-based search type of user interface, so not something which is driven by specific criteria and where typos is a problem, and I just get a plain list. And I'm going to use the Cognitive Services.
So here is how the whole system works. We have the data. We extract the data and push it into a Azure AI search. So we have to push it into a AI database, if you want, or a search engine from Azure. And this one already gives a great deal of capabilities for searching. We will see it in a minute. But on top of it, I wanted also to add the OpenAI to go even beyond that and then basically have a user interface. OK, let's see.
So this is Vault, and within Vault we have our Search dialog. But, again, we don't want to use the Search dialog. So let's skip this, and let's go into the Azure portal under AI Services, look for AI Search. I just already created one. Within this search, there are indexes. And here there are already some, but I wanted to create a new one, AU 2024.
And I could do it here through user interface, but there are some limitations with the user interface, one of which is that I would like to have a synonym for the state. So for the state, as you see here, below, I'd like to use Released and Approved and Completed and Done complimentary, so I don't want to be fixed and hold on to use the word released. And this is something that, through the UI, was not able to accomplish, so had basically use Postman to send the AI call.
Now, here, you see the creation of this index. It's called AU 2024. It has a couple of properties, so, for instance, the name and the create date and the title, and we also have the state. And the state, here, has these synonym maps, which we just spoke before. And then we have other properties like the category and, here, for instance, a collection of components, so the children for my assembly as an example with the name and the state and the category.
So let's roll it, and let's create the index. Takes just a few seconds. We go back into the Azure portal and refresh, and now we see our new index. Go in. And, in this case, we will see that the index, right now, is empty. Even if we search for something, there is no value.
But we see the properties. We see the complex type, which is this collection of children. And we also see that we have a semantic configuration. This was done through this call we made before. This helps me because I'd like to talk with natural language, and I have to tell to the AI which properties, which fields are actually a good fit for natural language. In this case, it's the title, the state, the category, and so on.
All right, so now we have to feed this index, and for that we create a little PowerShell script. In this case, we use powerVault, one of our tools for simplifying the code, the communication with Vault. So we go here. For instance, we get the list of all the Inventor files. Then we cycle through the list of Inventor files and basically generate a little object that will contain the data you see here-- the title, the name, the create date. So these are the same fields that we have in our index.
And then we also basically look for non-IPT files and get the references. In this case, we create a component list. At the end, we're going, then, to translate this into a JSON. And then, finally, we basically make the call and send the data to Azure.
So let's do that. Takes a couple of seconds. Well, depending on the site, it can take even longer. But now we go back, and we see that we have 1,200 entries in our data set. And if we go back, we can start, now, querying our data set.
So, again, I have prepared a couple of questions. Let's take them over. And so, for instance, show me all records, all records that are released, all released documents. There you go. We see, here, the released documents. And here's another one.
The next thing is that let's try the thing with the synonyms. So instead of release, let's say approved. And here you go. So here you see, for instance, that first of all, it doesn't have to be correct in terms of upper and lower case, but the second thing is that now I can start playing with other words.
Now, let's try out, for instance, category Engineering. Again, it's actually written in upper E, but here we're using it in lower E, and it's still working. Let's try out, now, to use the engineering, but we have a typo. So an E is missing, as you can see. And let's try out this one here. And, well, in this case, unfortunately, it doesn't give any results. So this with the typo is not really working here, but we will fix it later with OpenAI.
Another thing that I would like to do now is to, for instance, search for assemblies that have released components, so look into the relationship. This is something I cannot do with the classic Search window. So, unfortunately, with this little text field, it's not possible for doing it here.
So I have, now, to use the more advanced way of doing it, by doing applying a filter on a subcomponent that has a state equal released, and so on, but it's technically working. So it's giving me, now, a result. And this is something that I could not achieve by using a regular search dialogue, unless the search dialog is super high complicated and capable of doing these kind of things.
So now let's go to the next level, and now let's go into the Azure portal, back again, and use OpenAI for improving even more the search results. So we have already an OpenAI search created. And so this service has been already created. I'm not going through that process. It's actually straightforward, but I'd like to show you, here, the OpenAI Studio, which is a user interface from Microsoft that helps basically to deal with my AI model.
So I'm going, here, to the chat. I'm going to add my data set. And, here, I can select from different sources. We want to have the AI Search, and we look for our AU 2024 that we just created before. And we want to have a semantic type of search, you know, not by keyword but by natural language. We also have our semantic configuration be behind, the one that I told you before with the properties and so on. And now we can create our connection.
We are ready to ask question to our ChatGPT type of application, if you want. So let's ask a static standard question, release documents. Well, we get the answer. That's fine, but let's now try out the thing with the typo, so engineering but wrongly spelled. And this is where language models are starting to becoming interesting. As you can see, it actually still recognized what we mean and was capable to actually fix my problem and still give me a decent answer. You know?
Even the question about give me documents or assemblies that have release documents, release components, even this one here is working by using natural language. So you see that the search was necessary to actually build a database of content for the AI to work with. And in a search itself was already capable of delivering certain degree of language features. But then, in a connection with OpenAI, this is where the thing really starts to rock.
All right, a couple of comments. The big issue that we have with Vault, and probably the same as with PLM and BIM 360 is that, by all means, the data is poor. You know? It doesn't contain a lot and a lot of text, and this is where the language models are actually good at. If you give a book to a language model, then the language model is capable to make a sense out of it. But if you give just a list of terms like the size and the weight and the unit of measure and so forth, then the language model is not really capable to understand what should he do with it.
So the other thing is that dealing with relations like parent-child, well, that's something that at least the AI Search is not really meant for, but Microsoft provides solution for overcome this problem. So here is an example. You just Google for this problem, and you will find documentation for it.
And the other thing is that the user questions may require a specific model. This is related to the first point, that because of the poor data and because of the very, very specific and domain-specific questions that you're going to have, probably a trained model would be better in this task.
Anyway, let's move forward and see how we can have a conversation with a 3D model. So, for that, I'm using, here, the Large Model Viewer, the LMV from Autodesk, and I would like to interact with this by saying, hey, go there and show me this and hide that and maybe modify something. And we did this by building this little application with a chat, here, on the left-hand side, and I'd like to show it to you.
So let's go in and see. Again, we have a list of questions already prepared. And, in this case, we go and say, hide me all components, which happened. Great.
And then the next one is, show me just the part called motor, and there it is. We can say, OK, go to that part. Maybe it's a bigger assembly and I want to go there. And there it is. And then maybe we go and say, OK, well, select that component because I want to know more details. And now the component is selected, and you see, here, now, the browser to the left and the detail pane on the right showing detailed information.
And now we want to see some more components, like all the components called ball. And there you go. It's activated. And you see them, right? And then, at the end, we go back and say, OK, show me everything, and we are good.
So you see that we can basically interact with language now with the model. In this case, we just use it for visualizing stuff or not visualizing, but we can also use it for actually manipulating things, changing quantities or changing dimensions. Think about maybe a configurator type of product that can be leveraged by language.
How does it work? So, in this case, we didn't use the classic language model that replies back a language, a text. You know, it's not ChatGPT or some sort of thing because we have to translate his answer into API calls for the viewer. So understanding from the language which API called to use would require another AI.
Instead, we use this Conversation Language Understanding API from Azure, this CLU. And, in this case, it will give you more detail. We get a very programmatic response every time, structured in the same way. So that means that the user question gets sent to the CLU. The CLU replies with a simplified response, with the structured response. And we take that response, and we still had to develop a piece of code that basically maps the response to the given API call of the viewer.
So creating this CLU is straightforward. You go in the AI Services, create the service, but then you have to explain or to train the service by defining the end intents such as GoTo, visibility, activate, deactivate, and whatever and then the entities, so an action, for instance, which could be show or hide, the object name, which is the name of the part or the object type, which is, for instance, part or assembly, the quantity, and so on.
Now, if we go into this CLU, and we have defined the entities and the intention, then we can start-- oh, we actually have to, we must create utterances. So we have to enter questions that we think users probably are going to ask, define what is the intent that we think it is, and then, with drag and drop, basically associate the right entity to the given word. Then we can train our model with this information, and then the model will be capable to provide us back a nice JSON response, which is brilliant for developers, where it tells us exactly what he think is the top intent with the other intents and a given confidence score-- yeah, you see, here, 0.3, 0.6, and whatever-- and also, then, the entities that he was able to recognize from the text, the confidence score, and a given value. So with this, now, it was able, then, to map the response to API calls.
Couple of comments, the results from the AI must be turned into actions, you know, text to code. This requires some degree of coding, and it requires some time. The nice thing is that, of course, it returns, now, a very structured data set, which we can then operate through If This Then That operations in the classic way. The other thing is that, because it's returning this answer, it's not returning a nice answer like ChatGPT. So if we want to have even a sort of chat conversation type of thing, we would have to combine two AIs, one that generates this and the other one that generates then a response.
Last example, drawing check. Well, pretty simple. Here, we have a drawing that basically has some quotes, some measures. Here, we have the same drawing without the measure. Can we use AI to spot the problem and spot the difference?
So a little example here, and we are involved again. So we see, here, the drawing without the quotes, try to approve the document, and then we get this error message saying, hey sorry. No, you can't approve this. You can't release this because the drawing is not fully detailed. Right?
OK. So let's solve the problem. Let's go back into Inventor. And, in Inventor, we're going, now, to apply our annotations, and then we're going to save it back into Vault. And once it's back in Vault, we are going to approve it again.
So let's try again, and now we see in the preview that we have the quotes. And we release it, and now it's working. All right.
So how does this work? Well, we used, in this case, again, the pretrained EfficientNet model that we used before for the clustering. Actually, it was the wrong choice, but this was just convenient because we were already working on it, so we had the code running, and so on and so forth. I mean, it did work, but it's not really the right one to use in this case. However, it's a Python script. Here, you see a couple of extracts from the script.
What we had to do is that, once we have initialized the model, we actually had to add an additional custom layer to our AI model and define that that layer only has one output. Or, actually, at the end, it only has one output because we just want to know, Does it have the measures or not? Is it good or is it not good, one or zero? All right.
And then we needed data, which we don't have. So we created, of course, a bunch of self-made drawings with and without quotes or measurements. We then trained the model and then basically tested, and it did work. But, of course, because of the quality of our data and the poorness of the data, we would never make this public and sell it to anyone. This is, by far, not ready to go into production. But, you know, we made the point and, again, even with this standard model.
So, again, preparing the model, then connecting the API to Vault so that it can talk with the model and provide back the answer-- yes, it's good or, no, it's not good-- and then basically interact with the user. And that's it.
A couple of comments, again, a pretrained model like EfficientNet was an easy choice. And it was OK for the proof of concept but, by all means, it doesn't work. And I guess I'm not aware of any model out there, right now, that is specifically for CAD drawings, so probably you have to create your own model and train it yourself, and for that you need data, data, data. The more data you have, the better it is, but it's a time-intensive process. So it's doable, but of course it comes with some cost with it.
Conclusions, well, so the whole thing, to me, is super cool. However, it comes with a degree of complexity. So first one, starting from the bottom, is that how rich and how structured is the data? Probably not that much, at least not in our domain. So that's the first thing, right?
The second thing is that, when it comes about choosing the AI layer, well, first of all, thankfully, there is a bunch of pretrained models out there, but even finding out the right one, testing the right one, and so on, it still takes time. And you still have to understand how the model works because every model has the ability to be tuned if you want. Right? Worst case, you have to create your own one and then train it. And that's, of course, more intensive and expensive.
Regardless of the AI layer, we will still have a point where we have to use the classic rule-based coding and do something with it, even just to bridge the gap between a presentation layer and the AI layer. So there is something we have to code anyhow. We cannot skip it, you know? There are some low-code and no-code things. Last year, I hold a session about this. If you are interested, you can go and watch it. But, again, you just reach a certain point and then, beyond that, you still have to take on with coding.
The last thing is the presentation layer, the user interface. And for everyone that is familiar with coding, a user interface is a pain in the neck because it has a lot of buttons and things you have to take care. So, it has a certain degree of complexity, building a complete application here, in this case.
I see two areas for AI in this domain. Maybe there are more, but this is the thing that I'm taking away from preparing this class. One is the ability to interact with data, and the other one is to transform the data. So the data interaction is about searching with more language style way of searching and have a search which is more error-tolerant, understands the context, dependencies between objects, and so on, but the other very interesting thing is the ability to really interrogate the data, understand what's in it, ask the AI to say, hey, are there any suggestions or improvement, anything that, seems not correct, recognize errors and so on. This is something that you usually cannot do with the classic user interfaces that just moves you from one step to the next one.
The other thing is about data transformation. We saw it with the BOM thing, right? But even with the Large Model Viewer, with the car seats, I can think of situations where you can either create data to some extent, like in the case of the BOM, or rearrange or reorder the data or maybe even drive to some degree creation, approximation maybe, of new models using the Large Model Viewer and maybe Inventor behind. I think there could be some scenarios where you can actually go quite far.
So the question is when to use AI. Well, if you need 100% exact answer every time, well, then you have to go with the deterministic approach and use a rule-based model, you know, If This Then That. That's the only thing that gives you the 100% security, no matter what.
But if the problem is too complicated to solve with rule-based or if an approximation is OK, you know, 70%, 80%, like the forecast for tomorrow, then the AI is absolutely brilliant. Then it's absolutely right, the right choice to go there. You just have to think about what you get and what you lose, but it's OK.
So what's next? Well, you're back in the office. Think about error detection, which type of situation you have in a company where AI could help solving the problem. Data creation, as I mentioned before, data transformation, that could be an area of interest. And, for instance, trends like clustering and grouping or seeing where the data is going and get a sense of what's going on.
So these are a couple of ideas that I'd like to give to you, brain food, as I mentioned at the beginning, to think about, if and where and how AI may help you improve your business. And by all means, if you want to continue the conversation with us, you're welcome to get in touch with us. I'd like to thank you for your attention and see you soon at the next class. Thank you.