Description
Key Learnings
- Compare traditional and Autodesk Construction Cloud-enabled knowledge management.
- Learn about Sunway's use of the keyword search in Autodesk Construction Cloud Files.
- Explore the integration process of Autodesk Construction Cloud and ChatGPT via OpenAI API embedding.
- Learn how to implement a Q&A system using Autodesk Construction Cloud and OpenAPI ChatGPT integration.
Speakers
- ZLZiqing LiewZiqing graduated with MEng (Distinction) Engineering from the University of Cambridge in 2017, and has worked as a site engineer and subsequently transitioned into VDC and digital transformation role in the past 5 years. In Sunway Construction, Ziqing and his team delivered the BIM-FM (Facility Management) deliverables for Parcel F Putrajaya, one of the largest of its kind in Malaysia. He then transitioned into leading the Digital Transformation Team of Sunway Construction, helping the company to develop digital processes, workflows and tools that are suitable for site application. His work involves building multiple in-house web-applications for construction use: tracking attendance, movement & vaccination of workers during COVID times, digitalisation of delivery order, 3D visualisation and dashboarding tools using Autodesk Forge, etc. With his experience at site and programming, he currently works as the Head of Digitalisation, Development & Delivery (3D) in Sunway Integrated Property, where technologies and new ideas are explored extensively to propel the property and construction industry forward. With Sunway's recent subscription in ACC, his team is working to drive adoption, implementation and customisation of ACC to suit into Sunway's and Malaysia construction context.
- YWYifan WongSenior Engineer of Sunway Integrated Properties, focusing on integrating ACC and Forge viewer with many internally developed applications.
ZIQING LIEW: Good day, everyone. Today, I'm speaking. I'm actually from Sunway. Together with me is William.
We wanted to present to you Autodesk Construction Cloud Meets GPT, in Autodesk University, year 2023. Now when it comes to Autodesk Construction Cloud Meets GPT, we are really talking about a knowledge revolution that will bound to happen in Sunway, and hopefully, in the rest of the organizations within where you are working.
Now just a little bit introduction about Sunway Group Malaysia, before that, you would learn about Sunway today. You will be typically-- we will show you around the typical knowledge management on Autodesk Construction Cloud. We will energize Autodesk Construction Cloud file search with OpenAI GPT API, and we will also show you our initial results, so you will be able to understand where we are today and how we want to move forward with the technology that we are building up now.
So a bit about Sunway, Sunway is basically one of the largest conglomerate in Malaysia. We do township development, such as hotels, malls, hospitals, residences, offices, theme parks. And in fact, in the background, what you see is Sunway Velocity. It's actually one of the township that we have built within a span of 10 years, from a village into a bustling township consisting of shopping malls, hospitals, residences, and other fun elements as well.
Now because of the way we build townships, we build integrated solutions. We have evolved from just a property developer into a contractor. We also provide hospitality and healthcare services, education, leisure, pharmacy, and so on and so forth.
In fact, let me show you one of the most beautiful townships that we have built in Malaysia. It's within Sunway city, Kuala Lumpur. On the left hand side, it's where you have Sunway Pyramid, one of the largest shopping malls in Malaysia.
Sunway Lagoon, which is the fun place where we attract a lot of international tourists to come to have fun in one of the biggest theme park in Malaysia, with as many rides as you can enjoy. We also have Sunway Resort Hotel smack in the middle, where the Sunway Resort Hotel is an important place for us to host important events. And we also host Gordon Ramsay restaurants in Malaysia and so on and so forth.
Sunway Pinnacle, the Grade A office buildings that is also green certified. Sunway Medical Center is actually the largest medical center in Malaysia. And of course, we goes into Sunway Geolake Residences, where we built our bread and butter, which is the property development sector.
Now when it comes VDC side of things, because we wanted to build such an integrated township, it's important for us to adopt virtual design through construction. And in fact because digital excellence is a key element to our project management, we will need it to be done quickly and greatly as well. And that's why in Sunway Construction, we started implementing BIM since 2000. We are one of the company that pilots BIM in Malaysia. And in fact, we completed the largest BIM-FM project in Malaysia in 2019.
We internalize BIM to technical and operation. And in fact, this is something that has been quite recent, and it's not commonly found elsewhere in the world. That means that we actually get our technical and operation team to actually have all the BIM knowledges, so that we no longer would need to maintain a large BIM team to produce BIM elements. The technical team, the operation engineers will be the one that uses BIM to produce drawings and also models as well.
Now beyond Sunway Construction, it's where we have Sunway property who mandated BIM for all the property projects. We also have Sunway Construction who operationalize BIM, as I mentioned earlier. And finally, we have digitalization, development, and delivery team, which is myself and William, who would provide technical solution, support, R&D that is required to reap the best of the digital tools.
Now with 3D, I would like to introduce myself. I'm Ziqing. It's nice to meet all of you. I'm from civil, structural, environmental background from the University of Cambridge. I was a site operation engineer for a year before I moved on into the digital route.
And I started the digital transformation team in Sunway Sunway, the biggest pure-play construction company in Malaysia. And I moved on to deliver the largest team project as mentioned earlier in 2019. And it's William here. William come.
WILLIAM WONG: Hi. Hi, everyone. I'm William. OK. Let me try controlling the screen a bit, yeah.
Just a moment, yes. Hi. I'm a computer engineering from Monash University. And previously, I was joining the digital transformation team in Sunway Sunway. And then, now I move on to Sunway Property, together with Ziqing, under the 3D team.
And then, I was one of the pioneers in Malaysia for using Autodesk Forge Platform, built a lot of custom Forge viewers to load up all the 3D models. And then now what I do is, I'm a system integrator, and I build a lot of custom apps for Sunway Property and Sunway Construction, basically integrating a lot of microservices with Autodesk Construction Cloud. Pass back to you, Ziqing.
ZIQING LIEW: Thank you, William. Now let's just talk the main topic of the day, which is knowledge. We all have knowledge. In fact, as a company, the whole base of the company is built on top of knowledge gained throughout years of experience. And since ACC becomes part and parcel of our knowledge management, it becomes a vital tool for us to use and maximize the usage of ACC to be able to manage our knowledge effectively.
Now comes the word CDE. I'm sure everyone is aware of CDE, which is a common data environment as the single source of truth. Sunway has actually adopted Autodesk Construction Cloud since 2023, very recent for us, in fact, just in January. So we started this journey less than a year ago.
But we found that it is a great tool to support our project needs. Why? Because there are a lot of different cool tools around ACC that is centered around project management, such as versioning of the files. We are able to do comparison of PDFs, CAD files, Revit files. In fact, we are able to show differences between different versions.
We are able to review workflow with a status indicator, such as the one shown on the screen at the bottom, where you see that each row of the files, there is a review status that says whether it's approved or it's rejected. And it becomes very clear we can build such workflows into our file management system, and the people that organizes and uses this tool love it because they are able to understand the status of the file and use the correct file for their actual work.
And finally, linkage to other functions are also a key part of what we are doing, and hence ACC files are able to link to different modules such as RFIs, such as issues, for all this information to be able to sit in one place. Now that being said, while ACC, it's a great tool, it offers a lot of important features. It does have a certain limitations as well at the moment.
So our technical challenges, when it comes to Autodesk Construction Cloud, is that we have layers of folders that we created to ensure that the project filing index falls into the similar structure. As you can see on the left hand side, this is one of the snapshots from our projects, where we see that there are layers of layers of folders being nested into each of the folders just to reach the file.
Now it is a common practice, even before ACC where we have our own digital tools and digital servers, the files are nested to start with. So it's not a native problem in ACC itself. But ACC does provide a very cool tools to use for us to enable the exact search.
So for instance, I put in waterproofing as a keyword. And I will be able to do the content search. And list of files that is related to waterproofing or has the keyword waterproofing will actually be able to be pulled out from the system.
But soon, we realized that actually there are problems related to cool tools like search. Because, number one, we have to fit in the exact search. And let's say we spelled waterproofing slightly or we change it to a slightly different term or slightly different words, then it will be not searchable at all. And of course, on the right hand side, you realize that with such a keyword called waterproofing, we are bombarded with results.
Now we are just showing 100 results. Who knows in other documents, there are so many more results that has the keyword waterproofing. Now to sift through all these documents to find the actual information that is required could take a long while.
Well, let's not forget, that's only on the technical side. On the human side, we face a small challenge, and we have to manage more stakeholders. The human factor definitely doesn't help when it comes to managing such a mass in our file management system.
Now on the left hand side, you see that there are different preferences when it comes to the way each person store their own documents. And an example from two different projects, one project's named the folder Technical 03. One project's named it 05 Technical, but that's not about it. As you can slowly see that the way the files are structured are different from projects to projects.
You have a document controller that loves to put latest version in a single folder, such as 01 Drainage. And below it, it's all the drawings about drainage. But in other hand, you also have another document controller who prefers to load the folders by submitting. That means, let's say that the consultant submits this on the 28th of September, I will be putting it in a folder. This is the submission from this contractor or this particular consultant.
So as you can see, different people have slightly different preferences. And sometimes, it's also the project direction and the project director who just doesn't get used to the new way of storing things. So preferences differ from people to people.
Now next thing that makes it worse is that people come and go. As in Autodesk Construction Cloud, the environment allows us to control the members admission and resign members to be kicked out from the system very easily. But the challenge here is that when a new staff comes in into the system, it's difficult for them sometimes to understand or comprehend where the documents are stored. In fact, sometimes because of the naming convention are not strictly followed, it's very difficult for them to even trace back certain files, as well, and some of the files are lost in the sea of documents inside Autodesk Construction Cloud.
And so people come and go becomes a very, very common trend, not sure about the rest of the world, but definitely in Malaysia, definitely after the pandemic, where job mobility is a lot higher at the moment. So we really desperately need a solution, something, for us to be able to make sure that the new staff that comes into the platform are still able to search intelligently through the systems of the relevant files or, in fact, relevant knowledge as well.
Now finally, let's talk about technology. Although human is a factor, but because we are so relied on technology nowadays, we love to Google things. We love to go to ChatGPT and ask for the latest information. And the convenience of Google and ChatGPT offers a variety of expectation, where everything has to be on a finger toe. Everything has to be very fast.
But the challenge is with hundreds of thousands of results returned in one go, it's actually very difficult for any single member to be able to sift through all the information and produce a response very quickly. So that's why, human expectation is still that things have to be really fast. So that's where human factors really wanted to drive the digital team to think of a better solution of how we can proceed and make sure the systems are designed to cater for the human differences, but at the same time, be able to feed all the information that we need in one go.
So we were thinking how to do it. And one day with the "light bulb" moments, we came up from here. Now I pass on to William to explain how we actually do it and show some initial results.
WILLIAM WONG: Yep. Thank you, Ziqing. OK. Now I'll be talking more on the technical side on how we solve the problem that Ziqing mentioned just now. So basically the problem we faced was, if we are searching through ACC by the usual, was searching through PDFs, the term of searching is called a lexical search, meaning we search by keywords.
So the "light bulb" moment for us is, how do we energize this ACC file search with OpenAI GPT API? And then it comes to us the idea of semantic search, meaning we are actually searching based on the context of the words given, the meaning of the words given, instead of matching characters by characters.
So what do we actually need? First, for our system to behave like ChatGPT, we need the intuitive and real-time chat in all the intuitive and real-time response. And also, we needed the sources of our answers to be only from the ACC projects, not gathering all other sources from anywhere else in the world. And also, we needed references for our answers, so that users knows that it's a valid answers.
So to achieve this result, to achieve the idea of semantic searching in ACC, the idea that comes to mind is natural language processing. So a little bit of explanation on natural language processing. So basically, imagine you have a long text in a PDF, so in NLP, all these texts will be serialized into vector spaces.
Imagine you have the word "coffee", so then this word "coffee" is being serialized into a list of numbers that we call it vectors. So what happens is, if you are comparing two words, the closer the two vectors are the closer the meaning is. I'll give an example. If you are comparing coffee and espresso, the vectors for these two words will be closer than if you compare coffee and basketball, because coffee and espresso are related to each other.
So that's how NLP works. So we use this kind of concept to help us identify what type of content that we are looking for based on the query given. All right. So our application aims to find the documents that are most relevant to the search or the query, meaning finding documents with a closest vector space to the query.
So what we need to do is there are actually two parts of this application. OK. First, I will talk about the first part, which is the preparing all our files, preparing all our contents to be serialized into vectors. Many are preparing everything to become numbers-- preparing all the text to become numbers, so that it's searchable.
So the first part for us is, what we do is we grab all the files in a specific folder in ACC, which is meant for sharing. We convert all the contents of those files into text, and then we serialize it, and then we save it into a vector database. And then, imagine in this database instead of having words and characters, we're having a series of numbers representing those words. So this is the first part of the application.
So the second part of the application is, we are getting incoming query, incoming questions into our applications. What we are trying to do is we convert all the questions into the same numbers, which is vectors, and then we will compare, and we will match the query vectors to the contents, vectors which we stored earlier in our vector database. And then, we extract out the closest field and form a valid response from the matching vectors.
OK. I'll just go through the process flows on first part of the application, which is to prepare ACC files for Vector DB. So how we did it was, as I mentioned earlier, we have a specific folder just for sharing. So ACC users in our company will upload files into that particular folder. And then we as a developer, I'll be setting up webhooks via the Autodesk Platform Services API.
So this webhook will be triggered by some of the events. Notably, the new files being uploaded. Our new revision of the same file that's being uploaded are the deletion of the files. So any of these events will trigger Autodesk to send us notification regarding the changes in that folder.
So what our app do is, after we receive that notification, we will go into the file and we extract text. This is an example of new files being uploaded. New files be uploaded into ACC, one of our folders, and then we get a notification from the ACC's webhook. And then what we do is, based on that notification, we know what event is being triggered.
If it's the upload event, what we do is we extract the text. We go into that file, extract the text. We vectorized all the contents, all the text into embeddings with OpenAI embeddings, API.
So up until this point, we have converted all the text into numbers, and then we need a place to store it. So what we do next is, we're going to store it inside a vector database called Pinecone. So we are not just storing it blindly because we needed a way to re-identify back the files that are being converted to vectors.
So we label all the files, all the vectors by files ID, and ACC, and also some of the custom attributes that will help us re-identify back the files that we stored. So up until this point, imagine user upload files to ACC, files converted to numbers, numbers stored in a vector database. This is the first part of the application.
So now comes to the second part of the application. Now assuming we have users in our company that type in a question in our application, like for example, what are the safety regulations at a construction site? Simple questions.
So in the normal way of searching, we have to match each of the characters, each of the words, have to find out the words of safety, the words of regulation. But then for this, we don't have to because semantic search now takes care of it. What we do is, we find out what's the meaning behind this string of query.
So we have that line of question, which is, what are the safety regulations at a construction site? We do the same thing as the content. We convert this string of questions into the vectors also by using GPT embedding API.
So now we have two vectors. One is our content vectors, which we uploaded earlier in part 1. Number two is we have the query vector, so now we will query the vector based on the contents that we have in Pinecone DB, and then we will see which one of the content vectors match the query vector, which we have right now.
And then what our Pinecone DB that eventually will return to us is, we have our query vector with a list of content vectors that is most related to this query. However, for now, it's still not user friendly, because it will just become a list of all the contents jumping here and there. It's not arranged properly. It's not human readable.
So that's where from engineering comes in place. So we design the prompt in a way that we also call GPT Chat API to answer that question based on the query and the context given. And then what ChatGPT API returns to us is a human-like response that is user friendly, that is like a human answering you, a random questions. But then, all the sources will be from our ACC folder.
So we also design a problem in such a way that if it doesn't find any relevant information in our folder, it will tell you that, sorry, I don't know. So you won't be getting a lot of random answers from anywhere else. So that is how part 2 works.
So I'm just going to show the results of our application for now. As you can see here now, this is our own application. So the users are actually typing in the questions, just a human-like questions.
What are the things to note when doing interlocking pavement? I queried it, and these are the answers that have been given back from the Pinecone DB after we match the closest vector, and then we converted all those contents into human, readable form. It's like people explaining something to you. And together with that, if you see the reference, we have the ACC link together with the page's information.
So users can have confidence in using this system because they know the source of this answer. That's the whole technical process on how we design this app. Now I'll pass back to Ziqing.
ZIQING LIEW: Yep. Thank you, William. I'll just let it play again, so that you guys see the references and what is being mentioned by William earlier. By just a click of the link produced, you are able to redirect it back to Autodesk Construction Cloud and the page is 113 to 115.
So our journey for this particular project has been conceptualized in the year 2023, in May. And we started the development since July 2023 with the help of two staffs, and we actually are able to roll out for a small group testing in September 2023 in our staff portal. And now we are actually optimizing this query, as well as the prompt engineering side of things, and also trying to fix certain issues when it comes to the consistency of results.
What I mean by optimization is where we are looking at the references being able to be synthesized all the time. One particular thing that we face a lot of issues was when even with such a good prompt engineering, sometimes the results that is being returned from the ChatGPT API might not be including the references link, which gives us quite a big headache because we have guaranteed that all documents that is synthesized has to be referred so that it's easier for the team to be able to search back where this came from rather than something from ChatGPT itself.
So that's why the references is very important. But sometimes the ChatGPT API might miss it out, so that's why we are doing some optimization and fixing when it comes to the way we architecturing the whole process as well.
Now aside from fixing some of the small bugs, we actually got a lot more requests that says that, look, this is a great tool. The staff no longer needs to take in so much time to respond to general questions like, where do I find the files? Where can I get this information? Because you can get it a lot more accurately from this engine itself.
In fact, we also had staff that feedback to it and say that they didn't realize that those content were there until this. They did the search, and it was accurate. Now, because of that particular feedback that we took, we also received a lot of feedback that says that, look, why don't we extend this to not just [INAUDIBLE] to other files as well?
Number one, we wanted to focus on PDFs that are scanned because in construction sites and in project management, there are a lot of signed documents, and because those are scanned documents, it's very difficult for us to extract the information out, and hence, we wanted to work with certain [INAUDIBLE] technologies to be able to pull the information from the scanned documents into certain system, as well, to be synthesized via similar method.
Number two, we wanted to look into non-PDF files, such as Words, Excel, PowerPoint, that could potentially contains quite a number of big contents as well. And so we wanted to let the user to be able to search through this information.
We want to even look into non-text files, such as images. Because a lot of times, pictures tells a thousand words. And if you are able to point them to the image, or in fact, provide the image in the context itself, it will even be better because that will be able to tell those who are searching for information where are the information and what should be done. Let's say for the interlocking pavement, if certain images of how to lay the interlocking pavement could be able to be shown up, it will even be a nicer approach to things.
Now one of the things that we always had a hole here in our heart is the search for drawings. And in fact, those who know that the reason why construction industry has always moved forward with naming conventions, and we are stressing a lot about the naming convention, is that drawings can be searchable in the system easily. But when information becomes more robust and the technique of searching drawings become more robust, what we wanted to look into is a way where drawings can also be searched by generic terms, as well, aside from using the standard naming conventions.
And by that time, I think, we are ready to move on into more human intuitive technology that doesn't require so much remembering and understanding. So number two of course, as I said earlier, we wanted to improve the accuracy and consistency of the results as mentioned.
And we also wanted to extend this to other areas of ACC, such as searching for relevant issues, searching for RFIs, or in fact, searching for dashboards or creating figures that are reporting like, how many issues have been reported for the day, or how many issues related to certain categories, and what are the key root causes and such?
Those are the things that we could make use of this technology to enhance ACC experience further. But eventually, what we wanted to just express is that we really hope to convince Autodesk one day that we will be able to adopt this feature internally as a feature natively inside Autodesk Construction Cloud. And that will be our dream for everyone here that's attending this conference that we no longer need to do it out of the box, our app to achieve what we are achieving today.
OK. That wraps up with our presentation. I hope you guys had a good day. Thank you.
WILLIAM WONG: Thank you.
Downloads
Tags
Product | |
Industries | |
Topics |