Description
Key Learnings
- Learn about executing data pipeline construction to databases using Autodesk Platform Services APIs and Autodesk Construction Cloud Connect, with comprehensive recipe guidance.
- Learn how to use SQL queries for data engineering on normalized tables, integrating automated scheduled recipes.
- Experience a live demonstration highlighting data visualization of Autodesk Construction Cloud modules through Power BI dashboards.
Speakers
- ARAdam RobertsI work as a Data Analyst for Rosendin Electric. My background is nested in information technology and I am involved directly in solution development for our company.
- Liang GongHe is a structural engineer by training (PE) with a background in preconstruction/estimating, construction management, BIM/VDC and data science. He helps customers leverage the data they produce through the design and build process to generate actionable insights including forecasting and scalability. He also automates customized workflows with ACC Connect and Autodesk Platform Services. After graduating from Duke University, Liang is currently working on his second master's degree in Applied Data Science at University of Chicago, focusing on AI/ML as a part-time student.
LIANG GONG: Hi, everyone. Please read the safe harbor statement. Welcome to our presentation, Rosendin's Big Data Foundations Yield Big Construction Solutions. This is Liang with Autodesk. And this is my partner, Adam Roberts with Rosendin Electric. Speaker introduction, I have a background in civil engineering. So I'm a PE in Civil. And then I'd be working in BIM, in VDC, in BIM estimating in the design build world. And then pivoted to tech. And now I work on analytics and automation consulting at Autodesk.
Meanwhile, I'm a part-time student in MS in applied data science, focused on AI and machine learning at UChicago. Next, I'll have my teammate, Adam, introduce himself.
ADAM ROBERTS: Thanks, Liang. Welcome, everybody. Thanks for tuning in. My name's Adam Roberts. I work for Rosendin Electric. I don't have any cool letters after my title like Liang does. I'm kind of normal in that regard, I guess. He's an overachiever. I am definitely not. My background is strictly IT. I've been with Rosendin for nearly 11 years. Before that, I had some various IT jobs before I came to Rosendin.
I moved over to our quality department in 2020 to help with some initiatives regarding dashboarding and data analysis. So I just kind of wanted to point that out there. I don't have a data science master's, or anything like that. I really think that this stuff's approachable. If you have the will and you want to implement these things in your environment, you can do so without being a turbo genius. Obviously, if you have a turbo genius like Liang to help you, that helps a lot. But I just wanted to make that point that if you want to implement these things in your environment, you can do so. It's really not that daunting.
We're just going to cover our presentation agenda very briefly. First, I'll take you through our business background and motivations. Talk to you a little bit about our philosophy, why we're doing this. Liang will take you through a bit more of the technical stuff, starting with executing the data pipeline construction to our database using the Autodesk Platform Services APIs, the ACC Data Connector, and Autodesk Construction Cloud Connect, which includes Workato. We use ACC Connect and Workato interchangeably, that's why they're the same thing.
Next, he'll show you the SQL queries they've built using the data engineering for normalizing the tables. So the stuff comes from the data connector. It's a bit complicated. We really wanted to normalize that stuff so it was easier to manipulate, and analyze, and then tie into our other databases. He's going to show you how he's done that using the recipes in Workato. The beautiful thing here is it's all automated. So we don't have to do any sort of manual interference. It's all done scheduled. And that's all we have to do to worry about syncing that data.
Next, I'll show you some of our high-level KPIs, and some of the stuff that we've done in Power BI for dashboarding some of these things. And then lastly, Liang will take you through some predictive analytics stuff that we may be implementing in our environment. And maybe some parallels, some other things that-- maybe the stuff I'm talking about won't resonate, but the things that he will talk about using predictive analytics with some other types of data could resonate for your environment.
So I'm going to start off with a quote, which is super cliche. So sorry about that. So the quote is, "the most dangerous kind of waste is the waste we do not recognize." Quoted by Shigeo Shingo. He is an engineer consultant that was sort of made famous in the West by his documentation and study of the Toyota manufacturing process. I know Toyota has sort of become buzzwordy. But a lot of really good stuff here. If you don't know about him, I highly recommend researching the guy. He's very brilliant.
So let me talk about our philosophy a little bit. So I think it's important to talk about the philosophy. It's one thing to have a solution, but the philosophy sort of propels the solution. So our philosophy was sort of three-tiered. We want to learn from our mistakes. Learning from your mistakes is very important. But what we wanted to do was scale up a solution where we could learn from other people's mistakes, which sort of enhances that process and speeds things up.
So instead of people making the same mistake all over the country, if one person makes a mistake, we could share that information at scale. So for that to work, there has to be an environment that fosters learning and a willingness to admit mistakes. And that definitely needs to be reinforced by leadership. Luckily, we have some people in our company that feel very strongly about this stuff. Next, changing the culture is often difficult. And I just want to recognize that for a second.
It's one thing to say, oh, yeah, just do this. Changing the culture in your company can be very difficult and take a long time. So just keep that in mind. Stuff doesn't change overnight. It never does. Don't get frustrated. Next, understand your strengths and weaknesses. I think this is very important. Knowing your strengths is good. You can use them as a foundation to build off of. But understanding your weaknesses is equally as important. And then you need to, if you can, address the underlying contributing factors for your weaknesses, whatever they may be.
Lastly, adjust accordingly and continuously improve. So we have done this a little. This first bullet is important. It kind of ties directly in with this project. So introduce ways to collect information that can be analyzed, reported on, and distributed. So if you want to collect information about the mistakes that are happening in your organization, this presentation is very quality focused. But this can be-- the same project can be implemented for safety, or any number of different instances.
But it's important to collect that data so you can get it back into your hands, analyze it. And then get it back into the hands of the people that actually need that information so they can do their jobs more efficiently. And then of course, the cycle continues. So it's continuous improvement. As you find these things and implement them in your organization, you just will basically be doing the same three steps forever, hopefully. Maybe not. Maybe you perfect them pretty quick. But most people find problems and then find more problems. You turn over a rock, and there's another rock underneath. And it's rocks all the way down.
So our business background and motivations for this. I talked about our philosophy. I can't overstate the philosophy enough. Money is a bottom line for a lot of businesses. But there are other things at play here. So for starters, we wanted to become more efficient as a company. So we want to find problems before they happen, which means doing things right the first time. We're an electrical sub. Rework plagues our industry, and it's a huge time sink. It's a place where a lot of money gets spent.
And also in our specific trade, electrical rework is the most dangerous type of work. So we want to make sure we are doing the things we can to mitigate, to make sure, our people are safe and they can go home. But also we want to make sure that we're doing right by our customer. So we want to become more efficient, finding problems before they happen. And we want to do more with less.
Anyone that has had any, I guess-- projects are popping up all over the place. There's more projects than there are people to actually complete those projects. The workforce is aging. There's not as many people coming into the construction workforce as there are people sort of retiring. So we have to become more efficient with what we have. There's just kind of no other choice.
Secondly, we want to provide a better product and service to our customers. So we want to strengthen existing relationships and new relationships. So we have projects with newcomers. We want to make sure that they come back to us. We want to make sure that they were satisfied, the interactions were good. That helps us get repeat business, which is big. Next, have better cohesion and communication with our project teams, the GCs, and the customer.
I'd say from experience, most rework happens when there's some sort of breakdown in that communication. Whether it's among our own project teams, or it's from maybe our customer, the GC, the information doesn't trickle down to who it needed to. That's usually when stuff goes wrong. Lastly, make more money. We're a business. It's part of this. But I like to think of it as an after effect. If we're doing these other things, the money piece comes naturally. We're not necessarily out to make more money. It's just an added benefit of doing this process.
So make more money. If we're more efficient, we do more with less, then we make more money. So we're doing more with less. Great. Lastly, revised programs and processes that don't add value. So there are-- especially in electrical, there's all kinds of regulatory stuff, which is great. But we've put in self-imposed things that maybe don't necessarily add any value. And we're doing some analysis internally to identify things so we can modify those processes, modify the programs. Make sure they are adding value across the board so they make sense for everyone included in these projects.
So I just wanted to talk a little bit about the solution that we're looking at today broadly, just so we can all follow along a little bit easier. I come from an IT background, so a lot of this stuff made sense to me right out of the gate. But for those of you without that background or lack of experience with these solutions, I kind of want to just show you, very broadly, what this process looks like. So what we're doing is we're taking the project data in the platform that's entered by our project teams-- so this is just people filling out issues, forms, filling out assets, and changing the statuses.
All this stuff gets collected into ACC. And what we are doing is we're pulling that data using the data connector. And we're going to move it into our internal database. So we have this data in our SQL database. And that's where the normalized table construction happens with the queries that Liang and his team have built to take the raw data, normalize it so it's easier to analyze. From there, we can do our analysis and dashboarding. So there's a couple components here at the very end here that may or may not pertain to what you want to do in your environment.
We have chosen to not only do the analysis in Power BI-- you can use whatever program you want to. We happen to use Power BI-- but we've also-- going from the analysis and dashboarding into the consumable information, that piece of it, we have the Power BI dashboards and reports that we've built. And then we make those available to the project teams and personnel-- the people that can actually learn from this stuff. So they can use the dashboards for oversight, easier management, reporting.
If they want to create reports to send up like weekly status reports, things of that nature, they can do that to the GC or the customer without anything from my end at all. They can do that all on their own. And then the added benefit of making this information available to your project teams is they can come to these conclusions on their own. So I'm not just going, hey, I found a thing. I think you should do things differently. They can come to those conclusions on their own.
After all, I have a IT background. I'm not a construction person, per se. I'm not a project manager. I've never worked in the field. Those professionals know way more about those processes than I do. I've just made the data available so they can consume it. So without further ado, I'm going to turn you over to Liang, who is going to explain all of this stuff in much better detail than I can. Thank, Liang.
LIANG GONG: Thanks, Adam. Adam is just being very humble here. So Adam mentioned the project data. And reality, as shown in the picture, the project data could have different modules. You use ACC, SS, Docs, checklists, forms. They all come from different silos. And the end goal is to diagnosing and analyzing and visualizing the data. How are we going to achieve that goal from those data and different silos? We need to consolidate them. We need to process them. Put them in a central repository, like a SQL database. And perform data engineering before we could use the consolidate table to perform analytics.
So this gives a overall picture of how we are processing the data, how the data pipeline looks like. The next page that-- intuitively, I want to give you some concepts that you probably use the issues, module, and ACC a lot every day. This data shown on the UI are not ready to be analyzed, because ACC is not a database. On the backhand is, but on the front end UI, it is not. So we want to grasp all this data. Like, what's the issues name, issue credit date? And what's the attribute value? And what's the attribute here?
So essentially, we want to grab all these important information about the issue. How we are grabbing those information by performing data engineering to consolidate the normalized tables for the issues table. For example here, we want to grab the Builders FirstSource. We want to grab Punch List for the issue subtype. I want to grab the Punch List for issues type. And you can see all the values are distributed under different normalize tables. And we have to perform data engineering.
So just want to give you some intuitive concepts right now for better understanding of the latter slides. I want to know how many people are actually using the insights module on our ACC? A lot of the times, people just click the Run Extraction, and the Zip file can be downloaded, which includes all the-- more than 200 normalized tables here. And Rosendin doesn't want to go through this manual process to download the data every day, or every week. It's a waste of time.
So we're helping Rosendin and Adam to automate the process-- the data pipeline. And what we're automating here, we're grabbing all those automated normalized tables from the ACC cloud into their SQL database. And the reason here I'm circling the customer attributes three tables is because I'm going to give an example of how to consolidate these three tables in the latter slide. So this is just a heads up.
Speaking back like, how we're actually automatically piping the data from ACC cloud into Rosendin SQL database, this is how we are doing it. We are doing this through ACC Connect, a.k.a. Workato. For the recipes, there are two different levels. This is on the parent level, we're hosting all the parent recipes. And they say the Utilities folder, which are hosting all the child recipes, which include all the 200-- more than 200 tables that how we are piping those tables from CSV into SQL database. Let's take a deeper into how we're actually do this.
On the left side, these two screenshots are from the parent table-- the parent recipe. And on the right side this is the child recipe. So the general logic is that we hit the APIs, telling the APIs, grab a data extraction on the UI. So the UI began to hit that button, Run Extraction. So after the extraction, the run is finished with [AUDIO OUT] the file. Unzip that file, and begin to loop through all those more than 200 tables in the zip file. Will process them one by one. For each CSV, we send them to the child recipe, which is on the right side.
For the child recipe, what it's doing that create a SQL table if there's not one yet in the SQL database on our Rosendin SQL. And analyzing the CSV, and gradually upsert each line of the CSV table-- upsert into the SQL table-- corresponding SQL table under Rosendin. So this is what we're doing here. Essentially, grabbing the data from ACC cloud and download it, analyze it table by table. And for each table, we're upserting those rows in the table into Rosendin's SQL database. So this is the overall idea.
And all this are on a time schedule, which means it's totally hands free. You do not have to do this manually in any of the single steps. After we put all those SQL tables-- we filled all those SQL tables with the CSV files, the next step, we need to perform data engineering, which I was talking about earlier. Because all these tables are just normalized tables. You have to consolidate them. If you remember the issues example, by sticking the tables together to extract the actual valuable information that is ready to be analyzed and visualized.
I was talking about earlier for the issue of customized attributes tables for this three tables. So from the first table you choose Custom Attributes table. Left outer join the Attribute Values table. And then left outer join again with the Issues Type table. In this way, we could expose the attribute value, the attribute name, and the issue type that the attribute belongs to. That was an example from the attribute perspective. How about on the issues level? Because we're caring about the issues creation date, the issues type, which category the issues falls into. These are all the normalized table.
The most difficult part of the work is how we find the common columns between different tables, and how it correctly mapping them. There's already an a schema in the zip file. However, it's not providing a ERD. Essentially, we're working on something like this. This is for the modules-- the ERD. Basically, we're mapping the different columns between different tables to map them correctly. So after stitching them together to make sure it is accurate, there is no duplicated rows. It's ready for analyze for visualization. And this is a hard core of this data engineering part.
And that is equivalent to the Power BI part. What I did first is that we're putting the Power BI, looking into one single project first as a proof of concept to make sure the logic-- the way the tables are merged together are correct before writing the massive SQL queries in SQL. Just using Power BI as a proof of concept. As you can see here, after we prove the concept in Power BI, we're using the SQL to write the SQL queries. Because all this queries are essentially the same as the apply steps in Power BI.
It just move the logic from Power BI into SQL. That's all it is doing here. Next, after we're generating this Issues Consolidated tables, we hand it over to Rosendin. And Rosendin could begin to perform analytics to drive insights from this data. And I'll head over back to Adam to talk about the visualization and analytics.
ADAM ROBERTS: Thanks, Liang. So what we have here on screen is an example of some scoring KPIs that we've put together internally. Again, I just want to reiterate. Our focus, specifically at Rosendin right now, has been on rework type data and quality metrics. But you can use this for any-- safety incidents. Really, kind of the sky's the limit. It's just a matter of what you can collect in the platform, and then what you can creatively apply in terms of analysis to figure out what kind of KPIs you would want to use on your end.
So I'm going to run through these very quickly. So at the top, we have a summarized score. And then underneath that, we have the contributing KPIs. So the left and right side you're looking at here is the left is overall-- it's all issues. The right side is a relative date running score with the same KPIs, but it's just for the last 30 days. So when you're looking at these two numbers, the right side is sort of a visual indicator of how are you doing in relation to your overall score. Or have you improved over the last 30 days?
Maybe you've not improved. In this case, what you're looking at here, there's an improvement that can be observed. But the reason we have that there is sort of a trending indicator that you can look at very quickly and say, hey, how are we doing lately? You can get it real quick. So what we've done is these KPIs are actually weighted differently depending on the criteria. There's little goal indicators under each one of those figures. And that's the self-imposed goal that we've set at our company to try to achieve.
And then the scoring is based off of whether or not they've made those goals. So there's some other stuff under the hood. I won't bore everyone, but some measures that we've put in place to build these KPIs and the weighting. So really quick, I'm just going to run through these. We have average days past due, average days to close the issues, percent of issues closed total, percent of L4/L5 issues. Were L4/L5, in terms of electrical speak, is sort of issues that have been discovered maybe at time of energization.
External punch list items are just like they sound. Things that weren't up to whatever the customer specifications were maybe, or something-- some kind of rework that was indicated after turnover. Percent of rework items overall. How many actual items do you have logged in your issues that are considered rework? Or at least we are considering rework internally? And then lastly, average percent of blanks. So this is something I just want to talk on really briefly. I think Liang mentioned this once already. Maybe not.
But your analysis is only going to be as good as the data you collect, correct? So what we've done here is we've built a KPI for our people to be able to manage how data is being put into the platform in their project. So if they look here, there's actually a breakdown that goes along with these. These blanks can be detected in various fields. So if someone fills out an issue in the platform, we can say, hey, they didn't fill out a root cause. Or maybe they didn't put in a due date, or something to that nature.
And we want to make sure they're filling those issues out completely. So that way, we can collect good information. So I just wanted to talk on that very briefly. Analysis is great. But if you don't have great data to start with, it makes the analysis much harder. Or you have to leap to conclusions, which is never something you want to do. But this is just an example of some scoring metrics that we've put together on our end using the data that we're collecting from the platform.
Some filtering options that we have. We call it slicing in Power BI, if you're not familiar with that term. Slicing is filtering. It's the same thing. So we're able to slice by company. Internally, we have regions and divisions. So we can go down to basically, if we want to look at Rosendin's rework numbers, or maybe a supplier-- if we're logging issues on a supplier, we can go in here, filter by the supplier. And then see how many rework issues we have with that supplier.
So maybe we have issues regarding shipping. Maybe items are damaged in shipping, and we can log all that stuff, and very easily pull up what's going on with that supplier, for example. We can also filter by project. So that's sort of to circle back to the entire point of getting this information in people's hands. So if you're looking at company-wide data, that paints a picture. But it's not very specific. So that's why we have these other filters to get really dig down to region, division, and project so people can-- at any level of the organization, people can get the information that they want to see.
Maybe a division manager wants to know how their division is doing overall. And they can get that information. The visual that you're seeing here is just, again, an example visual of something that we've put together, where we can see what is contributing to this rework value by either region or division. So you can really dig down quickly to figure out what you're contributing factors are. Some other things that we don't have shown here are some breakdowns by root cause, or by issue type. So that way, we can get some more information.
Again, digging further, deeper into what's going on. So we can do some sort of corrective action to bring projects back up to where they're successful again. Or come up with a better plan. Like I said, the sky's the limit here. You can really do whatever you want. It's just a matter of being creative enough to come up with what you want to do. Which sometimes is easier said than done. I'm going to turn this back over to Liang here so we can talk a little bit about predictive analytics. Without further ado, go ahead, Liang.
LIANG GONG: Thanks, Adam. That's really comprehensive. Coming back here, I'd like to show you guys the evolution of the analytics descriptive, prescriptive, and cognitive. So far, what we all have talked so far about, the descriptive analytics. However, this really lay a very solid foundation for predictive analytics. Predictive analytics answer the questions like, which asset is going to fail? When it's going to fail. And the prescriptive analytics going to answer the question like, what can we do to prevent it?
So with a solid foundation of the data that already lives on our Rosendin SQL database, what can we do? For example, here is a quick structured data using the structured data, performing a supervised learning by using secular package to predict the issues priority level. Like, which issue can be solved on the construction side objectively, rather than people on the side just pick up the issue to solve subjectively? By leveraging these eight columns, we're predicting the issues priority level. Like high, low, medium.
This is a quick example of supervised machine learning by-- falls under the predictive analytics. But there are tons of other use cases under predictive analytics. The first one, just talking about the previous slide. And the second one is actually Revit. This is a great example for unsupervised machine learning. As you can see here, this dots representing the models of with different disciplines. If there is a new model causing the blue without actually opening the model-- which could take a long time to open it if it's a big model, I don't want to spend the time opening it.
Just by using its metadata from Revit, I could tell it falls under the cluster of structure. So I know this is structure Revit model even before opening it. So this is an example of unsupervised machine learning. And another big area is about the text. We put a lot of the text under the issues description, under submittals, transmittals. Have we ever really used those texts? Those texts are literally unstructured data. We could perform sentimental analysis.
We want to know which issues, descriptions really inactive, which could be flagged as red so that we could solve that issue first or prioritize that issue. Things like that for those unstructured data, those are very important. Valuable resources, if we don't use them, they just-- it's just a waste. It's a valuable asset for us. And the next one is about ACC photos. For example, this guy is wearing a safety hat, safety vest, pants, shoes, gloves. If you want to identify anyone on the site who is not wearing a safety glasses, you probably have to go over all the pictures one by one, or the videos.
However, with the deep learning artificial neural networks, it's going to automatically flag the person who's not wearing the safety hat automatically. So you do not have to go through one by one manually. So that we could abide by the OSHA guidelines. Another impact using the computer vision is about the construction progress. If we are taking a screenshot of the same angle on the construction site at the same time every week, we could compare the differences between the photos.
So say week two and week three had the biggest difference between the photos. That means that's actually have the biggest progress. Again, without manually looking through all the photos like 52 weeks, we could automatically performing the algorithm under cosine similarities to tell how quick of a progress construction project is. Last but not least is about the trend analytics. For example, here, the client walk to us, they want to know their token's forecast based on the seven years purchase of tokens with us. They want to know how many more tokens they should purchase for the next year.
So by performing trend analysis-- and here, I'm using a SARIMA model-- you could see here the orange line is predicted. This really provides a baseline for conversations between the account team and the clients so that they know how many more tokens to buy, how many more to deduct. This is just a baseline. All this cannot be achieved with a solid database, the solid data repository, which has already acquired all the ACC's data into a Rosendin SQL database automatically. And that's the foundation, again.
That really wraps up our presentation for this year's AU. Any comments or questions are welcome here. And thanks Adam, for co-hosting the session with me. Thanks to everyone.
ADAM ROBERTS: Thank you.
LIANG GONG: Thanks.