Description
Key Learnings
- Understand how files are stored with Forge
- Learn steps to download, upload, and interact with files via the Data Management API
- Review storage providers’ common practices (Box, Google Drive, Dropbox)
- Learn how to transfer files according to provider rules
Speaker
- Augusto GoncalvesAugusto Goncalves has been an API evangelist at Autodesk, Inc., since 2008. He works with all sorts of technologies, from classic desktop to modern mobile and web platforms, including .NET for AutoCAD software and Revit software, and JavaScript application programming interface for NodeJS.
PRESENTER: OK. So that's the third class of the day. Third round of classes. Hope you still have energy. We still have one more, right? So this class will be about transferring data between Autodesk and other storage providers. The goal of this class is to show you some samples, and also give you some tips and tricks on how to do it.
And why this is important? This is important because data is spread all over the web. Autodesk is not the single source of data. You may have data on Dropbox. You have data on Box. You have data on your local storage. You have data on your internet.
So how can you move data around? How can you get that data from here to there, right? I know it doesn't make sense to keep moving data all the time, moving big files. But some time, to get started with a new API, you need to move the data to that API.
So I just forgot the space. I keep forgetting it. My name is Augusto Goncalves. I work at Autodesk for 10 years now doing support for ADN on the desktop, and now doing support for partners with Forge. And again, this is integrating Forge Data Management API with other storage providers. Basically, how to move data around.
And as I said, data is a spread. We have to move that data. And let's say you have a lot of data sitting there on Dropbox and on Box, Google Drive, and you want to get started with BIM 360 Docs. You want to get started with A360. Or you have data on A360 and you want to make a backup of that copy or get a snapshot of that data back to Dropbox or back to Google Drive.
So moving data around is very important, and it's relevant. I would say it doesn't make sense to keep data in sync between Autodesk and other storage. Why do you need to keep data in sync? But sometimes you need to just give some snapshots, right? So this class, we'll go over some basics on how to create and transfer data between us, Autodesk, and Box, Google Drive, Dropbox.
I will also show how the idea works for Ignite and OneDrive. So that's the basic idea. OK? OK. So before I go ahead, do you have your company, do you have data hosted somewhere outside other than Autodesk? Do you have data on Box, Google Drive?
AUDIENCE: [INAUDIBLE]
PRESENTER: OK, I think there's is a lot. Google Drive? Interesting. Box? Also on Dropbox? Ignite? Interesting. And OneDrive? OK.
Autodesk is also moving to OneDrive because they had a very good offer. If I have Office 365 and OneDrive, so that's a good offer. Anyway. So let's get started. So the idea is to understand how data is stored on Autodesk and the steps to download, upload, and interact with the files, using Data Management API.
Also review some storage providers, some best practice on Google Drive, Dropbox, Drive, et cetera. And learn how to transfer those files according to the provider rules. Actually, most of those providers, they work with the same basic standards. They have OAuth authentication. They have REST APIs. So it's very easy to implement the same idea for all of those providers. So we are going to dig into that. OK.
So if you are moving data between Forge, between Autodesk and some other storage, first you need to understand how the Forge data is organized. Then you also need to understand how the storage data is organized. So let's get started with Forge.
So Forge, all the data that belongs to the user is stored on Forge using a three-legged authentication token, right? So every data, the data is kept on Autodesk using Forge, using a three-legged, meaning the data belongs to the user. We do have our organization for that data. So we have hubs, projects, folders, items. In each item, we have several versions.
The files are actually stored on our OSS, which means Object Storage Service. But we have this abstraction layer on top, just to organize the data. So let's say you have A360, you has BIM 360 Docs, you have BIM 360 Team. All of those are hubs. And inside those hubs you have projects.
So hubs and projects are just organizations to keep the data, just abstractions to keep the data organized in a way that makes sense. And with the hubs, we have the BIM 360 Team and Docs, Collaboration, Fusion, and the OSS at the very end. So that's the organization we have.
It's important to understand that because you cannot go straight to the file, but you have to start with the hub, then go to the project, then to the folder, item, and then the version. OK? Too complicated? To simple?
Let me show you a sample here very quickly. So that's the sample I'm going to begin during the class. So this sample is actually available on the BIM 360 App Store. And the sample is bim360google.autodesk.io. If you replace Google with other providers, we have five of those samples. So it will be bim360box.autodesk.io, bim360dropbox.autodesk.io, Google, OneDrive, and Ignite. OK? Just replace the word, and the samples will be the same.
So if I replace here Google with OneDrive, you see the icon is now OneDrive. It's exactly the same thing, works exactly the same way. Let me go back to Google just because I have an account here.
So before I sign in to Google, I will sign in on Autodesk. And I have to type in my password. So my account here. And, as I said, the first layer of organization here are the hubs. So I have three hubs on my account. So let me zoom in.
So I have three hubs. The first hub here is my personal hub. It's all my A360 personal information. The second one is my BIM 360 Docs account. So I'm part of this organization on BIM 360 Docs. And the third hub is my BIM 360 Team Enterprise Account. OK?
When I use the APIs, I can recognize those hubs. So the hub is the first layer. You may just filter for BIM 360 Docs hubs, for instance. And if I expand, I see all my projects. If I expand, I see all the projects on BIM 360 Docs. Come on, come on, come on, come on.
And let me collapse this, come back here. And on this project, I see all my root folder. And then I see all the files that I have on this folder. And on this file, I have six versions of that file. So the version is actually the file. The item here, analyze.dwf, is not the actual file, but the version 1 to 6 are the actual files, right?
So on Forge, on Autodesk, we have version for all the files. So if you want to actually access a file and move that file from here to Google, you need to select a version. If you select an item, you can assume it is the last version, but you can just assume that. But actually, the version contains a location in the storage definition of the file.
So again, coming back to the goal of this class, we want to move a file from here to here, and vise versa, right? So explain the tree structure. That's how we have the data organized on Forge. So let me come back here. As I said, on Autodesk we have hubs and projects.
And under the data domain we have folders, we have items and versions. So those are endpoints we need, so with project hubs, we get all the hubs. With a specific hub ID, we can select the projects on that hub. With a specific project ID, and a specific folder ID, I can get the files.
With the project ID in items, I can get an item. And the same thing for a project ID and version, right? This is just a tiny piece of all the endpoints we have on Data Management. We have a lot of endpoints to navigate to all the data. I'm just highlighting the interesting pieces of that. And these are the buckets where you can upload files if you want to, but that's not the case here.
Well, there is a difference, as I said, about hubs. On Autodesk, we have Enterprise, Personal, and BIM 360 Docs. And when you list all the hubs, you can see the type of the hub. It is a Core, A360, or BIM 360. Make sure that you are listing the right hub you want.
Then you have the projects, and items, files, and versions. Even the items on the folder, they have different types. So item can be BIM 360 Document, BIM 360 Review Document. So make sure you are selecting the correct file you want to move. OK? Too much information? Clear? Everyone is looking, ah, come on. OK.
So the important piece here is that, if you want to move data between Autodesk and somewhere else, you have to make sure you are selecting the information that you want to move, right? So usually the files are Core file or BIM 360 file. You don't want to move BIM 360 Document with your document, because that's just a reviewing document. That is not actually the file itself. So usually once you move those files here. So that's the thing we're trying to move. OK?
It's interesting because it's not showing. Yeah, this should be red. It's almost red. Anyway, it's just the coloring. OK.
Before we keep moving here, let me show you this working. So come back here to the project. Let me sign in to Google, and I'm going to type in my password. So on Google I have folders and files. I don't have hubs and projects, anything like that. That's just the way Google works, right? So if you go to a different provider, it would be a bit different.
So on Google I can have a folder, and I have files on that folder. So I have a few files here. If I select the file on Autodesk, a file or a specific version, I can then select the folder in Google and ask me to move that file. So it's going to say, Analyze Version 5 by this name, destination folder, and do the transfer. So it's setting up.
I'm going to come back to the sample and explain more about it. I just want to show you how it works. So it's OK. And the file is right here, right? So what's happening here? When I move the file, it's going to navigate through all that projects and items, et cetera, and do this change. So that's basically what this slide here is showing.
This is actually showing the way around when I upload the file BIM 360. So to upload the file to be BIM 360, I have to call those endpoints as described there. And here I have the items and create a new item, and I have the core file attributes that I want to move the file from somewhere else to BIM 360. The way you upload the file to BIM 360 is following this tutorial right here.
So the way you upload the file to Google-- in this case, I was uploading from BIM 360 to Google-- but if I move the file from here back. So let's say I select this folder, select the file on Google, and hit the way back. It's going to send the same file to my project, and it's going to run all those endpoints. So it's going to select the project, and start to transfer, and do a transfer right here. Just wait for it. OK. And it should be done.
You see, now the file is here. So when I hit the left arrow, it's getting the file from Google Drive, creating a project, uploading a file on my project, and uploading the file there. How do I do that? I'm following this tutorial right here. That is the first one. And here are the few endpoints we need to upload that file. OK?
So this slide is just pointing out the endpoints you need to upload a file to BIM 360. On BIM 360 or A360, we also need to create folders. So sometimes we have to create a new folder to place those files. So let me go back to the sample.
So I have here a New Folder button. So if I click here and say DevCon 2017, the folder will appear right here. Just wait for it. Refreshing. OK. So here's the new folder. How do I do that? I'm calling the endpoint project, project ID folders or project, project ID commands to create a new folder. OK? Interesting. So that's how we can create the new folder. OK.
So now that we look at how the files are stored and how we create folders, we can look at how to actually transfer the folders. I did a couple transfers, but I didn't explain how it works. I was focused on how to dig into the tree, right? So now let's understand how the transfer actually works.
So basically, we have a source, a source of the file, and we have a destination of the file, and some other Cloud engine that will perform the action. So this is very generic, right? So I want to transfer a file from Autodesk to Google Drive or from Google Drive to Autodesk, and someone who performed the action. Right?
Or it can be a file from SharePoint to Autodesk or from Autodesk to SharePoint, whatever. So it's a source and a destination, and someone to perform the action. What is the action that I need to perform in this case? What is the action from getting a file from here to there?
It's actually a get to read and a post or a put to write. Right? So it's basically just reading bytes from here and writing bytes over there. OK? I can do that in a package or I can do that at once, but I'm basically just doing that to transfer files. There are many ways of doing that.
On this sample, a good way to do it, it's using a serverless approach. So I can be Forge, it can be Box, anything else. Here's my application that is doing the action. And for my sample, I'm using AWS Lambda to perform the action. So Lambda is actually doing the action of transferring the files. Right?
Why am I using that? Why this application here is not doing the thing and I have Lambda doing the task? Before I go that, do know what Lambda is? OK. Lambda is a AWS Amazon solution for a serverless application. Serverless means that you don't have an application running, you just have the recipe somewhere. So you have your code stored on somewhere.
When you need it on demand, AWS will read that code and execute that code. And then when the code is finished executing, it's going to dismiss the server and get rid of everything. OK? So you just pay for the amount of time that you use. After that, the server is completely disconnected and it's gone. OK?
What's the benefit of Lambda? The benefit is that it's really, really cheap, because it's just using the server for like 3, 5, 10 seconds and then get rid of everything. Make sense? It's just like, OK, I'm here not doing anything, and on demand I'm going to run some code. I will say, OK, is that only on Amazon? No.
So this is the Amazon AWS website with information about AWS Lambda. And I have here on the screen also the Azure, similar. On Azure, they call it serverless architecture. So it's the same idea, but the Microsoft implementation of the same idea. They have the same thing that Amazon, but they call it serverless architecture.
And the Google Cloud platform also have the same solution. They call it serverless as well. Well, that's the same idea, right? So you have your code somewhere, and on demand that code will run. Why this is interesting? OK.
So imagine that I have an application like this that will list all the files for you to select, right? And I have like a 100 gig file in here that I want to transfer. When I hit transfer, this application will need to be a very big application with a lot of memory to perform this operation because it's a big file. I want to allocate a lot of memory, a lot of bandwidth, to transfer the file.
But when I do the transfer, I need all that power. But after that, I don't need power, I just need a very basic application, right? So in this case, my application is a very tiny application. I'm paying the cheapest layer of Amazon instance to run this application. But when I need to transfer a big file, I allocate a large AWS Lambda machine to transfer the file in a few seconds and then deallocate everything. Make sense?
The other benefit of serverless is that I can run as many instances as I need. I can run like 1,000 instances at a time and then deallocate all the instances. I don't have to pay for 1,000 instances if I'm not using that. So I have a very tiny machine, and I allocate on demand big machines. And I allocate several machines at the same time. OK? So that's the benefit.
On this case, the idea of transferring files from A to B, it's an excellent use case for Lambda or for Azure Serverless or Google Serverless because the application is tiny and the transfer of files is a big machine.
The question was, how you allocate. Yes. Let me go to the pricing. So when I specify my machine, I can say that I want machines of 128 memory up to 5 gig of memory, this one here. And that to allocate the machine following that specification. And I can specify the amount of memory. I can specify the timeout.
So the machine will run for as much as five minutes or as much as 30 seconds, and then we will deallocate. But I can change that if I want. I can specify, OK, this time create a tiny machine, this time create a big machine, because the price is a bit different on each case.
But on my sample, I'm always using the same machine because I don't need a lot of memory. I need a lot of bandwidth, and there is no configuration for bandwidth, just for memory. So you have configuration for memory, for memory and time out. OK? Other questions? OK. So moving forward then.
As I said, the benefits of AWS, or any other serverless implementation, is that it's scalable. You don't have to worry about the size of your machine because it can just run different instances of the same machine. I can do multiple transfers. If I have to transfer 100 files, I allocate one instance of Lambda for each file. I don't have to worry about how big my machine is, I just launch one instance for each file. Right?
And on demand, I don't have to create a server and wait for it. I just start the server on demand and then close all the servers after that. And it's very cheap compared to the other things because I'm not using the servers when I'm not doing the transfers. I'm just using when I need it, so it's very cheap. OK.
So how it will work when I want to transfer from Autodesk to a different storage? So I have my [INAUDIBLE] application. I will allow the user to select the file on my hubs, on my BIM 360 Teams our Docs hubs. I will allow the user to select the folder at the destination. And then I will prepare the source and destination URLs and send the job to Lambda.
Remember, from Autodesk, and from other storage, it is always a get and a put operation, right? So a get from source and a put on destination. So I create a URL from Autodesk to get and a URL from Google to put. And I can do the transfer. I delegate that to Lambda.
When the job is completed, the Lambda will send back a job completed message, and my application will just say, OK, it's done. Right? Regardless how the information is stored on Autodesk, it's still a put and get operation, so that's why it's easier to do it here.
The way back, it's quite similar, right? So on the Box, Dropbox, et cetera, I select the folder. On Autodesk I select the destination folder. Sorry, I select the file in the destination folder, then delegate the job to Lambda. When the job is complete, the Lambda will send me a message that it is complete.
And one additional thing that I have to do on the way back. After the file is back to Autodesk, I have to say that this file is a version of the file, so version 1, version 2, because we control versions. The other storers may not control versions of the files. OK?
So let me come back here and show you some other things. So this is the sample running. So let's say I want to move a few files from here. Let me create the new folder here. So DevCon. Let me select a few files and hit Move.
So in this case, I have four files, right? So when I click Transfer, I'm going to launch four instances of AWS to transfer. Each instance of AWS will do one of the transfers. And that's why it can run in parallel, right? So all the instances will start and will do the transfer independently.
I don't care how big those files are because the Lambda will be really fast to do it. And as the transfer is happening in parallel, I'm not waiting in line to do it. I just do it, just wait for it. OK?
So about the code-- Not this here. About-- There is a source code here. I want to show you just some pieces of how it works so you understand parts of it. OK. This sample, it's available in GitHub, the sample I'm showing. It's available on GitHub. That's the source code. This sample, I wrote this sample in OJS, and it's the same code for all these storers. It can be a Dropbox, Box, OneDrive, Google Drive, and Ignite. There is a set of instructions here on how to get this running.
And let me show you this interesting piece. So Lambda. Local endpoint. So this is the Lambda implementation of that. As I said, the Lambda doesn't know anything about Autodesk. It doesn't know where the file is coming from or where the file is going to. It just receives the source and the destination and the callback to call me back, right?
So the Lambda is just receiving "get the file from source," "send the file to destination," "when ready, callback this URL passing this data." OK? Yes. And the implementation of the Lambda is much simpler, actually. The implementation is right here.
I mean, there is a lot of error checking. But the main idea is-- So "request from source." "On response, pipe to request destination." The rest of the code is just error checking. So it's basically "request from source." "Pipe to request destination." OK?
AUDIENCE: You don't have to chunk the file [INAUDIBLE]?
PRESENTER: Yes, you'll need to, but there are a few interesting points. Why do you use resumable upload?
AUDIENCE: Because the file is large.
PRESENTER: Because the file is large and the connection may break, right? You have instabilities, right? So in this case, it's always connected because it's using the same storage as all the others, so the connection will mostly not break. I should do it, I agree. I should use resumable.
But in this case, I'm using pipe, which is not downloading the entire file at once but pieces of the file, and it's streaming pieces at the time. So I should use resumable. I'm not using because I'm relying on the connection being more stable here, and it's faster than doing it from the desktop.
If you're uploading from the desktop to the Cloud, you must use it because the connection will break from the desktop. But on this kind of implementation, it will not break that much. So it's like a risk, a calculated risk, in this case.
But the pipe will not actually download the whole file. It's just downloading pieces and it's streaming pieces. So that's why I don't need a lot of memory. I just need like 1 gig of memory because it's downloading this. I did some testing here, and it's downloading like 500 megs at chunks, so that amount of memory should work in this case.
So the rest of the code is just doing some error checking and the callback when it's done, et cetera, et cetera. But that's the overall idea, right? With serverless, you don't have to implement a lot of things. And the way this sample works is that the sample will run and delegates the task of moving the files to this Lambda instance. Questions?
AUDIENCE: So what triggers the actual Lambda instance to come up [INAUDIBLE]?
PRESENTER: So can you repeat from the beginning, please?
AUDIENCE: What triggers the Lambda server [INAUDIBLE]? Is it something in your code or something external to delegate [INAUDIBLE]?
PRESENTER: Well, the Lambda on AWS will activate as soon as you call the endpoint. So if you call the endpoint, it will activate. I don't know that much about Azure or Google, but on AWS, they have the cold start in the hot start.
So let's say you're not running the machine for like a few hours, and you hit it the first time, it will be a cold start. It will take a few seconds to get running. The second time will be a hot start and it will start immediately. Right?
So as soon as I call the endpoint, the Lambda will start running. And I call the endpoint with a post, so my Lambda instance only receives post with the authorization header. But you see, this Lambda-- just open a parenthesis here-- this Lambda can transfer from A to B regardless where A or B are, right? So it's very powerful. So I have to make it password protected. So I just call the endpoint with the source destination, including the headers, and the authorization header, and it will run.
AUDIENCE: Is it possible to organize the transfer? So let's say I upload [INAUDIBLE] automatically transfer that to [INAUDIBLE].
PRESENTER: It should. Yeah, OK. So this sample here was designed to be OneAuth. There is no WebHook implementation on it. So if you implement WebHooks on Google or on our site, it should be possible. I did a quick prototype on that.
When you upload a new file to Docs, it automatically sends them back to the other side to keep files on sync. But keeping files on sync, it's a big, big task, right? So unless you have a very strong reason, it's a big project to do. So doing a OneAuth transfer is very easy, you just transfer.
But to keep files on sync, it's a big task. But it should be possible. There is no restriction on that. You just need WebHooks on both sides or on the side that is going to react. So you need WebHooks on Google or Box or Dropbox if you want to transfer from there to Docs. Or you need WebHooks on Autodesk to transfer from Autodesk to other storage to react to that hook.
AUDIENCE: [INAUDIBLE]
PRESENTER: Say again.
AUDIENCE: [INAUDIBLE]
PRESENTER: Yes. This is using three-legged authentication on both sides. So this sample is using three-legged authentication on both Autodesk and Google Sites, right? See, there is my account on Google. I'm using three-legged on Google.
So if I open the other sample which will be bim360dropbox.autodesk.io. Well, as I'm already signed in on Autodesk, if I click here, it's just going to, OK, I know you, it's going to authorize me already. If I click on Dropbox, it's going to do the same thing that it does for Autodesk. It's going to open the Dropbox Sign In page. I had to type my Dropbox account. I hope I remember my password.
And the Dropbox account will do a three-legged, send in the code. The same deal, right? So Autodesk and all the other storers use the exact same authorization mechanism, OAuth. So everyone is using the same.
So now I see all my files on Dropbox, because I'm using OAuth and three-legged authentication on Dropbox, the same way I'm using on Autodesk. OK? So, yeah. So this sample is possible to do that in the same way.
AUDIENCE: [INAUDIBLE]
PRESENTER: No, I'm not.
AUDIENCE: [INAUDIBLE]
PRESENTER: Is one hour. 3:45 to 4:45.
AUDIENCE: [INAUDIBLE]
PRESENTER: No.
AUDIENCE: [INAUDIBLE]
PRESENTER: OK. I thought it was 4:4-- OK, sorry about that. I thought it was like-- anyway. My bad then. OK. Sorry. So let me finish then.
AUDIENCE: [INAUDIBLE]
PRESENTER: We started at 3:45, right?
AUDIENCE: Yeah.
AUDIENCE: Yeah.
PRESENTER: Isn't one hour?
AUDIENCE: [INAUDIBLE]
PRESENTER: Oh, that's it then.
[LAUGHTER]
That explains a lot. OK. So next, as I said, we have all those samples. It's BIM 360, Box, Dropbox, Ignite, Google, OneDrive .autodesk.io. This sample is available on GitHub at that link you see on the bottom. Well, it's the same source code for all the samples. You just say, OK, I want to use this as Google or as other storage, and the same sample will work. And that's the basic idea.
You can get more information on developer.autodesk.com. That's our blog. And they have a lot of samples on GitHub. And I'll be more than happy to explain more about the sample tomorrow at the DevLab that will be at the Level 1, Galileo 1002. And that will go from 1:00 PM to 5:30 PM.
I'll be there most of that time. And you can come, bring your code, and we can review your code and find problems. Whatever you need, OK? Thank you, and see you tomorrow.
[APPLAUSE]