Description
Key Learnings
- Learn how to implement Vault multisite replication
- Learn how to implement Vault connected workgroups replication
- Learn how to set up the system environment for the Vault replication
- Learn how to configure and optimize the environment
Speaker
- FTFrancesco TonioniFrancesco is an Autodesk Support Specialist for Mechanical and Data Management products since 1998 when he moved from Rome, Italy to the European Autodesk office in Neuchâtel, Switzerland . His 20 year experience goes from CAD/CAM support, training and consulting to the Data Management and Manufacturing Enterprise Support where he enjoys his passion: to find solutions.
FRANCESCO TONIONI: Good morning, everybody. Morning. We are going to speak about replication in Vault. The intention is to explain what does it mean, what is it, why you would like to use it. And I assume all of you know Vault. That's my assumption. OK. So I will not stay too long on the explanation on Vault.
So who I am. I'm working for the support for Enterprise Priority in Europe in Switzerland. Work in Autodesk since '98. And these are my contacts.
What we want to learn. So replication is for Vault Professional only. What is it? Which possibility I can use of this replication feature. How they work. I will show you different option, what you can do with the replication, different configuration. And I hope to help you to find the most appropriate for your company.
So first the agenda. We will speak about what is the replication in few words. How Vault works, because I have to remind you, really, behind this scene what Vault does to understand correctly replication. Then we introduce the concept of sites, multi-sites. We will see the file store replication in multi-sites. Then I will introduce concept of workgroup. And then we'll see the connected the workgroup. And then the metadata replication in the connected workgroup. And then at the end, if we have time, some little tip to optimize the Vault replicated environment.
So what is replication? I think everybody know, but I will repeat what is it. What is the intention of replication? It's a technology that allow user across multiple sites, very distant sites, to work from remote location in the way like if they are in the same location with the same data, the same files, with the same database. There are different level of these, depending on the distance you are.
What is the intent of replication? The problem is when this site or user is very far from the Vault server, of course, the latency is enormous. You have to wait minutes to get response, to get answer from the server to download a file, et cetera. The idea is the replication just to minimize this time, that's all. Why, how? Copying locally the data. Depends if I store files or database. We will see what you need and what is the best for you.
How Vault works. So very quickly. Probably you know. But I have to repeat. This yellow square is the Vault server, let's say. ADMS, but not only ADMS server, but this is a web server. Consists or include also a file store and SQL database. This is the entirety is this server of Vault. The file store in SQL can be in separate server. And here it's important to understand replication. There are two separate stuff. All together is the whole server. Then the clients are something else completely outside, and they will connect to ADMS server only via HTTP protocol. They will not connect to the SQL directly. They will not connect to file store directly. They only connect to the ADMS IIS server with agility.
So what's happen? If I want a client here, I want a file, download a file. So send an HTTP request ADMS. ADMS say you want the file. I have to ask SQL if you can. Because you need the permission. Where is the file? So it ask SQL, where is the file? The file is in file store here. And you can give the file back to the user. OK, so once confirm it. ADMS server ask file store grab via normal network connection the file from the file store. Then transmit split in HTTP calls and give back to the client. It is for security reason. There is no direct connection SQL client.
So the replication consists of what? We want to replicate all these. All those depending on the level replication you want to use. So one is called file store replication. The other one is called full replication, because means that we replicate everything, database and file store.
So site and multi-site. Here we have IIS service or ADMS that is a maybe in the same machine. It's not compulsory. but you can have in the same machine the SQL and then the clients connect to this ADMS Vault server. This is called a site. Let's say you have an office in New York, another office very near, a few kilometers. You don't want to split the SQL. You want to use the same database. So in this case, you connect from one site directly to the SQL in the other one as well. So both site, they will use the same database but different ADMS server. They're installed locally. But remotely, they are connected with same SQL. The same SQL can be all on one side or on the other side or somewhere else in a third site, for example.
So the instance share across multiple sites. In this case, we have a multi-site. Let's say two offices in the same country on the same region. So what does it mean the file store in multi-site environment? The idea is, OK, multi-site environment means the site are enough near. They have enough latency. The connection is good. So they can use the same database. But the file that in comparison to the database code, the database or just a line of that are very quick to transmit the file. No. Can be very big. But they want so I want to replicate only the file. So the file is the hard work, is the long work. So I replicate the file on each site, but the database will be the same. So you can replicate that across other sites the file you select.
You have sites in New York, the other site in Los Angeles. The same database. And you say, OK, in New York, the server 2008 test, you have to grab the file only from this folder, for example. I will show you now how. File store replication is a full replication. Means that the site ask for the files. It's not that there is a main site that give the files to every site. No. Each site, depending of the select folder to be replicated, will ask to the other site for those file containing this folder. In this case, this server 2008 test site will ask this folder to all the rest. OK. If there is something, it will grab.
You say, OK, what is it what you gain here? Because you get the files, yes, but you still have to wait to copy the file. The point is that you copy the file in the night. You can select the time you want to copy the file, depending on the location of the sites, for example. So normally if you work in an environment together with somebody else, you don't need exactly the same moment the file at your site.
You can simply work on the same project and maybe tomorrow you open the inventor assembly and the file already copied in your site locally. So very quick to open during the night. This is the idea behind. So they are also replicating unreplicated files. OK, I will explain this later. Yeah, better later, because I want to introduce before the concept here. OK.
When I choose a multi-site environment, let's suppose in the same continent. You are in Europe. You have a server Vault in Germany. The client that are so near that they can connect with a good connection, they just connect directly from France, from northern Germany, from Italy. But UK is far. They cannot connect directly this client if connect directly to Germany is not enough performance. You have to wait seconds and minutes to get response from both.
So what it does. We implement another server. It's called AVFS that is another ADMS but is specific for file store replication. So assume that is the same. It's another Vault server here. The database will be the same. We connect with other bases the main site. But the files will be grabbed locally. So this client in Scotland will connect to east Vault server here locally. So very quickly. And this server locally will connect for SQL remotely, but SQL is very little data. No problem. But the file will be local. So the client will grab the file locally from the Vault server that during the night has been copied the file from the main site for them.
For example, in Spain today they check in a file. The file is here in Spain. During the night, the file is transmitted here, then here, then here, everywhere. In the morning, you have the file available. Why you want to do this? For example, when the latency is between 15 to 100 milliseconds, it's OK. More than 200 milliseconds, we will see after this is still too much. You cannot use this technology. But between these, yes. Less than 50 milliseconds is OK to direct connect to the Vault server. There is no need to replicate anything in general, but it's up to you.
For example, you want to give to external providers a server that only contains the necessary files for them, and you don't want that they connect directly to your SQL, for example. This is another reason why you want file store replication. For the provider, for example, you only share within. So replicate the file that they need is a little bit part of your file store, not entire file store.
And this SQL is on your hometown. I mean, in the main site. So data cannot access directly your database is another reason. And also when you have low bandwidth and large files. So not only the latency but also the bandwidth. If you have files very big images and hundreds of megs, but the bandwidth is very few, so you want to prefer to exchange them during the night without load on your network, for example. Let me see.
So the file store replication can be scheduled, can be prioritized, it can be on demand. Replication schedule. What does it mean? I want to replicate the file when? Every day. All the day of the week except Saturday, Sunday, for example, because I don't need. When? One in the night. With who do you want to start replication? Let's go back to the previous prioritize the replication. Not this. Next. Sorry.
So when you want to share a file, you don't go to share, for example, from Spain to Sweden. You want to search the file first in Germany. It's nearest. This is the prioritization simply for that. So in Spain, you prioritize Germany, second, UK, third, Sweden. So if the file is here, [INAUDIBLE]. If it's not here, we search for next one. OK, it's not here. So we'll go to Sweden. Only if it doesn't find before. This is the reason for the priority.
And the last one is on demand. So you can ask, OK, the file replication denied, but they find it immediately. What I do? If you need immediately, the file will be immediately copied. But of course, in this case, you have to stay to the latency and speed of the network. Because you call the file, it will arrive from the nearest place, but will arrive-- will need time to be copied locally. So when you open a file that is not replicated, it will ask you, OK, this file is not in your local server. Do you want to copy it? Yes. And would be copied to your local server. So any questions on this? It's clear the concept of the file store replication install, yeah? OK.
So now we have more distance. We have multiple sites, a multi-site environment, we saw before. We can call it a workgroup. It's a series of sites connected to the same database is a workgroup. So if I have a workgroup in Europe like before, another one US, we connect them with a replication of database. Not directly, because it's too far. We can not connect from US to Europe directly to the SQL database. It's too slow. So we replicate a database across the two workgroups. And so in this case, we speak about connected workgroups.
So connected workgroup use SQL replication technology from Microsoft. And difference with the file store SQL replication is immediate, is real time in the same moment, every minute. Every minute is an example. The SQL will ask to the other SQL if there is something to replicate every minute. And it's continuously replicating the data.
So the data maximum one minute you will have the updated data on the other side from US to Europe. One minute you have update. I check in a file in US. In one minute, you see the file checked in in Europe. You don't have the file. You have this. You see the file in your Vault interface that is checked in, but the file is still in Europe. Will be replicated during the night. It's clear this concept.
The SQL replication used is called merge replication that is works like this. It's just asking every minute to copy the data that is in the last minute has been created everywhere in any workgroup. Workgroup can be this simple. Two workgroups can be, I think, maximum 12 or there's a limit in Microsoft, something like 15 or 12.
And we have to define a publisher and subscriber. Publish is the main one. In this case is publisher with this workgroup here. And subscriber in the secondary workgroup, you can say. The data is the same, but we need to consider publisher. Because the SQL data will go always to the publisher and then distributed to the others. Meanwhile, the file store replication we go directly from one server to the other directly without passing through any main server. It's direct.
So this is an example. This is a multi-site environment with one SQL server. These are the sites that replicate the file store between them. Now I want to connect to US from Europe. So this is the connection that will bring the database data immediately to US to this publisher, for example.
You see all the connections go to a publisher. So if I want to pass from Europe something to China, because the publisher is in US, the data will go first to US. The publisher we load it. We'll recognize it. And then we'll distribute to all the others. This is the way it works. It is immediate. It's very few data in a few seconds depending on your latency, of course.
So what's happen if somebody modify a file and then in US, in Europe, modify file in US and the same time modify the same file. You want to check out in Europe. At the same time, US they want to check out. It is not possible, because there is this-- because can create conflict, of course. But there is this ownership technology that we block. Once you get ownership of a file or a folder, you have ownership. Nobody else can modify it. So this avoid conflict that somebody else can modify it.
This is a screenshot, for example. It's automatic. You don't want to do anything. But it will tell you, for example, if you want to check out the file. There is an icon with a lock. We will see after. It will tell you, no, you cannot because you are not the ownership. Somebody else has the ownership. You can see who, like in this case.
The other server has the ownership. Also you can set at least until. So you want to be owner of that for one week, for example, because you are working often on that project in this moment, for example. You can change in any moment and no problem. Normally it's not needed to do anything. It's automatic. If nobody else has ownership of an element, of a file, you automatically get it when you want to modify it.
So let me try to show you something. Let me-- where is it? Here. So I have to-- I start Vault. Where is my phone? OK, I have two servers replicated in a virtual environment here in US, I think. I will connect to them. So one is the publishers. This IP address, for example. At the base there is the Vault. OK. I connect here. Then I open another Vault explorer. And connect to another server, just to show you the difference what's happen. In
OK, with this explorer, I connect to another server. 243, for example. This is a subscriber. OK. I have some file here. Just a little assembly. And also, yeah, I open the same folder. Because it's replicated, I have exactly the same data. OK. I can reduce these. I can move these. I can reduce this.
Our first difference. You can see here in the subscriber this icon. That tells me that this file is not on this site. OK. This means this file is not there. It's not replicated. It's still somewhere else. Meanwhile, on the other side, you see is all clear, because all the files are there. Here is all clear. Means all the files are here in this site that is different. This is a 44 IP address. This is 243. It's different server. We are connected to a different server.
OK, let's try to change something. For example, I check out this.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: It's called replicated. How it's called? File replicate at SQL. So the file check out. Here nothing happen. We have to wait until one second. But if I try to check out 62, for example. [INAUDIBLE]. Because it's [INAUDIBLE] just a moment. One minute. Must be there. Yes. So here is check out. Here is blocked. I was not able to check out as well. It was not possible. And now it tell me that it's blocked.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: Yes. If I try again to check out. To grab-- no, that would tell me something. Is there reserve to another site. OK. So now it is check it out. That is passed there. The file is still not there. Let's try now to open a file that is not here. This icon means it is not replicated. But I need it. I need it. So I want to grab it. I want to get it. Yes. So it tell me there is no file. I cannot find the file you want. Would you like to copy now? Yes.
OK, the icon is gone. Now that the file not only is copied on my local work folder, but this copy on my server. In fact, if I delete from my local work folder. 37A. I can delete it. Refresh. The icon is gone of the local file. But this is still local. It means that is in on this site, on this server, I have it quickly, because it's on the file store I own. It's on my file store. OK. So any question about this? There are two kind.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: Yes.
AUDIENCE: And will it ever [INAUDIBLE]?
FRANCESCO TONIONI: No.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: Yeah. You don't need to know if it's publish or not. It's just internally behind this scene that there is this concept that the user is transparent. It is the same.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: [INAUDIBLE]. If you have two sites, no way. If you have five sites. The moment the publisher is down and you request the server, it will go to search to the other servers if the file is somewhere else.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: We grab it.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: Not the file. The database.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: The file you can grab and steal. But the database is true. If the publisher is down and you want to do something, you stay local. If the publisher is down here, you can still work. But nobody will see what you are doing. If you are ownership, you can still work. And the other cannot do anything, because your ownership is to you until the publisher is reconnected. If you are not ownership here, you cannot take it.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: If you don't have ownership and you don't have access to the publisher because it's down, you cannot take ownership.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: Exactly. Pay attention. The publisher and the subscriber are connected SQL. If the connection is down for default 14 days SQL assume finito, finish, it's gone. And all data is lost. Pay attention. You can not comment anymore. So that stay local only.
So if you implemented full replication with SQL to pay attention to these settings, we can set up a more relaxed time that the subscriber expire maybe 28 days instead of 14. Set up a monitoring tool that tell you if some subscriber is down and stuff like this. Because you don't realize. You work on your subscriber. Everything works. It's OK. And maybe it's not connected to the publisher. You don't know.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: No. Publisher is publisher forever. If you want to change a publisher, you have to remove all the replication and reactivate it. That is not a big deal. In ADMS, you remove all replications. You pay attention. Everything is replicated is OK. Once everything is replicated of the file, you remove the replication and reactivate but from another publisher, and you can change the publisher in this way. Let me show you.
OK, so we already spoke about this. It's important to choose the correct priority. The file store, the file, are replicated directly between sites. So this site, this is the site in North Africa. We'll ask for the file is entitled to ask, because there is the least of folders, to the other site. It will start randomly, depending on the site on the list in ADMS, but of course you don't want to start to search for files in California or in Australia if you have nearest places. So in the priority you set up, these first, then the other one, et cetera. This is the priority important.
But also another here to pay attention to this when you create a your environment. Let's assume your company is in Greece. But 80% of the work is done in between China, Thailand, and Australia. So there is no sense to set up a publish sharing in Greece only because your main office is Greece when the work is done here. So it's a lot better, and you see in these two example, to set up a publisher here, because the data will go very fast between the place that they work more and replicate it also to Greece, of course. It's lot shorter instead of this.
So the Chinese needs to give some data, I mean, to share data with Thailand, it must go across Europe. That is no sense. So pay attention. It doesn't care, publisher and subscriber. You can create a backup from subscriber. There is no obligation to be a publisher. You don't have more rights to be a publisher. And also workload. Also workload consideration. I don't know. For example, from China and Thailand, it's very near, but the line is worst very, very slow. So maybe it's better this. Depends on your environment.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: Exactly. Not the file store replication. You have to choose what you need. Normally, average, average speaking. The customer in the world, if they work inside the same continental, same state, they use only file store replication. No need to. Because a lot more complex to introduce SQL replication than only file store replication. If you can avoid is better. So normally, yeah. If between continent, file store replication. In between site same. And between continent, across continent, full replication with SQL.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: If the work is done mainly in US and then in Germany they just need to grab some files sometimes to look at them, if they accept the performance, yes. That depends on if they're--
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: You have to see. If they are two user, they had to wait. The problem is that when they hit the interface of Vault and they want to check out the file, they have to wait, because the SQL need to transmit from US to Germany directly. If it's a sector, it's OK. But important also is key point that is a completely scalable. You can start with these without anything. No file store replication, no SQL replication. You can add file store replication or SQL replication when you want. There is no problem.
OK, so are there settings for-- let me see. OK. Some other question? Why I don't see? Wait. OK. Sorry.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: SQL just manage that-- not one single file. [INAUDIBLE] permission, property of the file, and that's all. Yeah, the version number. Not the version of the file. The file will be still in the file store. The properties. Yeah.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: No. The full replication is only for metadata. You will have always updated this data. The check out, check in, permission, et cetera. But the file will be always file store replication. It will follow the same rule. If or not you use the SQL replication, the file is-- the way to replicate is still the same. In database, the other way. Let's say that file store replication we have in both cases and database replication is an extra that you can add if you need.
So let's see quickly the option in ADMS. So in this server, for example, you open the publisher. OK. Here my ADMS. You can see? Yes.
So workgroups and file store. The file store are the sites that you have in this environment. Actually, we have one, two, three sites. Workgroups are the workgroups. And they have only two workgroups, one subscriber and one publisher. Why I have three sites? Because one site is owned by one. I think it's going to publisher. I remember now. So two sites in one workgroup and one site in another workgroup.
What I can do here from the ADMS. You can create more workgroup. You can manage your application. You want, for example, I have this Vault. What you want to do with this Vault? Now I have only one subscriber. It's already replicated. You have the least of your Vault here on the left. And you can pass on the right, depending what you want to replicate. And it will be replicated there.
You can replicate site now. Means that if you need replicate all the files now, no wait for the night. The replication priority. We saw that before. If you right click on the Vault, you have the folder that you want to replicate. In this case, are all. So in this site, ECS, blah, blah, blah, I will ask for all the files contained in all the folder to all other sites. As we clear this concept, it's the site that ask for the file. It will not receive the file automatically. You need to ask for them. If I set up like this, what does it mean? That I will have all the files always. As you can see there, if I had that setting, I replicate the files 00. So I have everything, because I select all the folders in list.
What else you can do here? The priority you saw already. No priority. It will follow the order. Or you can add on the right. And you are the priority for that site to be replicated for the files still. Let's see. Yeah, let's see quickly. What's happen on the SQL site? In the SQL, you have here subscription replication. This is yes, yes. Wait. Refresh. OK. The remaining amount is low. Yeah. OK. Local subscription. This is not the server. OK.
These are my publication. SQL you can manage also the replication of the file. But the important part here, you can call this replication monitor. They will tell you the situation of your database replication across all your workgroups. So the Vault is OK. Let me see what's happen here. It's synchronizing. The delivery rate is this. Last synchronization is that. How long is synchronized? Now is everything OK? If I select one, I can go to details, for example. I can see that this synchronization is waiting 60 seconds. That means it's OK. I have everything. I'm just waiting the next minute to grab more data if there is more data.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: The files once a night or when you want. Or never.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: Yeah, yeah, yeah. You can continue working. You don't realize that they are not connected. Their database is continuous connected. But if it's not, you stay with your database locally. You continue working. After two days, connection will be OK again. All the data you accumulated for two days will be transferred to the publisher and redistributed to the other, and you will grab all the data from the other that was created during those days.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: File store you don't need, because it's one database. That is not this problem. If you go down, you don't work.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: With the workgroup, you had file store and SQL in different place. With the file store replication only, the database, the SQL must be E1. It's the same. Connect everybody there so it's immediate. If it's down, it's down. Nobody can work.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: Yes. Knowledge vault master-- automatic. When you create a replication workgroup, when you activate workgroup replication, automatically knowledge vault master will be replicated. You don't need to do anything. It's behind the scene will do it for you. As soon as you activate it. Also, if you don't have any Vault replicated. Knowledge vault master must be replicated. Yes.
I think that's all. Let me see if we have some other question. Yeah. Last notes and a bit more advanced. If you have problem, the issue here is the connection. When you connect really from US to China, the latency is so high that sometimes the connection go down. There are more problems. So maybe you can customize the replication profile. I will not explain you how, et cetera. But you have to know that exist the way to customize the replication profile in SQL.
So you can set up different settings to have a more reliable connection. And maybe you can increase the network performance implementing something like this accelerated, like Riverbed, stuff like this. This is what they saw in our customer. Because if you have thousand of users in China, it's a lot of data that is continually replicated across the world. And yeah. Any other question?
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: Exactly. No. So if go down the publisher, the ownership will stay to you, and you can use it. But also if somebody during normal work, if somebody else ask ownership will be given, of course. Yes?
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: Publisher. In theory, also on the subscriber, then they will be replicated to the publisher the change on the database. Yeah. A backup normally is a publisher. The publisher doing back up will check. There is an option. It will check if all files are replicated. So if the option is on, when you back up from ADMS, it will replicate first all the files. And only when all the files are replicated we started the backup, for example. OK. No question for you. All is clear. [LAUGHS]
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: The icon. Yeah. Yeah, this one.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: Yes, because you can.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: No, because I'm the same user. I am the same user. So only put the lock. If you are different use, like in the reality it is, will be the dash that say that is checked out by. So nothing else for me. So that's all. Thanks for your attention. Remember to give a survey, because survey is important for you, not for us. We have to understand what you prefer when you [INAUDIBLE], so to propose the classes for you.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: No. The computer name doesn't care. But when you created ADMS backup, all the replication is gone. You don't restore replication. You restore a normal, standard Vault server. Wherever you want with a different name, wherever you want, but then you if you need the replication, you have to recreate the replication. But in theory, the data is all there. All the file store and all the database.
AUDIENCE: [INAUDIBLE]
FRANCESCO TONIONI: No. But there is no way to back up the entire environment with a subscriber. No. OK. Last chance. OK, thank you.