Description
Key Learnings
- Learn how to analyze a vault for data issues
- Learn how to set up corrective actions to clean the vault using PowerShell
- Learn how to create some weekly tasks for reporting
- Learn how to utilize the job processor to help maintain a clean healthy Vault
Speakers
- Kimberley HendrixBased in Tulsa, Oklahoma, Kimberley Hendrix provides custom solutions for lean engineering using Autodesk, Inc., products and industry knowledge to streamline design and engineering departments. Hendrix has worked in the manufacturing industry for over 30 years and she specialized in automated solutions for the heat exchanger industry. She has worked with Autodesk products since 1984. Hendrix is associated with D3 Technologies as the Manager of Data Management, focusing on data management, plant, automation, and mechanical issues
- LDLauren DrotarLauren got her start back in 2009 when she began attending her local technical high school's drafting program. Since then, she has gone on to pursue her mechanical engineering degree and worked in numerous sectors of the industry- including firearms, diesel engines, and fluidics. In 2020, Lauren transitioned from being an Autodesk customer to a member of the Autodesk partner channel when she joined D3 Technologies' Data Management team, focusing on enterprise integrations using Data Standard and coolOrange tools. She was lucky enough to attend AU in 2017 and spoke at the 2019 Autodesk Accelerate conference as well as AU 2019. When she isn't working, Lauren can be found reading, hiking, mountain biking or spending time with friends.
KIMBERLEY HENDRIX: Hi, thanks for joining my class. This is Vault Administration, Advanced Administration. I'm Kimberley Hendrix with D3 Technologies out of Oklahoma. And I'm going to spend the next 30, 40 minutes or so talking about how to administrate your vault in an advanced way using some PowerShell and some techniques that I've learned over the years.
So with that, we'll get started. Our objectives today is we're going to learn how to analyze the data in our vault for these kind of issues-- some useful reports and some data pools. We'll learn how to set up some corrective actions to clean the vault using PowerShell and using the Job Processor.
We'll learn how to create some weekly task. I refer to them as Cron jobs that create reports and tasks to keep the vault and the job processor clean. And then we'll spend some time working on the Job Processor to help maintain a clean and healthy vault, a self-cleaning vault, and a self-cleaning processor. So let's kick it off.
We're going to start off with analyzing vault because if you don't know what's wrong with your vault, it's a little bit hard to clean it. So we're going to start with some of that. There's three ways to get the data that you need in your vault.
The easiest one are the out of the box reports. And we'll look at the duplicate files and a way to correct that. That's out of the box report. And we'll use that report to do some corrective action.
There are some custom queries that we find helpful when we do maintenance checks or wellness checks or cleanups of vault. And I'll talk through some of those that we do. And then some PowerShell scripts that we create that help us clean your vault, maintain your vault, improve your vault, and keep it as an overall healthy environment.
So we're going to start off today with the out of box reports. And I'm sure most of you guys have seen this. And it's pretty handy to look at the duplicate files report that's right out of your vault. Duplicate files can create unknown errors in your vault. There's a setting you can turn on that says Only Allow Unique File Names.
The problem is that some people have had this vault for 10 years or whatever, and they started off without that box checked. And then they're like, I can't ever go back because I have so many duplicate files. I have part 1 everywhere.
I have this half-inch screw named the same all over my vault. And if I check that box now, I get a lot of errors. And if I go run this report-- which I do this by going to Tools, Administration, Vault Settings, which you see here, and then I check on the Find Duplicate Names-- and it creates a report like this.
Looks pretty benign until you hit that little Details button. And it comes up. And there's hundreds and thousands of duplicate files. Well, you can go through and clean them. Like, this one part I've got right here, there's three of them in there and I could go clean them. Some people have hundreds of the same files, and it would take weeks and weeks and months to clean that up to be able to successfully check that little box.
What I'll have you do when you run this, after you run it and get to this details, and we'll go into File and Export that as a CSV file-- we'll save that for a little later. And I'll show you a PowerShell script that will help rename those files and clean that up so that we can check that box.
So going forward, everything is unique. And then how to make that a self-cleaning issue as we go find That. So we'll get to that in a minute. Just save that CSV file and we'll come back to it.
The other thing we've got are some custom queries. And I'm going to show you some of those live, as well as the one I've got. But what I look for when I'm doing a maintenance appointment or a health check on a vault with the customer is a few things.
One are files that are checked out for x amount of time. If you have files that have been checked out to user George for two years, that's a problem. Obviously, those aren't in use and that needs to be cleaned up. I look for files that have been checked out for more than 30 days typically when I do a maintenance check. And I give my customer that report so that can be cleaned up.
I also look for change orders if you're using that feature in Vault Pro that haven't been modified for a period of time, that aren't closed. And I'll show you how to do that search as well. And then the biggest thing that I find in vaults are visualization attachments not there.
So if you use your vault and you use it to its full extent-- so you're using the thin client, you're using the Preview, and you're doing checks and balances with it, then those visualization attachments are important and not having them creates errors. So the thin client can't see the preview. So files without visualization attachments is also an important source that we'll do. And I'll show you how to clean that up and set that up to clean up on a regular basis every week to keep that clean.
And then orphan files-- part files that are without a parent-- it's rare that a part file should not have a parent. There are some occasions. But looking for those orphan files will also help clean up your vault.
This sample I have on the screen. And I'll do a couple of live here in just a second as well. You do the Find. I use the Advanced Find. And I set my file extension to a part.
And this is a relatively new property that's available. It's called Has Parent. And so I set that Has Parent property relationship to false. And it gives me a list of all of the files that are orphaned.
And I can run a report on that. Thinking back to what I had to do with the duplicate files, I can run a report, a table, an export at the CSV. And once I have that CSV file, I can make PowerShell then execute on that. And I'll show you some of that.
Let me pull up my vault and show you a couple of other things that we look at as well. So this is my standard demo vault that's one of the original ones from Autodesk. But if I start at the original at the Project Explorer, I can look for all files-- and let's just do files-- whose file extension-- file extension contains. We'll just do IDWs for right now-- and who's visualization attachment is none.
If I find that, I'm going to find a handful of files. And my vault's pretty clean. I have four files that do not have visualization files that are IDWs. Now, that's all cool, and those four files, I could just randomly queue them.
But if you're like most people out there, you're going to have hundreds of them, especially the first few times that you run it. You know, visualization files get detached on a lifecycle change or a property update or checked in. Or they get checked in and it's queued. But somebody changes the file before the job? Processor run. You get a non-tip. You don't get a visualization file.
But for whatever reason, those things need to be cleaned up. And I have a script. And I'm just going to show you real quickly-- let me pull up my PowerShell. So this script here, I call it D3files create the width.
And it is a pretty simple script. It does a search. And the reason I pulled it up right now without talking about PowerShell is because I want to show you the search function in here, and I want to compare it to what I did here with the manual search inside of here.
So I'm looking for-- I open my vault. And I do use a utility from coolOrange called Power Vault. It is a gateway into the API, which makes things much easier. Otherwise, this opening the vault connection in the API or in PowerShell would take about 25 lines of code.
With them, it takes four variables and one line. So I use that. I set up the property-- I get all my property definitions for my file, and I look for the actual property name called Visualization Attachment. And then I set up search conditions.
And you'll see I have three search conditions set up here. It would be as if I put three here. This says, My Visualization Attachment, which is this first one this, file name Prop.Id is none. The second one is my file extension. And it includes for this first one DWGs.
And my third one is Checked Out By because I don't want to try to execute a PowerShell script on a file that's checked out, like this box 1, 2, 3 because it would just fail my script. So I look for those three and I run a search.
This is a standard API call for find files and search. And it returns an array of files. And I've actually ran that down here already down to that point, down to this search right here. I ran it down through 117.
And if I do a files count, I get five files. So if I change this-- and let's just do that real quick. If I change this to DWG and replace that in Find Now, I get seven files, one of which is checked out already. So I wouldn't get that in the search using my PowerShell.
So then my PowerShell goes through. And it says, OK, if I got five files, I do a second search for IDWs. We won't go through that just now.
But then I say, OK, I'm going to set this to run every week. And I don't want to do more than 1,000 a night or 1,000 weekend because I want my Job Processor to be able to run regular scheduled jobs throughout the day. So I'm just making use of my job processor during off hours.
So that's the reason I have this limit of 1,000. I only have seven, so it would run really quick. And I'd say, for each file in this list of files that I have, I'm going to get, get the file ID, and I'm going to queue property sync, which if you queue property sync, then it runs a job property sync, and then it queues a DWF.
And then I add my counter to 1. I write that I have done that limit. And I count it and I do it. So if I ran this right now, it would cue those six or seven jobs for me right away, and it would run overnight.
And we'll get into more detail in this PowerShell a little bit, but I wanted to show you how I can do this same search that I do inside a vault using PowerShell and then affect on that change. So I don't always have to run a report and use that CSV file. With duplicate files, I will have to run that CSV file because getting that duplicate file search in a PowerShell search is a little more difficult. So it was quicker to do it with the CSV file.
OK. If we were live, I'd ask for questions, but we'll have that for our normal time. So let's move on. OK, so, now, we have all that data gathered up, and we have PowerShell scripts that run, the run searches. We have CSV files that we saved from searching. And what do we do with that data?
Well, that's when we can make educated decisions on that data and run things like renaming files and generating DWF files and moving files and adding lifecycles or whatever it is that we need to do that makes your vault work to its optimal level. So let's look at some options.
Duplicate file names is the number one request I get from customers to be cleaned up. That causes so many problems with copy designs. And I can just go on and on. That's the number one I get.
And so if I have that CSV file that we created in my slide several back, then I can execute on that. And I'm going to show you how to do that in a second.
The other thing that I get is-- we've had Vault Basic for three or four years. It's a great product, but now we're ready for the new features in Workgroup Professional. That's fantastic. I can upgrade that from Vault Basic To Workgroup or Pro.
Now have, all these new features, but my data is not in a position to take advantage of that. So how can I go through and make all my files be in the right category with the right lifecycle with the right lifecycle state in less than the two years it would take me to hire an intern to do that, right? It takes weeks and weeks and weeks to do that.
The other question I get a lot is I have a stack of data. Maybe I acquired another company, or a vendor gave me 5,000 files and an Excel sheet with all the information around them. I loaded all that into my data, and I have all this information that I would like to have in my vault to make it searchable, but I don't want to type it, right? And I'll show you how to take that data and update it using a CSV file.
So let's look at the first one. This one is my duplicate file rename. And I run this. It's not recommended that you just run this on the fly. We typically do a lot of testing and run it on a test environment before we do it.
This code snippet here that I have on the screen will log into the vault-- that's what we got up here, the username, password, open the vault-- and that's going to import the CSV file. So that's the CSV file that we created earlier on in the class from our duplicate files. And I just took everything out of there but the list of files because that's all I need is to list the file names.
And then I-- oops, go back. There we go. And then for each row in my CSV file, I execute stuff on it. I go find the files that have that name on it. And then I take those files and I sort them by date.
This is how I've done it. I can do it however the customer or however your environment needs it. But I sort them by date. And I take the newest one, the most recently and I skip it. I leave it, whatever that original file name is.
And then starting with the next oldest one, I rename it to the same file and I underscore duplicate 1. The next one would be the same file name underscore duplicate 2. So I'm just renaming the files in there.
I'm not doing a search and replace. I could, that would be a little more complicated. We'd have to open an adventure and do some stuff. For right now, I'm just renaming them with duplicates. That gives me another search opportunity to have an intern maybe say, OK, go look for all files that end in underscore Dup, and I can start doing search and replaces, or renaming because they might not actually be duplicate file names. They may just have the-- I'm sorry, they may not be actual duplicate files, but they have the same filing.
So we could rename them to a logical 1 and that cleans it up. And eventually, you would get rid of all of underscore Dup file names, and you have this very clean file with all unique file names. And then you don't get those errors anymore.
You can check that little box in your settings, and it's clean from that point on. If somebody tries to check in a file then and that file already exists, it'll flag them with that error and have them renamed.
Lifecycle updates, that's where you've made the move from Vault Basic. Or maybe you've been in Vault Professional for a while, and you have one rogue department who's like, I don't need workgroups or I don't need workflows. But now it's time to get everybody on the same page.
So I can create that search-- if you think about the search that I showed you in PowerShell-- that looks for whatever it is I'm looking for-- all the files in this folder, all the files older than-- maybe you're going to take all the files older than five years ago, and you're going to move them to a category that's called obsolete, and you're going to change its workflow state the obsolete workflow definition and set them to an obsolete read-only file name.
We can do that with PowerShell. So we create that same search that we talked about. And then we iterate through each file, and we perform an update vault file. Update vault file is a cmdlet that comes in Power Vault. And I'll show that to you.
So in this instance, I found my search status. The search is above on this. And all of the PowerShell scripts that I am showing you in today's class will be available in the class documentations on AU. So you'll have access to these files.
So I'm going to search through each file, and I'm going to execute an update. So I'm going to get the file. So I have the file ID. And then I'm going to update the lifecycle using Update Vault File with the full path. I'm going to change my life cycle distance to engineering my status to Release.
I'm going to update my counter, and then I'm going to iterate through the next file. So if you think about doing all those steps manually-- select a set of files, right click on them, change the scheme, right click on them again, change the status, or move them to a different folder or out of property-- it takes a really long time.
This script will run through 1,000 files in about 20, 30 minutes max, maybe even quicker depending on your systems. This runs client-side, by the way. It uses the vault client API to do that. So you don't even have to be on the server to do that.
The next one that we do is a data dump from another source. This is real common when you're coming from an ERP system, or maybe you're changing ERP systems, or a vendor has given you a list of stuff. It's pretty common then to get this list of file names and associated properties.
Or the last one I did, a customer of mine acquired another division. And we brought all their files into their vault. And we did a dump out of their old ERP system with all the properties of lifecycle states if they were released or obsolete or whatever it was. And I could do that update all at once using the CSV file.
So instead of having to do a search for files like I did in the previous ones, I imported a CSV file. In this instance, I called it properties dot CSV. And in that CSV file, it has a file name and five properties-- division, division manufacturing, business unit, business unit manufacturing, and region manufacturing. Those are the five properties that I need to update.
And so this Excel sheet is just-- or CSV file was just name and these four across there. So I go find the file for each entry in the file. I do a Get File By the File Name. So now I have this as a-- this returns a file object, which tells me all things about the file, the path, ID, all the properties. It gives me all of that stuff.
And I'm creating a hash table of my property array. So I'm saying my division, which is the name of my property in my vault, equals. And this entry dot division would be for each line in here. So it would be that column from that field.
And I do that for each of the five. And I convert that to a hash table. You could type it out individually, but I use the hash table. And then I say, update the vault file, that full path, these properties with this hash table that this creates. And it, just that quick, updates all five properties for each file in that CSV file. And it can, again, do thousands or hundreds of them at a time.
So that's how we update a whole bunch of properties using a CSV file or an Excel file. I typically use CSV rather than Excel because it's quicker. Excel, I have all the object and heavy stuff behind it.
I can do the same thing with Excel. I can read from Excel. But typically, I go into my Excel file and export it to CSV because it's like text, and it runs much faster without all the overhead of actually having to run Excel.
OK, let's take a breath and let's show you a few things in the vault before we go into the server and Job Processor. I'm going to show you a few of the scripts that we've talked about and kind of run through them. So the one that we talked about-- let's go through the update lifecycle ones in-- yeah.
So in this one, I'm looking for all files in the base category that have been modified since 1995. So if you think about your search, if you did this search inside a vault, you would add-- for each one of these search conditions that you see here, it's one of your conditions in your standard find file.
And so I'm looking for a category name that equals- in this instance, I have part machine down here. The classification is designed visualization and that it's in this part machine lifecycle. And I get that list of file. Your search conditions can be whatever's important to you.
And I'm going to do 5,000 or so a night. And I'm going to get that file. And in this instance, I'm going to change all of them to the engineering lifecycle at the status release, all of them. That's my criteria for that one.
I can have many different search. and I could say for this group, change it to obsolete; and for this group, change its release, but it's a pretty simple script. The other one, let's talk about is creating DWFs. It's just so important to do that.
So in this one, I have this one. I'll talk about Cron jobs in a minute which does it on a timely thing. I have this running on most of the vaults that I manage that has Power Jobs which I'll talk about in a minute.
I have this running every weekend for most of our vaults that we manage. And it goes through and it finds-- it could find all files-- IPTs, IAMs, IPNs, IDWs. In this example, I'm just finding the drawing files because that was the most important to get through first.
Once I get all the IDWs and all the WDG with visualization files, because those were the most important, then I go back and I add in all my part files and assembly files and IPN so that I get them all cleaned. And then eventually, once my vault stays very clean, every week, I take the file extension completely out. And I'm like, I want any file that doesn't have a visualization file, and I want to generate it. And then I put some catches in there for PDFs and document files so that I ignore them.
And I run so many a night or so many a weekend every weekend. And I keep this job running continuously on the vaults that we manage because that makes you have then what I refer to as a self-cleaning vault so that it stays very clean. It keeps the property synced in all the DWFs.
These last two we'll talk about on our next section. So the next session is server and Job Processor Cleaning. The Job Processor is a tool that is invaluable in vault administration. You think about the job process and you're like, yeah, it's over there in the corner, and it generates DWFs When I check in a file or when I do a lifecycle transition, or it generates PDFs. That's new a few years ago.
The job processor can do so much more. Some of the stuff that we do in here-- and this is just some cleanup stuff that I do-- is if we have a managed services agreement with a customer, then we want to check this stuff on a regular basis, like, the size of your SQL database.
If you're running SQL Express-- most people start that way-- then there's a limit of-- I think, it used to be 10 gig per database. I think it's 12 now. I think it's changed.
But if you hit that max, if you bump up the-- it just shuts down your vault. And then you're down until we buy you a seat of standard SQL and get you upgraded, also, drive space where your file store is.
If you max out your file store drive space, then the vault just shuts down until you can improve that. So we want to be proactive on that. Same thing with backups. Are they running successfully? Is your Job Processor clean?
I don't know if you've managed a Job Processor before. It's running a bunch of jobs, and it downloads these files to the temp directory. And yeah, it's supposed to clean up after itself. It's supposed to clean those files, but it doesn't always.
If a file errors, then that builds and builds and builds and builds, and before you know it, your temp file is three or four gigabytes, and your machine is running really slow for your job processor, and it can't keep up with your day-to-day processes.
Same thing with orphaned processes. If you're not rebooting your Job Processor, every so often you'll get orphaned design review or Inventor Server or other ones. And so we do some self-cleaning stuff. So I'm going to talk about how we do that. And then we'll talk about a Cron job.
So a Cron job is-- so first off, to be able to do a Cron job or a timed event, we partner with the coolOrange folks. They do a couple of classes in AU as well. You can see-- I know Christian's doing a class this year, so you can see some of those.
They have a product called Power Jobs. It utilizes the vault out of the basic Job Processor. I always say it's the Job Processor on steroids, so it adds a layer on top of it so that we can run any job that's written with a PowerShell script.
So you think, oh, that's cool. Then I can do fancy PDFs, or I can write a STEP file, or I can write it and I just file out. But I can literally make it run a job with anything having to do with PowerShell. And if you're an IT guy or an administrator, that means I can do server status stuff, too. I can do File system stuff as well.
So to do this on a timely basis, we do something called a Cron. Back in my old HP Unix stuff, we used Cron jobs a lot. But there's a Cron trigger. You can create these crons using cronmaker.com, and this comes as a sample.
The base part is this line right here, it's time-based. And this means that at 8 o'clock-- that's the 0, 0, 8-- once a month on the third Sunday, I'm going to run a job.
And I'm going to write it on this vault. And I'm going to set it as a priority 10, and I'm going to throw this description in there. So if you look at your Job Processor and it's gets cued, that's what you'd see is this. This job is triggered monthly on the third Sunday at 8:00 AM.
And this is called a settings file. And a settings file needs to be named the same as a job, and then it will run that job based on this Cron settings. And so the one I'm doing for this is size of SQL database. The size and the free space of the driver of the file store is located, an then I email those results to an administrator.
Alternatively, we can send that information to our database and then manage it with our managed services. But this is internal, so we'll go with the internal one. And let me show you what that script looks like. Has absolutely nothing to do with vault.
It runs using the job processor on the cron, but it doesn't even log into the vault. All this does is it sets an empty string for the email text just so I can create a body. It gets today's date in a string format. This CR sets and carriage returns so I can make the subject line pretty. And I'm going to run some of this manually for you.
It's going to do a call. For the database size, it uses invoke SQL command. That's a part of Microsoft that we can get with PowerShell. And it's going to query my databases and my Autodesk vault instance.
And then I'm going to just get the I only care about my vault database. So I type it through where-object. And then I do some fun stuff, so I divide it by gig so it's easier to read.
And then I set up my email string. So my first email says-- the first line of my body says, server status for date today and then two carriage returns. My second line is going to be that plus this. And it says, the database size for the vault mono is the database size.
And you'll notice I did some ifs. If my database size is less than 1 after I do the division of 1 gig, then instead of putting a number of 0 in there, I do an if statement, and I say it's less than a gig. So it's nothing to worry about.
Then I do the same thing for my disk size. I use a Microsoft object. And I call a remote computers. And I filter it for my C drive because that happens to be where my file store is on my system. And I get just the size and free space.
Now, I do the same thing. I reduce that divide it by the gigabyte so it's easier to read. And I write to my email text. The disk size free space on this server is this and a couple of carriage returns. And then I set up some email and I do send mail message.
And let's do some fun stuff and run that. So this is inside of the PowerShell. And I'll go full screen so we can see it. I can run-- I'm going to run these a few lines at a time so that you can see what happens if I do these first lines. And then down in the blue I show you what the email text is.
Right now it says server status for date Friday 9/3/2021. If I do this next section right here, which is to get my database size, and down to the email and I run that. It went and got my SQL database size down here at the bottom. And I can look and see what my email text says now.
And now it says server status for date Friday 9/3. Database size for vault BAC MONO is less than 1 gigabyte. Mine is pretty small.
And then my next step, I'm going to get my disk size. And I'll run that. And if I look at my email now, the body in my email will be-- you can see right here. It's this right here-- server status for the date. The database size is less than a gigabyte.
And my disk size, free size for the server is 931 gigabytes with 450 gigabytes free. That gets emailed using the rest of this down here. I don't know if my SMTP server is right, so I'm not going to run it. Knowing my SMTP server is correct, I would get an email.
And I can schedule that to run once a month, once a week, every day. I don't recommend doing emails every day, but I can run it as often as I want. I can also add other things to this.
So the other things that we do on Cron jobs is-- let me see if-- the self-cleaning Job Processor. The cleaning out the temp directory and killing orphaned processes. , Again nothing to do with vaults and not checking into my vault. It's just keeping my environment very clean.
So all it does is it says, hey, do I have these processes of design review. If I'm running this job, design review and express review shouldn't be showing, right, because that's the only job running. So I'm going stop any of those processes. If there's not any, it just says there wasn't any to kill. That's good.
And then I go to my temp folder, whatever that is for that one, and I get all of my folders and I delete them. I remove each one of those item recursively through all those folders to keep that clean. I've been on some customer's job processors whose temp folders are 3, 4, 5, 6 gigabytes full. And that really slows down the speed of their system. So we want to be sure that we keep that as clean as possible.
And let me show you that script. Here we go. It's very short. It's like 30 lines. Stop the processes, clean them up, clean up the folders. And I can run this right here in PowerShell, or I can have my system run it.
So it says, you don't have any processes running. So mine's clean. And it cleaned up my temp folder. So my temp folder is now all empty. I didn't have any processes to run, so it said there's not any processes like that. That's what the red is down here.
With that, I'm going to show you a few references and some different places to get some fun stuff. PowerShell with the vault extensions and vault data standards is very powerful. You can use it to enhance your data, to clean your data, to maintain your vault, to maintain your server status, and to let people know what's going on with your vault, and where things you need to do.
There's a lot of good references. The Autodesk knowledge network is a valuable resource of examples and tutorials. The coolOrange product vault is invaluable and it's a great entry into the vault API.
As of now, they allow you to download that free Power Jobs, which is a product that they sell which makes your Job Processor on steroids. So that would be a purchase product. I put a link to their website. They also have a great blog and tips and tricks out there.
Marcus with Autodesk, his GitHub is a wealth of information on vault data standards and all-things PowerShell. I also listed D3 Technologies where I work for our website and our block.
And then I want to let you know that all of these PowerShells that I've shown you will be available on the AU website I believe when I AU's over. So with that, that's my contact information. And that will be available. I appreciate your time, and I look forward to our question and answer series. Thank you.