Cut the Clutter
Analytics can be a messy business.
A novice might think that all you've got is input data and results: nice and tidy. Reality is typically far different. Not only are there many dead ends that didn't turn into useful results, there are also seemingly innumerable "helper" tables, fields and scripts that did contribute to the final result.
These artifacts are absolutely necessary to make the results repeatable. The problem is that while they may be necessary, their mere presence clutters the project, even if we adopt standards to minimize their impact. This can obscure both the critical components (fields, scripts and tables) that we need to interact with, as well as the results themselves.
This may not matter right now, but for reviewers, or those who take over the project in the future, it can make the project much more difficult to understand or repeat, even with appropriate documentation.
The rule we try to live by is to rely on documentation as little as possible, so clarity in the project is paramount.
The other problem is tidying up after one's self. Most of the tables created along the way are no longer necessary once we get results. They are transient files that were needed at the time, and will be automatically created again when the analysis is re-run, but aren't required the rest of the time. This leaves you with three possibilities:
- Leave the clutter in the project
- Tediously go through
- Delete all the intermediate tables that are no longer required, or (just as tediously) create a script that cleans up for you
ARBUTUS Offers Unique Technologies that help make this a reality.
Arbutus includes the ability to "hide" any field, table, script, index.
You, of course, get to choose what is hidden, but this gives you the ability to present the project in the clearest manner, so only the important components are on display, without losing any of the underlying helper tables, scripts, workspaces and fields that are a necessary part of producing the result. Everything hidden is still there, and can be shown at any time, but on opening the project you see just the important results, with the "plumbing" hidden away, as it should be. This ensures that your analyses can be easily understood in the future.
Arbutus will automatically delete intermediate files that are no longer required.
Just give them a standard name (like temp…, or whatever; you choose) and Arbutus will automatically clean up after you when the project is closed. Tables, indexes, whatever, will be swept away with no action on your part. Tidying up becomes totally automatic.
Sub-procedures allow you to bundle an entire set of scripts into a single package.
Your analysis might involve 3 scripts, or 300; it doesn't matter. The entire set can be easily bundled into a single script file. This not only un-clutters the interface, but just as importantly, it allows you to keep all the related scripts together in a single package. This makes it easy to maintain, document and run.