DMS Plan

The INPC is your partner for Data Management and Sharing.  Every research project should have a DMS Plan that evolves over time. The DMS Plan should be 3 - 5 pages and follow the NIH guidelines.

When working with the INPC, there are things to consider and details to include in your DMS Plan.

Neuroimaging

We integrate with the Magnetic Resonance Research Facility here at Iowa and we use the Argon cluster for processing. The neuroimaging data is stored in BIDS format on a dedicated LSS drive called the inc_database.  Access to the inc_database is only allowed through Globus. Other types of sharing are possible but require more time to setup and execute.  Sharing with NDA is complicated and should be discussed early in the process.

Your data

For data you collect in research, we can advise and help create workflows that create quality data using Redcap, Excel with macros, CSV with data dictionaries, and scripts with bash and python.

Source Data

Tree

Source data is the original, unprocessed data collected from various sources. It encompasses all potential variables and represents the starting point in the data management process.

There are several types of source data, the neuroimaging from the scanner, single file like the Redcap export or the Aseba download, and the multiple file like the NIH Toolbox.  These unprocessed sources are difficult to use.

Raw Data

Lumber

Raw data is the processed version of source data. It has been cleaned, organized, and made ready for analysis, serving as the foundation for data exploration and interpretation.

For neuroimaging raw data, we use NIFTI files in the BIDS standard.  For phenotype data, we recommend CSV files with data dictionaries.  For Redcap, this requires a detailed data dictionary and the creation of multiple reports.  For other sources, they will need a processing pipeline to create the data and data dictionary.

Derivatives

Cabin

Derivatives are the outputs generated from raw data analysis. These include processed reports, visualizations, and models that provide actionable insights and drive decision-making processes.

Neuroimaging derivatives are organized by pipeline, such as our custom INC processing, Brains Auto Workup, and Freesurfer. 

Sharing with NDA is a complex task that requires several new derivatives that follow the NDA data dictionary.  This applies to both neuroimaging and phenotype data.

INPC

Backend

INPC will handle the collection of all neuroimaging data, ensuring it is accurately recorded and maintained.

Once collected, INPC will be responsible for processing this data, organizing it in a way that is usable for the Project Research Team (PRT). This includes turning sourcedata into rawdata and then processing the rawdata into derivatives.

INPC will coordinate with the PRT and any relevant data repositories to ensure that processed neuroimaging data is shared correctly and responsibly, in line with relevant data sharing policies and guidelines.

Joint

Collaboration

Both the INPC and the PRT will need to maintain open lines of communication, regularly updating each other on progress, challenges, and changes related to data management. This collaboration is key to ensuring the data management and sharing plan is executed effectively.

Both teams must ensure all data handling adheres to required data protection and privacy standards. This includes the de-identification of data and any necessary permissions or consents for data sharing.

Research Team

Frontend

The PRT will be in charge of collecting additional data, using tools like Redcap, NIH Toolbox, and Aseba.

The PRT, under the direction of the Principal Investigator, will take the lead in executing the DMS plan. This includes ensuring all data collection and management is conducted according to the plan and coordinating with the INPC on data processing and sharing tasks.

REDCap

Database and Reporting

REDCap

REDCap is a versatile platform that can be used to store a variety of information for your project. It's recommended to utilize at least two REDCap databases per project to enhance data security. The first database should contain personally identifiable information for tracking purposes. The second database should solely contain deidentified data.

A crucial component of using REDCap is the data dictionary. It provides detailed information about the data, clarifying what each data entry represents.

In instances where data analysis will occur outside of REDCap, you might find it beneficial to establish additional REDCap databases for the deidentified data. This is particularly useful when some data is being manually entered into a form, while other data can be imported directly.

While REDCap excels in data storage, it is not ideally suited for data analysis. Therefore, multiple reports should be generated from the REDCap platform and saved as CSV files. With each report, it's important to create a corresponding data dictionary, derived from the original REDCap data dictionary. This new dictionary will serve as a comprehensive guide, outlining the contents of the respective CSV file.

Excel

Spreadsheet and Macros

Excel

Excel is a powerful tool for managing large volumes of data, especially when it comes to quickly scanning information and making simple updates. For more complex data modifications, doing these manually is possible, but it's not the most efficient method.

Excel's macro feature is particularly useful for these complex tasks. Macros are sequences of commands that can automate repetitive tasks. By programming these sequences, you can perform intricate changes to your data with a single command, which makes your work more efficient and minimizes the chance of errors.

Our team has expertise in using Excel macros. If you find yourself needing to automate complex tasks in Excel, we're here to assist. This collaboration can help ensure your data is managed effectively and your project runs smoothly.

Scripts

Bash and Python: Command-Line

Scripts

For certain complex data tasks, traditional tools like REDCap or Excel may fall short. In such scenarios, using command-line scripts through languages like Bash and Python can be more effective. Bash is commonly used for tasks like managing files and automating repetitive processes. Python, with its powerful data processing capabilities, can handle more complex data manipulations.

Our team has experience with both Bash and Python. When necessary, we can assist in creating and running scripts to better manage and share your data. If you encounter any data tasks that require command-line operations, know that assistance is available.

BIDS

BIDS

The Brain Imaging Data Structure is a standard that organizes and formats neuroimaging data in a manner that makes it easier to understand, share, and use. By adhering to BIDS, we ensure that our neuroimaging data is uniformly organized and labeled, making it more comprehensible for users. BIDS supports a wide range of neuroimaging data types, including MRI and fMRI, and its consistent naming convention facilitates the sharing and usage of this data.

CSV

CSV

CSV files are a simple, widely-used format for storing tabular data. Accompanied by a well-structured data dictionary, a CSV file can serve as a versatile and user-friendly data source. The data dictionary provides detailed information about the data elements contained in the CSV file, such as their names, meanings, and allowable values. This pairing enables users to easily understand, analyze, and manipulate the data, regardless of their specific field or level of technical expertise.

NDA

NDA

The NDA Data Dictionary offers a standardized set of variables for structuring and detailing your data. This is pivotal for ensuring uniformity and transparency, allowing other researchers to more easily comprehend and interpret your data. By aligning your data with the NDA Data Dictionary, you're making it interoperable with a broad array of other datasets, and easily integrated into future research efforts.

NIH Example

A sample plan that uses both neuroimaging and phenotype data.  In the document, Element 3: Standards shows the integration with the NDA data dictionary.

DMSP INPC
DMSP Exemplar

PLAN EVOLUTION

PHASE ONE

PHASE TWO

Biannual

Biannual

Some repositories, like NDA, mandate data sharing every six months. This frequency provides an excellent opportunity to refresh your DMS Plan and get your dataset in order. Regular updates ensure your data remain current, relevant, and readily available for research needs.

Annual

Annual

Some repositories require annual updates. This yearly review is not just a routine task, but a valuable checkpoint to ensure your project is on the right track. It's an opportunity to realign your dataset with your project goals and make any necessary adjustments.

Conditional

Conditional

Sometimes, you may find yourself in a situation where your dataset is needed for analysis or sharing, but it isn't ready. If you have been using your DMS Plan diligently, the process of making additional updates will be relatively straightforward. However, if the DMS Plan has been neglected or ignored, the required work can become challenging and troublesome. When properly used and maintained, your DMS Plan serves as a reliable guide, even in the face of unexpected events. It provides the necessary direction and support to navigate through such circumstances.