Data Management and Sharing Plan

For an

Follow the

Create a

The INPC is your partner for Data Management and Sharing. Every research project should develop a Data Management and Sharing (DMS) Plan that evolves over time and aligns with NIH policy. A typical DMS Plan is 3–5 pages and outlines how data will be collected, processed, organized, and shared throughout the lifecycle of the project.

For neuroimaging studies, we integrate with the Magnetic Resonance Research Facility at the University of Iowa and use the Argon cluster for data processing. Neuroimaging data are organized in BIDS format and stored on a dedicated LSS resource (the inc_database). Access is managed through Globus to ensure secure and controlled data sharing. Additional sharing mechanisms are available but may require additional planning and implementation time. Data sharing with the NIMH Data Archive (NDA) is complex and should be considered early in the project.

For other research data, we support the development of workflows that promote data quality and consistency. This includes systems built with REDCap, Excel (including macros), CSV files with structured data dictionaries, and automated scripts using Bash and Python. These approaches help ensure that data are well-documented, reproducible, and ready for analysis and sharing.

When working with the INPC, there are key considerations and details to include in your DMS Plan. The sections below outline recommended practices for managing, processing, and sharing research data effectively.

Source Data

tree

Source data is the original, unprocessed data collected from various sources. It encompasses all potential variables and represents the starting point in the data management process.

There are several types of source data, the neuroimaging from the scanner, single file like the Redcap export or the Aseba download, and the multiple file like the NIH Toolbox. These unprocessed sources are difficult to use.

Raw Data

lumber

Raw data is the processed version of source data. It has been cleaned, organized, and made ready for analysis, serving as the foundation for data exploration and interpretation.

For neuroimaging raw data, we use NIFTI files in the BIDS standard. For phenotype data, we recommend CSV files with data dictionaries. For Redcap, this requires a detailed data dictionary and the creation of multiple reports. For other sources, they will need a processing pipeline to create the data and data dictionary.

Derivatives

cabin

Derivatives are the outputs generated from raw data analysis. These include processed reports, visualizations, and models that provide actionable insights and drive decision-making processes.

Neuroimaging derivatives are organized by pipeline, such as our custom INC processing, Brains Auto Workup, and Freesurfer.

Sharing with NDA is a complex task that requires several new derivatives that follow the NDA data dictionary. This applies to both neuroimaging and phenotype data.

INPC

Responsibilities

Gather neuroimaging data: INPC will handle the collection of all neuroimaging data, ensuring it is accurately recorded and maintained.
Process neuroimaging data: Once collected, INPC will be responsible for processing this data, organizing it in a way that is usable for the Project Research Team (PRT). This includes turning sourcedata into rawdata and then processing the rawdata into derivatives.
Share neuroimaging data: INPC will coordinate with the PRT and any relevant data repositories to ensure that processed neuroimaging data is shared correctly and responsibly, in line with relevant data sharing policies and guidelines.

Project Research Team

Responsibilities

Generate additional research data: The PRT will be in charge of collecting additional data, using tools like Redcap, NIH Toolbox, and Aseba.
Execute the DMS Plan: The PRT, under the direction of the Principal Investigator, will take the lead in executing the DMS plan. This includes ensuring all data collection and management is conducted according to the plan and coordinating with the INPC on data processing and sharing tasks.

Joint

Responsibilities

Cooperation and communication: Both the INPC and the PRT will need to maintain open lines of communication, regularly updating each other on progress, challenges, and changes related to data management. This collaboration is key to ensuring the data management and sharing plan is executed effectively.
Adherence to data protection and privacy guidelines: Both teams must ensure all data handling adheres to required data protection and privacy standards. This includes the de-identification of data and any necessary permissions or consents for data sharing.

sharing.nih.gov

Data Repository

NIH encourages researchers to select the repository that is most appropriate for their data type and discipline.

Selecting a Data Repository Repositories for Sharing Scientific Data

NIMH Data Archive

NDA Repository

Sharing data with the NDA repository requires harmonizing your data to the NDA Data Dictionary, validating the data, and uploading the data every six months.

Website Data Dictionary Webinars and Tutorials

REDCap

Database and Reporting

REDCap is a versatile platform that can be used to store a variety of information for your project. It's recommended to utilize at least two REDCap databases per project to enhance data security. The first database should contain personally identifiable information for tracking purposes. The second database should solely contain deidentified data.

A crucial component of using REDCap is the data dictionary. It provides detailed information about the data, clarifying what each data entry represents.

In instances where data analysis will occur outside of REDCap, you might find it beneficial to establish additional REDCap databases for the deidentified data. This is particularly useful when some data is being manually entered into a form, while other data can be imported directly.

While REDCap excels in data storage, it is not ideally suited for data analysis. Therefore, multiple reports should be generated from the REDCap platform and saved as CSV files. With each report, it's important to create a corresponding data dictionary, derived from the original REDCap data dictionary. This new dictionary will serve as a comprehensive guide, outlining the contents of the respective CSV file.

Excel

Spreadsheet and Macros

Excel is a powerful tool for managing large volumes of data, especially when it comes to quickly scanning information and making simple updates. For more complex data modifications, doing these manually is possible, but it's not the most efficient method.

Excel's macro feature is particularly useful for these complex tasks. Macros are sequences of commands that can automate repetitive tasks. By programming these sequences, you can perform intricate changes to your data with a single command, which makes your work more efficient and minimizes the chance of errors.

Our team has expertise in using Excel macros. If you find yourself needing to automate complex tasks in Excel, we're here to assist. This collaboration can help ensure your data is managed effectively and your project runs smoothly.

Scripts

Bash and Python

For certain complex data tasks, traditional tools like REDCap or Excel may fall short. In such scenarios, using command-line scripts through languages like Bash and Python can be more effective. Bash is commonly used for tasks like managing files and automating repetitive processes. Python, with its powerful data processing capabilities, can handle more complex data manipulations.

Our team has experience with both Bash and Python. When necessary, we can assist in creating and running scripts to better manage and share your data. If you encounter any data tasks that require command-line operations, know that assistance is available.

BIDS

The Brain Imaging Data Structure is a standard that organizes and formats neuroimaging data in a manner that makes it easier to understand, share, and use. By adhering to BIDS, we ensure that our neuroimaging data is uniformly organized and labeled, making it more comprehensible for users. BIDS supports a wide range of neuroimaging data types, including MRI and fMRI, and its consistent naming convention facilitates the sharing and usage of this data.

CSV with Data Dictionary

CSV files are a simple, widely-used format for storing tabular data. Accompanied by a well-structured data dictionary, a CSV file can serve as a versatile and user-friendly data source. The data dictionary provides detailed information about the data elements contained in the CSV file, such as their names, meanings, and allowable values. This pairing enables users to easily understand, analyze, and manipulate the data, regardless of their specific field or level of technical expertise.

NDA Data Dictionary

The NDA Data Dictionary offers a standardized set of variables for structuring and detailing your data. This is pivotal for ensuring uniformity and transparency, allowing other researchers to more easily comprehend and interpret your data. By aligning your data with the NDA Data Dictionary, you're making it interoperable with a broad array of other datasets, and easily integrated into future research efforts.

Putting the DMS Plan in action!

Seasonal Work

Phase One includes gathering, assessing, and ensuring the accuracy and completeness of the source data.

Phase Two involves processing the source data into raw data and derivatives for sharing with the PI, PRT, or relevant repositories.

Using your Data Management and Sharing Plan is crucial for effectively managing your data. Just like a map guides you on your journey, the DMS Plan provides clear instructions on how to handle your data at each stage. Data management and processing activities occur throughout the research project. However, there are instances where focused efforts and additional processing are needed to meet specific objectives such as data sharing or optimizing the dataset for analysis.

To keep the DMS Plan useful, it's important to update it as your dataset evolves. This ensures that your Plan remains practical and relevant, allowing you to readily analyze and share your data. The ultimate goal is to maintain a flexible dataset that can meet the changing demands of your research.

Biannual

Some repositories, like NDA, mandate data sharing every six months. This frequency provides an excellent opportunity to refresh your DMS Plan and get your dataset in order. Regular updates ensure your data remain current, relevant, and readily available for research needs.

Annual

Some repositories require annual updates. This yearly review is not just a routine task, but a valuable checkpoint to ensure your project is on the right track. It's an opportunity to realign your dataset with your project goals and make any necessary adjustments.

Conditional

Sometimes, you may find yourself in a situation where your dataset is needed for analysis or sharing, but it isn't ready. If you have been using your DMS Plan diligently, the process of making additional updates will be relatively straightforward. However, if the DMS Plan has been neglected or ignored, the required work can become challenging and troublesome. When properly used and maintained, your DMS Plan serves as a reliable guide, even in the face of unexpected events. It provides the necessary direction and support to navigate through such circumstances.

NIH Example

A sample plan that uses both neuroimaging and phenotype data. In the document, Element 3: Standards shows the integration with the NDA data dictionary.

INPC Example

coming soon...

Exemplary

coming soon...

Neuroimaging

Data Management and Sharing Plan

Data Management

Source Data

Raw Data

Derivatives

Roles and Responsibilities

INPC

Project Research Team

Joint

Data Repository

NDA Repository

Tools, Software, Code

REDCap

Excel

Scripts

Standards

BIDS

CSV with Data Dictionary

NDA Data Dictionary

Seasonal Work

Biannual

Annual

Conditional

DMS Plans

NIH Example

INPC Example

Exemplary