Source data is the original, unprocessed data collected from various sources. It encompasses all potential variables and represents the starting point in the data management process.
There are several types of source data, the neuroimaging from the scanner, single file like the Redcap export or the Aseba download, and the multiple file like the NIH Toolbox. These unprocessed sources are difficult to use.
Raw data is the processed version of source data. It has been cleaned, organized, and made ready for analysis, serving as the foundation for data exploration and interpretation.
For neuroimaging raw data, we use NIFTI files in the BIDS standard. For phenotype data, we recommend CSV files with data dictionaries. For Redcap, this requires a detailed data dictionary and the creation of multiple reports. For other sources, they will need a processing pipeline to create the data and data dictionary.
Derivatives are the outputs generated from raw data analysis. These include processed reports, visualizations, and models that provide actionable insights and drive decision-making processes.
Neuroimaging derivatives are organized by pipeline, such as our custom INC processing, Brains Auto Workup, and Freesurfer.
Sharing with NDA is a complex task that requires several new derivatives that follow the NDA data dictionary. This applies to both neuroimaging and phenotype data.
Gather neuroimaging data: INPC will handle the collection of all neuroimaging data, ensuring it is accurately recorded and maintained.
Process neuroimaging data: Once collected, INPC will be responsible for processing this data, organizing it in a way that is usable for the Project Research Team (PRT). This includes turning sourcedata into rawdata and then processing the rawdata into derivatives.
Share neuroimaging data: INPC will coordinate with the PRT and any relevant data repositories to ensure that processed neuroimaging data is shared correctly and responsibly, in line with relevant data sharing policies and guidelines.
Generate additional research data: The PRT will be in charge of collecting additional data, using tools like Redcap, NIH Toolbox, and Aseba.
Execute the DMS Plan: The PRT, under the direction of the Principal Investigator, will take the lead in executing the DMS plan. This includes ensuring all data collection and management is conducted according to the plan and coordinating with the INPC on data processing and sharing tasks.
Cooperation and communication: Both the INPC and the PRT will need to maintain open lines of communication, regularly updating each other on progress, challenges, and changes related to data management. This collaboration is key to ensuring the data management and sharing plan is executed effectively.
Adherence to data protection and privacy guidelines: Both teams must ensure all data handling adheres to required data protection and privacy standards. This includes the de-identification of data and any necessary permissions or consents for data sharing.
REDCap is a versatile platform that can be used to store a variety of information for your project. It's recommended to utilize at least two REDCap databases per project to enhance data security. The first database should contain personally identifiable information for tracking purposes. The second database should solely contain deidentified data.
A crucial component of using REDCap is the data dictionary. It provides detailed information about the data, clarifying what each data entry represents.
In instances where data analysis will occur outside of REDCap, you might find it beneficial to establish additional REDCap databases for the deidentified data. This is particularly useful when some data is being manually entered into a form, while other data can be imported directly.
While REDCap excels in data storage, it is not ideally suited for data analysis. Therefore, multiple reports should be generated from the REDCap platform and saved as CSV files. With each report, it's important to create a corresponding data dictionary, derived from the original REDCap data dictionary. This new dictionary will serve as a comprehensive guide, outlining the contents of the respective CSV file.
Excel is a powerful tool for managing large volumes of data, especially when it comes to quickly scanning information and making simple updates. For more complex data modifications, doing these manually is possible, but it's not the most efficient method.
Excel's macro feature is particularly useful for these complex tasks. Macros are sequences of commands that can automate repetitive tasks. By programming these sequences, you can perform intricate changes to your data with a single command, which makes your work more efficient and minimizes the chance of errors.
Our team has expertise in using Excel macros. If you find yourself needing to automate complex tasks in Excel, we're here to assist. This collaboration can help ensure your data is managed effectively and your project runs smoothly.
For certain complex data tasks, traditional tools like REDCap or Excel may fall short. In such scenarios, using command-line scripts through languages like Bash and Python can be more effective. Bash is commonly used for tasks like managing files and automating repetitive processes. Python, with its powerful data processing capabilities, can handle more complex data manipulations.
Our team has experience with both Bash and Python. When necessary, we can assist in creating and running scripts to better manage and share your data. If you encounter any data tasks that require command-line operations, know that assistance is available.
The Brain Imaging Data Structure is a standard that organizes and formats neuroimaging data in a manner that makes it easier to understand, share, and use. By adhering to BIDS, we ensure that our neuroimaging data is uniformly organized and labeled, making it more comprehensible for users. BIDS supports a wide range of neuroimaging data types, including MRI and fMRI, and its consistent naming convention facilitates the sharing and usage of this data.
CSV files are a simple, widely-used format for storing tabular data. Accompanied by a well-structured data dictionary, a CSV file can serve as a versatile and user-friendly data source. The data dictionary provides detailed information about the data elements contained in the CSV file, such as their names, meanings, and allowable values. This pairing enables users to easily understand, analyze, and manipulate the data, regardless of their specific field or level of technical expertise.
The NDA Data Dictionary offers a standardized set of variables for structuring and detailing your data. This is pivotal for ensuring uniformity and transparency, allowing other researchers to more easily comprehend and interpret your data. By aligning your data with the NDA Data Dictionary, you're making it interoperable with a broad array of other datasets, and easily integrated into future research efforts.
Using your Data Management and Sharing Plan is crucial for effectively managing your data. Just like a map guides you on your journey, the DMS Plan provides clear instructions on how to handle your data at each stage. Data management and processing activities occur throughout the research project. However, there are instances where focused efforts and additional processing are needed to meet specific objectives such as data sharing or optimizing the dataset for analysis.
To keep the DMS Plan useful, it's important to update it as your dataset evolves. This ensures that your Plan remains practical and relevant, allowing you to readily analyze and share your data. The ultimate goal is to maintain a flexible dataset that can meet the changing demands of your research.
Some repositories, like NDA, mandate data sharing every six months. This frequency provides an excellent opportunity to refresh your DMS Plan and get your dataset in order. Regular updates ensure your data remain current, relevant, and readily available for research needs.
Some repositories require annual updates. This yearly review is not just a routine task, but a valuable checkpoint to ensure your project is on the right track. It's an opportunity to realign your dataset with your project goals and make any necessary adjustments.
Sometimes, you may find yourself in a situation where your dataset is needed for analysis or sharing, but it isn't ready. If you have been using your DMS Plan diligently, the process of making additional updates will be relatively straightforward. However, if the DMS Plan has been neglected or ignored, the required work can become challenging and troublesome. When properly used and maintained, your DMS Plan serves as a reliable guide, even in the face of unexpected events. It provides the necessary direction and support to navigate through such circumstances.
A sample plan that uses both neuroimaging and phenotype data. In the document, Element 3: Standards shows the integration with the NDA data dictionary.