Back to Top

Research Data Management

Research Data Management describes all of the ways researchers collect, analyze, store, and use data during a project. It includes writing and following lab procedures, keeping backups of hard copies and digital items, and thinking ahead about how these data may be re-used or re-purposed.

Effective data management helps increase research impact and visibility, saves time and money, and enables new discoveries. The University Library can provide guidance on writing and implementing data management plans for any research project. 

Data Management Plans

A data management plan (DMP) should describe how your data will be created, collected, used, stored, and preserved in the long term. It should also address any specific requirements from your funding agency, and explain how you plan to comply with relevant federal and state laws that govern data sharing and privacy protection.

Making a plan

A data management plan should be no more than 1-2 pages. Use the outline below to organize the content of your DMP, and use tables, charts, and bulleted lists to save space.

  1. Data & data collection: Describe the data you will collect in terms of source, format, stability, and volume. Indicate which data/file formats are proprietary.
  2. Documentation, organization, and storage: Indicate how you will describe the content, formats, and internal relationships of your data so that others may understand them. Include project protocols, code, classification schema, data collection methods, and equipment and software used.
  3. Access, sharing, and re-use: State where, when, and how the data will be shared. Indicate any privacy, ethical, or confidentiality concerns that limit which data can be shared. Provide details on whether you intend to permit re-use, re-distribution, or re-purposing of the data and under what conditions.
  4. Archiving: Describe how you will preserve both the data and access to the data for the long term, and how you will prepare it for preservation/sharing (converting file formats, anonymizing data, etc.).

A data management plan can save researchers considerable time and money by addressing some of the most common mistakes and issues.

Common Issues and Best Practices

  •  

    Data & Data Collection

    Common IssuesBest Practices
    Lab instruments and software are proprietary, file types are proprietary, data can only be read/analyzed on original instrument  Use most common machines/software for discipline
    Lab instruments and software are designed to run on older operating systems  Keep copies of the OS and software, be aware of compatibility issues caused by software updates
    Machines and software are not backwards compatible  Avoid upgrading software systems
    Exported/converted data cannot be manipulated (data loss)  Save both original, uncompressed data files and exported data
    Manufacturer mergers and acquisitions can lead to discontinued/unsupported products 
  •  

    Documentation, Organization & Storage

    Common IssuesBest Practices
    Inconsistently labeled files and folders Use file naming conventions
    Poor version control Use version control software, or dates in file/folder names
    Files saved in multiple locations Use permanent data identifiers to prevent duplication
    Metadata is missing crucial information Use disciplinary metadata standards
    No team protocols or procedures Document workflow, build a data dictionary, write data documentation procedures
    Digital data are stored on local hard drives and not backed up Back up 3 copies of everything (original + external/local + external/remote)
    Data are stored in the cloud owned by private sector Use both hard drive and cloud storage
    Machines are set to override data to clear space Change settings or increase storage capacity
    Files are kept on common drives Establish access levels for files and folders
    Specimens, samples, etc. are not secure; paper lab notebooks can be damaged or lost Have policy on keeping hard copies/specimens physically secure
    Data have to be shared with remote collaborators/rotating lab personnel Train personnel on their roles in creating, storing, and taking responsibility for data security
  •  

    Access, Sharing & Reuse

    Common IssuesBest Practices
    Sensitive data are not encrypted or anonymized  Anonymize data using a random ID generator (not subject or experiment characteristics)
    Patient consent forms do not cover re-use, re-purposing, or sharing  Obtain permission from participants to make data publicly available
    Copyright material is used w/o permission to distribute derivatives  Review all relevant federal and state laws; seek copyright permissions
    Certain data cannot be deposited w/o violating patient privacy  Encrypt and store data for verification purposes only
    Certain data cannot be shared for national security reasons/trade secrets/patents  Encrypt and store data for verification purposes only
  •  

    Archiving

    Common IssuesBest Practices
    Researcher keeps the only copy of the data Make copies of the data
    Others can access the data only be personal request (Researcher gets to vet uses)  Place in an institutional or disciplinary repository so it can be discovered and accessed
    Data is shared, but without metadata or instructions that make it possible for others to re-use or understand  Include workflows and metadata
    Process for accessing data is complicated or confusing  Set permissions while depositing data

    Placing your work in a public repository comes with added benefits, like fixity checks, metadata assistance, format migration, permissioning, backups, and increased discoverability.

  •  

    Further Reading & Resources

Funding Agency Requirements

In order to promote public access to research data, many funding agencies have public access policies that require both articles and supporting data be published. Most of these agencies have established repositories where researchers can place their research products in order to comply. Some also work with publishers to automatically deposit or link to your data.

Increasingly, scientific publishers are also requiring authors to deposit their data and/or provide links. This is intended to improve reproducibility studies and reduce the number of retracted articles.

Find a data repository

Open data repositories are where researchers place and discover datasets from original research. There are repositories designed around specific disciplines and around specific data types, repositories for individual institutions, and multidisciplinary repositories.

Check with the funder for data deposit requirements. Some agencies (like NIH, NASA) require data be placed in a specific repository. Others may provide a list of approved repositories, or merely require a citation.

Further reading

Data and Copyright

U.S. Copyright law is fairly straightforward as it pertains to data. Copyright law (17 U.S.C. § 102) applies to original works in a fixed medium, but does not cover ideas, concepts, discoveries, or factual information. The Supreme Court case of Feist Publications v. Rural Telephones asserted that copyright law is intended to protect “original creative expression, not just hard work.”

In short, data cannot be copyrighted. You can own copyright of expressions of data, such as graphs or charts included in a publication, but not the underlying factual information presented in the graph or chart.

Where copyright law does not cover data, researchers may pursue other avenues to protecting their intellectual property, such as filing patents or protecting trade secrets. Non-disclosure agreements, grants, contracts, institutional policies, federal and state laws (such as HIPAA or FERPA) may all affect the decisions and protections you are afforded in this area.

Back to Top