Skip to Main Content

Research Data Management

What is Data Documentation?

Papers, by Jerzy Gorecki (Pixabay License, cropped)

Documenting your data means providing information that allows other users to understand and use your data. It is a requirement for open data, as shown in the FAIR principles. Data documentation can take different forms, from a simple text document (often called a Readme file or data appendix) to information embedded within the files themselves, or even structured descriptive lists such as a catalogue. Metadata (data about data) is a form of documentation that follows an established standard. 

This section of the guide will help you find the best solution to document your dataset. Please keep in mind that you may need to mix and match different documentation strategies depending on the type of data you collected: quantitative and qualitative datasets require different treatments, and so do files created using different software.

Study-Level Documentation

Study-level documentation should provide all the information necessary to understand how the dataset was created and how it is structured. The context of the research, the sources of the data, modifications made over time, and all other aspects that matter for the dataset to be usable and understandable.

Some of this information is often already provided in publications and reports to funders. It is a good idea to collect it at the root of your dataset while you are working on it and when you decide to share it. This will often take the form of a README file, but it can also be a PDF/A collecting information on all aspects of the project. A lab notebook could also be part of this documentation.

The minimal information you will need to provide when you upload data to a FAIR data repository will generally be as follows:

  • Title
  • Creator (principal investigator)
  • Date created (also versions)
  • Format (and software required)
  • Subject
  • Unique Identifier (DOI, generally provided by the repository itself)
  • Description of the specific data resource
  • Coverage of the data (spatial or temporal)
  • Publishing organisation
  • Type of resource
  • Rights
  • Funding or grant