Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Research Data Management

What is a Readme File?

A Readme file is usually a text file titled README.txt that should be located at the root of your dataset. Its title indicates that any potential user of your data should consult it before checking any other part of your dataset. 

It is a form of documentation that can be created manually, without using a specific standard, and is therefore often easier to read. Your readme can also take the form of a PDF/A document or use a different title, as long as it is clearly labeled as something the user should read. It is generally better to start writing up that information into a central document as soon as you start collecting your data, or even during the planning phase.

The main readme file explains the contents and structure of your dataset, and gives enough information for a potential user to determine whether the data is of interest to them or not. If your dataset requires a codebook, it can be included within it. You can of course also create secondary readme files in subfolders to document specific parts of your data.

Contents of a Readme File

The following is a suggested list of elements you could include in a readme file located at the root of your dataset. You may of course place some of them in secondary documentation or in separate files, or even embed them in your data. Many of the suggestions only make sense with certain types of data, and the goal is not to make your readme file too lengthy. You should only include elements that are useful and/or necessary to correctly interpret, evaluate and reuse your dataset. 

  • General information
    • Dataset title
    • Investigators names, roles and contact info (include OrcID if available)
    • Grant information
    • DOI for the reference version of the dataset
  • Your data and the world
    • Licences and restrictions placed on (parts of) the dataset
    • Links to publications based on the dataset
    • Relationship with other datasets
    • Other resources used as source for data collection (books, articles, etc.)
  • Organisation
    • File naming system (with examples)
    • Folder structure
    • Relationships and dependencies between files
    • Other documentation files of interest within dataset (notes, companion files)
    • For each major file, short description of contents
    • Date of creation of each major file
  • Data collection
    • Methods used for data collection (including references, documentation, links)
    • Collection date (or range)
    • Geographic location of collection (if appropriate)
    • Experimental & environmental conditions of collection (if appropriate)
    • Standards and calibration for data collection (if applicable)
    • Uncertainty, precision and accuracy of measurements (if appropriate)
    • Known problems & caveats (sampling, blanks, etc.)
  • Codebook
    • Definition of codes, symbols and abbreviations used in files
    • List of variables with full name and definition
    • Definition of column headings and row labels for tabular data
    • Measurement units and data formats (e.g. YYYYYMMDD)
    • Treatment of missing data (code, etc.)
    • Example of records for each file type
  • Processing, versioning & QA
    • Methods used for data processing
    • Software used in data collection and processing, including version numbers
    • File formats used in the dataset & recommended software
    • Quality control procedure applied
    • Description of file versioning system if appropriate
    • Dataset changelog

Further Information

Cornell University proposes an excellent Readme guide and a readme file template.