Skip to Main Content

Research Data Management

What is a Readme File?

A Readme file is usually a text file titled README.txt that should be located at the root of your dataset. Its title indicates that any potential user of your data should consult it before checking any other part of your dataset. 

It is a form of documentation that can be created manually, without using a specific standard, and is therefore often easier to read. Your readme can also take the form of a PDF/A document or use a different title, as long as it is clearly labeled as something the user should read. It is generally better to start writing up that information into a central document as soon as you start collecting your data, or even during the planning phase.

The main readme file explains the contents and structure of your dataset, and gives enough information for a potential user to determine whether the data is of interest to them or not. If your dataset requires a codebook, it can be included within it. You can of course also create secondary readme files in subfolders to document specific parts of your data.

Contents of a Readme File

The following is a suggested list of elements you could include in a readme file located at the root of your dataset. You may of course place some of them in secondary documentation or in separate files, or even embed them in your data. Many of the suggestions only make sense with certain types of data, and the goal is not to make your readme file too lengthy. You should only include elements that are useful and/or necessary to correctly interpret, evaluate and reuse your dataset. 

  • General information
    • Dataset title
    • DOI for the reference version of the dataset
    • Investigators
      • Names, roles, institutions and contact information (include OrcID if available)
    • Project title (if any)
    • Grant information
  • Your data and the world
    • Licences and restrictions placed on (parts of) the dataset
    • Relationship with other datasets
    • Other resources used as sources for data collection (books, articles, etc.)
    • Links to publications based on the dataset
  • Data collection
    • Collection date (or range)
    • Geographic location of collection (if appropriate)
    • Methods used for data collection (including references, documentation, links)
    • Experimental & environmental conditions of collection (if appropriate)
    • Standards and calibration for data collection (if applicable)
    • Uncertainty, precision and accuracy of measurements (if appropriate)
    • Known problems & caveats (sampling, blanks, etc.)
  • Organisation
    • Folder structure
    • File naming system (with examples)
    • Relationships and dependencies between files
    • Other documentation files of interest within dataset (notes, companion files)
    • For each major file, a short description of its contents and date of creation
    • Description of file versioning system if appropriate
  • Codebook
    • Definition of codes, symbols and abbreviations used in files
    • List of variables with full name and definition
    • Definition of column headings and row labels for tabular data
    • Measurement units and data formats (e.g. YYYY-MM-DD)
    • Treatment of missing data (code, etc.)
    • Example of records for each file type
  • Processing & QA
    • Methods used for data processing
    • Software used in data collection and processing, including version numbers
    • File formats used in the dataset & recommended software
    • Quality control procedure(s) applied
    • Dataset changelog

Further Information

Cornell University proposes an excellent Readme guide and a readme file template.