Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Research Data Management

Simple Version Management

When working on data or documents, it is generally a good idea to keep different versions of them, to mitigate risks of corruption, human error or loss of data. In that case, you should make sure you know which version is which, and there are different options to do this.

The simplest is using a file name suffix. This goes a long way: just include a version number, date, and/or reviewer initials to know what you need to:

  • filename_v02.pdf (no single digits!) is the second major version of a file.
  • filename_v02-01.pdf is the first minor revision of version 2
  • filename_20181128.pdf is the version dated November 28, 2018
  • filename_gp.pdf was revised or commented on by Guillaume Pasquier

You can of course mix and match: filename_v02-01_gp.pdf contains my comments on version 2.01 of the file.

Minor revised versions can generally be ignored after the next major revision is created. Archive or delete them as appropriate to avoid cluttering your folders. To the contrary, the original or raw version of your data should be conserved as a reference.

Advanced Version Control

In research projects, and especially collaborative work, it is often useful to record what changes were brought to each version of a file. A version control table records who did what and when. This can be embedded in the file itself (in headings, notes or metadata), or it can take the form of an attached spreadsheet or readme file.

Version control is sometimes embedded within the software you are using. Word, for example, keeps a history of modifications in each file. If you are storing your data on cloud services such as Dropbox, Google Drive or Amazon S3, they also provide you with version histories and backups over a few weeks or months.

Specific version control software such as Git and Apache Subversion (SVN) are designed to manage and record different versions of files. While they are too cumbersome for most researchers, they can be useful in a data-heavy context such as research in economics.