Skip to main content

Research Data Management

Choosing an Open Data Repository


Original illustration (cropped) CC By 4.0 Ainsley Seago

What is an open data repository?

First things first: putting your data as a .zip file on your website is not enough. Of course, it is better than nothing, but you will miss out on many of the excellent possibilities offered by the specialised infrastructure generally called open data repositories. Your data will be less durable, findable and usable than it would be on an appropriate platform.

As the Registry of research data repositories (RE3data.org) can tell you, there is a very large number of open repositories you could choose to share your data. Some are dedicated to a specific field and others general. Some are located in Switzerland, some are of a commercial nature, and the licenses you can use on each one are different.

The Graduate Institute, much like other Swiss universities, has not decided to create its own repository at this time. This does not mean that we cannot help you choose one for your project. Here is a very short selection of repositories we can heartily recommend.

Major FAIR Repositories

Website: https://zenodo.org/

Owner: an initiative created by CERN in Geneva and stored in its data center. If you want your data to be shared through a Swiss-European non-commercial platform with a large user base, this might be it.

Price: Free (donations welcome).

Volume: 50 GB per dataset, multiple datasets allowed. Higher-size datasets possible upon request (limited to ~100 GB).

Access parameters: Open, Embargoed, Restricted, Closed.

Licence parameters: Creative Commons 4.0 (CC By, CC By-SA, CC By-ND, CC By-NC, CC By-NC-ND), or custom licence if you choose "restricted access".

Services: DOI, Versioning, OrcID, GitHub synchronization, OpenAIRE indexation, Grant referencing (SNSF not included yet).

Durability: CERN guarantees 20 years of data conservation. In the event of closure, they will migrate the data to another suitable repository.

Website: https://yareta.unige.ch/

Owner: University of Geneva, DLCM project on a mandate from SwissUniversities.

Price: Free up to 50 GB for Geneva institutions (including IHEID). The price for larger volumes is based on size, preservation duration, number of copies, etc., currently CHF 100.- per TeraByte per year.

Volume: Not limited.

Access parameters: Open, Restricted, Closed, Embargo.

Licence parameters: Creative Commons 4.0 (CC0, CC By, CC By-SA, CC By-ND, CC By-NC, CC By-NC-ND).

Services: DOI, OrcID.

Durability: This is a new service built for data preservation, unlike some of the other repositories listed here. You can choose a duration based on your needs (5-10-20 years). Metadata is preserved forever even after the data is deleted.

Website: https://dataverse.harvard.edu/

Owner: Harvard University. This repository is run on the Dataverse software created by Harvard's Institute for Quantitative Science (IQSS). This software is also used by other institutions around the world.

Price: Free.

Volume: 2.5 GB per dataset.

Access parameters: Open, Restricted (access request).

Licence parameters: CC0 "public domain dedication", or custom licence parameters.

Services: DOI, API, Versioning, OrcID, Dataverse (ie: set of datasets).

Durability: Unknown but probably long-term.

Website: https://datadryad.org/

Owner: Non-profit repository built by Oxford University and other stakeholders.

Price: 120 $ publishing charges per dataset. Please note that this is not considered a commercial repository by the Swiss National Science Foundation (SNSF), meaning that an SNSF project could use project funds to cover this platform's cost.

Volume: 20 GB per dataset. Additional costs of 50 $ per additional 10 GB.

Access parameters: Open, Embargoed (1 year after publication).

Licence parameters: CC0 public domain only.

Services: DOI, APIs, Versioning, OrcID.

Durability: Unknown.

Website: https://figshare.com/

Owner: Private company supported by Digital Science (Holtzbrinck, also majority shareholder of Springer Nature).

Price: Free. Please note that this is considered a commercial repository by the Swiss National Science Foundation (SNSF), meaning that an SNSF project will not be allowed to use funds to cover open data preparation or storage costs for this platform.

Volume: "Unlimited" public data, 20 GB private data. Max filesize 5 GB.

Access parameters: Open, Private.

Licence parameters: CC By 4.0, CC0 public domain, GPL, MIT, Apache 2.0.

Services: DOI, API, Versioning, OrcID.

Durability: Unknown. Please note this repository is not specialized in data preservation, but rather data sharing.

Other Options

There are many other repositories you can use, especially discipline-specific repositories. You can find most of them on RE3data.org. As long as they offer good metadata management and a persistent identifier (DOI/ARK), they are usually acceptable options.

Existing Swiss repositories include FORSbase, the repository of the Swiss national centre of expertise in the social sciences, located at the University of Lausanne and funded by the SNSF. They can host any quantitative datasets by Switzerland-based researchers and notably specialise in survey datasets. The Swiss Data and Service Center for humanities (DaSCH) offers services for complex qualitative datasets exploiting linked data, but costs are non-negligible. They can of course be covered by SNSF grants. The SwissUniversities DLCM project also intends to propose a national repository sometime in 2020, offering 20 GB of free storage, a DOI and 10+ years of guaranteed conservation, but other parameters are unknown at this time.

If you have any favourite disciplinary repository we should know about, please let us know! Send an e-mail to researchdata@graduateinstitute.ch. Feel free to also contact us if you need any additional information on repositories in general.