Banner
Open Data
Open Research Data
Open research data benefits both original researchers and the broader research community.
The benefits of open data
- Increases transparency and trust in your findings
- Reduces barriers to research, allowing others to verify, reuse, or build on your work
- Forms part of the scholarly record along with publications, reports, and other products of research that are increasingly recognized in research assessment
- Can increase the profile of the research, lead to more citations and greater impact and recognition
- Helps meet funder and publisher requirements
- Is easier to do than you might think, especially with trusted repositories
“I have to share all of my data”
Most data policies only require you to share or deposit data that directly supports research findings. Nearly all open data requirements make exceptions for confidential or proprietary data.
“Open data isn’t useful unless it’s big”
Even small or partial datasets can be useful for verification or replication, or may be combined with other data for some types of secondary research.
“Open means that anyone can do anything with my data”
Many repositories allow you to choose licenses and access levels with which you are comfortable.
“It’s too complicated to publish data”
Repositories provide support to guide you through the process. Collecting data with the intent to make it open at the end can greatly reduce the amount of work required to make your data publication-ready.
“There’s no benefit to sharing qualitative data”
While some qualitative methodologies depend on direct relationships with participants, qualitative data may help readers better understand the research findings and lead to novel secondary uses.
“Data sharing provisions will make it harder to recruit participants”
Many participants want their contributions to have the widest possible impact on research. Others can opt out of data sharing when providing consent.
Getting Started
with Open Research Data
Quick guide to FAIR data
You don’t need to be a data scientist to meet FAIR principles. Here are the basics to get you started.
FAIR Principle | Key Step |
---|---|
Findable | Deposit data in a searchable repository that assigns a DOI |
Accessible | Choose open access or, if restricted, clearly described access conditions |
Interoperable | Use open or widely available formats like CSV, TXT, or R |
Reusable | Add a README file, license, and information about how the data was collected |
What to deposit
Identify the data that supports your research findings and include any information that another researcher would need to validate your findings or reuse your data. Most datasets include at least some of the following:
- Clean data files in open formats (like CSV, TXT, R, JSON) or widely used proprietary formats (like XLSX, SAV)
- Codes or scripts
- Documentation like data dictionaries and codebooks
- Survey instruments or protocols
- Consent forms
Where to deposit
Choose a repository that fits your data type, size, and discipline. These are a number of good options to choose from:
Repository | Why deposit here? |
---|---|
MRU Data Repository | Locally managed and suitable for most research outputs |
FRDR | Accepts large data files that are highly discoverable |
Dryad or OpenICPSR | Used for discipline-specific, well-curated datasets |
A Deeper Dive
Into Open Data
Understanding
FAIR Data
To ensure your data has the greatest impact adhere as closely as possible to the FAIR guidelines, which outline characteristics that data and metadata should have in order to be optimally Findable, Accessible, Interoperable, and Reusable. FAIRness is a joint effort of researchers, data curators, and repository managers, with each helping to ensure that data and metadata are clearly described and readable by both humans and machines. The FAIR principles are inclusive of all types of research data, even data that is restricted due to sensitivity or intellectual property concerns.
Findable
Make your data easy to locate by assigning an identifier, writing good metadata, and using searchable repositories.
- Ensure your dataset has a unique identifier. Choose a repository that assigns a persistent identifier (like a DOI) so your data can be reliably cited and found over time.
- Describe the dataset with rich metadata. Include details like title, author(s), keywords, abstract, and subject area so your data can be easily indexed and discovered.
- Deposit in a searchable repository. Choose a repository (like MRU Data Repository or FRDR) that is indexed by search engines and research platforms.
- Use clear, standardized titles. Avoid acronyms or unclear shorthand—use full, descriptive titles that others can understand.
Accessible
Let others know how to access your data—whether it's open or has conditions for use.
- Enable open access when possible. Whenever appropriate, allow users to download the data directly, without needing special tools, software or permissions.
- Explain any access restrictions. Clearly describe who can access restricted data, why it's restricted, and how someone can request access.
- Keep metadata visible. Ensure the dataset’s descriptive information remains accessible even if the data files are not openly shared.
Interoperable
Use widely accepted formats and metadata standards so others (and machines) can use your data.
- Use open or widely-supported file formats. Prefer formats like CSV, TXT, JSON, or XML. Avoid proprietary formats unless they are standard in your field.
- Apply metadata standards. Choose a formal metadata schema (e.g. Dublin Core, DDI) suitable for your discipline and data type.
- Define variables clearly. Document units of measure, column names, codes, and values to make your data interpretable by others.
Reusable
Make your dataset easy to understand and legally usable for others who were not involved in your project.
- Provide detailed documentation. Include a README file, data dictionary, codebook, and any contextual information needed for reuse.
- Add a clear reuse license. Apply a license (such as CC BY) to let others know how they can use your dataset legally.
- Follow disciplinary norms. Format and structure your data according to what is typical in your field to support trust and adoption.
Publishing Your
Data
Publishing FAIR data requires consideration of both the structure of the data and the repository. The MRU Data Repository incorporates several elements that will help you make your data FAIR, including persistent identifiers, licensing, metadata fields that conform to standards across disciplines, and integration with discovery tools. Other elements of FAIRness may be incorporated by depositors prior to and at the time of deposit.
Documentation
For data to be reusable, it must be well-documented. Documentation should include the contexts of data collection, including dates and locations, methods, sample or population, participant consent, and any other information that a secondary user would need to accurately use the data. Documentation may also include codebooks, data dictionaries, and instructions for using specialized software or code. If possible, documentation should be shared even if data files are restricted.
File Formats and Structures
Using open, non-proprietary formats helps ensure preservation of data files. Common open formats include TXT, CSV, JSON, XML, and JPEG. If it’s not possible to use open formats, try to use formats that are widely used in your discipline. The MRU Data Repository will automatically convert tabular data from Excel, SPSS, Stata, and R to an open TAB format, when possible, so that users may download data in either the original or open format.
The format or structure of the data depends on its type and disciplinary norms, but it should be as clean as possible and easily decipherable by a disciplinary expert. That may mean defining headings and units of measure, standardizing data formatting, removing or masking identifiers, and addressing null values.
Metadata
When depositing data, fill in all relevant metadata fields that describe the dataset and its component files. The MRU Data Repository allows depositors to add variable-level metadata for some types of data. Variable-level metadata greatly improves discoverability of datasets and allows users to explore the data prior to download.
Data
Repositories
Repository choice may be based on data type, discipline, or personal preference. Some repositories accept only specific types of data (e.g. GenBank, PANGAEA), while others are more general in scope. The following repositories support FAIR data deposit.
Name | Type | Good For | Key Features | Limitations | Access Model |
---|---|---|---|---|---|
MRU Data Repository | Institutional | Most MRU researchers |
|
5 gb file size limit, multiple files permitted |
|
Federated Research Data Repository (FRDR) | General | Large datasets |
|
Open datasets only |
|
Dryad | General but focus on biological sciences | Biological sciences researchers |
|
|
|
OpenICPSR | Subject - Social, behavioural, and health sciences | Social science data |
|
No curation resulting in variable quality of data and documentation |
|
Open Science Framework | General | Collaborative projects |
|
5 gb limit for private projects |
|
Open Research
Data Policies
Many research funding agencies and academic publishers now require or encourage researchers to make their data openly available. This section outlines the most relevant policies and expectations for researchers preparing grants or manuscripts.
Funder Policies
Tri-Agency Research Data Management Policy
The Tri-Agency Research Data Management Policy requires researchers to deposit data underlying publications and pre-prints resulting from agency-funded research.
- Deposits should be made at the time of publication.
- Data is not required to be open, but openness is strongly encouraged when possible. As of now, the requirement applies to a limited number of grants, but it is expected to expand by 2026.
Other Funder Policies
Genome Canada
Data Release and Sharing Policy
Heart & Stroke Foundation
Open Access to Research Outputs Policy
National Institutes of Health
Data Management and Sharing Policy
Publisher Policies
Publisher data policies usually include terms for both open data and data availability statements.
Open Data
A few things to know about journal data policies:
- Open data requirements of publishers vary widely from journal to journal, ranging from encouragement to requirement that authors make the data underlying articles openly accessible.
- Some publishers, like Taylor & Francis and Wiley, use tiered open data policies, with journals under the banners of those publishers typically adopting one of those tiered policies. For example, Taylor & Francis’ Basic Data Sharing Policy only encourages data sharing, while its Open & FAIR Policy comes with strict data sharing and licensing requirements.
- Open data policies may only apply to certain types of articles and most journals will make exceptions for reasons related to confidentiality or intellectual property.
Please Note
Researchers should review journal data policies early in the research process - ideally prior to applying for ethics approval or submitting manuscripts. This ensures that your data sharing plans align with the journal’s requirements and helps avoid unexpected barriers later.
Data Availability Statements
Data availability statements are a common requirement when submitting an article to a journal. Data availability statements require authors to indicate if the data underlying the article’s findings will be shared and, if so, how. When writing data availability statements consider:
- If you are using a repository, include the name of the repository and a persistent link to the dataset.
- If data will be available only by request, specify access procedures (e.g. prior ethics approval) and conditions of use (e.g. particular types of research).
- If you plan to share data only by request, consider archiving the data in a repository that permits access restrictions (like the MRU Data Repository). This approach ensures your data is securely preserved and discoverable, while still allowing you to control who can access the data and under what conditions.
- If data will not be shared, provide reasons (e.g. absence of participant consent)
Cambridge University Press Example Data Availability Statements
Learn MoreFinding Research and
Government Open Data
See the Statistics and Data subject guide for lists of resources or contact your subject librarian.