Open Data
Research Data Management
Research data management is a general term describing how to organize, safeguard, and preserve data gathered in the course of research. Research data may become inaccessible due to destruction or loss, technical obsolescence, or poor organization and documentation. Good data management practices help ensure that research data is preserved for follow-up, replication, or further study.
Elements of RDM Include:
Organizing Your Data
Naming Your Research Data Files
Research projects can generate hundreds of data files. Short descriptive file names and a simple file hierarchy make these files easier to navigate and locate. Set up conventions for your project, document them for all team members, and be consistent.
Recommended conventions:
Include dates in file names, using YYYYMMDD format.
This format will allow you to sort your files chronologically.
Include abbreviated identifier, when possible
Abbreviations help reduce the size of file names. Meanings of abbreviations should be shared with the research team.
Very briefly describe the contents of the file
Use brief, clear language (e.g. 'questionnaire')
Avoid spaces or special characters in file names
Use underscores or capitals to separate words. Spaces and special characters do not always translate well between software types.
Use version numbers and/or dates within file names
These will more easily allow you to keep track of the sequence and development of research documents/files. Use one or two leading zeros in version numbering (v001).
Keep folder structures simple and folder names clear
Simpler structures speed up back up and make finding files easier
Documentation
Clearly document the steps you take throughout the research process. Documentation supports shared understanding among research teams, helps researchers recall the details of the methods and procedures of the research, and provides context if research data will be put to reuse or further analysis.
Documentation can be stored in numerous ways, but one of the best methods is to include a text file in folders containing research data and other research documents.
Consider documenting the following
- The background and context of the research project, including research team members
- Data collection methods at a very granular level
- Structure of files
- Procedures for data checking and validation
- Any modifications made to data
- Confidentiality and permissions
- Names of labels and variables
- Explanations of codes and classifications
Documentation should be as clear as possible. Will you or anybody else be able to decipher the research ten years down the road?
Metadata
For very large research projects, you might consider using an established metadata standard to describe the entire project, subsets of the project, or individual files. A metadata standard is simply a structured way of describing certain elements of the project or dataset.
Metadata standards vary, but many data repositories, disciplines, and organizations have developed specific metadata standards. For example:
- Darwin Core describes biological diversity by providing reference definitions, examples, and commentaries
- DDI (Data Documentation Initiative) describes data in the social and behavioural sciences
- CASRAI (Consortia Advancing Standards in Research Administration Information) describes research administration information
The UK Digital Curation Centre (DCC) maintains a comprehensive list of metadata standards to help you find the most appropriate standard for your research data: http://www.dcc.ac.uk/resources/metadata-standards.
If you are deciding which metadata standard to use, consult your Subject Librarian.
Preserve and Store Data
Preserve
By taking steps to safeguard your data for long-term storage, you ensure that the data will be available for further study, reuse, or replication. The best way to ensure that your data are preserved is to deposit your data files in an established repository. Established repositories usually include automatic backup and preservation procedures.
Before you are ready to deposit your data, there are a few things you might want to consider to ensure your data are safe and sound. These include
- Choosing open or widely used file formats
- Storage and backup of your working files
- Security of your file
File Formats
Not all file formats are created equal, and it does matter which file formats you use. Try accessing a Microsoft Works file saved on a floppy disk. Although the hardware and software still exist to open the file, it isn't something that most of us could do on a whim.
While no file format is immune from obsolescence, there are characteristics to look for in a file format that will improve the odds that your data will be readable in the future.
Choose Open Formats
Open file formats are formats for which the specifications have been made publicly available. That means that any software developer, proprietary or not, can use the format. Many open file formats are so widely used that it's hard to imagine they will ever become completely obsolete. Some examples include:
- plain text
- csv
- html
- zip
- gif
Choose Widely Used Proprietary Formats
If an open file format is not appropriate or available, save your files in widely used proprietary formats. Some proprietary formats, like those used in the Microsoft Office suite of products for example, will be readable for a very long time, if only because of their ubiquity.
Choose Less Lossy Formats
When your files are saved, they are compressed in order to save storage space. This is not a concern for text-based documents but for video, images, and sound compression can result in the loss of fidelity. If the resolution of your data is important to your research, explore file formats that compress files the least. The trade-off is that your files will be much larger in size and could be more challenging to store and backup.
Recommended File Formats
The following formats are recommended for long-term preservation of your data. The format you choose to work in might also depend on your discipline and particular research needs.
- Databases: XML, CSV
- E-Books: EPUB
- Images: JPG, PNG, PDF, TIFF, BMP
- Sound: MP3, FLAC
- Text: TXT, CSV, PDF/A, ASCII, UTF-8
- Video: MPG, MOV, AVI
- Spreadsheets: CSV
Storage and Security
It's a good idea to keep a few copies of your data files in separate locations and to back them up throughout the life of your research project. No storage solution is perfect but by putting your data in multiple places, you can reduce the likelihood of completely loss of your data, whether accidental or intentional.
Consider the following storage options:
- Portable storage (flash drives, external hard drives, etc.)
- Pros: Portable storage is inexpensive and secure from unauthorized remote access
- Cons: Storage devices are easily lost, stolen, or damaged
- Bonus tip: Data on portable storage devices should be encrypted to help prevent unauthorized access
- Cloud Storage (Google Drive, Dropbox, etc.)
- Pros: Major cloud storage solutions are usually very secure. They simplify collaboration among research teams.
- Cons: Although they are quite secure, they are connected to networks and could be subject to unauthorized remote access. There is no guarantee that the cloud storage option you choose will exist tomorrow
- Bonus tip: With cloud options, your data could reside in a data centre outside of Canada and be subject to the laws of the host country.
- Internal Network Storage (e.g. MRU H: Drive)
- Pros: Likely stored on local servers and protected under Canadian law.
- Cons: Even with robust security, internal storage may be subject to unauthorized access
If your research involves human subjects, check with your REB about preferred storage solutions.
Of course, depositing a copy of your data in an institutional or subject repository is the best way of ensuring that a copy will be available.
Find Research Data
Data Searching
The following search tools and repositories can help you find research data.
Multi-repository search tools:
- Google Dataset Search
Searches a wide range of data and statistics repositories, including major research data repositories like Dryad and Figshare. - Mendeley Data
Searches numerous research data repositories for data in a wide range of formats. - DataCite
Data registry that allows searching across registered datasets - Borealis, The Canadian Dataverse Repository
Search across Canadian institutional data repositories
Individual Repositories
Other Search Methods
Consider the following when searching for research data:
Did you find out about the data through an associated output (e.g. journal article, conference paper)?
- Go to the website of the publication (not necessarily through a library database - Google the journal title)
- Browse through back issues until you find the associated article or paper. Is the dataset listed as supplementary material?
- Check for a link somewhere on the publication's website about research data. Does the journal have a data publishing policy? Is there a repository of datasets associated with that publication?
- Search ICPSR's Bibliography of Data-related Literature, a list of publications based on data contained in ICPSR
- If you know of major repositories in that discipline, quickly search within them for the title of the article. Don't go too far down that rabbit hole, though. Only search the largest two or three repositories that you can think of
- Google (not Google Scholar) the title of the article within quotation marks and add the word data (outside of the quotation marks)
Are the data part of a large, ongoing project?
- See if the project has an associated website
- Search re3data for the name of the project. There might be a repository associated with the project
Are you looking for research data on a particular topic?
- Go to re3data and either browse through the list of repositories by discipline or, if your topic is broad, search for your topic. Keep in mind that re3data is a registry of repositories. When you are searching, you are searching for repositories, not datasets. Once you've identified potentially useful repositories, search within them for data on your topic
- Talk to your librarian
Planning and Data Archiving
Data Management Plans
It's a good idea to make a plan for the collection, documentation, storage, security, preservation, and access of your research data at the beginning of your project. Some research funders require you to submit such a plan as part of grant applications.
The Data Management Plan (DMP) Assistant is a tool created by the Portage Network to help you organize and document your plans for your research data. Simply create an account, start a new project, and provide responses to questions around your intentions for your data. Once you've completed the outline, a final DMP can be exported and appended to a grant application and/or shared with your research team.
- Data Management Plan Catalogue
From the Association of European Research Libraries, a sample of data management plans across disciplines. - Portage Training Resources
Includes exemplar data management plans used for digital humanities and mixed methods research
Data Archiving Requirements
It is becoming increasingly common for funding agencies and publishers to require that researchers make their data openly available. Take a close look at the policies of intended publications or granting agencies at the beginning of your research project for open data requirements.
Some notable funding agencies and publishers with open data policies:
- Tri-Agency Statement of Principles on Digital Data Management
- The Tri-Agency states that publicly funded research data should be open
- Some CIHR and SSHRC grants require that research data be published in an open repository. Expect these policies to be broadened in the near future
- Public Library of Science (PLOS)
- PLOS requires all authors publishing in its journals to make their data open
- Genome Canada
- All funding recipients are expected to make research data open
- SpringerNature
- Uses four policies on data sharing ranging from open data encouragement to requirement
- Science
- Requires authors to deposit data in a repository before publication
- National Institutes of Health
- Requires grant recipients to make data openly available when possible. Applications for grants larger the $500,000 require submission of a plan for data sharing
Making your data open
If you are required to make your research data open, check with the publisher or granting agency to determine if they have preferred repositories or restrictions on methods of publication. Most organizations are flexible, allowing deposit in any repository that is publicly accessible.
Three common places to archive your data are:
- Subject repositories
- These are usually managed by an association within a particular discipline and tend to contain data within very narrow disciplinary categories
- Multidisciplinary repositories
- There are numerous large multidisciplinary repositories that accept data within broader disciplinary categories (e.g. Dryad, Figshare)
- Many research universities now have research data repositories that accept data from all local disciplines. Data can often be published as supplementary material to other outputs (e.g. peer-reviewed articles) in other institutional repositories
Contact your librarian for additional support for archiving your research data.
Help
Links to Introductory Information
MANTRA – University of Edinburgh, series of training modules
Research Data Management DataGuide – University of British Columbia