Skip to main content

Research data management: Preserve and store data

Preserve

By taking steps to safeguard your data for long-term storage, you ensure that the data will be available for further study, reuse, or replication. The best way to ensure that your data are preserved is to deposit your data files in an established repository. Established repositories usually include automatic backup and preservation procedures.

Before you are ready to deposit your data, there are a few things you might want to consider to ensure your data are safe and sound. These include

  • Choosing open or widely used file formats
  • Storage and backup of your working files
  • Security of your file

File formats

Not all file formats are created equal, and it does matter which file formats you use. Try accessing a Microsoft Works file saved on a floppy disk. Although the hardware and software still exist to open the file, it isn't something that most of us could do on a whim.

While no file format is immune from obsolescence, there are characteristics to look for in a file format that will improve the odds that your data will be readable in the future.

Choose Open Formats

Open file formats are formats for which the specifications have been made publicly available. That means that any software developer, proprietary or not, can use the format. Many open file formats are so widely used that it's hard to imagine they will ever become completely obsolete. Some examples include:

  • plain text
  • csv
  • html
  • zip
  • gif

Choose Widely Used Proprietary Formats

If an open file format is not appropriate or available, save your files in widely used proprietary formats. Some proprietary formats, like those used in the Microsoft Office suite of products for example, will be readable for a very long time, if only because of their ubiquity.

Choose Less Lossy Formats

When your files are saved, they are compressed in order to save storage space. This is not a concern for text-based documents but for video, images, and sound compression can result in the loss of fidelity. If the resolution of your data is important to your research, explore file formats that compress files the least. The trade-off is that your files will be much larger in size and could be more challenging to store and backup.

Recommended File Formats

The following formats are recommended for long-term preservation of your data. The format you choose to work in might also depend on your discipline and particular research needs. 

Databases: XML, CSV

E-Books: EPUB

Images: JPG, PNG, PDF, TIFF, BMP

Sound: MP3, FLAC

Text: TXT, CSV, PDF/A, ASCII, UTF-8

Video: MPG, MOV, AVI

Spreadsheets: CSV

Storage and security

It's a good idea to keep a few copies of your data files in separate locations and to back them up throughout the life of your research project. No storage solution is perfect but by putting your data in multiple places, you can reduce the likelihood of completely loss of your data, whether accidental or intentional.

Consider the following storage options:

  • Portable storage (flash drives, external hard drives, etc.)
    • Pros: Portable storage is inexpensive and secure from unauthorized remote access
    • Cons: Storage devices are easily lost, stolen, or damaged
    • Bonus tip: Data on portable storage devices should be encrypted to help prevent unauthorized access
  • Cloud Storage (Google Drive, Dropbox, etc.)
    • Pros: Major cloud storage solutions are usually very secure. They simplify collaboration among research teams.
    • Cons: Although they are quite secure, they are connected to networks and could be subject to unauthorized remote access. There is no guarantee that the cloud storage option you choose will exist tomorrow
    • Bonus tip: With cloud options, your data could reside in a data centre outside of Canada and be subject to the laws of the host country.
  • Internal Network Storage (e.g. MRU H: Drive)
    • Pros: Likely stored on local servers and protected under Canadian law.
    • Cons: Even with robust security, internal storage may be subject to unauthorized access 

If your research involves human subjects, check with your REB about preferred storage solutions.

Of course, depositing a copy of your data in an institutional or subject repository is the best way of ensuring that a copy will be available.