Choose the best file formats
Guidance on selecting file formats for long-term accessibility and interoperability
This page lists the file formats which are recommended for depositing in Edinburgh DataShare and Edinburgh DataVault. If you have a suggested update or a question about any of this advice, the Research Data Service team will be delighted to hear from you.
Introduction: why your choice of file format is important
Longevity of your research data
To ensure access and usability of your data to the broadest audience into the long term, the Research Data Support team encourages you to deposit standard preservation file formats to ensure the maximum longevity of your research data.
The digital preservation community recommends standard preservation formats because, either:
- they encode the information in a way that is software-independent or allows interoperability between systems and applications. Often, these formats are a recognised standard, or published in an open format. Some of these file formats might be proprietary but can be opened in different operating systems and with different programs or applications.
or
- the information is encoded with a lossless algorithm and for that reason there is no data loss when files are ‘saved as’ and stored in these formats.
We recommend that any files you deposit in Edinburgh DataShare, DataVault or an external repository should, where possible, be open, platform-independent or nonproprietary file formats.
Formats we recommend
File formats that the Research Data Support team supports and recommends are listed below.
Textual documents
- Adobe PDF /A (filename extension: ".pdf")
- Text (filename extensions: ".txt", ".asc", ".sts")
- OpenDocument - text (" .fodt", ".odt")
- Microsoft Word XML (".docx")
Tabular data
- Comma separated values (CSV) (".csv")
- Tab separated values (" .tsv", ".tab")
- OpenDocument - spreadsheet (".fods", ".ods")
- Microsoft Excel XML (".xlsx")
Images
- JPEG 2000 (".jpxml", ".jp3d", ".jpf", ".jpm", ".jpx", ".jp2")
- TIFF (".tiff", ".tif")
- PNG (".png")
- JPEG * "(.jpg", ".jpeg")
- Scalable Vector Graphics (SVG) (".svg")
Audio / Video
- AIFF (".aiff", ".aif", ".aifc")
- WAV (".wav")
- Free Lossless Audio Codec (FLAC) (".flac")
- MPEG-4 (".m4v", ".m4r", ".m4b", ".m4p", ".m4a", ".mp4")
- Motion JPEG2000 (".mjp2", ".mj2")
Geo-spatial data
-
Shapefile (".shp", ".shx", ".dbf", ".prf")
-
GeoTIFF (".tif")
Other
- Postscript .ps
- Structured Query Language (SQL) .sql
- OpenDocument - presentation .fodp, .odp
- Microsoft Powerpoint XML .pptx
- SAS syntax .sas
- SPSS syntax .sps
- Stata syntax .do, .dct
- Minitab syntax and output .lis, .tj
- R (ASCII, as opposed to the .rdata saved workspace file) .rdata
- XML (Extensible Markup Language) .xml, .sgml
- HTML (Hypertext Markup Language) .htm, .html
- CSS (Cascading Style Sheets) .css
- NetCDF Network Common Data Form .nc
* N.B. While the JPEG format is supported, depositors should be aware that we consider JPEG-2000 and TIFF (both being standard preservation formats) to be more interoperable for the long-term than JPEG. Depositors who value long-term sustainability may wish to add copies of their images which have been converted to JPEG-2000 to their deposit.
Other acceptable file formats
File formats such as the ones listed below have been deposited in the repository but are not considered standard preservation formats because they are either proprietary or system-, software- or version-dependent, are considered lossy (i.e. data are lost when compression is applied) or not as commonly-encountered as the ones mentioned above.
Most of these formats are widely used and it is likely we will be able to preserve them, but we cannot guarantee it. If you have files in these formats, you may deposit them in Edinburgh DataShare.
- BED .bed
- bedGraph .bg
- DBase, DBF .dbf
- EAF File .eaf
- Encapsulated PostScript (EPS) .epsi, .epsf, .eps
- FLT .flt
- HDF (Hierarchical Data Format) .he4, .h5, .hdf4, .h4, .hdf, .he5, .hdf
- LAB .lab
- Mathematica .nbp, .nb
- MatLab code .m
- ML source code file .ml
- MTRANS file .mtr
- Photo CD .pcd
- PSC .psc
- PFSX File .pfsx
- PITCH File .pitch, .PITCH
- PitchTier File .pitchtier, .PITCHTIER
- RESULTSMFC File .RESULTMFC
- TextGrid .textgrid, .TextGrid
- VTK (Visualisation ToolKit) .vtu
Formats which should be converted
Converting research data files from proprietary or software-dependent formats to a standard preservation format will help to avoid difficulties opening these files in the future. By using standard preservation formats, you are maximising the likelihood that most future potential users will be able to open the files.
If your research data include any of the following file formats, we recommend you convert them to the suggested standard preservation format, where it is possible to do this without compromising (i.e. losing or altering) the data. The converted files should then be deposited along with the original files.
Textual documents
Format name (original file) | Convert to (recommended preservation format(s)) |
---|---|
Encapsulated PostScript (EPS) (".epsi, ".epsf", ".eps") | TIFF |
RTF (".rtf") | OpenDocument format, Microsoft Word XML, PDF or plain text |
LateX (".ltx", ".latex") | Deposit .pdf files alongside these. |
TeX (".tex") | Deposit .bib and .pdf files alongside this. |
TeX dvi (".dvi") | |
WordPerfect (".w51", ".wp5", ".wp", ".wpd") | OpenDocument Format, Microsoft Word, plain text or PDF |
Tabular data
Format name (original file) | Convert to (recommended preservation format(s)) |
---|---|
MatLab binary data files (.mat) | CSV or plain text |
Microsoft Access (.mdb) | If practicable, export to multiple tables e.g. CSV, Excel and/or tab-delimited format. |
SPSS – We recommend SPSS users deposit syntax files and data files. Syntax files should be deposited in the .sps format, as generated automatically by SPSS. Whereas we recommend that the following SPSS data and system files be converted as follows: |
|
SPSS portable file (contains data) (filename extension: .por) | Deposit .sps (syntax) and .csv (data) files alongside these. |
SPSS binary data file (.sav, .gsav, .zsav) (aka system file) | Deposit .sps (syntax) and .csv (data) files alongside these. |
SPSS output file (.spv, .spo) | Convert to text, HTML or PDF, and deposit alongside these. |
Images
Format name (original file) | Convert to (recommended preservation format(s)) |
---|---|
BMP (".ddb", ".dib", ".bmp") | TIFF / JPEG-2000 |
NifTi (.img, .hdr, .nii) | It may be worth exporting a selection of still 2-D images as TIFF files for accessibility. |
Photoshop (.psd, .pdd) | TIFF / JPEG-2000 |
GIF (".gif") | TIFF / JPEG-2000 |
Audio / Video
Format name (original file) | Convert to (recommended preservation format(s)) |
---|---|
Audio (.au, .snd) | FLAC |
MPEG (.mpeg, .mpg, .mpe) | MPEG-4 |
Video Quicktime (.qtm, .mov, .qt) | MPEG-4 |
MPEG Audio (.m4a, .mpa, .abs, .mpega) | FLAC |
Flash Video (.f4b, .f4a, .f4p, .f4v, .flv) | MPEG-4 |
AVI Audio/Video Interleaved Format (".avi") | MPEG-4 |
Ogg Vorbis Codec Compressed Multimedia File (".ogg") | FLAC |
Compression archives
WARNING for DataVault depositors: files deposited into the DataVault are encrypted. We strongly discourage compression of data destined for deposit in the DataVault since, in combination with the encryption, this adds a considerable risk of the data becoming irretrievable over the long term.
Format name (original file) | Convert to (recommended preservation format(s)) |
---|---|
Compressed Archive File (".zip") |
If practicable, expand archive to submit individual files for ease of navigation, as long as number of files is less than 200. N.B. Mac users - we have found that .zip files larger than 4 GB created using Mac in-built zip functionality cannot be opened using on other platforms. Therefore we ask Mac users to use an alternative such as GNU zip (gzip) for zipping archives of that size. |
BZIP2 (.bz2, .bz) | If practicable, expand archive to submit individual files for ease of navigation, as long as number of files is less than 200. |
GZIP compressed archive file (".gz") | If practicable, expand archive to submit individual files for ease of navigation, as long as number of files is less than 200. |
Tarball (.tar, .tgz) | If practicable, expand archive to submit individual files for ease of navigation, as long as number of files is less than 200. |
RAR compression archive (".rar") | Zip or tarball instead |
Getting help
If you have research data in file formats that you are unsure about, need help converting your files to standard preservation formats, or simply want to discuss your needs with us, please contact us via the Contact box above.