Skip to content

Requesting data out of the TRE

What is allowed out (TRE data export policy)?

Individual level data are not allowed out of the TRE. Any data out requests are reviewed by the Genes & Health core team to make sure they do not contain individual level data.

Please keep files simple, e.g. text only (can be .txt, .csv, .tsv, etc.) or figures (e.g. .pdf, .png, .jpg). Powerpoint, Excel, Word formats are also OK.

Binary files

We cannot review binary files, nor R, parquet, feather, arrow etc. - these will be rejected.

Warning

In code files there can sometimes be unreadable data (e.g. hexadec image in .ipynb notebooks). These will result in download request rejection. Please check your code files are text only.

There is no problem with text files being very large.

Facilitating download approvals

To speed review of your request, please make it easier for the review system:

  • single datafile types (e.g. just pdf, not both pdf and png)
  • one huge datafile is easier than lots of datafiles
  • all files in one flat folder structure not lots of subfolders
  • Please .zip or .tar.gz files before requesting data out

Creating a .zip or .tar.gz file

If you are trying to download multiple files, please do not make loads of per-file download requests. Rather, create a tar archive containing the requested files. If the files are large (total >10Mb), please compress the tar file.

For example, if you wanted to compress a folder into a .tar.gz file:

tar -czvf backup.tar.gz /home/ivm/directory-of-files-to-export

This says: “Create (c option) a gzip-compressed (z option) archive of my directory-of-files-to-export folder, show me what’s happening (v option), and name it (f option) backup.tar.gz.”

See the How to Tar a File in Linux: Commands, Examples & Best Practices guide for more details (external unverified link)

Summary statistics

Summary statistics (e.g. by gene, variant or disease), graphs, etc. are all usually fine.

For small numbers of individuals, we will apply inference control (as advised by the Information Commissioners Office). Specifically, counts between 1 – 5 have the individual number replaced by the text “1to5”.

!!! info "TRE data export policy for small numbers/counts of individuals" For more information, please read the TRE data export policy document{target="_blank"}

Requesting data

You can make a request to download your results by right-clicking the file and selecting "request file download" for any file in:

/genesandhealth/red

or

/genesandhealth/pipeline

This sends an automated email to the Genes & Health team. If you have not received a response within 72h please feel free to chase us up. The team will copy the data to green_downloads (for users of your sandbox only. For small files, your data may be directly emailed (to the email address used to make the request).

Info

Please note that you can make one data out request per week.

The 'Trying to request more than 1 file to download.' error

The 'Trying to request more than 1 file' error message

If you get the 'Trying to request more than 1 file to download.' error, there is probably a space somewhere in your file path or filename. This throws the systems so, for example:

  • /genesandhealth/red/Joe Blogs/my_requested_file.tar (space in Joe Bloggs element of the path) or
  • /genesandhealth/red/Joe_Blogs/my requested file.tar (spaces in my requested file.tar)

will cause the error, but /genesandhealth/red/Joe_Blogs/my_requested_file.tar will not.

Note

If you get this error, rename your file and/or copy it into a path with no spaces. Alternatively, tar your files/paths with spaces to a single (space-free named) file.

Tip

Enter a linux/unix file system frame of mind and, if possible, avoid spaces in paths and files: /this_will/make/things-a-lot/simpler_v0.1.txt.

Existing data

There are a number of files in library-green that are available for download. These do not need a request to be made.

Accessing TRE data from external systems/internet

Users can download data from greendownloads or library-green using linux command line gcloud storage.

Alternatively, you can use the web-interface for your Sandbox specific green-downloads bucket, you can find the link for your sandbox using the table below:

Sandbox Link to green-downloads bucket
Sandbox 1 - QMUL +WSI Core Team Desktop https://console.cloud.google.com/storage/browser/qmul-production-sandbox-1_greendownloads
Sandbox 2 - External Academic Desktop https://console.cloud.google.com/storage/browser/qmul-production-sandbox-2_greendownloads
Sandbox 3 - GSK Desktop https://console.cloud.google.com/storage/browser/qmul-production-sandbox-3_greendownloads
Sandbox 4 - BMS Desktop https://console.cloud.google.com/storage/browser/qmul-production-sandbox-4_greendownloads
Sandbox 5 - MSD Desktop https://console.cloud.google.com/storage/browser/qmul-production-sandbox-5_greendownloads
Sandbox 6 - Takeda Desktop https://console.cloud.google.com/storage/browser/qmul-production-sandbox-6_greendownloads
Sandbox 7 - Pfizer Desktop https://console.cloud.google.com/storage/browser/qmul-production-sandbox-7_greendownloads
Sandbox 8 - S00050_FFAIR-PRS Desktop https://console.cloud.google.com/storage/browser/qmul-production-sandbox-8_greendownloads
Sandbox 9 - Maze Therapeutics Desktop https://console.cloud.google.com/storage/browser/qmul-production-sandbox-9_greendownloads
Sandbox 10 - Novo Nordisk Desktop https://console.cloud.google.com/storage/browser/qmul-production-sandbox-10_greendownloads
Sandbox 11 - University of Exter https://console.cloud.google.com/storage/browser/qmul-production-sandbox-11_greendownloads
Sandbox 13 - AstraZeneca https://console.cloud.google.com/storage/browser/qmul-production-sandbox-13_greendownloads
Sandbox 14 - External Academic, Consortium access https://console.cloud.google.com/storage/browser/qmul-production-sandbox-14_greendownloads
Sandbox 15 - 5 Prime Sciences https://console.cloud.google.com/storage/browser/qmul-production-sandbox-15_greendownloads
Sandbox 16 - Sandbox 16 https://console.cloud.google.com/storage/browser/qmul-production-sandbox-16_greendownloads
Sandbox 17 - Academic, NHS Digital access https://console.cloud.google.com/storage/browser/qmul-production-sandbox-17_greendownloads

From your external system, ideally Linux server rather than laptop if you are downloading lots of data (e.g. our GWAS).

Login to gcloud with:

gcloud auth login

Login with your username@genesandhealth.qmul.ac.uk that you use for TRE access from your browser. It is likely to ask you for 2 Factor Authentication either via phone or via a website link.

From a multicore Linux server, and especially if you are trying to transfer lots of data/files

gcloud storage ls gs://qmul-production-sandbox-2_greendownloads/  

To transfer file use:

gcloud storage cp <local-file-path> gs://<bucket-name>/<destination-path>