OVPrecision Data Upload
The uploading procedure consists of the following 5 steps which are explained in more detail below.
1. Create a pre-defined folder structure on your local computer
On your local computer, create the following nested folder structure, e.g., in Windows Explorer or Mac Finder.
<Sample-ID>/<Experiment-Date>/<Data-Stage>/
In this structure:
-
<Sample-ID>is a placeholder for the unique identifier for each sample. -
<Experiment-Date>is a placeholder for the date of the experiment in YYYYMMDD format, e.g., 20240131, as provided in the Experiment Date field in the metadata. For ctDNA and bkRNA-seq, it should correspond to the Sequencing End Date as provided in the metadata. This date is important for data tracking so please double-check carefully! -
<Data-Stage>represents the processing stage. It can take the following values,raw(corresponding to experiment)
derived(corresponding to analysis)
Examples:
myworkingdirectory/O-MABADAB/20250229/raw/
myworkingdirectory/O-MABADAB/20250229/derived/
2. Create metadata files
Up to 3 files describing the metadata must be provided per sample (described below). For each data type and stage, we provide an Excel template, which serves as a reference to illustrate the required structure and column headers. You are not required to use the templates directly, but the file you submit must exactly match their format.
Each template includes a header row followed by two rows showing the expected data types and an example. These two rows should be removed in your submission, which should include only the header and one row with your actual data.
2.1. Sample Summary
For each sample that exists in LabKey, you must provide a file summarizing the steps run for the sample. This requirement applies even to poor quality samples that are not processed further. In this case, leave empty all columns that are not applicable, i.e. you do not need to fill in the columns on experiment and analysis. We will automatically assume that they also failed. Please refer to table 1 for information on how the file should be named and where it should be saved.
Table 1: QC metadata file specification
| Field | Description |
|---|---|
| Template | QC_template.xlsx |
| File format | Tab-delimited text file (.txt or .tsv) |
| File name | <sample-ID>__QC_metadata.txtPlease note the double underscore after the sample ID! |
| File location | The file must be saved in the <Sample-ID>/<Experiment-Date>/ directorycreated in step 1, e.g., myworkingdirectory/O-MABADAB/20250229/O-MABADAB__QC_metadata.txt |
2.2. Experiment and Analysis
For all samples for which an experiment and/or analysis was performed, additional metadata files must be provided separately for each step. This data will then be automatically imported into LabKey.
Table 2: Experiment metadata file specification
| Field | Description |
|---|---|
| Template | Please refer to Table 4 |
| File format | Tab-delimited text file (.txt or .tsv) |
| File name | <sample-ID>__experiment_metadata.txtPlease note the double underscore after the sample ID! |
| File location | The file must be saved in the corresponding experiment folder:<Sample-ID>/<Experiment-Date>/rawe.g., myworkingdirectory/O-MABADAB/20250229/raw/O-MABADAB__experiment_metadata.txt |
Table 3: Analysis metadata file specification
| Field | Description |
|---|---|
| Template | Please refer to Table 4 |
| File format | Tab-delimited text file (.txt or .tsv) |
| File name | <sample-ID>__analysis_metadata.txtPlease note the double underscore after the sample ID! |
| File location | The file must be saved in the corresponding analysis folder:<Sample-ID>/<Experiment-Date>/derivede.g., myworkingdirectory/O-MABADAB/20250229/derived/O-MABADAB__analysis_metadata.txt |
Table 4: Templates for the metadata file for different data types
| Technology | Data Stage | Metadata Template |
|---|---|---|
| Apricot | Raw | apx_raw_template |
| Derived | apx_derived_template | |
| Bulk RNA-seq | Raw | bkrna_raw_template |
| Derived | bkrna_derived_template | |
| ctDNA | Raw | ctdna_raw_template |
| Derived | ctdna_derived_template | |
| Flow Cytometry | Raw | fc_raw_template |
| Derived | fc_derived_template | |
| IMC | Raw | imc_raw_template |
| Derived | imc_derived_template | |
| Pharmacoscopy | Raw | pc_raw_template |
| Derived | pc_derived_template | |
| Targeted NGS (OCCA+) | Raw | occa_raw_template |
| Derived | occa_derived_template |
3. Name data files using pre-defined format
Filenames must follow this structure:
<Sample-ID>-<optional info>__<fixed name>.<ext>
The section before the double underscore starts with the OVPrecision sample ID and may include any additional information separated from the sample ID by a dash. The section after the double underscore is a fixed, standard name and extension that defines the file type and must be consistent across all samples.
For example:
O-MABADAB-my_sample_id1.my_run_id1__01.tiff
O-MABADAB__multiqc_report.html
4. Calculate md5sums
md5sums allow us to ensure that the files were not corrupted during the transfer from your local machine to LeoMed. Therefore, you need to provide md5sums for all files that you upload.
For detailed instructions on calculating md5sums using a convenient script, refer to this guide. Using the script is optional, and you are free to choose an alternative method if you prefer. You can submit either a single md5sum file that contains checksums for all files in the folder, or provide a separate md5sum file for each individual file.
After step 4, your data directories (raw and/or derived) will contain your data files, a metadata file and one or multiple files with md5sums. For example:
raw/
<sample-ID>-<any name>__01.tiff
<sample-ID>-<any name>__02.tiff
<sample-ID>__experiment_metadata.txt
checksums.md5
5. Upload files to LeoMed
We now have a local folder structure that could look, e.g., like this:
O-MABADAB/
20240131/
O-MABADAB__QC_metadata.txt
raw/
O-MABADAB__L001_R1.fastq.gz
O-MABADAB__L001_R2.fastq.gz
checksums.md5
O-MABADAB__experiment_metadata.txt
derived/
O-MABADAB__multiqc_report.html
checksums.md5
O-MABADAB__analysis_metadata.txt
ETH members need to be connected to the ETH network via VPN before starting the upload. External users must make sure they are using an IP address that was allow-listed during the onboarding process.
Instructions are provided for Linux and Mac users only. For Windows users, we highly recommend installing the Windows Subsystem for Linux (WSL), which then enables you to follow the same steps. Please find more details here. Contact us in case this is not a viable option for you.
Initial Configuration
Before you upload data for the first time via the command line (Mac/Linux), you need to go through the following configuration step. This needs to be done only once.
Add the following lines to the file ~/.ssh/config on your local computer. If this folder and/or file does not exist, you need to create it.
# biomed tenant
Host jump-biomed <jumphost-name>
HostName <jumphost-name>
IdentityFile <path-to-your-private-key-file>
User <username>
Port 22
Host biomed <login-node-name>
HostName <login-node-name>
User <username>
IdentityFile <path-to-your-private-key-file>
IdentitiesOnly yes
ProxyJump jump-biomed
The placeholders indicated by <> need to be replaced by actual values specific to your system.
<username>,<jumphost-name>and<login-node-name>are in the email you received from LeoMed support during the onboarding process. This email includes a command structured as shown below, which shows the actual values to be used for each placeholder:
ssh -J <username>@<jumphost-name> -l <username> <login-node-name><path-to-your-private-key-file>is the path to the file containing your private ssh key. When you requested access to LeoMed, you had to generate a public-private key pair, and you shared the public key with LeoMed support. Here, you need to provide the path to the matching private key file.
Once this initial setup is complete, you can use the rsync commands in the next section, or log in to LeoMed as ssh biomed.
Data Upload
From the command-line interface, navigate to the working directory where you created your main folder in step 1.
cd <working directory>
# for example:
cd /home/myprojects/ovprecision/imc
From the command-line interface, run the following command to copy the content of your local folder to your remote folder. Note that there should be NO forward slash ("/") after the local folder name. All required nested folders will be automatically created on LeoMed.
rsync -avzPR <local-data-folder> biomed:/cluster/work/tumorp/ovprecision/data-drop/<technology>/
The <technology> placeholder should be replaced by one of:
apxbkrnactdnafcimcpcyocca+
For example:
rsync -avzPR O-MABADAB/20250229 biomed:/cluster/work/tumorp/ovprecision/data-drop/imc/