Bulk Upload of VCF Files to Opal

Charlene Son-Rigby -

 

The bulk_upload.py utility allows high performance bulk upload of VCF files directly to Amazon S3. Each VCF file is assigned an Opal genome ID for later reporting.

Important caveats:

  • Bulk VCF upload must be enabled for your Opal workspace. Contact Fabric Genomics customer support to enable this feature.
  • This utility does not run reports automatically.
  • The genome's accession ID will be automatically set to the name of the VCF file 
  • The VCF files must be compressed with gzip before uploading.
  • Multiple genome IDs cannot be returned for any given job ID. Therefore only single-sample VCF files are fully supported in terms of tracking the relationship between job ID and genome IDs. Multisample VCFs can still be uploaded, but the genome IDs will need to be added manually. 

Any number of VCF files can be specified on the command line. The following command would upload two VCF files, for example.

$ bulk_upload.py one.vcf.gz two.vcf.gz

To save the genome IDs to a TSV file for subsequent processing by other scripts, such as running reports and submitting QC data for example, redirect the output to a file:

$ bulk_upload.py one.vcf.gz two.vcf.gz > genome_ids.tsv

 

 

 

 

 

Have more questions? Submit a request

Comments