Running Queries: Overview¶
A core functionality of DivBase is to allow users to checkout user-defined subsets of the VCF data and sample metadata contained in their project. In DivBase, the action of checking out data like this is called a query.
There are three query workflows:
- Query sidecar metadata only
- Query VCF data only
- Combine sidecar metadata filtering with a VCF query
All queries depend on that the prerequisites are fulfilled, as described in the next section.
Prerequisites¶
First, ensure that the VCF files fulfill the requirements described in Working with VCF Files in DivBase and have been uploaded to the DivBase project. Then ensure that the project's VCF dimensions cache is up-to-date by running:
# Submit a job to ensure that the project's dimensions cache is up-to-date
divbase-cli dimensions update --project <PROJECT_NAME>
# To check the progress of the dimensions update job, use
divbase-cli task-history user
To be able to query on sample metadata, you also need to upload a TSV file to the project, as described in the guide on Sidecar Metadata TSV files.
Once the VCF dimensions update job is finished and the dimensions cache of the project is up-to-date, continue with one of the query paths below.
Query paths¶
divbase-cli query vcf requires exactly one sample-selection mode. The table below covers all options.
| What you want to do | Command | Full guide |
|---|---|---|
| Find samples/files from metadata TSV | divbase-cli query tsv "<FILTER>" |
Sidecar Metadata TSV files |
| Submit a VCF query on a named list of samples | divbase-cli query vcf --samples "S1,S2" --command "view ..." |
DivBase VCF query syntax |
| Submit a VCF query on samples listed in a file | divbase-cli query vcf --samples-file samples.txt --command "view ..." |
DivBase VCF query syntax |
| Submit a VCF query on all samples in the project | divbase-cli query vcf --all-samples --command "view ..." |
DivBase VCF query syntax |
| Use a metadata filter to select samples for a VCF query | divbase-cli query vcf --tsv-filter "<FILTER>" --command "view ..." |
DivBase VCF query syntax |
Minimal examples¶
# Metadata query only
divbase-cli query tsv "Area:North"
# VCF query — named sample list
divbase-cli query vcf --samples "S1,S2,S10" --command "view -r 21:15000000-25000000"
# VCF query — all samples in the project
divbase-cli query vcf --all-samples --command "view -r 21:15000000-25000000"
# Combined metadata + VCF query
divbase-cli query vcf \
--tsv-filter "Area:North,West" \
--command "view -r 21:15000000-25000000"
What happens after query submission?¶
query tsvdirectly returns the sample IDs and VCF filenames in the project matching the query.query vcfsubmits an asynchronous job and returns a DivBase Task ID.- On successful VCF jobs, DivBase uploads a results VCF file with the subset data in the project's data storage.
Check VCF job status with:
# By the DivBase Task ID
divbase-cli task-history id <TASK_ID>
# Or by all the jobs submitted by the user
divbase-cli task-history user