Skip to content

Quick Start Guide

This guide will walk you through everything you need to start managing and querying your VCF files.

Prerequisites

  • Python 3.12 or higher
  • VCF files and sample metadata in TSV format

Step 1: Create an account

Create an account on DivBase and make sure to verify your email address.

Step 2: Join or create a project

Option 1: Join an existing project

If you have an account you can be added to an existing project by a project member with the role manager. Ask them to add you and give them the email address you used to sign up with.

Managers can see this guide here on how to add members to a project here (TODO).

Option 2: Create a new project

To create a new project, you'll need to contact us at TODO@scilifelab.se with the following information:

  • Desired project name
  • A brief description of the project
  • Your registered email address

Step 3: Install divbase-cli

Install divbase-cli using pipx (recommended):

pipx install divbase-cli

If you do not have pipx installed, you can install it by following the official instructions from pipx. Refer to the Installation Guide for more detailed instructions or other ways to install divbase-cli.

Step 4: Add your project(s) to your divbase-cli config

Add your project(s) to the configuration file:

divbase-cli config add PROJECT_NAME --default

By setting this as the default project, you won't need to specify the project name in future commands. To select a different project in any future command, you can add the --project flag to any command. Note that the rest of the example commands in this quick start guide will omit the --project flag for brevity.

Note

On the divbase website you can see your project's name(s) under the "Projects" tab after logging in.

Step 5: Log in

Log in to DivBase:

divbase-cli auth login EMAIL_ADDRESS

Step 6: Upload files

Upload your VCF files to your project:

# Upload a single VCF file
divbase-cli files upload path/to/your/file.vcf

# Upload multiple files
divbase-cli files upload path/to/file1.vcf path/to/file2.vcf

# Upload all VCF files in a directory
divbase-cli files upload --upload-dir /path/to/directory/

Check your uploaded files:

divbase-cli files ls

Step 7: Dimensions update

For DivBase to be able to efficiently handle the VCF files in the the project, some key information about each VCF files is fetched from the files and cached in the server. In DivBase, this is refered to as "VCF dimensions". These include for instance which samples and scaffolds that a VCF file contains.

Update the project dimensions after uploading your files.

divbase-cli dimensions update

This submits a task to the DivBase task management system. The task will wait in a queue until the system is ready to work on it. Depending on the size of the VCF files, this make take a couple of minutes.

Notes

  1. Please note that it is not possible to run VCF queries in DivBase until the dimensions update task has finished. The reason for this is that the VCF queries use the dimensions data ensure that the queries are feasible and to know which VCF files from the project to process.

  2. Please also note that the divbase-cli dimensions update command needs to be run every time a new VCF or a new version of a VCF file is uploaded.

Step 8: Confirm dimensions update job completion

Check the task history to confirm the dimensions update job has completed:

divbase-cli task-history user

Once complete, you can run any queries on the uploaded data.

It is possible to inspect the cached VCF dimensions data for the project at any time with the command:

divbase-cli dimensions show

Step 9: Upload sample metadata

DivBase can checkout data based the VCF files themselves, but can also take an optional sidecar sample metadata file into account. The metadata file must be a TSV (tab-separated variables) file. The metadata contents of the file is defined by the users. If the VCF dimensions command has been run for the project, the cached dimensions data can be used create a template where the samples of the project have been pre-filled:

divbase-cli dimensions create-metadata-template

Details on how to write this file are given in Sidecar Metadata TSV files: creating and querying sample metadata files. In short, the first row starts with # and contains the headers for different metadata columns. The first column (Sample_ID) is mandatory and can be created by the system as just described; if created manually just make sure that each sample name is spelled exactly as in the VCF files. The rest of the columns are free for the user to define.

Example of a sidecar metadata TSV file with the mandatory Sample_ID column and two user defined columns.

#Sample_ID Population Area
129P2 1 North
129S1 2 East
129S5 3 South

Note

Please use a text editor that preserves the tabs when the file is saved. Incorrect tabs can lead to issues with running metadata queries in DivBase.

There is a command to help check that the sidecar metadata TSV is correctly formatted for use with DivBase. Running it is optional:

divbase-cli dimensions validate-metadata-file path/to/your/sample_metadata.tsv

When you are happy with the sample metadata file, it should be uploaded to the DivBase project with the following:

divbase-cli files upload path/to/your/sample_metadata.tsv

Step 10: Run your queries

There are three types of queries in DivBase:

  • Sample metadata query
  • VCF data query
  • Combined sample metadata and VCF data query

Note

Queries are one of the more complex aspects of DivBase and therefore the user is encouraged to read the section on Running Queries after reading this quick start.

Running sample metadata queries

As an example, let's assume that user-defined sidecar sample metadata TSV file contains a custom column name Area and that Northern Portugal is one of the values in the column. To filter on all samples on that column and value:

divbase-cli query tsv "Area:Northern Portugal"

Note

Please see Sidecar Metadata TSV files: creating and querying sample metadata files for more details on the syntax for writing sample metadata queries.

Running VCF data queries

DivBase uses bcftools to subset VCF data and therefore the query syntax is based on bcftools view syntax (as described in the bcftools manual).

For instance, to subset all VCF files in the project on a chromosomal region in a scaffold named 21:

divbase-cli query bcftools-pipe --command "view -r 21:15000000-25000000"

The VCF queries can be combined with sidecar sample metadata queries with --tsv-filter and the fixed expression view -s SAMPLES (where SAMPLES tells DivBase to use the results from the sidecar filtering as input for bcftools view -s). In this way, only the VCF files that fulfil the sample metadata query will be used in the bcftools subset commands. An example:

divbase-cli query bcftools-pipe --tsv-filter "Area:Northern Portugal" --command "view -s SAMPLES; view -r 21:15000000-25000000"

Note

DivBase only allows bcftools view in its query syntax and no other bcftools commands. The merge, concat, and annotate commands are used when processing a query, but should not be defined by the user.

Step 11: Download any results files

You can check the status of all of your submitted jobs using:

divbase-cli task-history user

Once a bcftools-pipe job is complete, you can download the resulting merged vcf file:

divbase-cli files download merged_[JOB_ID].vcf.gz # --download-dir path/to/save/results/

Replacing [JOB_ID] with the actual job ID from the task history.

Next steps

TODO - a selection of links to more detailed user guides

Getting help

If you run into issues:

  1. Check the output: Read the error message
  2. Consult and search the docs: Full documentation
  3. Get help on the command you're running: divbase-cli COMMAND --help

To get assistance from us you can either send us an email (TODO - link) or report an issue on our GitHub Issues.

Common Issues

Authentication Issues
  • Make sure you've verified your email address
  • Check if you can login to your account on the DivBase Website. If it fails on the website it will also fail on the CLI.
  • Try logging out and back in: divbase-cli auth logout then divbase-cli auth login