Query vcf header usiung tiledbvcf

Hi,

I am new to the tileDB, and I just started to use tileDBvcf for a project. I wanted to know, how can I use tileDBVCF to query the metadata / vcf header data of a vcf file for eg information about refrence genome, info attribute, etc. I tried to look for it, but no results were obtained. Can anyone please help me with this?

Hello,

The VCF header for every sample is stored in the TileDB-VCF dataset. Here’s some python code that reads the VCF header for a sample using the TileDB Python API:

import tiledb

def get_vcf_header(dataset_uri, sample):
    uri = tiledb.Group(dataset_uri)["vcf_headers"].uri
    with tiledb.open(uri) as A:
        return A.df[sample].header[0]

dataset_uri = "__PATH_TO_VCF_DATASET__"
sample = "__SAMPLE_NAME__"

vcf_header = get_vcf_header(dataset_uri, sample)
print(vcf_header)

You may need to install the TileDB Python module in your environment.

pip install tiledb

Hope this helps.

1 Like

Yes, this works

Thankyou very much for the help