I am new to the tileDB, and I just started to use tileDBvcf for a project. I wanted to know, how can I use tileDBVCF to query the metadata / vcf header data of a vcf file for eg information about refrence genome, info attribute, etc. I tried to look for it, but no results were obtained. Can anyone please help me with this?
The VCF header for every sample is stored in the TileDB-VCF dataset. Here’s some python code that reads the VCF header for a sample using the TileDB Python API:
import tiledb
def get_vcf_header(dataset_uri, sample):
uri = tiledb.Group(dataset_uri)["vcf_headers"].uri
with tiledb.open(uri) as A:
return A.df[sample].header[0]
dataset_uri = "__PATH_TO_VCF_DATASET__"
sample = "__SAMPLE_NAME__"
vcf_header = get_vcf_header(dataset_uri, sample)
print(vcf_header)
You may need to install the TileDB Python module in your environment.