Runtime error in accessing s3 vcf data file
Code:
import tiledbvcf
import boto3
import os
import tempfile
import glob
import pandas as pd
small_ds = tiledbvcf.Dataset(‘small_dataset2’, mode = “w”)
with open(“s3-vcf-samples.txt”) as f:
sample_uris = [l.rstrip("\n") for l in f.readlines()]
small_ds.ingest_samples(
sample_uris,
scratch_space_path = tempfile.gettempdir(),
scratch_space_size=10
)
Error message:
RuntimeError Traceback (most recent call last)
in ()
4 sample_uris,
5 scratch_space_path = tempfile.gettempdir(),
----> 6 scratch_space_size=10
7 )
/home/ec2-user/SageMaker/tileDB-new/tiledbvcf/lib/python3.7/site-packages/tiledbvcf/dataset.py in ingest_samples(self, sample_uris, extra_attrs, checksum_type, allow_duplicates, scratch_space_path, scratch_space_size)
212 # Create is a no-op if the dataset already exists.
213 self.writer.create_dataset()
–> 214 self.writer.register_samples()
215 self.writer.ingest_samples()
216
RuntimeError: TileDB-VCF exception: Error processing sample; URI ‘s3://bucket/data_path/chr1-prefix.vcf.gz’ does not exist.
I tried exporting the access key and secret key as mentioned in the docs but still getting the same error.
Thanks.