How to access tiledbvcf dataset put in AWS s3 bucket

– try to access tiledbvcf dataset already set up in s3 bucket (done using cli)

uri = ‘s3://some-bucket’
ds = tiledbvcf.Dataset(uri, mode = “r”)
ds.samples() # to list sample names

→ error:
RuntimeError: TileDB-VCF exception: Cannot open TileDB-VCF dataset;…
Unable to parse ExceptionName: PermanentRedirect Message: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.

– then try to access tiledbvcf dataset already set up in s3 bucket again, this time with config file with AWS credentials

cfg = tiledb.Config()
cfg[“vfs.s3.aws_access_key_id”] = “<key_id>”
cfg[“vfs.s3.aws_secret_access_key”] = “<access_key>”
cfg[“vfs.s3.region”] = “<region_>”

– able to see that config attributes correctly assigned, by using:

for p in cfg.items():
print(“"%s" : "%s"” % (p[0], p[1]))

– try again with:

uri = ‘s3://some-bucket’
ds = tiledbvcf.Dataset(uri, mode = “r”, cfg = cfg
)
ds.samples()

→ error:
AttributeError

File ~/…/python3.10/site-packages/tiledbvcf/dataset.py:122, in Dataset.init(self, uri, mode, cfg, stats, verbose, tiledb_config)
120 self.reader = libtiledbvcf.Reader()
121 self.reader.set_verbose(verbose)
→ 122 self._set_read_cfg(cfg)

File ~/…/python3.10/site-packages/tiledbvcf/dataset.py:137, in Dataset._set_read_cfg(self, cfg)
135 if cfg is None:
136 return
→ 137 if cfg.limit is not None:
138 self.reader.set_max_num_records(cfg.limit)
139 if cfg.region_partition is not None:

AttributeError: ‘Config’ object has no attribute ‘limit’

(
I’m using:
tiledb v0.23.0
numpy v1.23.5
tiledb-vcf v0.26.0
)

Could you tell me if I’m using the config file wrong please?

Hello @Carmen_Chan , thanks for posting the question. You are correct that for your original error message about “PermanetRedirect” this indicates from AWS that you need to set the region to access the bucket. You also correctly set the region on the configuration. The only change needed is when using TileDB-VCF it has TileDB-VCF specific python object for the configuration.

Please try the following, with the important line being the line that introduces tiledbvcf.ReadConfig:

cfg = tiledb.Config()
cfg[“vfs.s3.aws_access_key_id”] = “<key_id>”
cfg[“vfs.s3.aws_secret_access_key”] = “<access_key>”
cfg[“vfs.s3.region”] = “<region_>”
read_cfg = tiledbvcf.ReadConfig(tiledb_config=cfg)

uri = ‘s3://some-bucket’
ds = tiledbvcf.Dataset(uri, mode = “r”, cfg = read_cfg)
ds.samples()

Thank you very much! It’s working now :slight_smile: