Hello,
I’m testing the Python library for measuring performance differences between AWS and GCS. AWS reads/writes work fine, however GCS does not. Note I’ve been careful with keys and secrets, and something like:
aws s3 --endpoint-url https://storage.googleapis.com ls s3://my-bucket-in-gcs
works fine, as does mb and cp. Similarly, I can use boto3 in python to manipulate the GCS bucket using the s3 client seamlessly.
However, when using tiledb I’m always told:
Traceback (most recent call last):
File “dense_array_gcs.py”, line 36, in
tiledb.DenseArray.create(array_name, schema, ctx=ctx)
File “tiledb/libtiledb.pyx”, line 3387, in tiledb.libtiledb.Array.create
File “tiledb/libtiledb.pyx”, line 254, in tiledb.libtiledb._raise_ctx_err
File “tiledb/libtiledb.pyx”, line 239, in tiledb.libtiledb._raise_tiledb_error
tiledb.libtiledb.TileDBError: [TileDB::S3] Error: Cannot write object 's3://my-google-bucket/tiledb/__array_schema.tdb
Exception:
Error message: Unable to connect to endpoint with address : 172.217.12.80
Any ideas? Config looks as follows
config[“vfs.s3.region”] = “(region must be configured!)”
config[“vfs.s3.use_multipart_upload”] = “false”
config[“vfs.s3.multipart_part_size”] = 5000000000000
config[“vfs.s3.max_parallel_ops”] = 1
config[“vfs.s3.endpoint_override”] = “https://storage.googleapis.com”
config[“vfs.s3.aws_access_key_id”] = google_access_key
config[“vfs.s3.aws_secret_access_key”] = google_secret
I’ve removed all my company proxies and tried various combinations of these configurations.
thanks
Hi @cleader,
If you are using the PyPI binary package, I believe you are hitting an issue we recently found, that HTTPS addresses are only usable on CentOS-like systems. This will be fixed in the next release by: https://github.com/TileDB-Inc/TileDB/pull/1393
In the meantime, the quickest solution is the conda packages (conda ships its own certificates in a stable location, so the issue noted above does not apply). If you don’t have conda, a few quick steps:
Using the above, I’ve just tested the linux build from conda against GCS, successfully
One additional note: the region must be set either to a specific, real region, or to "auto"
.
Hope that helps,
Isaiah
1 Like
Here is a clean version of my test script, which should hopefully be usable by replacing the URI and credentials – the test will create a 5 element array at the given URI, and read it back:
import tiledb
import numpy as np
import sys
# update this
uri = "s3://your-bucket/array-path"
# read credentials from 'creds.nogit' file in current directory, newline separated:
# "key\nsecret"
key,secret = [x.strip() for x in open("creds.nogit").readlines()]
# gcs config
config = tiledb.Config()
config["vfs.s3.endpoint_override"] = "storage.googleapis.com"
config["vfs.s3.aws_access_key_id"] = key
config["vfs.s3.aws_secret_access_key"] = secret
config["vfs.s3.region"] = "auto" #"us-central1"
config["vfs.s3.use_multipart_upload"] = "false"
# context
ctx = tiledb.Ctx(config=config)
# create sample array if it does not exist
vfs = tiledb.VFS(ctx=ctx)
if not vfs.is_dir(uri):
print("trying to write: ", uri)
a = np.arange(5)
schema = tiledb.schema_like(a, ctx=ctx)
tiledb.DenseArray.create(uri, schema)
with tiledb.DenseArray(uri, 'w', ctx=ctx) as T:
T[:] = a
print("reading back from: ", uri)
with tiledb.DenseArray(uri, ctx=ctx) as t:
print(t[:])
1 Like
Hi Isaiah,
That helps immensely, thank you. I tried running on a CentOS machine and I have my GCS access functional.
Thanks again!
Chris