GCS configuration

Hello,

I’m testing the Python library for measuring performance differences between AWS and GCS. AWS reads/writes work fine, however GCS does not. Note I’ve been careful with keys and secrets, and something like:

aws s3 --endpoint-url https://storage.googleapis.com ls s3://my-bucket-in-gcs

works fine, as does mb and cp. Similarly, I can use boto3 in python to manipulate the GCS bucket using the s3 client seamlessly.

However, when using tiledb I’m always told:

Traceback (most recent call last):
File “dense_array_gcs.py”, line 36, in
tiledb.DenseArray.create(array_name, schema, ctx=ctx)
File “tiledb/libtiledb.pyx”, line 3387, in tiledb.libtiledb.Array.create
File “tiledb/libtiledb.pyx”, line 254, in tiledb.libtiledb._raise_ctx_err
File “tiledb/libtiledb.pyx”, line 239, in tiledb.libtiledb._raise_tiledb_error
tiledb.libtiledb.TileDBError: [TileDB::S3] Error: Cannot write object 's3://my-google-bucket/tiledb/__array_schema.tdb
Exception:
Error message: Unable to connect to endpoint with address : 172.217.12.80

Any ideas? Config looks as follows

config[“vfs.s3.region”] = “(region must be configured!)”
config[“vfs.s3.use_multipart_upload”] = “false”
config[“vfs.s3.multipart_part_size”] = 5000000000000
config[“vfs.s3.max_parallel_ops”] = 1
config[“vfs.s3.endpoint_override”] = “https://storage.googleapis.com
config[“vfs.s3.aws_access_key_id”] = google_access_key
config[“vfs.s3.aws_secret_access_key”] = google_secret

I’ve removed all my company proxies and tried various combinations of these configurations.

thanks

Hi @cleader,

If you are using the PyPI binary package, I believe you are hitting an issue we recently found, that HTTPS addresses are only usable on CentOS-like systems. This will be fixed in the next release by: https://github.com/TileDB-Inc/TileDB/pull/1393

In the meantime, the quickest solution is the conda packages (conda ships its own certificates in a stable location, so the issue noted above does not apply). If you don’t have conda, a few quick steps:

Using the above, I’ve just tested the linux build from conda against GCS, successfully

One additional note: the region must be set either to a specific, real region, or to "auto".

Hope that helps,
Isaiah

1 Like

Here is a clean version of my test script, which should hopefully be usable by replacing the URI and credentials – the test will create a 5 element array at the given URI, and read it back:

import tiledb
import numpy as np
import sys

# update this
uri = "s3://your-bucket/array-path"

# read credentials from 'creds.nogit' file in current directory, newline separated:
#   "key\nsecret" 
key,secret = [x.strip() for x in open("creds.nogit").readlines()]

# gcs config
config = tiledb.Config()
config["vfs.s3.endpoint_override"] = "storage.googleapis.com"
config["vfs.s3.aws_access_key_id"] = key
config["vfs.s3.aws_secret_access_key"] = secret
config["vfs.s3.region"] = "auto" #"us-central1"
config["vfs.s3.use_multipart_upload"] = "false"

# context
ctx = tiledb.Ctx(config=config)

# create sample array if it does not exist
vfs = tiledb.VFS(ctx=ctx)
if not vfs.is_dir(uri):
  print("trying to write: ", uri)
  a = np.arange(5)
  schema = tiledb.schema_like(a, ctx=ctx)
  tiledb.DenseArray.create(uri, schema)
  with tiledb.DenseArray(uri, 'w', ctx=ctx) as T:
    T[:] = a 

print("reading back from: ", uri)
with tiledb.DenseArray(uri, ctx=ctx) as t:
  print(t[:])
1 Like

Hi Isaiah,

That helps immensely, thank you. I tried running on a CentOS machine and I have my GCS access functional.

Thanks again!
Chris