Network related errors when writing on gcs

Hi, we’ve recently replaced our network storage format by tiledb. We use gke (kubernetes on google cloud) for our processing, gcs for storage, and python as our language. Often, many simultaneous jobs try to write tiledb stuff to gcs, and sometimes we get network related errors, like:

{ "code": 504, "message": "", "errors": [ { "message": "", "domain": "global", "reason": "gatewayTimeout" } ] } } [INTERNAL])

or:

Upload part failed on: uninteresting_file_name/.tdb__tiledb_14 (Error in non-idempotent operation InsertObjectMedia: Service Unavailable [UNAVAILABLE])

Is there anything we could do (tune?) to mitigate these errors? They seem like temporal network errors, so maybe there is a setting to make tiledb retry more? Unfortunately, I saw lots of S3 related settings, but only a few gcs related ones in the docs…

Hi Vincent –

Thanks for reaching out! As you suggest, we might be able to mitigate this with retry/timeout tuning. I will introduce GCS-specific configuration parameters that mirror the configuration options that we currently have for S3. I expect to complete and release these changes this week.

Joe

Hi Joe, thanks for that. I saw the PR “Improve GCS retries” in github passing by (i’m following the repo, just to keep myself updated). Once a new 2.2 version is released (hopefully soon!), we’ll be happy to test it!

Vincent.

1 Like

I’ve just released v2.2.7 of the core that contains the patch you referenced. It has not yet been released in a Python package, but we will do that soon.

While adding the new configuration option, vfs.gcs.request_timeout_ms, I realized that we also had a bug in the retry path. Upgrading to the new core version may be enough to substantially improve your situation. If transient network issues persist, I would recommend increasing that configuration parameter from its default of 3000 to something higher.

We’re always interested in learning about new use-cases. If you need any further assistance, please email me directly at joe@tiledb.com.

Thanks!

Thanks for the 2.2.7 of the core lib. If I build that, would the existing python package automatically pick up the newer core library version? Or is the only way to use it to wait for a newer python package?
Btw, is there a way to check which core lib version is being used by the python package, something like tiledb.core.__version__)?

Unfortunately, you would need to build the Python package from source to link against the new version of the core library. However, you do not need to build the core from source. To get this immediately, I would recommend downloading the binary distribution of the from the release page (Releases · TileDB-Inc/TileDB · GitHub).

To build+install:

git clone https://github.com/TileDB-Inc/TileDB-Py.git
cd TileDB-Py
pip install -r requirements_dev.txt
python setup.py install --tiledb=/path/to/the/downloaded/tiledb/

Additionally, you can dynamically link to a new version of the core if you are using the conda package: Python - TileDB Docs

Otherwise, we will have an official release soon. Thanks!

Release binaries for TileDB-Py 0.8.6 with TileDB Embedded 2.2.7 are now available on PyPI.

Best,
Isaiah