Hi, recently we upgraded tiledb-py in our application from 0.9.1 to 0.13.3, thus tiledb-embedded from 2.3 to 2.7.2). Apparently, somewhere in between something changed to how tiledb opens (dense) arrays on GCS, because after that upgrade we saw a sudden increase in cost, due to a steep increase in what google calls “class A operations”: listObject, ReadObjectMetadata, ReadObject calls.
A little testing shows that:
previously (libtiledb 2.3), opening an array (also created with that version) would issue 2 GetObjectMetadata and 2 ListObjects requests.
With libtiledb 2.7.2, simply opening an array (created with 2.7.2) issues 10 GetObjectMetadata and 12 ListObjects requests.
That is a serious increase.
I understand that for most users, the usage pattern is opening once, then reading a lot. Unfortunately, our implementation means many separate jobs (in k8s) need to open many different arrays (100s to 1000s). So in our case, this increase in requests means a serious increase in monthly cost (because one pays for classA requests). We already tried to mitigate this as much as possible by caching the open arrays in a single job, but it still feels like a waste of cost (and network traffic, thus latency, thus time, thus cost again).
Is there a way to avoid that simply opening an array results in this many List/Metadata requests?