Tiledb cache management

Hello !
I was wondering a bit on how to optimize tile fetching and caching on arrays.

So from what I’ve gathered. The only way to cache a cloud array for now is by playing with the parameter sm.tile_cache_size. Which impact an uncompressed in memory LRU cache of fetched tiles by the current session.

The Issue is that, sometimes, you might want to keep a “generated” cache from one session to another ( as in, commit the cache to the client filesystem) ( For example on compute instances, on long lived sessions that you might have to restart at some point without loosing the cache, or have multiple tiledb clients opening the same cache etc… )

I found an old issue relating to that problem.

But I’m not sure if you had adressed it at some point and I’m missing something.

My current workaround is using minio gateway:

docker run  \
--name minio-s3-gateway \
-d \
-e MINIO_CACHE="on" \
-p 9000:9000 \
-v minio-cache:/data \
minio/minio gateway s3

(You can tune some more params for the caching policy )
It uses minio as a local cache for the fetched objects.
This has the benefit of faster metadata fetch on array opening.

I guess the question is: Is there a way to create a longer lived cache of fetched tiles that isn’t cleared when you close the client / can reuse between tiledb embeded clients ?

Thanks !

1 Like

Hi @tiphaineruy, this indeed would be a nice feature (and your workaround is valid). It’s not in our roadmap currently but we could work on it in the future.

If you don’t mind, could you please add it on
our feature request site? Our team will add a ticket to our backlog.

1 Like

Done @stravos : Support Array caching | Voters | TileDB

Though indeed it’s not a really high priority issue as the minio gateway workaround works. But I’m sure long term some edge cases might require a tighter integration with tiledb embeded.