Tiledb cache management

tiphaineruy · March 23, 2021, 4:52pm

Hello !
I was wondering a bit on how to optimize tile fetching and caching on arrays.

So from what I’ve gathered. The only way to cache a cloud array for now is by playing with the parameter sm.tile_cache_size. Which impact an uncompressed in memory LRU cache of fetched tiles by the current session.

The Issue is that, sometimes, you might want to keep a “generated” cache from one session to another ( as in, commit the cache to the client filesystem) ( For example on compute instances, on long lived sessions that you might have to restart at some point without loosing the cache, or have multiple tiledb clients opening the same cache etc… )

I found an old issue relating to that problem.

But I’m not sure if you had adressed it at some point and I’m missing something.

My current workaround is using minio gateway:

docker run  \
--name minio-s3-gateway \
-d \
-e MINIO_ROOT_USER=$PUB_KEY \
-e MINIO_ROOT_PASSWORD=$PRIV_KEY \
-e MINIO_CACHE="on" \
-e MINIO_CACHE_DRIVES="/data" \
-p 9000:9000 \
-v minio-cache:/data \
minio/minio gateway s3

(You can tune some more params for the caching policy )
It uses minio as a local cache for the fetched objects.
This has the benefit of faster metadata fetch on array opening.

I guess the question is: Is there a way to create a longer lived cache of fetched tiles that isn’t cleared when you close the client / can reuse between tiledb embeded clients ?

Thanks !

stavros · March 25, 2021, 5:49pm

Hi @tiphaineruy, this indeed would be a nice feature (and your workaround is valid). It’s not in our roadmap currently but we could work on it in the future.

If you don’t mind, could you please add it on
our feature request site? Our team will add a ticket to our backlog.

tiphaineruy · March 25, 2021, 7:12pm

Done @stravos : Support Array caching | Voters | TileDB

Though indeed it’s not a really high priority issue as the minio gateway workaround works. But I’m sure long term some edge cases might require a tighter integration with tiledb embeded.

cheers.

ghpu · March 6, 2023, 4:05pm

Minio gateway is now deprecated, it might be a good time to raise the caching issue again.

Topic		Replies	Views
Tile cache usage verification with Go API	6	155	July 3, 2024
Long Running Archives with TileDB	3	262	April 2, 2024
How can we tune TileDB s3 bandwidth utilization on 3D dense array reding operation?	6	998	June 28, 2021
Filters with dask.array.to_tiledb()	14	947	July 28, 2022
How would you expire data in a dataset?	1	465	September 26, 2023

Tiledb cache management

Related topics