Equivalent of zarr's get_coordinate_selection?

Hi,
I have a (geospatial) dataset which is a 2D dense array 21,600 x 43,200 with float values. From this set I would like to read a set of ~100,000 coordinate pairs (corresponding to locations) and get back an array of 100,000 floats. Performance is important in this use-case.

I have been using zarr and the get_coordinate_selection function for the same task and this works pretty well - and specifically it seems to treat the request as a ‘batch’ to ensure that the chunks needed are decompressed only once. However I would like to compare the performance to TileDB and also I am interested in database (especially Trino) integration. I was looking at multi_index, but this seems to take a cross_product of ranges which is a bit different.

How best to achieve this? I am using Python API but C#/C++ also fine. I have an uneasy feeling that I am just missing something in the documentation, in which case apologies in advance! :slight_smile:

Thanks!

Hi @joe-m,

Apologies for the delayed response. Point queries are not currently supported – only cross-product of ranges (as you observed). However, note that ranges do not need to be slices. For example:

a.multi_index[[1],[3,4]]

in Python corresponds to:

query.add_range(0,1,1)
query.add_range(1,3,3)
query.add_range(1,4,4)

in the C++ or C# API.

@stavros indicated to me that we will bump this up in the backlog for implementation (the plan is to support arbitrary subarray slices, not only point queries).

Best,
Isaiah

Hi Isaiah,

Thanks for the reply. Support for arbitrary subarray slices sounds even better and I think will be important for our use case: good to hear that is being bumped up.

Thanks,
Joe