Equivalent of zarr's get_coordinate_selection?

joe-m · December 28, 2021, 3:25pm

Hi,
I have a (geospatial) dataset which is a 2D dense array 21,600 x 43,200 with float values. From this set I would like to read a set of ~100,000 coordinate pairs (corresponding to locations) and get back an array of 100,000 floats. Performance is important in this use-case.

I have been using zarr and the get_coordinate_selection function for the same task and this works pretty well - and specifically it seems to treat the request as a ‘batch’ to ensure that the chunks needed are decompressed only once. However I would like to compare the performance to TileDB and also I am interested in database (especially Trino) integration. I was looking at multi_index, but this seems to take a cross_product of ranges which is a bit different.

How best to achieve this? I am using Python API but C#/C++ also fine. I have an uneasy feeling that I am just missing something in the documentation, in which case apologies in advance!

Thanks!

ihnorton · January 7, 2022, 2:50pm

Hi @joe-m,

Apologies for the delayed response. Point queries are not currently supported – only cross-product of ranges (as you observed). However, note that ranges do not need to be slices. For example:

a.multi_index[[1],[3,4]]

in Python corresponds to:

query.add_range(0,1,1)
query.add_range(1,3,3)
query.add_range(1,4,4)

in the C++ or C# API.

@stavros indicated to me that we will bump this up in the backlog for implementation (the plan is to support arbitrary subarray slices, not only point queries).

Best,
Isaiah

joe-m · January 21, 2022, 3:05pm

Hi Isaiah,

Thanks for the reply. Support for arbitrary subarray slices sounds even better and I think will be important for our use case: good to hear that is being bumped up.

Thanks,
Joe

joe-m · April 12, 2022, 2:32pm

Hi,

Please can I check: is there a GitHub issue for this support of subarray slices that I can track? We are very interested in this feature for the OS-Climate project.

Many thanks,
Joe

stavros · April 13, 2022, 11:58pm

Opened new issue here: https://github.com/TileDB-Inc/TileDB/issues/3076. We will try to scope it soon. Thanks for your patience.

Topic		Replies	Views
Multi-range query with R	1	25	January 16, 2025
Filters with dask.array.to_tiledb()	14	922	July 28, 2022
Tiledb performance with sparse point cloud data	8	881	March 23, 2023
Managing Large Geospatial Arrays with TileDB	3	671	September 29, 2023
Non-contiguous reads/writes for dense arrays	1	868	April 11, 2019

Equivalent of zarr's get_coordinate_selection?

Related topics