How are "Axes Labels" currently implemented?

Hi @cokelid, thanks for reaching out.

I understand the confusion and I am happy to share our thoughts and timeline. I will also update the Data Model section in the docs soon to avoid further confusion.

Let’s treat the dense and sparse array cases separately.

Sparse arrays

Since TileDB 2.0, sparse arrays support dimensions of different types, and of any type (e.g., floats, strings, datetime, etc). That makes “axes labels” for sparse arrays native. That is, you don’t need to maintain another level of redirection by mapping labels to some arrays integer indices. The array stores and searches natively on the axes labels which can be of any type.

Dense arrays

Internally, for performance purposes, TileDB supports only integer dimensions for dense arrays, and they are all homogeneous (so that we can template on a single datatype which makes the code faster). Therefore, if you want to support axes labels, currently you need to manually create, say, a sparse 1D array that maps strings to integer indices for each dimension. That will give you very fast lookups for the indices, and then you can apply the indices in a second query to the dense array.

We understand that this is cumbersome and we’d like much better behavior for dense arrays. Here is what we thinking about implementing. Although internally dense arrays need to have homogenous integer dimensions, at the array schema level (upon creation), we will allow the user to set dimensions of any type (similar to sparse arrays), effectively defining axes labels for dense arrays. We will offer various APIs for the user to provide the axes label vectors upon ingestion, and TileDB will practically create this two-layered indexing internally (i.e., it will maintain the extra mapping from labels to integers), without forcing the user to do so in separate arrays with separate URIs. Then, the user will be able to query either based on the labels on the indices. So, same implementation idea, but way better experience for the user as they will be interfacing natively with a single array.

I hope the above helps. We will be starting the implementation of this feature soon and we will try to get it done in Q1 2022.

1 Like