Re-Indexing String Dimensions For Sparse Arrays

Dave_L · March 16, 2023, 2:03pm

Hi I was wondering if re-indexing string dimensions (via a move like operation) on sparse arrays is possible. For example, If I collect data sequentially in time and I don’t yet have the value of a certain property (let’s say it’s a string name), I can assign a dummy index of, say, an empty string "". But, when I eventually get the value of that property I would need to re-index it or else I would have duplicate copies of the same data, just at different indexes. Would it be better to store this property as an attribute in that case?

Shaun_Reed · March 16, 2023, 9:11pm

Hi @Dave_L, thanks for the question! TileDB indexing comes from tiling, so if the property is a dimension this would not be possible without a rewrite. To re-index (“re-tile”) data in a TileDB array, all data would need to be written again based on the updated string coordinates. Dimensions are used to provide data locality for fast slicing, which makes reading from TileDB arrays very efficient. See our documentation on Tiling & Cell Layout for a more detailed explanation. If there is no need to slice on this data vector, it would be a good candidate to use as an attribute.

Dave_L · March 17, 2023, 1:51pm

Thanks for the reply! I had to look in the glossary for what a “cell” and “tile” were. I still am trying to understand the differences between the various kinds of “tiles”, but will look at your reference.

Topic		Replies	Views
Quickly get dimension Mapping for sparse array	1	465	March 6, 2023
Confused by dimensions vs attributes need help designing backend for project	5	877	March 10, 2023
Subarray of 2D sparse array with double dimensions	10	1065	April 7, 2022
Am I wrongly filling a sparse array that has a variable length string attribute? Or is this a bug?	4	938	August 15, 2023
Using a multi-dimensional sparse array in python	4	1248	July 31, 2019

Re-Indexing String Dimensions For Sparse Arrays

Related topics