Hi I was wondering if re-indexing string dimensions (via a move
like operation) on sparse arrays is possible. For example, If I collect data sequentially in time and I don’t yet have the value of a certain property (let’s say it’s a string name), I can assign a dummy index of, say, an empty string ""
. But, when I eventually get the value of that property I would need to re-index it or else I would have duplicate copies of the same data, just at different indexes. Would it be better to store this property as an attribute in that case?
Hi @Dave_L, thanks for the question! TileDB indexing comes from tiling, so if the property is a dimension this would not be possible without a rewrite. To re-index (“re-tile”) data in a TileDB array, all data would need to be written again based on the updated string coordinates. Dimensions are used to provide data locality for fast slicing, which makes reading from TileDB arrays very efficient. See our documentation on Tiling & Cell Layout for a more detailed explanation. If there is no need to slice on this data vector, it would be a good candidate to use as an attribute.
Thanks for the reply! I had to look in the glossary for what a “cell” and “tile” were. I still am trying to understand the differences between the various kinds of “tiles”, but will look at your reference.