I’ve been using TileDB for over a year now, and one issue I’ve always been facing is figuring out the best schema for a given array.
Would it be possible to add a feature for recommending schemas?
I imagine a function to which you pass a sample of your data, specify the dimension- and attribute names & dtypes, and it would then automatically run various combinations of dimension-tiling, filters, capacities, cell-orders, and return how much storage each one occupied, and how long it took to load all & partial data. It would also be useful if it pointed out flaws, such as irregular data density (when dealing with higher-dimensional sparse arrays), and sub-optimal order of the dimensions.
I suppose you already have something similar for internal testing.
Also posted here: