Obviously, the index is going to change. But would it be possible to treat each tile as a separate object and then just place the next [x,10] matrix next to the old one?
Let’s first model the schema of your array for your use case, without thinking in terms of tiles. Tiling is used internally to tune your performance, and I will touch upon it later below.
You need a 2D sparse array, where the first dimension is time (if slicing on time is the most selective for your queries) and the second is price. Currently, TileDB supports only “homogeneous” dimensions, that is, the dimensions must have the same type (e.g., both FLOAT32/64 or UINT64). The next TileDB version (to be release in April) relaxes this constraint and you will be able to define different types for your dimensions (e.g., DATETIME_MS for time and FLOAT32/64 for price).
The dimension domains in your case can be arbitrarily large, so that when you “append” data, essentially they can get written anywhere in the 2D domain without issues. Defining huge domains in TileDB does not affect performance, as TileDB does not pre-populate anything until you write to the array.
You then need to define your attributes, which determine the values you store in the cells of your matrix above, so probably a FLOAT32/64 attribute.
“Appending” to the above array is not special. You just need to write triplets of cells in the form (time, price, value), where (time, price) are the cell coordinates and value is the attribute value you are writing in the cell. Writing to sparse TileDB arrays is explained here.
Now once you get the above working, you can tune performance by defining the space tile extents and tile capacity. We include a lot of performance tips here.
Please let us know should you need any further information.
Does TileDB automatically re-index the data (backfill NaN values into cells at a price level that didn’t exist before) or do I need to read everything out and re-index it myself before saving it back to the DB?
In TileDB sparse arrays are truly sparse. The NaNs in your diagram won’t exist, only cells which contain values will exist. In this way the changing indices with each write do not matter. When you perform a sparse write you specify the specific coordinates that are being written, so TileDB only saves those cells. Empty cells in a sparse array are not materialized on disk and are not returned when you issue a query.
In the python sparse write example there is a diagram which highlights that sparse writes avoid materializing anything for the empty cells. The white cells below are “empty”/“do not exist” on disk so we avoid your NaN issue of the changing indices.