TileDB for sparse time series pivot tables?

Mtrl_Scientist · March 5, 2020, 8:56pm

Is it possible to store financial time series pivot tables as tiles and append to them over time?

Obviously, the index is going to change. But would it be possible to treat each tile as a separate object and then just place the next [x,10] matrix next to the old one?

stavros · March 6, 2020, 3:45pm

Hello,

Thanks for reaching out!

Let’s first model the schema of your array for your use case, without thinking in terms of tiles. Tiling is used internally to tune your performance, and I will touch upon it later below.

You need a 2D sparse array, where the first dimension is time (if slicing on time is the most selective for your queries) and the second is price. Currently, TileDB supports only “homogeneous” dimensions, that is, the dimensions must have the same type (e.g., both FLOAT32/64 or UINT64). The next TileDB version (to be release in April) relaxes this constraint and you will be able to define different types for your dimensions (e.g., DATETIME_MS for time and FLOAT32/64 for price).

The dimension domains in your case can be arbitrarily large, so that when you “append” data, essentially they can get written anywhere in the 2D domain without issues. Defining huge domains in TileDB does not affect performance, as TileDB does not pre-populate anything until you write to the array.

You then need to define your attributes, which determine the values you store in the cells of your matrix above, so probably a FLOAT32/64 attribute.

“Appending” to the above array is not special. You just need to write triplets of cells in the form (time, price, value), where (time, price) are the cell coordinates and value is the attribute value you are writing in the cell. Writing to sparse TileDB arrays is explained here.

Now once you get the above working, you can tune performance by defining the space tile extents and tile capacity. We include a lot of performance tips here.

Please let us know should you need any further information.

Stavros

Mtrl_Scientist · March 6, 2020, 4:55pm

Thanks a lot for your extensive reply, Stavros!
I appreciate the time you put in it!

The lack of datetime support is no problem. I can just save it as int timestamp and convert it back when reading it out.

However, I still don’t quite understand how to append data when the indices change, as shown in the figure below:

Does TileDB automatically re-index the data (backfill NaN values into cells at a price level that didn’t exist before) or do I need to read everything out and re-index it myself before saving it back to the DB?

Kind regards,
Frederic

seth · March 6, 2020, 5:07pm

Frederic,

In TileDB sparse arrays are truly sparse. The NaNs in your diagram won’t exist, only cells which contain values will exist. In this way the changing indices with each write do not matter. When you perform a sparse write you specify the specific coordinates that are being written, so TileDB only saves those cells. Empty cells in a sparse array are not materialized on disk and are not returned when you issue a query.

In the python sparse write example there is a diagram which highlights that sparse writes avoid materializing anything for the empty cells. The white cells below are “empty”/“do not exist” on disk so we avoid your NaN issue of the changing indices.

Seth

Mtrl_Scientist · March 6, 2020, 9:42pm

Oh, my mistake!

I was confused about how sparse matrices work!

I got it working eventually, thanks for the help!

Topic		Replies	Views
How to model financial data using tiledb	8	1275	March 2, 2021
Handling Non-Standard Index Timesteps	3	996	May 17, 2020
Question on how to layout data	0	561	May 5, 2022
Is TileDB a good fit for my use-case? Help wanted	5	1546	March 31, 2022
Optimizing the reads for sparse arrays	9	748	June 27, 2023

TileDB for sparse time series pivot tables?

Related topics