Hi all!
I have only just begun playing around with TileDB and was hoping that someone more experienced may be able to tell me whether it is a good fit for my problem.
I develop software for radio interferometry. We have long used a custom format called a measurement set (although the storage backend is actually something called a casacore table). This is a columnar database which supports storing large arrays as columns. It also supports a subset of SQL. Unfortunately, it lacks the thread-safety/parallelism needed as we move to distributed/cloud computing.
Our specific use-case seems to be somewhat rare. We need a relational database (i.e. supporting SQL-like queries) which can store chunked array data. Our large data products will typically consist of (number_of_rows, number_of_channels, number_of_correlations) arrays. Typical values for row are 1M+, channel between 1k and 32k and correlation <=4. In addition to these large arrays, we also store a variety of metadata columns e.g. the time at which each measurement occurred, the specific coordinates at which the measurement was taken etc. It is often these columns of metadata (which may also be multi-dimensional) which need to be used to query the large data products i.e. select a range of times, reorder on a different condition etc. We also store a number of subtables containing ancillary information such as telescope state, which would need to be preserved if we moved to a newer format. To top it all off, our largest data products are complex valued and this is rarely supported (natively).
I have done a little playing around with TileDB and it could definitely store the individual arrays. The point at which I got stuck (likely due to ignorance, I have almost no experience with databases), is composing multiple arrays into a database structure. As an example, consider two TileDB arrays called TIME and DATA. TIME is a 1-D array of time values with dimension t and DATA is a 2-D array of data values with dimension (t,f). I need a way to compose a database such that their t axes are aligned and that it is possible to do things like “SELECT DATA WHERE x < TIME < y”. I could not figure out if this was possible with TileDB.
Thanks in advance if you had the patience to read the above!