I’m working with a large array where I want to write data, read back parts of the data, and then write back the results. From my read of the (Python) Array base (https://docs.tiledb.io/projects/tiledb-py/en/stable/python-api.html#tiledb.libtiledb.Array), the mode is either read or write, not simultaneous read and write.
Is that correct, and what strategies can I use for mixed reading and writing with reasonable performance (I’m assuming opening/closing the array has some cost).
Indeed, currently you can open an array either for reads or writes. If you write to an array and you wish to read the newly written cells, the following sequence is important:
- Open the array in write mode
- Write cells
- Close array
- Open array in read mode
- Read cells
- Close array
You can write in parallel to the same array without any issues. You can read in parallel from the same array without any issues. But if you want to both read from and write to the same array in parallel, you need to be aware of consistency issues. That is, reads and writes are atomic and will complete without any issues, but what you read in each read query depends on which of the parallel writes have already completed and flushed (i.e., committed) to disk and in what order. In other words, if you truly require transactional consistency, this is a layer we need to build on top of our current atomic reads/writes.
@dakoner for the interactive “in the REPL” use case where one does not really care about absolute performance or consistency issues, I agree it would be a nice to support this (which would do the defensive thing and open / close tiledb resources every read write which is somewhat expensive but not too bad for the interactive use case). This issue is being tracked by https://github.com/TileDB-Inc/TileDB-Py/issues/58 so hopefully we can address in an upcoming release.