S3 list after create consistency

Reading up on S3’s consistency model (https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel), I see that the list after create is actually eventual consistent only. Is this going to cause problem with TileDB’s current implementation? For instance, it looks like when it’s reading a fragment for a column attribute it looks like it would write the fragments with timestamps in their names. Thus, I’d imagine that you’ll need to do a “list” operation in order to see what’s the latest fragment to read.

Say if I have job A that update an array and job B depends on job A’s data, is there a case where job B would not be able to read the data produced by A since the s3 listing would not have A’s fragments show up?

Similar to S3, TileDB offers eventual consistency as well. Please see these docs for details on the consistency model of TileDB. Therefore, S3’s consistency model does not create problems in TileDB, as long as one understands that TileDB is eventually consistent as well.

Having said that, upon finalization of the write process, TileDB waits until all the created objects of the new fragment become visible for a “reasonable” time. Specifically, knowing that S3 may delay arbitrarily the propagation of the objects, TileDB makes 1000 attempts with a 100-millisecond wait between the attempts (these are internal, non-exposed config parameters) to list the new objects.

If job B depends on job A’s data and you want to guarantee that job B reads job A’s data, we will need to expose the above parameters to you (via the tiledb_config_t, Config objects), so that you can set them to arbitrarily large values in order to make sure that all objects of job A are indeed propagated before job B starts (in other words, job A will block until the write “truly commits”).

If exposing those parameters would be useful to you, please open an issue on the TileDB github repo and we will get it done. Please let us know if you have any more questions.

As an additional remark, please note that we have never experienced problems with writes - S3 seems to propagate new objects very quickly. However, we have experienced problems with deletions, i.e., S3 seems to take a long time to render a deleted object invisible to the list operation.

In the distributed case you also additionally have to make sure that the list after list-write-read is visible across all readers as well as I don’t believe that there is any guarantee that if one reader can list the “committed” update then all readers necessarily can.

AWS services get around this by having a architecture where transactional listing is consolidated, ex. updates are polled from a dynamodb instance and then the object paths transactionally updated for consistent prefix listing.

1 Like