How would you expire data in a dataset?

tinodb · September 26, 2023, 6:34pm

We are downloading tiles of DEM data, and we’re allowed to cache those for a month. We do need to do some stitching, so it seems that TileDB would come in handy there! I do wonder how we could deal with the requirement to expire such a downloaded tile in time?

Note, it could be that the we download a single tile at day x, and then expand the coverage area by downloading more tiles that could overlap a couple of days later.

Nick_Kules · September 26, 2023, 8:57pm

Hi Tinodb,

The best solution for this use case would be to leverage our timestamp feature. Natively TileDB supports time travel as a timestamp is associated with each write. This enables writing a custom timestamp value that you can then explicitly read, or in your case delete \ remove from the array.

In your specific case of a downloaded (DEM) raster tile, you can specify a unique timestamp integer value when writing that data to a TileDB array. Do you have information on how you are writing these DEM tiles to TileDB arrays? Also what language are you working in? Below I will list python API examples but we do have multiple APIs.

This example shows how you can assign a custom timestamp value when writing: Writing at a Timestamp - TileDB Embedded Docs

Then in regards to expiring that dataset, you can delete the fragment associated with that timestamp (or timestamp range): TileDB Python API Reference — TileDB-Py 0.6.5 documentation

In regards to your comments about stitching, adding more tiles/expanded data. TileDB natively does not overwrite the previous data, any new DEM data written will exist in new fragments. On read, TileDB will natively read the latest timestamp for any pixel. Also on read you can specify reading from a specific timestamp or timestamp range. So in your case of stitching data together, you can keep writing new / expanded data to the same array (as long as its still encompassed in the array domain), and you can just use the default read to always get the latest data. You can then just keep the underlying data indefinitely, or “expire” using fragment deletion by timestamp as discussed above.

Let me know if this answers your need. If you have any other questions we are happy to help.

Topic		Replies	Views
Appending/removing data along a time dimension	4	1626	December 13, 2019
How to consolidate to a certain timepoint	3	707	July 17, 2020
Long Running Archives with TileDB	3	262	April 2, 2024
Nonvalid / Inconsistent Timestamp Records	3	1045	December 11, 2018
Optimizing the reads for sparse arrays	9	761	June 27, 2023

How would you expire data in a dataset?

Related topics