Reducing precision

jsheedy · January 20, 2020, 5:56pm

I’m evaluating TileDB for storage of dense 3D model output arrays. Only 16 bit precision is required and by converting our float64 input data to int16 we get much better compression ratios and the on disk size of the TileDB ends of being ~15% of original.

The builtin scaling of xarray/netCDF handles this precision reduction well but I don’t see anything similar in TileDB.

Is there a way to have TileDB handle this, or is there a good pattern for storing the scale factor in metadata? Right now I’m doing the transformation at the edge of read/write to TileDB.

ihnorton · January 21, 2020, 3:25pm

Hi @jsheedy,

We don’t have a rescaling transformation like this built in right now, but this is an interesting point that also comes up in areas like medical imaging.

We can push this down to the storage level as a tile filter, similar to the existing bit-width filter (which does do a rescale, but it only supports int types and is automatic except for the window size). This would be specified as part of the array schema, and would look something like:

rescale_filter = tiledb.RescaleFilter(target=np.Int16, scale_factor=...)
...
tiledb.Attr(..., filters=FilterList([rescale_filter]))

Would that work for your data? I mentioned this to @stavros, and I think he wants to follow up and make sure we can implement this in a way that is flexible enough for your needs as well as to support similar use-cases. Would you mind to shoot us an email at hello@tiledb.com?

ihnorton · January 21, 2020, 3:31pm

I also posted on our feature request tracker: https://feedback.tiledb.com/tiledb-core/p/support-rescaling-as-a-tile-filter

jsheedy · January 21, 2020, 5:45pm

I was wondering if a Filter might be the solution. The RescaleFilter you propose looks like it would fit our needs well. I’ve followed up on email as you suggest as well.

Lossy compression filters in general are interesting here. The precision reduction seems to really help downstream compression filters to work well. I did try the bit width filter but in my test it didn’t have the drastic improvement in file size that manually reducing the data type to int16 did, but I’ll get some harder numbers.

Topic		Replies	Views
Xarray <--> TileDB	4	2334	November 22, 2021
Seeking Advice to Optimize Writing Speed for Large Int16_t Arrays	2	216	April 22, 2024
Tiledb performance with sparse point cloud data	8	907	March 23, 2023
How to speedup reading form a sparse TileDB array	3	632	February 21, 2023
Managing Large Geospatial Arrays with TileDB	3	683	September 29, 2023

Reducing precision

Related topics