I’m evaluating TileDB for storage of dense 3D model output arrays. Only 16 bit precision is required and by converting our float64 input data to int16 we get much better compression ratios and the on disk size of the TileDB ends of being ~15% of original.
The builtin scaling of xarray/netCDF handles this precision reduction well but I don’t see anything similar in TileDB.
Is there a way to have TileDB handle this, or is there a good pattern for storing the scale factor in metadata? Right now I’m doing the transformation at the edge of read/write to TileDB.
We don’t have a rescaling transformation like this built in right now, but this is an interesting point that also comes up in areas like medical imaging.
We can push this down to the storage level as a tile filter, similar to the existing bit-width filter (which does do a rescale, but it only supports int types and is automatic except for the window size). This would be specified as part of the array schema, and would look something like:
rescale_filter = tiledb.RescaleFilter(target=np.Int16, scale_factor=...)
Would that work for your data? I mentioned this to @stavros, and I think he wants to follow up and make sure we can implement this in a way that is flexible enough for your needs as well as to support similar use-cases. Would you mind to shoot us an email at
I was wondering if a Filter might be the solution. The RescaleFilter you propose looks like it would fit our needs well. I’ve followed up on email as you suggest as well.
Lossy compression filters in general are interesting here. The precision reduction seems to really help downstream compression filters to work well. I did try the bit width filter but in my test it didn’t have the drastic improvement in file size that manually reducing the data type to int16 did, but I’ll get some harder numbers.