Trouble with Rasterio ingestion

Hi everyone. I was interested in using TileDB for geospatial purposes, so i went through the docs in the appropriate section. I tried ingesting a Sentinel 2 image using python as described in . However, i found myself unable to complete the operation: the files are created and data inserted; however, whenever i try to open the db using rasterio or Gdal, i obtain the following error:

rasterio.errors.RasterioIOError: [TileDB::Query] Error: Subarray out of bounds. subarray: [1, 1, 0, 1023, 0, 1023] domain: [0, 0, 0, 1023, 0, 1023]

I’ve also noticed that ingestion through CLI works fine, but there’s a difference in the schema produced by executing python code: more precisely, the dimension related to image bands in the schema of the db obtained by cli is of the type:


  • Name: BANDS
  • Domain: [1,4]
  • Tile extent: 1

While the schema produced by python code in the docs produce:


  • Name: BANDS
  • Domain: [0,3]
  • Tile extent: 1

See how the range of the domain is shifted by “1”. I tried to replicate the fact by changing the domain range in code, but in this case i obtain another error:
IndexError: index out of bounds

in this line of code :

which is exactly the one reported in the docs.

Anyone succeeded in such ingestion? Or am I missing something? I’ve even tried with simpler images, like the one proposed in the Gdal tutorial (the geotiff).

Thanks in advance.

Hi @loplace,

Thank you for trying TileDB with Rasterio and for your detailed report. This is a bug that was fixed some time ago - Are you running an older version of GDAL?

We do have a docker image created that you can use -, please pull the latest image.

I just ran the following commands in this container to test rasterio/gdal compatibility;

gdal_translate -OF TileDB UTM2GTIF gdal_array
rio info gdal_array
rio convert -f TileDB UTM2GTIF.TIF rio_array
gdalinfo rio_array

Please let us know if you have any further issues or want to discuss other applications with Sentinel2 data.

Hi Norman, thanks a lot for your kind answer and I’m sorry for my late reply. GDAL version shoul be the last one (3.0.4 at the moment if i’m not mistaken) and I still have some problems (actually, even creating the array with dask leads in failure, entering in a loop trying to create the array continuously with the following error:
tiledb.libtiledb.TileDBError: [TileDB::StorageManager] Error: Cannot create array; Array ‘mytiledb’ already exists. I’ll try to investigate if there are some conflicts or version mismatches in the libraries.

In the meantime, I would gladly accept your offer to discuss some applications with Sentinel2 data. To put it simple, I’m interestested in both how to deal with images taken from different areas and overlapping images (for example, the same area captured during two different acquisitions, or portions of tiles obtained by two different orbits that overlap). I was interested in TileDB to understand which would be an efficient configuration of the schema to work with this different cases, and if the cloud mask could be ingested somehow. Yeah, a lot of stuff I l know, but I’m still pretty new in geospatial services. Do you have some ready examples and/or suggestions that could help me?

@loplace Could you try your application using the docker image as this issue with rasterio and gdal was fixed but we can re-open the bug and look into it if the container is not working for you.

I have used Dask and Rasterio to create arrays in the way you have outlined and we can discuss this further. The Rasterio driver does not overwrite an existing array so you will need to delete the array first before creation.

Please contact me directly norman<@> and we can further discuss your application with Sentinel2.