Geospatial Workflow Question

diehl · January 25, 2023, 6:34am

Hello all,

I have a set of vector geometries representing land management conditions over the Western U.S. and I need to transform these vector layers into a multidimensional raster for further analysis. I would love to have a workflow in general where I can build and persist 3-dimensional raster stacks as dense arrays in TileDB and load them later for analysis and visualization. From the documentation, I’m not immediately seeing how rasters might be saved with their associated geospatial context, allowing for the raster to be georegistered properly on a map. Could someone clarify if there’s a logical workflow that goes from vector layers → raster layers as dense multidimensional array → TileDB → in-memory analysis → geovisualization. I’ve seen references online to tighter TileDB <–> xarray integration so I assume that is an important piece in this hypothetical chain. Any pointers would be greatly appreciated! I would love to embrace TileDB if it’s the right fit for this type of geospatial data science!

Chris

Nick_Kules · January 27, 2023, 6:19pm

Hi Chris,
Currently rasters (dense arrays) are saved with integer indexes, while other data types (such as point clouds or point vectors) can be saved as floating point dimensions and retain native coordinate systems. We do have an upcoming feature called Dimension Labels that will allow inline floating point conversions on dense arrays. Currently the best method for maintaining geospatial context is to save out the geotransform information.

If you are creating the rasters with an integrated tool such as GDAL, it will automatically save the geotransform into the _gdal metadata key. But if you’re creating the arrays yourself, you can use Rasterio or GDAL to read the geotransform from the input, and you can write that to a TileDB metadata tag.

You can then use this geotransform to convert your query from geospatial coordinates to the integer indices, as well as to convert the retrieved data for in memory analysis. So if you were doing the array creation yourself, it would look something like: Read raster data → create array schema → compute geotransform and write to metadata (can be done later) → write to array. Then your analytics later would be: Read geotransform metadata tag → query from computed indices → apply geotransform to x/y coordinates (if necessary for analysis or visualization) → in memory analysis or visualization. We do have integrations with several visualization tools such as folium, rasterio, and 3D visualizations in BablyonJS.

If you need some code examples I am happy to help.

Some info on geotransforms:
https://gdal.org/tutorials/geotransforms_tut.html
https://rasterio.readthedocs.io/en/latest/api/rasterio.transform.html

diehl · January 28, 2023, 4:04am

Thanks so much @Nick_Kules! I am using GDAL to create the rasters at present, and I’m still working out what options I have in terms of realizing a pipeline that takes me from a stack of vector geometries to a multidimensional raster representation of the vector geometries at a prescribed resolution. To start, I’ve got Python code that uses gdal.rasterize to create a GeoTIFF representation of each vector geometry. I just installed xarray and discovered how to load a GeoTIFF into an xarray.DataArray and recreate the GeoTIFF from the DataArray, confirming the geo transform is encoded in the DataArray. Is there a way to go directly from GDAL straight into TileDB, without needing to leverage xarray? In other words, can I build the multidimensional raster directly in TileDB through a series of gdal.rasterize calls and then load the complete dataset from TileDB? That would be amazing.

Nick_Kules · February 6, 2023, 7:47pm

Chris there are a few options that you can take. In short yes you should be able to build it directly without having to use xarray. If you are creating a bunch of individual tiffs with the gdal rasterize calls, you can then create a GDAL VRT (essentially a virtual mosaic of them), and use that VRT as an input for GDAL to create a TileDB array.

Depending on how you want the data structured, you might have to play around with the VRT settings. If you are creating one raster for every geometry, and want to keep those with unique identifiers, you can set them as different raster bands in the VRT, which will then keep that unique identifier in TileDB. If you are fine with it all being merged into one flat raster then you wouldn’t need to do anything. Alternatively you can use Rasterio instead of GDAL.

Information on building GDAL VRT: gdalbuildvrt — GDAL documentation
TileDB GDAL info: GDAL - TileDB Geospatial

Let me know if you have any additional questions, if I misunderstood anything, or come across any blockers while trying to implement.

diehl · February 7, 2023, 2:11am

Brilliant Nick - thank you! Let me dig into this further and circle back with any questions. You got it exactly right. I want to maintain each layer independently and was thinking about storing them as different raster bands.

Topic		Replies	Views
Managing Large Geospatial Arrays with TileDB	3	702	September 29, 2023
Is there OGR support?	3	839	April 28, 2020
Xarray <--> TileDB	4	2356	November 22, 2021
GDAL/Rasterio examples from docs not working	1	942	July 3, 2020
Options for geospatial visualizations	5	1227	September 28, 2020

Geospatial Workflow Question

Related topics