Optimizing TileDB Query Performance for GEDI Data

Simon_Besnard · February 27, 2025, 1:48pm

Hi everyone,

I’m working with a TileDB array to store and query GEDI data. The array includes spatial (latitude, longitude) and temporal dimensions, with variables stored as attributes. I want to get feedback on two aspects:

TileDB Schema: Does the structure of my TileDB array make sense for efficient querying?
Query Optimization: Am I reading the data efficiently, or are there improvements I could make (e.g., indexing strategies, query execution optimizations, parallel reading)?

Here’s the way to look at my TileDB schema:

import tiledb
import os


# S3 TileDB context
tiledb_config = tiledb.Config(
    {
        "vfs.s3.endpoint_override": "https://s3.gfz-potsdam.de",
        "vfs.s3.region": "eu-central-1",
        "vfs.s3.no_sign_request" : "true"
    }
)


ctx = tiledb.Ctx(tiledb_config)

# Read TileDB schema
bucket = "dog.gedidb.gedi-l2-l4-v002"
array_uri = os.path.join(f"s3://{bucket}", "array_uri")

with tiledb.open(array_uri, mode="r", ctx=ctx) as array:
    print(array.schema)

Below is an example of how I would query the data.

import tiledb
import os

# S3 TileDB context

tiledb_config = tiledb.Config(
    {
        "vfs.s3.endpoint_override": "https://s3.gfz-potsdam.de",
        "vfs.s3.region": "eu-central-1",
        "vfs.s3.no_sign_request" : "true"
    }
)

ctx = tiledb.Ctx(tiledb_config)

# Path to the tileDB array
bucket = "dog.gedidb.gedi-l2-l4-v002"
array_uri = os.path.join(f"s3://{bucket}", "array_uri")

# Define query parameters
attr_list =  ["agbd"]
lat_min = -17.140088
lat_max =  -17.094909
lon_min =  145.606605 
lon_max = 145.653595
start_time =  17532
end_time =  19929

# Read the data
with tiledb.open(array_uri, mode="r", ctx=ctx) as array:
   
    query = array.query(attrs=attr_list)
    data = query.multi_index[
        lat_min:lat_max, lon_min:lon_max, start_time:end_time
    ]

As an indication, this is a visualisation of the fragment’s structure in my tileDB array.

I would appreciate any insights on whether my approach is well-optimized or if there are ways to improve it!

Thanks in advance!

Simon.

Topic		Replies	Views
Optimal TileDB Structure and Querying Procedure for Spaceborne Lidar (GEDI) Data	0	42	October 31, 2024
Data structure for Lidar data	2	90	October 31, 2024
How to speed up the reading from tiledb	5	1980	October 8, 2020
Best strategy for writing and consolidating GEDI Data with orbit-based characteristics	0	48	November 21, 2024
Is TileDB a good fit for my use-case? Help wanted	5	1544	March 31, 2022

Optimizing TileDB Query Performance for GEDI Data

Related topics