Reading tiledb array from S3 resulting in `Non-empty domain: None`

I have two arrays stored in S3, for some reason I’m able to read from one, while the other reports a domain of None

tiledb.open("s3://some_weird.tiledb") as arr:
arr.nonempty_domain() => Non-empty domain: None

tiledb.open("s3://some_working.tiledb") as arr:
arr.nonempty_domain() => Non-empty domain: ((1, 2), (1, 2))

Looking at the respective files in S3 I can see that they have a different internal shape/structure, other than that I’m at a loss.
I’m using tiledb-py v0.12.2 and tiledb v2.6.2.

Thanks,
Rowan

I should also mention that the the file that I can’t read is in a protected bucket, but the one I can read is in a public bucket. I’m passing through a ctx option with the relevant credential set and I’m pretty sure that’s working, because if I remove that I get an Error message: Access Denied .

Hi @rowanwins,

We’ve fixed a bug in newer versions of TileDB-Py to return an error for nonempty_domain() with all schemas – previously there were some cases where the call would silently fail and return None. If you are able to try a newer version, then you may get a better indication of the source of the error.

One potential source of such an error would be permission limitations on the protected bucket. For example: if you have read permission but not list permission then the initial open would succeed (tiledb reads the array schema directly, first), but subsequent calls would fail due to lack of permission to list sub-prefixes containing fragment files (and this may silently fail with None for the nonempty_domain call, as noted above).

Would you mind to try the following with both the protected and unprotected bucket, which will give us a baseline to debug:

import tiledb, numpy as np

uri1 = "s3:// protected_bucket / ... "
uri2 = "s3:// unprotected_bucket / ..."

data = np.random.rand(4)

tiledb.from_numpy(uri1, data)
tiledb.from_numpy(uri2, data)

with tiledb.open(uri1) as A:
  print("uri1 schema: ", A.schema)
  print("uri1 ned: ", A.nonempty_domain())
  print("uri1 data: ", A[:])

with tiledb.open(uri2) as B:
  print("uri2 schema: ", B.schema)
  print("uri2 ned: ", B.nonempty_domain())
  print("uri2 data: ", B[:])

Best,
Isaiah

Hi @ihnorton ,

Thanks for the assistance. I’ve checked the AWS permissions and they seem ok.

Running your suggested on the protected array returns the following so it’s clearly finding something…

uri1 schema:  ArraySchema(
  domain=Domain(*[
    Dim(name='X', domain=(-1.7976931348623157e+308, 1.7976931348623157e+308), tile='None', dtype='float64', filters=FilterList([ZstdFilter(level=16), ])),
    Dim(name='Y', domain=(-1.7976931348623157e+308, 1.7976931348623157e+308), tile='None', dtype='float64', filters=FilterList([ZstdFilter(level=16), ])),
  ]),
  attrs=[
    Attr(name='timestamp', dtype='datetime64[ns]', var=False, nullable=False, filters=FilterList([ZstdFilter(level=16), ])),
    ...
    Attr(name='region_code', dtype='<U0', var=True, nullable=False, filters=FilterList([ZstdFilter(level=16), ])),
  ],
  cell_order='hilbert',
  tile_order=None,
  capacity=100000,
  sparse=True,
  allows_duplicates=True,
  coords_filters=FilterList([ZstdFilter(level=-1)]),
)

uri1 ned:  None

uri1 data:  OrderedDict(('timestamp', array([], dtype='datetime64[ns]')), ..., ('region_code', array([], dtype='<U1')), ('X', array([], dtype=float64)), ('Y', array([], dtype=float64))])

I’m about to try and setup a fresh conda environment with the upgraded tiledb dependencies and so I’ll see if that yields any further info in the error messages and will report back.

Thanks
Rowan

Ok so I’ve setup a fresh conda env with tiledb v2.9.1 and tiledb-py v0.15.1 and the read is now working for the protected bucket :tada:

So I’ll have a go at upgrading those dependencies in my bigger stack and :crossed_fingers: I don’t get any clashes. My bigger stack uses pdal and gdal so it’s a more complicated env.

1 Like

Hi @rowanwins

This should not get more complicated within a GDAL / PDAL stack, reach out to us with any questions, we run this stack and others in our TileDB Cloud geo image.

Norman