Weird behavior with variable length attributes

Hi,

I am experimenting with variable length attributes using Python bindings. I am experiencing some unexpected behaviors. For instance, if all the subarrays have the same length, I get a buffer conversion error.

# Array creation
ctx = tiledb.default_ctx()

dim = tiledb.Dim(name="dimension", domain=(0, 10000), tile=100, dtype=np.int64)
domain = tiledb.Domain(dim)

attr = tiledb.Attr(name="attr", dtype=np.int64, var=True)

schema = tiledb.ArraySchema(domain=domain, sparse=False, attrs=[attr])
tiledb.Array.create(array_path, schema, overwrite=True)
# fails if all subarrays have the same length here
x = [
    np.array([0,1,2,3], dtype=np.int64),
    np.array([0,1,2,3], dtype=np.int64),
] 

with tiledb.open(array_path, 'w') as A:
    A[0 : len(x)] = np.array(x, dtype='O')

And then I get TileDBError: Failed to convert buffer for attribute: 'attr'.
But, even if one sub-array has a different size, it works.

And also if only one sub-array is provided, regardless of its content, I get the same buffer conversion error.

# fails
x = [
    np.array([0,1,2,3], dtype=np.int64)
]

with tiledb.open(array_path, 'w') as A:
    A[0 : len(x)] = np.array(x, dtype='O')

Am I missing something here or maybe it is a bug?

Thanks.

Hi @yortuc,

This is a known limitation where Numpy coalesces homogenous subarrays.

My recommended workaround would be to add None at the end of x and slice it away when writing to the array.

x = [
    np.array([0,1,2,3], dtype=np.int64),
    np.array([0,1,2,3], dtype=np.int64),
    None
] 

with tiledb.open(array_path, 'w') as A:
    A[0 : len(x)-1] = np.array(x, dtype='O')[:-1]

Thanks.

2 Likes

Hi,

It works, thanks a lot.