I’ve been trying something and may have found a bug (probably in the Python wrapper), but I’m very new to TileDB so I may be doing something wrong.
I’m trying to do the following:
- Create and write an array
- Add an attribute to the array schema
- Write new data to the array
- Read the ‘old’ version of the array
However, this seems to crash the library without any error message or stacktrace. So my question is, is this something that should be supported, or is it simply an edge case that the Python wrapper should check for and not allow?
Here is my code:
import pandas as pd
import numpy as np
import tiledb as td
# Flag to add extra attribute or not
add_attr = True
index = list('xyz')
columns = list('abc')
df1 = pd.DataFrame(
np.random.rand(len(index), len(columns)),
index=index,
columns=columns,
)
try:
print('Creating initial array')
td.from_pandas(
'temp-array', df1,
sparse=True, allows_duplicates=False,
full_domain=True,
)
print('Check fragment info')
fragments_info = td.array_fragments('temp-array')
print(fragments_info)
print('Reading array')
with td.open('temp-array') as array:
array_df = array.df[:]
print(array_df)
pd.testing.assert_frame_equal(array_df, df1)
if add_attr:
print('Add a column/attribute')
columns += ['d']
se = td.ArraySchemaEvolution(td.default_ctx())
se.add_attribute(td.Attr('d', dtype=np.float64))
se.array_evolve('temp-array')
print('Rewrite array')
df2 = pd.DataFrame(
np.random.rand(len(index), len(columns)),
index=index,
columns=columns,
)
td.from_pandas(
'temp-array', df2,
mode='append',
)
print('Check fragment info')
fragments_info = td.array_fragments('temp-array')
print(fragments_info)
# Get the array timestamps
(t1, _), (t2, _) = fragments_info.timestamp_range
print('Reading newly written array')
with td.open('temp-array', timestamp=t2) as array:
array_df = array.df[:]
print(array_df)
pd.testing.assert_frame_equal(
array_df, df2,
check_names=False,
)
print('Reading previously written array')
with td.open('temp-array', timestamp=t1) as array:
array_df = array.df[:]
print(array_df)
pd.testing.assert_frame_equal(
array_df, df1,
check_names=False,
)
finally:
print('Removing temp-array')
vfs = td.VFS(ctx=td.default_ctx())
vfs.remove_dir('temp-array')
If I run it with add_attr = False
everything goes right, but if I run it like this I get the following output:
> python tiledb-test.py
Creating initial array
[2022-07-10 18:00:01.851] [Process: 3232] [error] [Global] [TileDB::Array] Error: Cannot open array; Array does not exist
Check fragment info
{'array_schema_name': ('__1657468801855_1657468801855_2dd8fc6d327c43edbd7bce189e19f68c',),
'array_uri': 'temp-array',
'cell_num': (3,),
'has_consolidated_metadata': (False,),
'nonempty_domain': ((('x', 'z'),),),
'sparse': (True,),
'timestamp_range': ((1657468801894, 1657468801894),),
'to_vacuum': (),
'unconsolidated_metadata_num': 1,
'uri': ('[...]/temp-array/__fragments/__1657468801894_1657468801894_cb3b1a70af694bf0bc6ed35dcd56822c_14',),
'version': (14,)}
Reading array
a b c
x 0.211619 0.604100 0.395342
y 0.031285 0.098245 0.432116
z 0.591643 0.436595 0.026439
Add a column/attribute
Rewrite array
Check fragment info
{'array_schema_name': ('__1657468801855_1657468801855_2dd8fc6d327c43edbd7bce189e19f68c',
'__1657468801989_1657468801989_6671225cdce9450bbe34fc78d38f4c06'),
'array_uri': 'temp-array',
'cell_num': (3, 3),
'has_consolidated_metadata': (False, False),
'nonempty_domain': ((('x', 'z'),), (('x', 'z'),)),
'sparse': (True, True),
'timestamp_range': ((1657468801894, 1657468801894),
(1657468802006, 1657468802006)),
'to_vacuum': (),
'unconsolidated_metadata_num': 2,
'uri': ('[...]/temp-array/__fragments/__1657468801894_1657468801894_cb3b1a70af694bf0bc6ed35dcd56822c_14',
'[...]/temp-array/__fragments/__1657468802006_1657468802006_0908ddb858484032b7310a4dbcad101b_14'),
'version': (14, 14)}
Reading newly written array
a b c d
x 0.694410 0.344363 0.308965 0.002191
y 0.290289 0.367782 0.621800 0.914875
z 0.065450 0.602449 0.579679 0.882256
Reading previously written array
As you can see it crashes without any message when trying to read at the old timestamp.
(Also not sure why it gives an error when creating the array while everything works out in the end?)