Debugging Segmentation Fault While Loading an Array

af-gh · March 14, 2023, 4:22am

The following code crashes with a seg fault:

In [1]: import tiledb

In [2]: loaded = tiledb.open('/path/to/my/data')

In [3]: loaded.df[:]
Segmentation fault (core dumped)

Are there any suggested approaches to debugging this without building the c lib (using metadata available in python)?

Alternatively, I will be happy to see suggested approaches for incremental construction of time-series data with schema evolution.

Steps to reproduce the array:

import numpy as np
import pandas as pd
import tiledb

array_uri = '/path/to/my/data'

days = pd.date_range('2021-01-01T10:00:00', '2021-01-10')
cols = [f'COL_{i}' for i in range(5)]
data = pd.DataFrame(np.random.randn(len(days), len(cols)), index=days, columns=cols)
data.index.name = 'date'
tiledb.from_pandas(array_uri, data.reset_index(), index_col=[0], sparse=False, debug=True)

loaded = tiledb.open(array_uri)
days = pd.date_range('2021-01-10T10:00:00', '2021-01-15')
cols = cols + ['COL_5']
data = pd.DataFrame(np.random.randn(len(days), len(cols)), index=days, columns=cols)
data.index.name = 'date'

ctx = tiledb.default_ctx()
se = tiledb.ArraySchemaEvolution(ctx)
se.add_attribute(tiledb.Attr(data.columns[-1], dtype=data[data.columns[-1]].dtype))
se.array_evolve(array_uri)
tiledb.from_pandas(array_uri, data.reset_index(), index_col=[0], mode='append', row_start_idx=loaded.nonempty_domain()[0][1] + 1)

loaded = tiledb.open(array_uri)
loaded.df[:]

ihnorton · March 14, 2023, 1:43pm

Hi @af-gh,

This is a defect. We’re looking in to it – thanks for the excellent reproduction.

Best,
Isaiah

ihnorton · March 14, 2023, 6:36pm

Hi @af-gh,

This issue is fixed by PR 3970, just merged thanks to @KiterLuc. We’ll do a TileDB 2.15.1 release by Monday next week including this fix.

Best,
Isaiah

af-gh · March 14, 2023, 8:10pm

Thank you for the quick turnaround. Looking forward to it.

Topic		Replies	Views
Reading array at previous timestamp after schema evolution	2	541	July 11, 2022
TileDBError: Error: Internal TileDB uncaught exception; std::bad_alloc	2	1205	July 4, 2020
Basic from_pandas usage problems	2	907	April 12, 2022
Pandas dataframe examples?	4	2216	October 21, 2020
Improved performance	2	1409	December 28, 2020

Debugging Segmentation Fault While Loading an Array

Related topics