Pandas dataframe examples?

Do you have any examples/sample code that demonstrates best practices for moving data between tiledb arrays and pandas DataFrame objects? In particular, how to best accommodate categorical Series and numpy types that are not directly supported by tiledb (eg, np.bool).

2 Likes

A few more questions related to this topic.

My goal is to move DataFrames in/out of tiledb and preserve the typing information that was in the original (source) DataFrame. Most unsupported types (eg, np.bool) are trivially converted to/from a tiledb type (np.bool to np.uint8) for writing, but reading loses this type information.

One possibility is to store this type information somewhere, and use it when constructing the DataFrame at read time (eg, using np.astype()). The recently landed array metadata looks ideal for this scenario, but is still WIP for Python API (PR #213). Suggested alternative?

1 Like

+1. Some example code would be very helpful for pandas users.

1 Like

For loading and parsing 1D multiattribute TileDB arrays into dataframes Iā€™m currently using:

def tiledb_to_df(tiledb_arr, idx_col='datetime'):
    idx = tiledb_arr.pop(idx_col)
    df = pd.DataFrame(tiledb_arr, index=idx)
    
    return df

with tiledb.open(array_dir.replace('/', '\\'), 'r') as TileDB_array:
    tiledb_arr = TileDB_array[:]

df = tiledb_to_df(tiledb_arr)

Just found some demo code from the developers:

1 Like