Dataframe with multidimensional values

Is there anyway I can make the below work? e.g. by changing dtype or suggesting a dtype?

    COLUMNS = ["name", "ages"]; data = [["Raj", [31, 32]], ["Jay", [21, 22]]]
    df = pd.DataFrame(data=data, columns=COLUMNS)

    uri = "tile-data.tiledb"
   
    tiledb.from_pandas(uri, df)

The error I get is:

Traceback (most recent call last):
  File "trial.py", line 24, in <module>
    tiledb.from_pandas(uri, df)
  File "/opt/conda/lib/python3.8/site-packages/tiledb/dataframe_.py", line 452, in from_pandas
    column_infos = _get_column_infos(
  File "/opt/conda/lib/python3.8/site-packages/tiledb/dataframe_.py", line 176, in _get_column_infos
    column_infos[name] = ColumnInfo.from_values(column, varlen_types)
  File "/opt/conda/lib/python3.8/site-packages/tiledb/dataframe_.py", line 100, in from_values
    raise NotImplementedError(
NotImplementedError: mixed inferred dtype not supported

I want to be able to use that within the context of a ML application where I load by β€˜row’ or batch. Something like

 # Ingest
    with tiledb.open(uri) as tiledb_array:
        print(tiledb_array[0]["name"]) # should give me 'Raj'
        print(tiledb_array[0]["ages"]) # should give me [31, 32]
   

Hi Rajiv,

Thanks for bringing this to our attention. We currently do not support mixed dtypes with the from_pandas and df functions but have added the feature request in our tracker. In the meantime, you may populate and read your data in a TileDB array by using example below as a guide:

import tiledb
import numpy as np

uri = "tile-data.tiledb"

# create the schema.
domain = tiledb.Domain(tiledb.Dim(domain=(0, 10000), tile=100, dtype="uint64"))
attrs = [
    tiledb.Attr(name="name", dtype=str, var=False, nullable=False),
    tiledb.Attr(name="ages", dtype=np.int64, var=True),
]
schema = tiledb.ArraySchema(domain=domain, attrs=attrs)

# create the array.
tiledb.DenseArray.create(uri, schema)

# write to the array.
with tiledb.open(uri, "w") as tiledb_array:
    tiledb_array[0:2] = {
        "name": np.array(["Raj", "Jay"]),
        "ages": np.array([np.array([31, 32]), np.array([21, 22, 24])], dtype="O"),
    }

# read from the array.
with tiledb.open(uri) as tiledb_array:
    print(tiledb_array[0]["name"]) # 'Raj'
    print(tiledb_array[0]["ages"]) # [31, 32]

Please let us know if you have any more questions.

Vivian

1 Like

Hi @nguyenv
I can definitely try that. Thanks!