Hey,
I’m getting an error when trying to use query conditions on a U type attr.
Here is code snippet to replicate the issue:
import pandas as pd
import tiledb
from tiledb import QueryCondition
data = [
["str","str"]
]
df = pd.DataFrame(data, columns = ["Stype","Utype"])
df.Stype = df.Stype.astype("S0")
uri = "/tmp/test"
tiledb.from_pandas(uri, dataframe=df)
with tiledb.open(uri) as A:
print(A.attr(0))
print(A.attr(1))
with tiledb.open(uri) as A:
qc = QueryCondition("Stype=='str'")
A.query(attr_cond=qc).df[:]
"""
No errors
"""
with tiledb.open(uri) as A:
qc = QueryCondition("Utype=='str'")
A.query(attr_cond=qc).df[:]
"""
Traceback (most recent call last):
File "/homefolder/roya/anaconda3/envs/tiledbclient/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3397, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-16-27f929b3f9ff>", line 3, in <cell line: 1>
A.query(attr_cond=qc).df[:]
File "/homefolder/roya/anaconda3/envs/tiledbclient/lib/python3.9/site-packages/tiledb/multirange_indexing.py", line 210, in __getitem__
return self if self.return_incomplete else self._run_query()
File "/homefolder/roya/anaconda3/envs/tiledbclient/lib/python3.9/site-packages/tiledb/multirange_indexing.py", line 341, in _run_query
self.pyquery.submit()
tiledb.cc.TileDBError: [TileDB::QueryCondition] Error: Value node non-empty attribute may only be var-sized for ASCII strings: Utype
"""
The "U0" Numpy dtype maps internally to TILEDB_STRING_UTF8 which is not supported for query conditions. In order to set the attribute dtype to TILEDB_STRING_ASCII in from_pandas, use the column_types argument and map the attribute to "ascii". I’ve also modified your code to set Stype to "S0" (TILEDB_CHAR) using this method too.
import pandas as pd
import tiledb
from tiledb import QueryCondition
data = [["str", "str"]]
df = pd.DataFrame(data, columns=["Stype", "Utype"])
uri = "/tmp/test"
tiledb.from_pandas(
uri,
dataframe=df,
column_types={"Stype": "S0", "Utype": "ascii"},
)
with tiledb.open(uri) as A:
print(A.attr(0))
print(A.attr(1))
# no longer errors out
with tiledb.open(uri) as A:
qc = QueryCondition("Stype=='str'")
A.query(attr_cond=qc).df[:]
qc = QueryCondition("Utype=='str'")
A.query(attr_cond=qc).df[:]
Hey @nguyenv,
Thank you for your answer. That is what I did. I had some problems with some attributes as they contained unicode characters, but that wasn’t a big issue.