Hi again,
I’m having issues reading out a particular value range.
My array goes from price values of $0 to $70’000, but I cannot read out the range between $5’500 and $7’000 in one go. The only way to read this range is iteratively in smaller chunks (i.e. $250 increments) and then concatenate the results, but this is not ideal as it takes much longer.
Things I’ve tried:
- Read out iteratively and save to a new array
- Optimize new array in terms of tile layout and tile capacity
Unfortunately to no avail… The error persists, but really only in this narrow range.
You can find the data here (~1.5 GB, 300 M data points).
Code that produces the error:
import tiledb
import pandas as pd
from pathlib import Path
import numpy as np
def from_tileDB2(p1,p2,sdir,pair):
with tiledb.open(os.path.join(sdir,f"{pair}")) as A:
data = A[p1:p2,:]
df = pd.DataFrame({"price":np.array(data["coords"]["price"],dtype=np.float64),
"date":np.array(data["coords"]["date"],dtype='datetime64[ns]'),
"data":np.array(data["data"],dtype=np.float64)}).set_index("price")
return df
# Source Dir
sdir = Path(r"Your_Path")
# Array Name
pair = "btcusdt2"
# Price Range
p1 = 5500
p2 = 7000
# Array Query
df = from_tileDB2(p1,p2,sdir,pair)
Traceback:
TileDBError Traceback (most recent call last)
<ipython-input-14-5cd2827a3f2c> in <module>
14 p2 = 7000
15
---> 16 df = from_tileDB2(p1,p2,sdir,pair)
<ipython-input-14-5cd2827a3f2c> in from_tileDB2(p1, p2, sdir, pair)
1 def from_tileDB2(p1,p2,sdir,pair):
2 with tiledb.open(os.path.join(sdir,f"{pair}")) as A:
----> 3 data = A[p1:p2,:]
4
5 df = pd.DataFrame({"price":np.array(data["coords"]["price"],dtype=np.float64),
tiledb/libtiledb.pyx in tiledb.libtiledb.SparseArrayImpl.__getitem__()
tiledb/libtiledb.pyx in tiledb.libtiledb.SparseArrayImpl.subarray()
tiledb/libtiledb.pyx in tiledb.libtiledb.SparseArrayImpl._read_sparse_subarray()
tiledb/libtiledb.pyx in tiledb.libtiledb.ReadQuery.__init__()
tiledb/libtiledb.pyx in tiledb.libtiledb.ReadQuery.__init__()
tiledb/libtiledb.pyx in tiledb.libtiledb._raise_ctx_err()
tiledb/libtiledb.pyx in tiledb.libtiledb._raise_tiledb_error()
TileDBError: Error: Internal TileDB uncaught exception; std::bad_alloc
Array configuration:
config = tiledb.Config()
config["sm.num_reader_threads"] = "8"
config["sm.num_writer_threads"] = "8"
config["sm.tile_cache_size"] = "10000000"
ctx = tiledb.Ctx(config)
dom = tiledb.Domain(
# tiles = 1 cent increment
tiledb.Dim(ctx=ctx,name="price", domain=(0, 9e12), tile=0.01, dtype=np.float64),
# tiles = 1 day increment
tiledb.Dim(ctx=ctx,name="date", domain=(0, 9e21), tile=86.4e12, dtype=np.float64))
schema = tiledb.ArraySchema(domain=dom, sparse=True,
attrs=[tiledb.Attr(name="data", dtype=np.float64,ctx=ctx)],
cell_order="row-major",tile_order="row-major",
capacity=int(1e9),ctx=ctx)
Leaving the ctx at default with:
ctx = tiledb.Ctx()
Produces the same error by the way.
Would appreciate any help with this!