TileDBError: Error: Internal TileDB uncaught exception; std::bad_alloc

Hi again,

I’m having issues reading out a particular value range.
My array goes from price values of $0 to $70’000, but I cannot read out the range between $5’500 and $7’000 in one go. The only way to read this range is iteratively in smaller chunks (i.e. $250 increments) and then concatenate the results, but this is not ideal as it takes much longer.

Things I’ve tried:

  • Read out iteratively and save to a new array
  • Optimize new array in terms of tile layout and tile capacity

Unfortunately to no avail… The error persists, but really only in this narrow range.

You can find the data here (~1.5 GB, 300 M data points).
Code that produces the error:

import tiledb
import pandas as pd
from pathlib import Path
import numpy as np

def from_tileDB2(p1,p2,sdir,pair):
    with tiledb.open(os.path.join(sdir,f"{pair}")) as A:
        data = A[p1:p2,:]

    df = pd.DataFrame({"price":np.array(data["coords"]["price"],dtype=np.float64),
              "date":np.array(data["coords"]["date"],dtype='datetime64[ns]'),
              "data":np.array(data["data"],dtype=np.float64)}).set_index("price")
    return df

# Source Dir
sdir = Path(r"Your_Path")
# Array Name
pair = "btcusdt2"

# Price Range
p1 = 5500
p2 = 7000

# Array Query
df = from_tileDB2(p1,p2,sdir,pair)

Traceback:

TileDBError                               Traceback (most recent call last)
<ipython-input-14-5cd2827a3f2c> in <module>
     14 p2 = 7000
     15 
---> 16 df = from_tileDB2(p1,p2,sdir,pair)

<ipython-input-14-5cd2827a3f2c> in from_tileDB2(p1, p2, sdir, pair)
      1 def from_tileDB2(p1,p2,sdir,pair):
      2     with tiledb.open(os.path.join(sdir,f"{pair}")) as A:
----> 3         data = A[p1:p2,:]
      4 
      5     df = pd.DataFrame({"price":np.array(data["coords"]["price"],dtype=np.float64),

tiledb/libtiledb.pyx in tiledb.libtiledb.SparseArrayImpl.__getitem__()

tiledb/libtiledb.pyx in tiledb.libtiledb.SparseArrayImpl.subarray()

tiledb/libtiledb.pyx in tiledb.libtiledb.SparseArrayImpl._read_sparse_subarray()

tiledb/libtiledb.pyx in tiledb.libtiledb.ReadQuery.__init__()

tiledb/libtiledb.pyx in tiledb.libtiledb.ReadQuery.__init__()

tiledb/libtiledb.pyx in tiledb.libtiledb._raise_ctx_err()

tiledb/libtiledb.pyx in tiledb.libtiledb._raise_tiledb_error()

TileDBError: Error: Internal TileDB uncaught exception; std::bad_alloc

Array configuration:

config = tiledb.Config()
config["sm.num_reader_threads"] = "8"
config["sm.num_writer_threads"] = "8"
config["sm.tile_cache_size"] = "10000000"

ctx = tiledb.Ctx(config)

dom = tiledb.Domain(
    # tiles = 1 cent increment
    tiledb.Dim(ctx=ctx,name="price", domain=(0, 9e12), tile=0.01, dtype=np.float64),
    # tiles = 1 day increment
    tiledb.Dim(ctx=ctx,name="date", domain=(0, 9e21), tile=86.4e12, dtype=np.float64))
schema = tiledb.ArraySchema(domain=dom, sparse=True,
                            attrs=[tiledb.Attr(name="data", dtype=np.float64,ctx=ctx)],
                            cell_order="row-major",tile_order="row-major",
                            capacity=int(1e9),ctx=ctx)

Leaving the ctx at default with:

ctx = tiledb.Ctx()

Produces the same error by the way.

Would appreciate any help with this! :slight_smile: