How can we tune TileDB s3 bandwidth utilization on 3D dense array reding operation?

Hello!

I’m trying to use TileDB to store 3D image data in S3 compatible storage and access it from Python with tiledb module. The data are 3D uint8 dense single attribute array of approximately 4000^3 size. I’ve tried using both gzip compression and no compression and storing data in tiles of 250^3 size. In all configurations I’ve checked so far on read operations there is a strange pattern in network bandwidth utilization: it uses somewhat max available network bandwidth to download the data for a few seconds and then halts for approximately the same amount of time (see pic below). So far I’ve tried changing any relevant available configuration parameters with no performance gain.

I’ve tried scaling the VM I’m working with up to 80 cores and tried manually overriding the number of concurrency levels for both io_ and compute_ tasks to both over and under the number of actual cores - no gain. Also there was no gain in changing memory_limits and tile cache sizes (in tens of times from the default values).

It seems this behavior is at least not entirely associated with my S3 storage service provider as the similar behavior is observed with locally established Minio storage server (with better bandwidth limits, but still no continuous network io bandwidth usage). Also judging from the both CPU utilization plots and network bandwidth plots the gaps are only partially occupied with intense CPU activity (~2seconds with ~50% of available cores utilization, then ~1 cpu core is used), as if some malloc operations are occurred.
I’ve tried the same operation with zarr module and it showed ~1.5 times better performance with no such inhomogeneous bandwidth utilization so my guess is that I’m still unable to configure TileDB properly (in case TileDB would use the full available network bandwidth constantly, it would show even better performance).
It seems that I’m missing some really obvious setting or feature. Could you please advise me on the proper settings for this case to gain maximum performance?

Bandwidth pattern (single reading operation performed):
Tiledb_bandwidth_test

Reading code:

    with tiledb.DenseArray(filename, mode="r", ctx=ctx) as A: 
        res = []
        for dataset in datasets:                                          # single dataset present in ongiong tests
            res.append(A[sx, sy, sz][dataset])                    # sx, sy, sz - are python slices objects here
    

Thanks for your support in advance!

Best regards,
Lisitsin Dmitriy

@DmitryLisitsin Thanks for posting, the network usage and CPU usage should be a bit more consistent, there is likely some tweaks needed to the setup.

A few questions to start:

  1. What version of TileDB-Py are you using?
  2. How large is the array? You said " 40003 size.", but is this cell count or bytes of the whole array? If bytes it with or out compression?
  3. How large is the data you are slicing in terms of bytes?
  4. What does the slices sx, sy and sz looks like? What is the shape of the query being run?
  5. Can you share the schema of the array so we can see the tile extents for each dimensions?
  6. Are you setting py.init_buffer_bytes on the config/context?

For the last question, if you are not overriding the python initial buffer size, it defaults to 10MB buffers per attribute/dimension. This might be the malloc you reference, if the data accessed is significantly larger you will have to wait for python to continuously resize the buffers to fit more data. This also effects the number of incomplete TileDB queries and user buffer are a factor in limiting how much data TileDB attempts to read from s3. Your attempts at changing the memory config parameters were good but if the python buffers are at the default 10mb, this is likely your limitation. In this case please try setting the py.init_buffer_bytes to something large enough to fit your expected data. If you are unsure of the size of the slice you can also use the new result estimation support in python.

Additionally can you please dump the TileDB Statistics and upload them here for us to check?

    tiledb.stats_enabled()
    with tiledb.DenseArray(filename, mode="r", ctx=ctx) as A: 
        res = []
        for dataset in datasets:                                          # single dataset present in ongiong tests
            res.append(A[sx, sy, sz][dataset])
    tiledb.stats_dump(verbose=True)

Hi, Seth!

Thank you for your reply!

Here are the answers for your questions:

  1. I’m using Python API with:
>>> import tiledb
>>> tiledb.__version__
'0.8.9'
  1. Sorry for the trouble with pasting the text. The sizes for data are ~ 4000 * 4000 * 4000 (I was trying to paste 4000**3 ). To be exact [4712, 4000, 4000] C layout data. Currently the test data are uint8 grayscale 3D image, but we are also interested in the scenario of using double data as well (multiple attributes, at least 3).

  2. Slice on the plot is 1/2 of the dataset linearly (so 1/8 of the dataset). With compression 4 it takes ~37 Gb. Uncompressed raw data are ~70.2Gb.

  3. Slices are [slice(2356, 4712, None), slice(2000, 4000, None), slice(2000, 4000, None)] - python native slice objects

  4. Sure. Here are the schema code (cleansed a little from input parameters and some logics):

    shape = [4712, 4000, 4000]
    tx, ty, tz = [250, 250, 250]
    dtype= np.uint8

    dom = tiledb.Domain(
        tiledb.Dim(name="z", domain=(0,shape[0]-1), tile=tx, dtype=np.int32),
        tiledb.Dim(name="y", domain=(0,shape[1]-1), tile=ty, dtype=np.int32),
        tiledb.Dim(name="x", domain=(0,shape[2]-1), tile=tz, dtype=np.int32)
    )
    
    if compression is None:
        attrs=[ tiledb.Attr(name="C0", dtype=dtype) ]

        schema = tiledb.ArraySchema(
                domain=dom, sparse=False, 
                attrs=attrs
        )
        schema.check()
    
    else:
        filter_list = tiledb.FilterList([tiledb.GzipFilter(level=compression)])
        attrs=[ tiledb.Attr(name="C0", dtype=dtype, filters=filter_list) ]
    
        schema = tiledb.ArraySchema(
                domain=dom, sparse=False, 
                attrs=attrs
        )
        schema.check()
        
    tiledb.DenseArray.create(filename, schema, ctx=ctx)   


I’ve tryed multiple combinations of input parameters. Same results as in case of default config plus s3 endpoint // account // secret// 'vfs.s3.use_virtual_addressing" = “false” defined

  1. No, I didn’t try changing this parameter yet. I’ve just found that this option exists in the forum suggestions once I was writing this post. Trying to wary this parameter right now. So I’m guessing ~40Gb init_buffer_bytes should be enough? Dumping the statistics right now. I’ve tried to enable vfs.s3.logging_level. Got ~1.5Tb of happy reading logs (I couldn’t find how do I disable binary data from being logged).

Thank you. I’ll post an update in a ~15 minutes or so!

Again: Thanks for your assistance!

Hi, @seth !

Here are the dump files you’ve asked about (I’ve updated tiledb with pip prior launching this tests):

for 8 cores VM:

INFO:root:strat reading dataset fraction
INFO:root:stopped reading dataset fraction
TileDB Embedded Version: (2, 3, 0)
TileDB-Py Version: 0.9.0

[
  {
    "timers": {
      "Context.StorageManager.read_load_frag_meta.sum": 0.216756,
      "Context.StorageManager.read_load_frag_meta.avg": 0.216756,
      "Context.StorageManager.read_load_consolidated_frag_meta.sum": 9.32e-07,
      "Context.StorageManager.read_load_consolidated_frag_meta.avg": 9.32e-07,
      "Context.StorageManager.read_load_array_schema.sum": 0.0485699,
      "Context.StorageManager.read_load_array_schema.avg": 0.0485699,
      "Context.StorageManager.read_get_fragment_uris.sum": 0.0337318,
      "Context.StorageManager.read_get_fragment_uris.avg": 0.0337318,
      "Context.StorageManager.read_array_open_without_fragments.sum": 0.0485959,
      "Context.StorageManager.read_array_open_without_fragments.avg": 0.0485959,
      "Context.StorageManager.read_array_open.sum": 0.265397,
      "Context.StorageManager.read_array_open.avg": 0.265397,
      "Context.StorageManager.Query.Reader.unfilter_attr_tiles.sum": 10.1745,
      "Context.StorageManager.Query.Reader.unfilter_attr_tiles.avg": 3.39149,
      "Context.StorageManager.Query.Reader.read.sum": 63.3175,
      "Context.StorageManager.Query.Reader.read.avg": 21.1058,
      "Context.StorageManager.Query.Reader.init_state.sum": 2.4262e-05,
      "Context.StorageManager.Query.Reader.init_state.avg": 2.4262e-05,
      "Context.StorageManager.Query.Reader.copy_fixed_attr_values.sum": 1.87442,
      "Context.StorageManager.Query.Reader.copy_fixed_attr_values.avg": 0.624807,
      "Context.StorageManager.Query.Reader.copy_attr_values.sum": 46.6636,
      "Context.StorageManager.Query.Reader.copy_attr_values.avg": 15.5545,
      "Context.StorageManager.Query.Reader.compute_sparse_result_tiles.sum": 7.219e-06,
      "Context.StorageManager.Query.Reader.compute_sparse_result_tiles.avg": 2.40633e-06,
      "Context.StorageManager.Query.Reader.compute_sparse_result_cell_slabs_dense.sum": 16.5337,
      "Context.StorageManager.Query.Reader.compute_sparse_result_cell_slabs_dense.avg": 5.51122,
      "Context.StorageManager.Query.Reader.compute_result_coords.sum": 2.0073e-05,
      "Context.StorageManager.Query.Reader.compute_result_coords.avg": 6.691e-06,
      "Context.StorageManager.Query.Reader.attr_tiles.sum": 32.9405,
      "Context.StorageManager.Query.Reader.attr_tiles.avg": 10.9802,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.read_next_partition.sum": 0.00643203,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.read_next_partition.avg": 0.00214401,
      "Context.StorageManager.Query.Reader.Subarray.read_load_relevant_rtrees.sum": 0.118545,
      "Context.StorageManager.Query.Reader.Subarray.read_load_relevant_rtrees.avg": 0.023709,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_tile_overlap.sum": 0.12307,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_tile_overlap.avg": 0.0205117,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_tile_coords.sum": 0.000301797,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_tile_coords.avg": 0.000100599,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_relevant_tile_overlap.sum": 0.002389,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_relevant_tile_overlap.avg": 0.0004778,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_relevant_frags.sum": 0.00189056,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_relevant_frags.avg": 0.000378112,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_est_result_size.sum": 0.126518,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_est_result_size.avg": 0.0140576
    },
    "counters": {
      "Context.StorageManager.read_unfiltered_byte_num": 20875,
      "Context.StorageManager.read_tile_offsets_size": 20560,
      "Context.StorageManager.read_rtree_size": 80,
      "Context.StorageManager.read_frag_meta_size": 6346,
      "Context.StorageManager.read_array_schema_size": 235,
      "Context.StorageManager.VFS.read_ops_num": 582,
      "Context.StorageManager.VFS.read_byte_num": 5850939393,
      "Context.StorageManager.Query.Reader.result_num": 9424000000,
      "Context.StorageManager.Query.Reader.read_unfiltered_byte_num": 12000000000,
      "Context.StorageManager.Query.Reader.overlap_tile_num": 768,
      "Context.StorageManager.Query.Reader.loop_num": 3,
      "Context.StorageManager.Query.Reader.cell_num": 12000000000,
      "Context.StorageManager.Query.Reader.attr_fixed_num": 3,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.compute_current_start_end.not_found": 1,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.compute_current_start_end.fixed_result_size_overflow": 1,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.tile_overlap_cache_hit": 1,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.tile_overlap_byte_size": 17104,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.relevant_fragment_num": 28,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.ranges_requested": 5,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.ranges_computed": 5,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.fragment_num": 95
    }
  }
]

==== Python Stats ====

* Total TileDB query time: 70.3346
  > TileDB Core initial query submit time: 17.3238
  > TileDB Core incomplete retry time: 52.8898
  > TileDB-Py buffer update time: 6.89599
  > TileDB-Py retry count: 2

and for 80 cores VM (the cloud provider scales network bandwidth with the number of vCPUs):

INFO:root:strat reading dataset fraction
INFO:root:stopped reading dataset fraction
TileDB Embedded Version: (2, 3, 0)
TileDB-Py Version: 0.9.0

[
  {
    "timers": {
      "Context.StorageManager.read_load_frag_meta.sum": 0.115803,
      "Context.StorageManager.read_load_frag_meta.avg": 0.115803,
      "Context.StorageManager.read_load_consolidated_frag_meta.sum": 1.101e-06,
      "Context.StorageManager.read_load_consolidated_frag_meta.avg": 1.101e-06,
      "Context.StorageManager.read_load_array_schema.sum": 0.113168,
      "Context.StorageManager.read_load_array_schema.avg": 0.113168,
      "Context.StorageManager.read_get_fragment_uris.sum": 0.0629837,
      "Context.StorageManager.read_get_fragment_uris.avg": 0.0629837,
      "Context.StorageManager.read_array_open_without_fragments.sum": 0.113282,
      "Context.StorageManager.read_array_open_without_fragments.avg": 0.113282,
      "Context.StorageManager.read_array_open.sum": 0.229115,
      "Context.StorageManager.read_array_open.avg": 0.229115,
      "Context.StorageManager.Query.Reader.unfilter_attr_tiles.sum": 4.63834,
      "Context.StorageManager.Query.Reader.unfilter_attr_tiles.avg": 1.54611,
      "Context.StorageManager.Query.Reader.read.sum": 46.7612,
      "Context.StorageManager.Query.Reader.read.avg": 15.5871,
      "Context.StorageManager.Query.Reader.init_state.sum": 2.218e-05,
      "Context.StorageManager.Query.Reader.init_state.avg": 2.218e-05,
      "Context.StorageManager.Query.Reader.copy_fixed_attr_values.sum": 3.68975,
      "Context.StorageManager.Query.Reader.copy_fixed_attr_values.avg": 1.22992,
      "Context.StorageManager.Query.Reader.copy_attr_values.sum": 30.6175,
      "Context.StorageManager.Query.Reader.copy_attr_values.avg": 10.2058,
      "Context.StorageManager.Query.Reader.compute_sparse_result_tiles.sum": 1.0685e-05,
      "Context.StorageManager.Query.Reader.compute_sparse_result_tiles.avg": 3.56167e-06,
      "Context.StorageManager.Query.Reader.compute_sparse_result_cell_slabs_dense.sum": 15.9386,
      "Context.StorageManager.Query.Reader.compute_sparse_result_cell_slabs_dense.avg": 5.31287,
      "Context.StorageManager.Query.Reader.compute_result_coords.sum": 3.9092e-05,
      "Context.StorageManager.Query.Reader.compute_result_coords.avg": 1.30307e-05,
      "Context.StorageManager.Query.Reader.attr_tiles.sum": 15.8594,
      "Context.StorageManager.Query.Reader.attr_tiles.avg": 5.28648,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.read_next_partition.sum": 0.0408849,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.read_next_partition.avg": 0.0136283,
      "Context.StorageManager.Query.Reader.Subarray.read_load_relevant_rtrees.sum": 0.0820199,
      "Context.StorageManager.Query.Reader.Subarray.read_load_relevant_rtrees.avg": 0.016404,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_tile_overlap.sum": 0.126213,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_tile_overlap.avg": 0.0210356,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_tile_coords.sum": 0.000363162,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_tile_coords.avg": 0.000121054,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_relevant_tile_overlap.sum": 0.0366353,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_relevant_tile_overlap.avg": 0.00732705,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_relevant_frags.sum": 0.00718433,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_relevant_frags.avg": 0.00143687,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_est_result_size.sum": 0.13387,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_est_result_size.avg": 0.0148744
    },
    "counters": {
      "Context.StorageManager.read_unfiltered_byte_num": 20875,
      "Context.StorageManager.read_tile_offsets_size": 20560,
      "Context.StorageManager.read_rtree_size": 80,
      "Context.StorageManager.read_frag_meta_size": 6346,
      "Context.StorageManager.read_array_schema_size": 235,
      "Context.StorageManager.VFS.read_ops_num": 582,
      "Context.StorageManager.VFS.read_byte_num": 5850939393,
      "Context.StorageManager.Query.Reader.result_num": 9424000000,
      "Context.StorageManager.Query.Reader.read_unfiltered_byte_num": 12000000000,
      "Context.StorageManager.Query.Reader.overlap_tile_num": 768,
      "Context.StorageManager.Query.Reader.loop_num": 3,
      "Context.StorageManager.Query.Reader.cell_num": 12000000000,
      "Context.StorageManager.Query.Reader.attr_fixed_num": 3,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.compute_current_start_end.not_found": 1,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.compute_current_start_end.fixed_result_size_overflow": 1,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.tile_overlap_cache_hit": 1,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.tile_overlap_byte_size": 17104,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.relevant_fragment_num": 28,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.ranges_requested": 5,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.ranges_computed": 5,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.fragment_num": 95
    }
  }
]

==== Python Stats ====

* Total TileDB query time: 57.9716
  > TileDB Core initial query submit time: 12.3788
  > TileDB Core incomplete retry time: 45.4965
  > TileDB-Py buffer update time: 11.1139
  > TileDB-Py retry count: 2

I’m afraid I didn’t manage to set the py.init_buffer_bytes properly (I’ve tried to set it to 2000^3 yet still same memory imprint over time and same bandwidth utilization and hence running time). I’m using context explicitly by passing it to tiledb routines. So I’ve tried to get context with proper “py.init_buffer_bytes” value (based on https://tiledb-inc-tiledb.readthedocs-hosted.com/projects/tiledb-py/en/stable/python-api.html#tiledb.Config) like this:

def init_s3_ctx():
    config = tiledb.Config({'py.init_buffer_bytes': 2000**3 })

    creds = get_credentials(os.path.expanduser("~/.aws/credentials"))

    config["vfs.s3.scheme"] = "https"
    config["vfs.s3.aws_access_key_id"] = creds['aws_access_key_id']
    config["vfs.s3.aws_secret_access_key"] = creds['aws_secret_access_key']
    config["vfs.s3.region"] = creds["region_name"]
    config["vfs.s3.endpoint_override"] = creds["endpoint_url"]
    config["vfs.s3.use_virtual_addressing"] = "false"

    #config["py.init_buffer_bytes"] = 2000**3           # tried as well, no effect :(

    ctx = tiledb.Ctx(config)
    return ctx

Could you please say what I’m doing wrong? Should I try utilizing default context?

Still no effect (same execution time and bandwidth utilization). For dataset with gzip compression lvl 4:
time to read subset [slice(2356, 4712, None), slice(2000, 4000, None), slice(2000, 4000, None)] = 58.26068878173828

Hi @DmitryLisitsin, thanks a lot for posting all this detailed information. Please note that we are working on various optimizations on that code path (dense reads), but here are a couple of hopefully useful comments.

First, let me explain what’s happening here, just by reading the stats:

  • The query does not fit in your allocated buffers (2000 ** 3) because your result count is 9424000000.
  • This query completes in 3 separate submissions (incomplete query feature), because the retry count is 2. This is not because of your buffer size, but because of parameter sm.memory_budget (default 5GB) which dictates how much internal memory TileDB should use (approximately). Your query decompresses about 12GB in total (12000000000 cells).
  • The query reallocates the buffer once and copies over ~7GB in the last submission, as your initial buffers do not fit the entire result.
  • You see 3 spikes in bandwidth because there are 3 query submissions with high IO, interrupted by some high CPU for copying the results to your buffers, then some pauses in between the submissions for reallocating your initial buffers.

How to fix this:

  • Allocate more memory than 2000**3 for the py.init_buffer_bytes, perhaps something like 12GB (@ihnorton will follow up if anything else is needed here for the buffer allocation).
  • Set sm.memory_budget to something like 15GB (and experiment with it to see if anything changes there)

Once you do the above, you should see > TileDB-Py retry count: to be zero, and hopefully > TileDB-Py buffer update time: to be reduced significantly. We will need to investigate if the former gets set to 0 but the second is still high.

I hope the above helps. You’ll see some more optimizations on this path very soon.

Thanks,
Stavros

Hi @stavros, thanks for your reply!

Sorry for my late response, somehow I managed to both skip your reply and got forum message to stuck in spam folder :frowning:

Also I can’t explain why I thought that slice 2356:4712 was less than 2000 (I’ve set 2000 ** 3 size considering it is slightly larger than the slice size (facepalm) ). But later on I’ve tried to set it up to 20GB and still didn’t see any significant improvements (on a level of benchmark duration dispersion).

I indeed didn’t try to set both py.init_buffer_bytes and sm.memory_budget values to be high enough. And also I didn’t notice that setting sm.memory_budget to sufficient values (80GB) indeed decreases bandwidth utilization fragmentation. Yet still there are now 2 fragments (" > TileDB-Py retry count: 1" in logs, presented below) in bandwidth utilization plot. And as far as I managed to see till now, there is no noticeable time improvements from changing py.init_buffer_bytes values to 12-20-50Gb. I’ve checked both cases of compressed (gzip lvl 4) and uncompressed dataset in S3 compatible storage. And the CPU resources consumption is still the same in between the bandwidth utilization fragments: half of the gap with high (~50% of available CPU resources) CPU consumption and the other half with ~100%CPU in top (1/80 cores). As far as I understand the first fraction is decompression and the later one is some memory allocations. The sm.memory_budget increase gives ~ 10% max performance gain (~5s/ 50s).

I’m not sure how should the py.init_buffer_bytes value be set. Could it be that the updated value is used only if I set it when constructing config with:

config = tiledb.Config({'py.init_buffer_bytes': '50000000000' })

Is it possible to set this parameter like any other Configuration Parameters? Tried the both approaches, yet no observable effect.

Thus I didn’t manage to get > TileDB-Py retry count: 0 yet. I’ll try to run a few more experiments tomorrow.

Could it be that it is worth changing other Configuration Parameters from default values (for example sm.tile_cache_size or sm.sub_partitioner_memory_budget)? Tried, no profit yet…

Is it theoretically possible to get completely overlay CPU usage for decompression and reordering with network transfer on current versions? Is it worth reproducing this test scenario with other API (say C API)?

By the way I’m not sure if it is a type or just not implemented in Python API: in Configuration Parameters there are a few default value examples with GB literal suffix, yet it didn’t work for me (for example sm.memory_budget value). In my tests I’ve already bumped a few times on a strange results due to some typo in cofiguration parameter keywords (due to the fact that unknown parameters are ommited silently). Hope all this ruckus with this entire topic is not due to some silly bug of mine :frowning: .

Here are Statistics dumps:
with sm.memory_budget = 90 GB:

INFO:root:strat reading dataset fraction
INFO:root:stopped reading dataset fraction
TileDB Embedded Version: (2, 3, 0)
TileDB-Py Version: 0.9.0

[
  {
    "timers": {
      "Context.StorageManager.read_load_frag_meta.sum": 0.0901656,
      "Context.StorageManager.read_load_frag_meta.avg": 0.0901656,
      "Context.StorageManager.read_load_consolidated_frag_meta.sum": 8.67e-07,
      "Context.StorageManager.read_load_consolidated_frag_meta.avg": 8.67e-07,
      "Context.StorageManager.read_load_array_schema.sum": 0.0434217,
      "Context.StorageManager.read_load_array_schema.avg": 0.0434217,
      "Context.StorageManager.read_get_fragment_uris.sum": 0.0364878,
      "Context.StorageManager.read_get_fragment_uris.avg": 0.0364878,
      "Context.StorageManager.read_array_open_without_fragments.sum": 0.0434408,
      "Context.StorageManager.read_array_open_without_fragments.avg": 0.0434408,
      "Context.StorageManager.read_array_open.sum": 0.133638,
      "Context.StorageManager.read_array_open.avg": 0.133638,
      "Context.StorageManager.Query.Reader.unfilter_attr_tiles.sum": 4.30799,
      "Context.StorageManager.Query.Reader.unfilter_attr_tiles.avg": 2.154,
      "Context.StorageManager.Query.Reader.read.sum": 42.2883,
      "Context.StorageManager.Query.Reader.read.avg": 21.1441,
      "Context.StorageManager.Query.Reader.init_state.sum": 2.3553e-05,
      "Context.StorageManager.Query.Reader.init_state.avg": 2.3553e-05,
      "Context.StorageManager.Query.Reader.copy_fixed_attr_values.sum": 2.54162,
      "Context.StorageManager.Query.Reader.copy_fixed_attr_values.avg": 1.27081,
      "Context.StorageManager.Query.Reader.copy_attr_values.sum": 26.8776,
      "Context.StorageManager.Query.Reader.copy_attr_values.avg": 13.4388,
      "Context.StorageManager.Query.Reader.compute_sparse_result_tiles.sum": 8.724e-06,
      "Context.StorageManager.Query.Reader.compute_sparse_result_tiles.avg": 4.362e-06,
      "Context.StorageManager.Query.Reader.compute_sparse_result_cell_slabs_dense.sum": 15.2822,
      "Context.StorageManager.Query.Reader.compute_sparse_result_cell_slabs_dense.avg": 7.64112,
      "Context.StorageManager.Query.Reader.compute_result_coords.sum": 4.0085e-05,
      "Context.StorageManager.Query.Reader.compute_result_coords.avg": 2.00425e-05,
      "Context.StorageManager.Query.Reader.attr_tiles.sum": 13.5855,
      "Context.StorageManager.Query.Reader.attr_tiles.avg": 6.79274,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.read_next_partition.sum": 0.0230381,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.read_next_partition.avg": 0.0115191,
      "Context.StorageManager.Query.Reader.Subarray.read_load_relevant_rtrees.sum": 0.0462501,
      "Context.StorageManager.Query.Reader.Subarray.read_load_relevant_rtrees.avg": 0.0154167,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_tile_overlap.sum": 0.0755613,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_tile_overlap.avg": 0.0188903,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_tile_coords.sum": 0.000398737,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_tile_coords.avg": 0.000199368,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_relevant_tile_overlap.sum": 0.024966,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_relevant_tile_overlap.avg": 0.00832199,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_relevant_frags.sum": 0.00408449,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_relevant_frags.avg": 0.0013615,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_est_result_size.sum": 0.0804848,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_est_result_size.avg": 0.016097
    },
    "counters": {
      "Context.StorageManager.read_unfiltered_byte_num": 20875,
      "Context.StorageManager.read_tile_offsets_size": 20560,
      "Context.StorageManager.read_rtree_size": 80,
      "Context.StorageManager.read_frag_meta_size": 6346,
      "Context.StorageManager.read_array_schema_size": 235,
      "Context.StorageManager.VFS.read_ops_num": 537,
      "Context.StorageManager.VFS.read_byte_num": 5327029130,
      "Context.StorageManager.Query.Reader.result_num": 9424000000,
      "Context.StorageManager.Query.Reader.read_unfiltered_byte_num": 11000000000,
      "Context.StorageManager.Query.Reader.overlap_tile_num": 704,
      "Context.StorageManager.Query.Reader.loop_num": 2,
      "Context.StorageManager.Query.Reader.cell_num": 11000000000,
      "Context.StorageManager.Query.Reader.attr_fixed_num": 2,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.compute_current_start_end.not_found": 1,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.compute_current_start_end.fixed_result_size_overflow": 1,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.tile_overlap_cache_hit": 1,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.tile_overlap_byte_size": 10800,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.relevant_fragment_num": 21,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.ranges_requested": 3,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.ranges_computed": 3,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.fragment_num": 57
    }
  }
]

==== Python Stats ====

* Total TileDB query time: 48.496
  > TileDB Core initial query submit time: 20.9982
  > TileDB Core incomplete retry time: 27.4385
  > TileDB-Py buffer update time: 6.14815
  > TileDB-Py retry count: 1

with sm.memory_budget = 90 GB, py.init_buffer_bytes = 40GB

INFO:root:strat reading dataset fraction
INFO:root:stopped reading dataset fraction
TileDB Embedded Version: (2, 3, 0)
TileDB-Py Version: 0.9.0

[
  {
    "timers": {
      "Context.StorageManager.read_load_frag_meta.sum": 0.0904538,
      "Context.StorageManager.read_load_frag_meta.avg": 0.0904538,
      "Context.StorageManager.read_load_consolidated_frag_meta.sum": 9.8e-07,
      "Context.StorageManager.read_load_consolidated_frag_meta.avg": 9.8e-07,
      "Context.StorageManager.read_load_array_schema.sum": 0.0324673,
      "Context.StorageManager.read_load_array_schema.avg": 0.0324673,
      "Context.StorageManager.read_get_fragment_uris.sum": 0.0323795,
      "Context.StorageManager.read_get_fragment_uris.avg": 0.0323795,
      "Context.StorageManager.read_array_open_without_fragments.sum": 0.0324861,
      "Context.StorageManager.read_array_open_without_fragments.avg": 0.0324861,
      "Context.StorageManager.read_array_open.sum": 0.123026,
      "Context.StorageManager.read_array_open.avg": 0.123026,
      "Context.StorageManager.Query.Reader.unfilter_attr_tiles.sum": 3.23723,
      "Context.StorageManager.Query.Reader.unfilter_attr_tiles.avg": 1.61861,
      "Context.StorageManager.Query.Reader.read.sum": 40.9202,
      "Context.StorageManager.Query.Reader.read.avg": 20.4601,
      "Context.StorageManager.Query.Reader.init_state.sum": 3.5258e-05,
      "Context.StorageManager.Query.Reader.init_state.avg": 3.5258e-05,
      "Context.StorageManager.Query.Reader.copy_fixed_attr_values.sum": 1.56068,
      "Context.StorageManager.Query.Reader.copy_fixed_attr_values.avg": 0.780338,
      "Context.StorageManager.Query.Reader.copy_attr_values.sum": 25.1186,
      "Context.StorageManager.Query.Reader.copy_attr_values.avg": 12.5593,
      "Context.StorageManager.Query.Reader.compute_sparse_result_tiles.sum": 6.746e-06,
      "Context.StorageManager.Query.Reader.compute_sparse_result_tiles.avg": 3.373e-06,
      "Context.StorageManager.Query.Reader.compute_sparse_result_cell_slabs_dense.sum": 15.6888,
      "Context.StorageManager.Query.Reader.compute_sparse_result_cell_slabs_dense.avg": 7.84439,
      "Context.StorageManager.Query.Reader.compute_result_coords.sum": 3.0513e-05,
      "Context.StorageManager.Query.Reader.compute_result_coords.avg": 1.52565e-05,
      "Context.StorageManager.Query.Reader.attr_tiles.sum": 14.1831,
      "Context.StorageManager.Query.Reader.attr_tiles.avg": 7.09156,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.read_next_partition.sum": 0.0211147,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.read_next_partition.avg": 0.0105574,
      "Context.StorageManager.Query.Reader.Subarray.read_load_relevant_rtrees.sum": 0.0657873,
      "Context.StorageManager.Query.Reader.Subarray.read_load_relevant_rtrees.avg": 0.0219291,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_tile_overlap.sum": 0.0942775,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_tile_overlap.avg": 0.0235694,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_tile_coords.sum": 0.000285052,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_tile_coords.avg": 0.000142526,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_relevant_tile_overlap.sum": 0.0248198,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_relevant_tile_overlap.avg": 0.00827326,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_relevant_frags.sum": 0.00347886,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_relevant_frags.avg": 0.00115962,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_est_result_size.sum": 0.0986528,
      "Context.StorageManager.Query.Reader.Subarray.read_compute_est_result_size.avg": 0.0197306
    },
    "counters": {
      "Context.StorageManager.read_unfiltered_byte_num": 20875,
      "Context.StorageManager.read_tile_offsets_size": 20560,
      "Context.StorageManager.read_rtree_size": 80,
      "Context.StorageManager.read_frag_meta_size": 6346,
      "Context.StorageManager.read_array_schema_size": 235,
      "Context.StorageManager.VFS.read_ops_num": 537,
      "Context.StorageManager.VFS.read_byte_num": 5327029130,
      "Context.StorageManager.Query.Reader.result_num": 9424000000,
      "Context.StorageManager.Query.Reader.read_unfiltered_byte_num": 11000000000,
      "Context.StorageManager.Query.Reader.overlap_tile_num": 704,
      "Context.StorageManager.Query.Reader.loop_num": 2,
      "Context.StorageManager.Query.Reader.cell_num": 11000000000,
      "Context.StorageManager.Query.Reader.attr_fixed_num": 2,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.compute_current_start_end.not_found": 1,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.compute_current_start_end.fixed_result_size_overflow": 1,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.tile_overlap_cache_hit": 1,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.tile_overlap_byte_size": 10800,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.relevant_fragment_num": 21,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.ranges_requested": 3,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.ranges_computed": 3,
      "Context.StorageManager.Query.Reader.Subarray.precompute_tile_overlap.fragment_num": 57
    }
  }
]

==== Python Stats ====

* Total TileDB query time: 47.1339
  > TileDB Core initial query submit time: 20.818
  > TileDB Core incomplete retry time: 26.2365
  > TileDB-Py buffer update time: 6.13405
  > TileDB-Py retry count: 1

Thanks for your assistance in advance!
Dmitriy

Hi @DmitryLisitsin,

We’ve released an update to TileDB-Py with a bug fix to ensure that the py.init_buffer_bytes parameter is used correctly for dense reads. As a point of reference, I’ve included a small test script below, for which I get the following output from the stats_dump (running both locally and against S3):

"Context.StorageManager.Query.Reader.loop_num": 1,

This confirms TileDB is running only a single query execution as desired.

To run the code, eg saved in ex355.py – writing/reading respectively:

python ex355.py create <uri>
python ex355.py read <uri>

Code:

import tiledb
import numpy as np
import sys

# 10 GB of int64:
default_nelem = 10 * 1024**3 // 8

def create(uri, nelem=default_nelem):
    arr = np.arange(nelem).reshape(1024,1024,-1)
    tiledb.from_numpy(uri, arr, tile=(256,256,256))

def read(uri):
    cfg = tiledb.Config(
        {
            'py.init_buffer_bytes': 15 * 1024**3,
            'sm.memory_budget': 15*1024**3
        }
    )

    tiledb.stats_enable()

    with tiledb.scope_ctx(cfg):
        with tiledb.open(uri) as a:
            a[:]

    print("---\nread complete\n---")
    tiledb.stats_dump()

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Expected two arguments: <command: create|read> <uri>")
        sys.exit(1)

    cmd = sys.argv[1]
    uri = sys.argv[2]

    if cmd == "create":
        create(uri)
    elif cmd == "read":
        read(uri)
    else:
        raise Exception("Unexpected command: ", cmd)

Best,
Isaiah