TileDBError: [TileDB::S3] Error: Direct write failed!

Hi,

I have a simple 3D dense array, stored on S3 (Ceph), to which I have performed a few hundred writes. When I try consolidating I consistently get the following error:

In [79]:  tiledb.consolidate(temp_array_path, ctx=ctx)

---------------------------------------------------------------------------
TileDBError                               Traceback (most recent call last)
<ipython-input-77-4b2f57dbfd46> in <module>
----> 1 tiledb.consolidate(temp_array_path, ctx=ctx)

tiledb/libtiledb.pyx in tiledb.libtiledb.consolidate()
tiledb/libtiledb.pyx in tiledb.libtiledb._raise_ctx_err()
tiledb/libtiledb.pyx in tiledb.libtiledb._raise_tiledb_error()

TileDBError: [TileDB::S3] Error: Direct write failed! 20971520 bytes written to buffer, 49345273 bytes requested.
In [80]: tiledb.__version__
Out[80]: '0.6.6'

Any hints on how to figure out what’s going wrong?

Cheers,

Luca

Tried consolidating the same array with tiledb 0.8.4:

TileDBError: [TileDB::ConstBuffer] Error: Read buffer overflow

I have to revise that last statement. I see that tiledb 0.8.4 actually generates the type of error.

I tried recreating the array and periodically consolidating as I insert data. I see that consolidation initially works, then at one point, after having inserted enough data, the procedure breaks.

Solved by enabling vfs.s3.use_multipart_upload!

Following some of the responses on Array consolidation here recently I tried consolidating my array too. With the same error as you.


tileDB version: (2, 2, 4)
consolidate

Called consolidate with context: Parameter Value
config.env_var_prefix ‘TILEDB_’
rest.http_compressor ‘any’
rest.retry_count ‘3’
rest.retry_delay_factor ‘1.25’
rest.retry_http_codes ‘503’
rest.retry_initial_delay_ms ‘500’
rest.server_address https://api.tiledb.com
rest.server_serialization_format ‘CAPNP’
sm.check_coord_dups ‘true’
sm.check_coord_oob ‘true’
sm.check_global_order ‘true’
sm.compute_concurrency_level ‘12’
sm.consolidation.amplification ‘1.0’
sm.consolidation.buffer_size ‘5000000’
sm.consolidation.mode ‘fragment_meta’
sm.consolidation.step_max_frags ‘4294967295’
sm.consolidation.step_min_frags ‘4294967295’
sm.consolidation.step_size_ratio ‘0.0’
sm.consolidation.steps ‘4294967295’
sm.dedup_coords ‘True’
sm.enable_signal_handlers ‘true’
sm.io_concurrency_level ‘12’
sm.memory_budget ‘5368709120’
sm.memory_budget_var ‘10737418240’
sm.num_tbb_threads ‘-1’
sm.skip_checksum_validation ‘false’
sm.sub_partitioner_memory_budget ‘0’
sm.tile_cache_size ‘10000000’
sm.vacuum.mode ‘fragments’
sm.var_offsets.bitsize ‘64’
sm.var_offsets.extra_element ‘false’
sm.var_offsets.mode ‘bytes’
vfs.azure.blob_endpoint ‘’
vfs.azure.block_list_block_size ‘5242880’
vfs.azure.max_parallel_ops ‘12’
vfs.azure.storage_account_key ‘’
vfs.azure.storage_account_name ‘’
vfs.azure.use_block_list_upload ‘true’
vfs.azure.use_https ‘true’
vfs.file.enable_filelocks ‘true’
vfs.file.max_parallel_ops ‘12’
vfs.file.posix_directory_permissions ‘755’
vfs.file.posix_file_permissions ‘644’
vfs.gcs.max_parallel_ops ‘12’
vfs.gcs.multi_part_size ‘5242880’
vfs.gcs.project_id ‘’
vfs.gcs.use_multi_part_upload ‘true’
vfs.hdfs.kerb_ticket_cache_path ‘’
vfs.hdfs.name_node_uri ‘’
vfs.hdfs.username ‘’
vfs.min_batch_gap ‘512000’
vfs.min_batch_size ‘20971520’
vfs.min_parallel_size ‘10485760’
vfs.read_ahead_cache_size ‘10485760’
vfs.read_ahead_size ‘102400’
vfs.s3.aws_access_key_id ‘xxxxx’
vfs.s3.aws_external_id ‘’
vfs.s3.aws_load_frequency ‘’
vfs.s3.aws_role_arn ‘’
vfs.s3.aws_secret_access_key ‘xxxxxx’
vfs.s3.aws_session_name ‘’
vfs.s3.aws_session_token ‘’
vfs.s3.ca_file ‘’
vfs.s3.ca_path ‘’
vfs.s3.connect_max_tries ‘5’
vfs.s3.connect_scale_factor ‘25’
vfs.s3.connect_timeout_ms ‘3000’
vfs.s3.endpoint_override ‘’
vfs.s3.logging_level ‘Off’
vfs.s3.max_parallel_ops ‘12’
vfs.s3.multipart_part_size ‘5242880’
vfs.s3.proxy_host ‘’
vfs.s3.proxy_password ‘’
vfs.s3.proxy_port ‘0’
vfs.s3.proxy_scheme ‘http’
vfs.s3.proxy_username ‘’
vfs.s3.region ‘xxxxx’
vfs.s3.request_timeout_ms ‘3000’
vfs.s3.requester_pays ‘false’
vfs.s3.scheme ‘https’
vfs.s3.use_multipart_upload ‘true’
vfs.s3.use_virtual_addressing ‘true’
vfs.s3.verify_ssl ‘true’

Traceback (most recent call last):
File “/xxxxxx.py”, line 27, in
db.consolidate()
File “xxxxx”, line 90, in consolidate
tiledb.consolidate(self.array_str, ctx=self.get_ctx(), config=self.get_ctx().config())
File “tiledb/libtiledb.pyx”, line 5581, in tiledb.libtiledb.consolidate
File “tiledb/libtiledb.pyx”, line 506, in tiledb.libtiledb._raise_ctx_err
File “tiledb/libtiledb.pyx”, line 491, in tiledb.libtiledb._raise_tiledb_error
tiledb.libtiledb.TileDBError: [TileDB::ConstBuffer] Error: Read buffer overflow
python scripts/read_tiledb.py 0.75s user 0.11s system 33% cpu 2.573 total


=> tiledb.libtiledb.TileDBError: [TileDB::ConstBuffer] Error: Read buffer overflow

But vfs.s3.use_multipart_upload seems to be truthy by default and manually forcing it to true did not solve the overflow. neither did upping the value of sm.consolidation.buffer_size
( to be noted: i’m only trying to consolidate fragments metadata :
sm.consolidation.mode | ‘fragment_meta’ )

Sorry for the confusion. The problem I solved by enabling multi-part uploads is:

TileDBError: [TileDB::S3] Error: Direct write failed! 20971520 bytes written to buffer, 49345273 bytes requested.

I had disabled multi-part uploads some time ago while in search for a solution to another problem and never re-enabled them.

So what you mean is that you still have the error

TileDBError: [TileDB::ConstBuffer] Error: Read buffer overflow

when trying to consolidate a large array ( mine is a bit under 30Go ). with tiledb-py 0.8.4 ?

I downgraded to tiledb-py 0.7.7 without changing anything else and consolidation completes without any problem.

There seems to be a regression to investigate. @stavros !


edit: downgraded all the way from 8.4 to 7.7 and 7.7 is the latest version at which consolidation still works on my array.
7.7 tiledb-py is the last to ship with v2.1.6 of tiledb.

Thanks for reporting this @ilveroluca and @tiphaineruy!

Could you please give us some more details, e.g., how many fragments, their sizes, the subarrays you wrote into for the dense case, etc?

Some comments before we deep dive:

  1. It is important to understand all sm.consolidation* config parameters, since, if you are using the defaults on huge arrays, you will probably run out of memory.
  2. Please check the consolidation docs, and especially the topic on dense array amplification in case you write to the array in non-contiguous subarray “patches”.
  3. We have a known bug in that you should explicitly pass the config object to tiledb.consolidate (see discussion here).
  4. For the dense case and especially if you write reasonably-sized data (e.g., 500MB-1GB) into disjoint subarrays, you can improve performance by just consolidating the fragment metadata (instead of the actual fragments).

@tiphaineruy, @ihnorton could help us see why there is a regression. Could you please send us any information you can so that we can reproduce the issue on our side?

Thanks!

Hey Stavros !

For my use-case:

sparse array created with tiledb 2.1.6.
120ish fragments at around 200mb per fragment. 3 dimensions and 7 attributes.
Total array size is 24go.

  • As for your point 1. 2. and 4. I got the buffer overflow

tiledb.libtiledb.TileDBError: [TileDB::ConstBuffer] Error: Read buffer overflow

while consolidating fragment metadata. ( cf my previous post):
sm.consolidation.mode | ‘fragment_meta’

  • As for your point 3. That’s what prompted my test. I realized I never really consolidated the array metadata as my config file wasn’t used. (cf my previous post)
tiledb.consolidate(self.array_str, ctx=self.get_ctx(), config=self.get_ctx().config())

Fragment metadata consolidation fails on my array on all tiledb-py version after the 0.7.7
with the buffer overflow error.

Fragment metadata consolidation succeeds on my array if I downgrade to 0.7.7

I didn’t try to do any fragment consolidation appart from metadata so I’m not sure if other sm.consolidation* param have any impact.

I’d be happy to provide additional infromation on the issue if you need it. ( I think i loaded that array in tileDB cloud. I could provide access to you so you can replicate )

@tiphaineruy absolutely, if we get access to your array on TileDB Cloud we can definitely investigate (please contact me at stavros@tiledb.com with any further details).

Also what’s the TileDB version you created the array with?

Sorry, you already mentioned that the array was created with 2.1.6.

(Not sure if you’ve been in touch directly w/ Stavros already, but my username on TileDB Cloud is ihnorton if you can add me as well – happy to take a look)

No. I’m no longer able to reproduce that problem.

Hi @tiphaineruy,

I just emailed directly, but wanted to circle back on this thread too – fragment meta consolidation works in the test case with the just-released TileDB-Py 0.8.5 + TileDB 2.2.6.

Best,
Isaiah

Seems to be working ! Thanks for the fix.

2 Likes