Seeking Advice to Optimize Writing Speed for Large Int16_t Arrays

juhwan · April 21, 2024, 4:23pm

Hello TileDB Community,

I’m working on a project where I need to write two dense arrays with dimensions 19995 x 9216 of Int16_t type within < 0.07 seconds . Unfortunately, my current performance is not meeting this target. Below are the details of my setup and attempts:

Arrays : I have two 1D arrays (data1 and data2), each with a length of 184273920 elements (equivalent to 19995 x 9216).
Frames : I process a total of 8 frames; after processing each frame, I receive new data1 and data2 data.
Dimensions : I’ve set up the dimensions as follows:
- const auto dimFrame = tiledb::Dimension::create<int32_t>(ctx, “frame”, {1, totalFrames}, 1);
- const auto dimHeight = tiledb::Dimension::create<int32_t>(ctx, “height”, {1, height}, height);
- const auto dimWidth = tiledb::Dimension::create<int32_t>(ctx, “width”, {1, width}, width);
Attributes : I’ve set up the array schema as follows:
- tiledb::ArraySchema schema(ctx, TILEDB_DENSE);
- schema.set_domain(domain)
- .set_cell_order(TILEDB_ROW_MAJOR)
- .set_tile_order(TILEDB_ROW_MAJOR)
- .add_attribute(tiledb::Attribute::create<int16_t>(ctx, “data1”))
- .add_attribute(tiledb::Attribute::create<int16_t>(ctx, “data2”));
Write Frame: I’ve set up write frame as follows:
- const tiledb::Context ctx;
- tiledb::Array array(ctx, arrayName, TILEDB_WRITE);
- tiledb::Query query(ctx, array);
- query.set_layout(TILEDB_ROW_MAJOR);
- query.set_subarray({frameNumber, frameNumber, 1, height, 1, width});
- query.set_buffer(“data1”, const_cast<int16_t*>(data1), width * height);
- query.set_buffer(“data2”, const_cast<int16_t*>(data2), width * height);
- query.submit();
- array.close();

Here are the methods I’ve tried and their results:

Per Frame Writing : Writing after processing each frame resulted in 0.9 – 1.1 seconds per frame.
Batch Writing : Writing all 8 frames at once took about 13 – 15 seconds.
Parallel Batch Writing : Writing in batch parallel (8 frames, 8 threads) took about 3 – 5 seconds.
Tile Sizes : Experimenting with different tile sizes showed that 465 x 512 offers the best performance, but it’s still not sufficient.
Configuration Tweaks : Adjusting the config parameters (sm.num_threads, sm.io_concurrency_level) had no significant impact on writing speed.
Compression Filter : Applying a compression filter before writing slowed down the process even more.
Async Writing : Attempting asynchronous writing (query.submit_async()) resulted in an excessively long completion time for even a single frame.

It appears that data write speed is too slow for my application. Any suggestions on how to improve the write speed would be most welcome.

Thank you in advance for your help!

ihnorton · April 21, 2024, 4:36pm

Hi @juhwan,

I believe the target throughput here is ~5GB/s? We can look at some optimization possibilities, but I have a few questions to clarify:

is this target a sustained rate? for how long?
can you buffer writes – if so, how much?
is the target hardware capable of this rate (ie testing raw parallelized writes or copies)?

(if you want to discuss directly, please email help@tiledb.com and reference this thread)

Best,
Isaiah

juhwan · April 22, 2024, 11:37pm

Thank you for looking into the question. I will take a look and come up with more data and answers to your questions.

Topic		Replies	Views
A few questions regarding efficient writes	17	2562	August 20, 2020
Usage help -- disk space, parallel writes	5	846	June 21, 2022
Dense or sparse array for incrementally generated dense data	6	1069	October 21, 2021
Reads are suffering badly	4	1245	June 28, 2019
How can we boost write/read performance in our scenario?	1	1036	June 10, 2021

Seeking Advice to Optimize Writing Speed for Large Int16_t Arrays

Related topics