DenseReader: Cannot process a single tile, increase memory budget

kunaal_desai · June 13, 2024, 9:35pm

I have C# code using which I am reading the tiledb. I am trying to read total of 4480000 tiles, each containing 2 floats. Which from my rough math looks like around 32MBs of data. But when I try to read this tile I get following error:

“DenseReader: Cannot process a single tile, increase memory budget”

Following is my current code:

public List<float> ReadArray(string arrayName, List<int> slice, string attribute)
	{
		Console.WriteLine($"Reading from the slice has started. Started at {DateTime.Now}. Slice: {slice[0]}, {slice[1]}, {slice[2]}, {slice[3]}");
		var config = new Config();
		config.Set("sm.memory_budget", (32L * 1024 * 1024 * 1024).ToString());  // Increase total memory budget
		config.Set("sm.tile_cache_size", (32L * 1024 * 1024 * 1024).ToString());  // Increase tile cache size
		config.Set("sm.memory_budget_var", (32L * 1024 * 1024 * 1024).ToString());
		using var ctx = new Context(config);
		using var array = new Array(ctx, arrayName);
		array.Open(QueryType.Read);

		using var query = new Query(ctx, array);
		using Subarray subArray = new Subarray(array);

		int dataSize = CalculateSizeOfData(slice);
		Console.WriteLine("dataSize is " + dataSize);
		var readData = new float[dataSize];

		subArray.SetSubarray(slice[0], slice[1], slice[2], slice[3]);

		query.SetLayout(LayoutType.RowMajor);
		query.SetDataBuffer(attribute, readData);
		query.SetSubarray(subArray);

		query.Submit();
		array.Close();

		Console.WriteLine($"Reading from the slice has completed. Completed at {DateTime.Now} Slice: {slice[0]}, {slice[1]}, {slice[2]}, {slice[3]}");
		return new List<float>(readData);
	}

Now, please note that I have not had any custom config code before but following the error, I added the code to increase the budget but it does not seem to work

I also wanted to add that when I this was a C++ code, it was able to handle far bigger datasets than this and also, I had never seen this error before with C++ code. This is something that we started seeing when we moved to C#.

teo-tsirpanis · June 13, 2024, 10:37pm

Hello @kunaal_desai, you should be using the sm.mem.total_budget config option to set the memory budget.

Could you tell us more details about your array and its schema? The default memory budget is 10GB; it is unusual that reading 32MB of data would exhaust the budget.

Best,
Theodore

kunaal_desai · June 13, 2024, 11:10pm

The array config code is below:

public void CreateArray(string uri, ArrayDimension rowDimension, ArrayDimension columnDimension)
	{
		try
		{
			using var ctx = new Context();
			var domain = new Domain(ctx);
			domain.AddDimensions(
				Dimension.Create(ctx, "rows", rowDimension.Start, rowDimension.End, rowDimension.TileExtent),
				Dimension.Create(ctx, "cols", columnDimension.Start, columnDimension.End, columnDimension.TileExtent)
			);

			// The array will be dense.
			var schema = new ArraySchema(ctx, ArrayType.Dense);
			schema.SetDomain(domain);

			var xNorm = new TileDB.CSharp.Attribute(ctx, "x_norm", DataType.Float32);
			var yNorm = new TileDB.CSharp.Attribute(ctx, "y_norm", DataType.Float32);

			schema.AddAttributes(xNorm, yNorm);
			Array.Create(ctx, uri, schema);
		}
		catch (Exception e)
		{
			Console.WriteLine(e);
			throw;
		}
	}

Two float attributes. and the chunk that I am reading has 440k tiles

kunaal_desai · June 13, 2024, 11:15pm

Also thanks for pointing me to correct config value. Updating it to 15Gb worked. But I do notice some slow down not sure if it is related to the memory usage problem

teo-tsirpanis · June 13, 2024, 11:15pm

What are the tile extent values? There is a chance that they are too low.

kunaal_desai · June 13, 2024, 11:17pm

row tile extent.= 66
Column tile extent= 600k

kunaal_desai · June 14, 2024, 8:27pm

The other thing is that most of the things are same between this code and the previous version of this code which was a C++ code but what I have noticed is that C++ code was able to handle much much larger load than this without any issues.

teo-tsirpanis · June 17, 2024, 9:42am

Hello, in order to be able to further investigate the problem we need to know some additional information:

What version of TileDB are you using? You can get this by calling the CoreUtil.GetCoreLibVersion() method.
- And what version were you using while using C++?
What slices of the array were you querying for?
Can you share the query’s stats? You can get them by calling query.Stats() after the call to query.Submit().

We would also need a dump of your array’s schema and fragment info. You can obtain them by running the following C++ code:

std::string array_name = "my_array";
Context ctx;

FragmentInfo fragment_info(ctx, array_name);
fragment_info.load();
std::cout << "Fragment info:" << std::endl;
fragment_info.dump();

std::cout << "Array schema:" << std::endl;
ArraySchema schema(ctx, array_name);
schema.dump();

Best,
Theodore

kunaal_desai · June 20, 2024, 11:24pm

My current code is C#. Is there any equivelant information on C#?

And it uses following libraries:

<PackageReference Include="TileDB.CSharp" Version="5.10.0" />
		<PackageReference Include="TileDB.Native" Version="2.20.1" />

teo-tsirpanis · June 21, 2024, 9:05am

Unfortunately the dump functions are not currently available in the C# API due to limitations of the underlying native C API. You can create a separate C++ program that dumps the array you created from C#.

You are also using an old version of TileDB. The latest versions are TileDB.CSharp 5.14.0, and TileDB.Native 2.24.1. Can you update your program to these versions and try again reproducing the issue?

Thanks.

KiterLuc · June 21, 2024, 9:58am

Hello @kunaal_desai. Alternatively, the following python code can give us the information we need to create a reproduction:

import tiledb
import argparse, sys, tempfile

def list_fragments(uri):
    fragments_info = tiledb.array_fragments(uri)
    
    print("====== FRAGMENTS  INFO ======")
    print("array uri: {}".format(fragments_info.array_uri))
    print("number of fragments: {}".format(len(fragments_info)))
    
    to_vac = fragments_info.to_vacuum
    print("number of consolidated fragments to vacuum: {}".format(len(to_vac)))
    print("uris of consolidated fragments to vacuum: {}".format(to_vac))
    
    print(fragments_info.nonempty_domain)
    print(fragments_info.sparse)
    
    for fragment in fragments_info:
        print()
        print("===== FRAGMENT NUMBER {} =====".format(fragment.num))
        print("fragment uri: {}".format(fragment.uri))
        print("is sparse: {}".format(fragment.sparse))
        print("cell num: {}".format(fragment.cell_num))
        print("has consolidated metadata: {}".format(fragment.has_consolidated_metadata))
        print("nonempty domain: {}".format(fragment.nonempty_domain))
        print("timestamp range: {}".format(fragment.timestamp_range))
        print(
            "number of unconsolidated metadata: {}".format(
                fragment.unconsolidated_metadata_num
            )
        )
        print("version: {}".format(fragment.version))

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--uri", type=str, help="Array URI to list fragments"
    )

    args = parser.parse_args()
    with tiledb.open(args.uri, 'r') as A:
        A.schema.dump()

    list_fragments(args.uri)

if __name__ == "__main__":
    main()

If you are not able to run this script either, would it be possible to get access to the array that is having this issue?

Thanks,
Luc

kunaal_desai · June 25, 2024, 5:09am

@KiterLuc : Dropbox
.

kunaal_desai · June 26, 2024, 4:38pm

Any news on this? I can try to have access to tile-db file set if this is not enough information

teo-tsirpanis · June 26, 2024, 5:25pm

The dumps you already provided are most likely sufficient. We will investigate the issue and come back to you when we have results.

Best,
Theodore

KiterLuc · June 28, 2024, 10:47am

Thanks for providing the dumps @kunaal_desai. As @teo-tsirpanis said, they should be sufficient to get the team to build an array to reproduce your issue. If you can give us access to the data though, it would make it much easier for us to investigate and would help to expedite the investigation. Let us know. Also note that we are currently working on a new release that might address your issue coming out in about a week. We’ll also let you know when it’s ready so you can try it with your array.

Best,
Luc

KiterLuc · July 1, 2024, 12:47pm

@kunaal_desai Actually we won’t need access to your data. I see the issue from just looking at your schema and fragments and there probably are a few easy tweaks we can make to fix everything for you. First a little more information about what the problem is. From your schema, I see that you have two dimensions and that the tile extent covers the whole domain, so you really have one large tile for the whole dataset. This is not ideal as TileDB always stores full tiles for each write in a dense array. This might be improved in the future but is done this way for now to simplify the read algorithms as well as making it perform better. So, in your case, for each write, each attribute will have a in memory tile size of ~280MB. This will likely compressed well on disk but in memory it will end up taking a lot of space as you have 114 tiles to bring into memory. Since the read algorithm tries to load everything it needs to process one tile at a minimum, it’s probably trying to load all your fragments in memory at once but doesn’t have enough memory to decompress everything.

Now, let’s chat about a solution! Looking at your fragments, I notice that you store one full column per write, so for now, changing the tile extent for your rows dimension to 1 will greatly improve everything as you’ll now only store a little over 2MB tiles for each writes and bring roughly 280MB of uncompressed data to do your read. Are you always going to write the data for this array one full row at a time? Also, may I ask how you plan to read this data? Depending on the slices you plan to access, I can recommend a better tile extent for the column dimension. There is also some things we can do with consolidation to improve the read access performance but I need to have more information about your write and read patterns.

kunaal_desai · July 9, 2024, 5:24am

@KiterLuc, thank you so much for your response. I have tried couple of combinations. here are my observations:

Dimension: 1X1 was for some reason very slow when writing.
Dimension: NumSamples/100 X NumOfLoci/100: pretty good read and write, plus the issue with the memory is also resolved.

I think since the original issue is resolved, I am curious about one thing. This code is c# code that we converted from original C++/PInvoke layer. But then when we converted the same code to C#, the same dimensions are having issue. I was expecting the same read/write capacity across the languages since they are simply access mechanism and the data layout should drive the performance.

Neverthless, thank you so much for your help. Your suggestions were pretty helpful.

teo-tsirpanis · July 9, 2024, 11:13am

Great news @kunaal_desai! Could you tell me more about your issues with the C# API? What do you mean by the read/write capacity not being the same?

KiterLuc · July 9, 2024, 1:50pm

@kunaal_desai There should be no difference between C++ and C#. Possibly the array you had when you were using C++ had less fragments written?

kunaal_desai · July 9, 2024, 11:22pm

@KiterLuc, @teo-tsirpanis there is some difference. I was inspecting the C++ file and the footprint of that file is in alignment with what I would expect it to take to store certain amount of floats + some meta-data.

However, this particular file is storing approx 136,800,000 floats, I am guessing that it should not take more than 500MB + some metadata but this file is taking almost 68GB. I think the original problem with the memory that we ran into is related to that because now it is having to load more data into memory. Even the fragments files are pretty large. is it possible to get some info from the dump file OR you need the tile it self to inspect it?

Just for your reference, I have uploaded the full tile-db which has footprint issues(114 fragments):

Also, attaching the dump file with smaller footprint. The attached dumpfile is is from a tile-db that was generated using C++ and also has 5k fragments as oppose to 114 fragemnts that the current one has.

Topic		Replies	Views
How can we tune TileDB s3 bandwidth utilization on 3D dense array reding operation?	6	1001	June 28, 2021
R: Optimising Read Performance - Range within a dimension	8	797	April 8, 2020
Slow read performance for local and S3 2D sparse array	6	1003	April 7, 2022
Tiledb performance with sparse point cloud data	8	953	March 23, 2023
Not entry point found tiledb_query_get_field C#	12	231	March 12, 2024

DenseReader: Cannot process a single tile, increase memory budget

Related topics