Tile cache usage verification with Go API

weathernutt · June 21, 2024, 6:41pm

I am currently working on a proof of concept API for raster weather data using the Go API for tiledb (v0.30.0) with TileDB core v2.24.0. I am attempting to benchmark dense tile reads with and without sm.tile_cache_size set and seeing virtually no difference in stats.

I dug through the available stats and do not see any timers or counters related to cache hits/misses, or obviously-similar stats. How can I confirm that the tile cache is indeed being used?

Fwiw, I have benchmarked with both local file storage and s3 storage, but have not seen improvements with sm.tile_cache_size set or unset in either scenario.

Here is a sample benchmark stats output for a local read test running on my Apple M3 MacBook Pro:

[
  {
    "timers": {
      "Context.StorageManager.subSubarray.read_load_relevant_rtrees.sum": 0.0122307,
      "Context.StorageManager.subSubarray.read_load_relevant_rtrees.avg": 3.24422e-06,
      "Context.StorageManager.subSubarray.read_compute_tile_overlap.sum": 0.364395,
      "Context.StorageManager.subSubarray.read_compute_tile_overlap.avg": 9.66564e-05,
      "Context.StorageManager.subSubarray.read_compute_relevant_tile_overlap.sum": 0.297617,
      "Context.StorageManager.subSubarray.read_compute_relevant_tile_overlap.avg": 7.89435e-05,
      "Context.StorageManager.subSubarray.compute_relevant_frags.sum": 0.0391469,
      "Context.StorageManager.subSubarray.compute_relevant_frags.avg": 1.03838e-05,
      "Context.StorageManager.sm_load_fragment_metadata.sum": 0.00187292,
      "Context.StorageManager.sm_load_fragment_metadata.avg": 9.85746e-05,
      "Context.StorageManager.sm_load_filtered_fragment_uris.sum": 0.000204122,
      "Context.StorageManager.sm_load_filtered_fragment_uris.avg": 1.07433e-05,
      "Context.StorageManager.sm_load_array_schemas_and_fragment_metadata.sum": 0.00441167,
      "Context.StorageManager.sm_load_array_schemas_and_fragment_metadata.avg": 0.000232193,
      "Context.StorageManager.sm_load_array_schema_from_uri.sum": 0.00221279,
      "Context.StorageManager.sm_load_array_schema_from_uri.avg": 0.000116463,
      "Context.StorageManager.sm_load_all_array_schemas.sum": 0.00227133,
      "Context.StorageManager.sm_load_all_array_schemas.avg": 0.000119544,
      "Context.StorageManager.array_reopen_directory.sum": 0.00207271,
      "Context.StorageManager.array_reopen_directory.avg": 0.00011515,
      "Context.StorageManager.array_reopen.sum": 0.00650867,
      "Context.StorageManager.array_reopen.avg": 0.000361593,
      "Context.StorageManager.array_open_read_load_schemas_and_fragment_meta.sum": 0.00442888,
      "Context.StorageManager.array_open_read_load_schemas_and_fragment_meta.avg": 0.000233099,
      "Context.StorageManager.array_open_read_load_directory.sum": 0.000104,
      "Context.StorageManager.array_open_read_load_directory.avg": 0.000104,
      "Context.StorageManager.array_open_READ.sum": 0.000322417,
      "Context.StorageManager.array_open_READ.avg": 0.000322417,
      "Context.StorageManager.VFS.ArrayDirectory.load_consolidated_commit_uris.sum": 1.7414e-05,
      "Context.StorageManager.VFS.ArrayDirectory.load_consolidated_commit_uris.avg": 9.16526e-07,
      "Context.StorageManager.VFS.ArrayDirectory.list_root_uris.sum": 0.00119288,
      "Context.StorageManager.VFS.ArrayDirectory.list_root_uris.avg": 6.2783e-05,
      "Context.StorageManager.VFS.ArrayDirectory.list_fragment_meta_uris.sum": 0.000146706,
      "Context.StorageManager.VFS.ArrayDirectory.list_fragment_meta_uris.avg": 7.72137e-06,
      "Context.StorageManager.VFS.ArrayDirectory.list_commit_uris.sum": 0.00150967,
      "Context.StorageManager.VFS.ArrayDirectory.list_commit_uris.avg": 7.94562e-05,
      "Context.StorageManager.VFS.ArrayDirectory.list_array_schema_uris.sum": 0.00164917,
      "Context.StorageManager.VFS.ArrayDirectory.list_array_schema_uris.avg": 8.67983e-05,
      "Context.StorageManager.VFS.ArrayDirectory.list_array_meta_uris.sum": 0.000158751,
      "Context.StorageManager.VFS.ArrayDirectory.list_array_meta_uris.avg": 8.35532e-06,
      "Context.StorageManager.Subarray.read_compute_tile_coords.sum": 0.00204402,
      "Context.StorageManager.Subarray.read_compute_tile_coords.avg": 5.42182e-07,
      "Context.StorageManager.Query.Reader.unfilter_attr_tiles.sum": 0.230854,
      "Context.StorageManager.Query.Reader.unfilter_attr_tiles.avg": 6.12345e-05,
      "Context.StorageManager.Query.Reader.read_tiles.sum": 0.239003,
      "Context.StorageManager.Query.Reader.read_tiles.avg": 6.33961e-05,
      "Context.StorageManager.Query.Reader.read_attribute_tiles.sum": 0.241062,
      "Context.StorageManager.Query.Reader.read_attribute_tiles.avg": 6.39422e-05,
      "Context.StorageManager.Query.Reader.load_tile_var_sizes.sum": 0.00663585,
      "Context.StorageManager.Query.Reader.load_tile_var_sizes.avg": 1.76017e-06,
      "Context.StorageManager.Query.Reader.load_tile_offsets.sum": 0.00780588,
      "Context.StorageManager.Query.Reader.load_tile_offsets.avg": 2.07053e-06,
      "Context.StorageManager.Query.Reader.init_state.sum": 0.202428,
      "Context.StorageManager.Query.Reader.init_state.avg": 5.36944e-05,
      "Context.StorageManager.Query.Reader.dowork.sum": 1.62782,
      "Context.StorageManager.Query.Reader.dowork.avg": 0.000431781,
      "Context.StorageManager.Query.Reader.copy_fixed_tiles.sum": 0.340567,
      "Context.StorageManager.Query.Reader.copy_fixed_tiles.avg": 9.03361e-05,
      "Context.StorageManager.Query.Reader.copy_attribute.sum": 0.343155,
      "Context.StorageManager.Query.Reader.copy_attribute.avg": 9.10226e-05,
      "Context.StorageManager.Query.Reader.apply_query_condition.sum": 7.6298e-05,
      "Context.StorageManager.Query.Reader.apply_query_condition.avg": 2.02382e-08,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.read_next_partition.sum": 0.647044,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.read_next_partition.avg": 0.00017163
    },
    "counters": {
      "Context.StorageManager.subSubarray.precompute_tile_overlap.tile_overlap_byte_size": 241280,
      "Context.StorageManager.subSubarray.precompute_tile_overlap.relevant_fragment_num": 3770,
      "Context.StorageManager.subSubarray.precompute_tile_overlap.ranges_requested": 3770,
      "Context.StorageManager.subSubarray.precompute_tile_overlap.ranges_computed": 3770,
      "Context.StorageManager.subSubarray.precompute_tile_overlap.fragment_num": 3770,
      "Context.StorageManager.subSubarray.add_range_dim_1": 3770,
      "Context.StorageManager.subSubarray.add_range_dim_0": 3770,
      "Context.StorageManager.read_unfiltered_byte_num": 5130,
      "Context.StorageManager.read_tile_offsets_size": 304,
      "Context.StorageManager.read_rtree_size": 152,
      "Context.StorageManager.read_frag_meta_size": 9386,
      "Context.StorageManager.read_array_schema_size": 4674,
      "Context.StorageManager.VFS.read_ops_num": 3979,
      "Context.StorageManager.VFS.read_byte_num": 988511005,
      "Context.StorageManager.VFS.file_size_num": 19,
      "Context.StorageManager.Query.Reader.tiles_unfiltered": 15080,
      "Context.StorageManager.Query.Reader.tiles_allocated": 3770,
      "Context.StorageManager.Query.Reader.num_tiles_read": 3770,
      "Context.StorageManager.Query.Reader.num_tiles": 3770,
      "Context.StorageManager.Query.Reader.loop_num": 3770,
      "Context.StorageManager.Query.Reader.internal_loop_num": 3770,
      "Context.StorageManager.Query.Reader.attr_num": 3770,
      "Context.StorageManager.Query.Reader.attr_fixed_num": 3770,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.compute_current_start_end.ranges": 3770,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.compute_current_start_end.found": 3770,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.compute_current_start_end.adjusted_ranges": 3770
    }
  }
]

KiterLuc · June 28, 2024, 11:29am

Hello @weathernutt, the tile cache is only used with one of our older read algorithm, which in your current case is not used. This setting as well as the old read algorithm will be deprecated soon. May I ask why you are trying to use any kind of cache? Are you trying to optimize any performance issue with this particular query?

I’m not yet familiar with our Go API yet so please bear with me as I figure out what’s going on here exactly. Are you setting up the buffer to receive the data yourself, if so, what size are you specifying. Also, what is the size of the data you are getting back?

Best,
Luc

weathernutt · June 28, 2024, 4:16pm

Hi @KiterLuc, thanks much for the info!

Here is some additional context:

We are testing storing dense array data for raster data at various levels of detail (LOD)
Each LOD currently has a different, fixed, number of 256x256 tiles (0 = 1 tile, 1 = 4 tiles, 2 = 16 tiles, x = 2^2x tiles)
We are storing 1 lod per array (since x/y dims of the array change per lod) - total number of dims is number of tiles * 256 * 256
We are using just 1 float32 attribute for now
The preliminary benchmark is a Go test benchmark run for 15-30s reading as many lod 0 tiles as possible (length of time up to 300s had an insignificant impact on the results).
We are gzipping the resulting tile data - initial pprof results suggest that may double the overall time it takes per tile
Preliminary test results were below expectations (but my expectations are ungrounded - maybe these numbers are good from TileDB’s perspective?):

Array Location	Single Tile Read Rate (tiles/second)
Local Array from Macbook Pro (hold open array handle)	208
Local Array from Macbook Pro (open array each read)	150
S3 from AWS Container (hold open array handle)	28.5
S3 from AWS Container (open array each read)	5
S3 from Macbook Pro (hold open array handle)	8
S3 from Macbook Pro (open array each read)	0.6

As the results suggest, there are clearly more performant ways to use TileDB (e.g. hold open the array handle, as well as all of the suggestions here Performance Factors — TileDB 1.6.3 documentation).

If caching were in play, I would expect the preliminary results to be massively higher since the benchmark is pulling the same tile over and over again - I was confused by this and the TileDB documentation regarding tile caching.

I haven’t had time to review and apply the performance factors yet, so I expect there are, potentially significant, improvements to be made.

Also, since this use case is history, we generally expect a write once, read a lot model. We will make significant use of caching between our service and customers, but wanted to confirm how caching was playing into initial performance results.

Since we are always pulling a single tile (for now), the buffer is just data := make([]float32, array.TileWidth*array.TileHeight)

On the order of a couple 100 KBs per tile

I am new to TileDB, so any advice is very welcome - thank you much for your time!

KiterLuc · July 2, 2024, 9:23am

Thanks for all the details @weathernutt! What triggered some of my questions in my previous reply was the following line:

      "Context.StorageManager.Query.Reader.loop_num": 3770,

If I understand correctly, in the test you ran, you did multiple reads (3770 to be exact) of the same tile to measure performance?

How did you build your LOD 0 array? IIUC you only made one write to this array with one 256*256 tile?

What do you expect you final product to do to read the data, also read only one tile at a time?

Also, just an update to one of my previous reply, the documentation you sent a link to is out of date. The latest documentation doesn’t mention the tile cache anymore as it has been removed. What version of TileDB are you using?

Looking at your initial statistics, I think there are potentially a few places where we can optimize but understanding exactly what you are going for might enable us to find a solution that doesn’t require code changes initially.

weathernutt · July 2, 2024, 2:06pm

Hi @KiterLuc,

Let me start with this question - I don’t know! I appreciate your help, but we can probably stop the optimization discussion until we get a little farther prioritizing use cases. We may read single tiles, or single tiles across a large range of attribute dimensions, or entire LODs, or ??? Just starting out. I feel like you’ve answered my initial question, so I’ll plan to open a new topic when we get farther. For completeness sake, I’ve answered your other questions below. Thank you!

For the test results I shared, that is correct. I ran another K6 test, with roughly similar results (not exactly apples to apples), that queried tiles across a range of LODs, but always read a single 256x256 tile that should have lined up with cell boundaries. It’s possible, since I’m new to this, my tests aren’t doing what I think they are, but I think I’m close.

That’s right. Each LOD is a separate array and each is created with a single write. Model remains write once read a lot (historical database).

TileDB-Go v0.30.0
TileDB Core 2.24.0

(fwiw) I manually updated the out-of-date homebrew formula to get Core installed - I think it’s been working well. But, a real bummer that the homebrew formula is not maintained anymore

Good to hear - and I’m not surprised. I’m hoping to wrap up an initial POC this/next week, then focus on prioritizing use cases to help drive the optimization discussion. Let me know if you have any more questions, glad to keep discussing as desired, but know I’ll probably be coming back to talk more about optimization later.

KiterLuc · July 3, 2024, 9:29am

Thank you for all the information! I can easily build a reproduction for your initial benchmark locally and start a quick investigation of what can be optimized so that we can schedule the work soon. I’ll let you know if I see anything that doesn’t match your numbers.

Please let me know when you have a good definition of your use cases and I’ll be happy to help with some optimizations!

Best,
Luc

KiterLuc · July 3, 2024, 3:59pm

@weathernutt FYI I spent some time to work on this today and opened the following PR:

github.com/TileDB-Inc/TileDB

Improve dense read performance for small reads.

TileDB-Inc:dev ← TileDB-Inc:lr/small-dense-read-performance-improvements/ch50178

opened 03:41PM - 03 Jul 24 UTC

KiterLuc

+100 -158

This improves a couples places that are easy gain when doing small dense reads. …Running a small read in a loop (256K tile), I found out a few places where we can easily cut read times. 2 of those places were nested parallel fors that have been replaced by a parallel for 2d. The other was storing a full configuration class, which includes a map of ~148 configuration settings in the subarray. The subarray only uses two settings and those are now saved as member of the class, removing the need to generate a full configuration class when a subarray is constructed. [sc-50178] --- TYPE: IMPROVEMENT DESC: Improve dense read performance for small reads.

You can see in one of the PR comments that a lot of the statistics improved drastically!

Topic		Replies	Views
Tiledb cache management	3	3321	March 6, 2023
How to speedup reading form a sparse TileDB array	3	646	February 21, 2023
How can we tune TileDB s3 bandwidth utilization on 3D dense array reding operation?	6	992	June 28, 2021
Optimizing the reads for sparse arrays	9	749	June 27, 2023
Cache clear programmatically	1	654	October 31, 2019

Tile cache usage verification with Go API

Related topics