Tile cache usage verification with Go API

I am currently working on a proof of concept API for raster weather data using the Go API for tiledb (v0.30.0) with TileDB core v2.24.0. I am attempting to benchmark dense tile reads with and without sm.tile_cache_size set and seeing virtually no difference in stats.

I dug through the available stats and do not see any timers or counters related to cache hits/misses, or obviously-similar stats. How can I confirm that the tile cache is indeed being used?

Fwiw, I have benchmarked with both local file storage and s3 storage, but have not seen improvements with sm.tile_cache_size set or unset in either scenario.

Here is a sample benchmark stats output for a local read test running on my Apple M3 MacBook Pro:

[
  {
    "timers": {
      "Context.StorageManager.subSubarray.read_load_relevant_rtrees.sum": 0.0122307,
      "Context.StorageManager.subSubarray.read_load_relevant_rtrees.avg": 3.24422e-06,
      "Context.StorageManager.subSubarray.read_compute_tile_overlap.sum": 0.364395,
      "Context.StorageManager.subSubarray.read_compute_tile_overlap.avg": 9.66564e-05,
      "Context.StorageManager.subSubarray.read_compute_relevant_tile_overlap.sum": 0.297617,
      "Context.StorageManager.subSubarray.read_compute_relevant_tile_overlap.avg": 7.89435e-05,
      "Context.StorageManager.subSubarray.compute_relevant_frags.sum": 0.0391469,
      "Context.StorageManager.subSubarray.compute_relevant_frags.avg": 1.03838e-05,
      "Context.StorageManager.sm_load_fragment_metadata.sum": 0.00187292,
      "Context.StorageManager.sm_load_fragment_metadata.avg": 9.85746e-05,
      "Context.StorageManager.sm_load_filtered_fragment_uris.sum": 0.000204122,
      "Context.StorageManager.sm_load_filtered_fragment_uris.avg": 1.07433e-05,
      "Context.StorageManager.sm_load_array_schemas_and_fragment_metadata.sum": 0.00441167,
      "Context.StorageManager.sm_load_array_schemas_and_fragment_metadata.avg": 0.000232193,
      "Context.StorageManager.sm_load_array_schema_from_uri.sum": 0.00221279,
      "Context.StorageManager.sm_load_array_schema_from_uri.avg": 0.000116463,
      "Context.StorageManager.sm_load_all_array_schemas.sum": 0.00227133,
      "Context.StorageManager.sm_load_all_array_schemas.avg": 0.000119544,
      "Context.StorageManager.array_reopen_directory.sum": 0.00207271,
      "Context.StorageManager.array_reopen_directory.avg": 0.00011515,
      "Context.StorageManager.array_reopen.sum": 0.00650867,
      "Context.StorageManager.array_reopen.avg": 0.000361593,
      "Context.StorageManager.array_open_read_load_schemas_and_fragment_meta.sum": 0.00442888,
      "Context.StorageManager.array_open_read_load_schemas_and_fragment_meta.avg": 0.000233099,
      "Context.StorageManager.array_open_read_load_directory.sum": 0.000104,
      "Context.StorageManager.array_open_read_load_directory.avg": 0.000104,
      "Context.StorageManager.array_open_READ.sum": 0.000322417,
      "Context.StorageManager.array_open_READ.avg": 0.000322417,
      "Context.StorageManager.VFS.ArrayDirectory.load_consolidated_commit_uris.sum": 1.7414e-05,
      "Context.StorageManager.VFS.ArrayDirectory.load_consolidated_commit_uris.avg": 9.16526e-07,
      "Context.StorageManager.VFS.ArrayDirectory.list_root_uris.sum": 0.00119288,
      "Context.StorageManager.VFS.ArrayDirectory.list_root_uris.avg": 6.2783e-05,
      "Context.StorageManager.VFS.ArrayDirectory.list_fragment_meta_uris.sum": 0.000146706,
      "Context.StorageManager.VFS.ArrayDirectory.list_fragment_meta_uris.avg": 7.72137e-06,
      "Context.StorageManager.VFS.ArrayDirectory.list_commit_uris.sum": 0.00150967,
      "Context.StorageManager.VFS.ArrayDirectory.list_commit_uris.avg": 7.94562e-05,
      "Context.StorageManager.VFS.ArrayDirectory.list_array_schema_uris.sum": 0.00164917,
      "Context.StorageManager.VFS.ArrayDirectory.list_array_schema_uris.avg": 8.67983e-05,
      "Context.StorageManager.VFS.ArrayDirectory.list_array_meta_uris.sum": 0.000158751,
      "Context.StorageManager.VFS.ArrayDirectory.list_array_meta_uris.avg": 8.35532e-06,
      "Context.StorageManager.Subarray.read_compute_tile_coords.sum": 0.00204402,
      "Context.StorageManager.Subarray.read_compute_tile_coords.avg": 5.42182e-07,
      "Context.StorageManager.Query.Reader.unfilter_attr_tiles.sum": 0.230854,
      "Context.StorageManager.Query.Reader.unfilter_attr_tiles.avg": 6.12345e-05,
      "Context.StorageManager.Query.Reader.read_tiles.sum": 0.239003,
      "Context.StorageManager.Query.Reader.read_tiles.avg": 6.33961e-05,
      "Context.StorageManager.Query.Reader.read_attribute_tiles.sum": 0.241062,
      "Context.StorageManager.Query.Reader.read_attribute_tiles.avg": 6.39422e-05,
      "Context.StorageManager.Query.Reader.load_tile_var_sizes.sum": 0.00663585,
      "Context.StorageManager.Query.Reader.load_tile_var_sizes.avg": 1.76017e-06,
      "Context.StorageManager.Query.Reader.load_tile_offsets.sum": 0.00780588,
      "Context.StorageManager.Query.Reader.load_tile_offsets.avg": 2.07053e-06,
      "Context.StorageManager.Query.Reader.init_state.sum": 0.202428,
      "Context.StorageManager.Query.Reader.init_state.avg": 5.36944e-05,
      "Context.StorageManager.Query.Reader.dowork.sum": 1.62782,
      "Context.StorageManager.Query.Reader.dowork.avg": 0.000431781,
      "Context.StorageManager.Query.Reader.copy_fixed_tiles.sum": 0.340567,
      "Context.StorageManager.Query.Reader.copy_fixed_tiles.avg": 9.03361e-05,
      "Context.StorageManager.Query.Reader.copy_attribute.sum": 0.343155,
      "Context.StorageManager.Query.Reader.copy_attribute.avg": 9.10226e-05,
      "Context.StorageManager.Query.Reader.apply_query_condition.sum": 7.6298e-05,
      "Context.StorageManager.Query.Reader.apply_query_condition.avg": 2.02382e-08,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.read_next_partition.sum": 0.647044,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.read_next_partition.avg": 0.00017163
    },
    "counters": {
      "Context.StorageManager.subSubarray.precompute_tile_overlap.tile_overlap_byte_size": 241280,
      "Context.StorageManager.subSubarray.precompute_tile_overlap.relevant_fragment_num": 3770,
      "Context.StorageManager.subSubarray.precompute_tile_overlap.ranges_requested": 3770,
      "Context.StorageManager.subSubarray.precompute_tile_overlap.ranges_computed": 3770,
      "Context.StorageManager.subSubarray.precompute_tile_overlap.fragment_num": 3770,
      "Context.StorageManager.subSubarray.add_range_dim_1": 3770,
      "Context.StorageManager.subSubarray.add_range_dim_0": 3770,
      "Context.StorageManager.read_unfiltered_byte_num": 5130,
      "Context.StorageManager.read_tile_offsets_size": 304,
      "Context.StorageManager.read_rtree_size": 152,
      "Context.StorageManager.read_frag_meta_size": 9386,
      "Context.StorageManager.read_array_schema_size": 4674,
      "Context.StorageManager.VFS.read_ops_num": 3979,
      "Context.StorageManager.VFS.read_byte_num": 988511005,
      "Context.StorageManager.VFS.file_size_num": 19,
      "Context.StorageManager.Query.Reader.tiles_unfiltered": 15080,
      "Context.StorageManager.Query.Reader.tiles_allocated": 3770,
      "Context.StorageManager.Query.Reader.num_tiles_read": 3770,
      "Context.StorageManager.Query.Reader.num_tiles": 3770,
      "Context.StorageManager.Query.Reader.loop_num": 3770,
      "Context.StorageManager.Query.Reader.internal_loop_num": 3770,
      "Context.StorageManager.Query.Reader.attr_num": 3770,
      "Context.StorageManager.Query.Reader.attr_fixed_num": 3770,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.compute_current_start_end.ranges": 3770,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.compute_current_start_end.found": 3770,
      "Context.StorageManager.Query.Reader.SubarrayPartitioner.compute_current_start_end.adjusted_ranges": 3770
    }
  }
]

Hello @weathernutt, the tile cache is only used with one of our older read algorithm, which in your current case is not used. This setting as well as the old read algorithm will be deprecated soon. May I ask why you are trying to use any kind of cache? Are you trying to optimize any performance issue with this particular query?

I’m not yet familiar with our Go API yet so please bear with me as I figure out what’s going on here exactly. Are you setting up the buffer to receive the data yourself, if so, what size are you specifying. Also, what is the size of the data you are getting back?

Best,
Luc

Hi @KiterLuc, thanks much for the info!

Here is some additional context:

  • We are testing storing dense array data for raster data at various levels of detail (LOD)
  • Each LOD currently has a different, fixed, number of 256x256 tiles (0 = 1 tile, 1 = 4 tiles, 2 = 16 tiles, x = 2^2x tiles)
  • We are storing 1 lod per array (since x/y dims of the array change per lod) - total number of dims is number of tiles * 256 * 256
  • We are using just 1 float32 attribute for now
  • The preliminary benchmark is a Go test benchmark run for 15-30s reading as many lod 0 tiles as possible (length of time up to 300s had an insignificant impact on the results).
  • We are gzipping the resulting tile data - initial pprof results suggest that may double the overall time it takes per tile
  • Preliminary test results were below expectations (but my expectations are ungrounded - maybe these numbers are good from TileDB’s perspective?):
Array Location Single Tile Read Rate (tiles/second)
Local Array from Macbook Pro (hold open array handle) 208
Local Array from Macbook Pro (open array each read) 150
S3 from AWS Container (hold open array handle) 28.5
S3 from AWS Container (open array each read) 5
S3 from Macbook Pro (hold open array handle) 8
S3 from Macbook Pro (open array each read) 0.6

As the results suggest, there are clearly more performant ways to use TileDB (e.g. hold open the array handle, as well as all of the suggestions here Performance Factors — TileDB 1.6.3 documentation).

If caching were in play, I would expect the preliminary results to be massively higher since the benchmark is pulling the same tile over and over again - I was confused by this and the TileDB documentation regarding tile caching.

I haven’t had time to review and apply the performance factors yet, so I expect there are, potentially significant, improvements to be made.

Also, since this use case is history, we generally expect a write once, read a lot model. We will make significant use of caching between our service and customers, but wanted to confirm how caching was playing into initial performance results.

Since we are always pulling a single tile (for now), the buffer is just data := make([]float32, array.TileWidth*array.TileHeight)

On the order of a couple 100 KBs per tile

I am new to TileDB, so any advice is very welcome - thank you much for your time!

Thanks for all the details @weathernutt! What triggered some of my questions in my previous reply was the following line:

      "Context.StorageManager.Query.Reader.loop_num": 3770,

If I understand correctly, in the test you ran, you did multiple reads (3770 to be exact) of the same tile to measure performance?

How did you build your LOD 0 array? IIUC you only made one write to this array with one 256*256 tile?

What do you expect you final product to do to read the data, also read only one tile at a time?

Also, just an update to one of my previous reply, the documentation you sent a link to is out of date. The latest documentation doesn’t mention the tile cache anymore as it has been removed. What version of TileDB are you using?

Looking at your initial statistics, I think there are potentially a few places where we can optimize but understanding exactly what you are going for might enable us to find a solution that doesn’t require code changes initially.

Hi @KiterLuc,

Let me start with this question - I don’t know! I appreciate your help, but we can probably stop the optimization discussion until we get a little farther prioritizing use cases. We may read single tiles, or single tiles across a large range of attribute dimensions, or entire LODs, or ??? Just starting out. I feel like you’ve answered my initial question, so I’ll plan to open a new topic when we get farther. For completeness sake, I’ve answered your other questions below. Thank you!

For the test results I shared, that is correct. I ran another K6 test, with roughly similar results (not exactly apples to apples), that queried tiles across a range of LODs, but always read a single 256x256 tile that should have lined up with cell boundaries. It’s possible, since I’m new to this, my tests aren’t doing what I think they are, but I think I’m close.

That’s right. Each LOD is a separate array and each is created with a single write. Model remains write once read a lot (historical database).

TileDB-Go v0.30.0
TileDB Core 2.24.0

(fwiw) I manually updated the out-of-date homebrew formula to get Core installed - I think it’s been working well. But, a real bummer that the homebrew formula is not maintained anymore :frowning:

Good to hear - and I’m not surprised. I’m hoping to wrap up an initial POC this/next week, then focus on prioritizing use cases to help drive the optimization discussion. Let me know if you have any more questions, glad to keep discussing as desired, but know I’ll probably be coming back to talk more about optimization later.

Thank you for all the information! I can easily build a reproduction for your initial benchmark locally and start a quick investigation of what can be optimized so that we can schedule the work soon. I’ll let you know if I see anything that doesn’t match your numbers.

Please let me know when you have a good definition of your use cases and I’ll be happy to help with some optimizations!

Best,
Luc

1 Like

@weathernutt FYI I spent some time to work on this today and opened the following PR:

You can see in one of the PR comments that a lot of the statistics improved drastically!

Hello, :smiling_face_with_three_hearts:

As per my knowledge, To confirm if the tile cache is being used, you might want to enable and review the TileDB stats for cache usage. You can set sm.stats to true in the TileDB configuration to get detailed statistics, including cache hits and misses. This should help you verify whether the sm.tile_cache_size setting is having an effect on your read performance.

I hope this will help you.

Regards