How to speed up the reading from tiledb

Hello, everyone!
I am using tiledb to store vectors of fixed length (there are several bases, actually, and the smallest vector length there is around 20.000 and the largest is about 200.000). My db is a 2-dimensional array. The data comes regularly and portioned (the number of vectors coming in each portion can vary from 1 to several hundred). When a certain portion comes, the indices for the vectors are generated automatically and orderly, and the process of uploading the data is actually rather fast. The point is that we would rather have fast reading properties of the database, even if it means that the uploading will have to slow down a little bit.
A typical query for the base may contain indices that come from very different chronological points, which can significantly slow down the reading. This is because the data cannot be read with one query. For example, if I want to get vectors with indices [1, 2, 3, 45, 46, 100], I have to make 3 separate queries: first, for [1, 2, 3], then for [45, 46], and finally, for [100], and then to unite them. I use python lib to make queries and the query looks like this:

with td.DenseArray(database_name, mode=ā€˜rā€™) as arr:
datas = [[
arr.query(attrs=attrs_to_get, coords=True)[
sp[0]:sp[-1] + 1, s[0]:s[-1] + 1
] for s in slices]
for sp in space_slices]

Here, slices is a list of list of ordered groups of vector indices (for the given list of indices slices would look like this: [[1, 2, 3], [45, 46], [100]], and space_slices is the same for the other dimension.
As far as I am concerned, the data in tiledb is held in fragments, which are generated with each upload query. I use consolidation from time to time with rather extreme parameters:

{
ā€œsm.consolidation.amplificationā€: 1000,
ā€œsm.consolidation.step_size_ratioā€: 0.000001,
ā€œsm.consolidation.step_min_fragsā€: 2,
ā€œsm.consolidation.buffer_sizeā€: 1000000000
}

It used to help in the past, but now it seems like the consolidation hardly works anymore. And the more data comes, the slower read queries work.

Is there a way to fix this problem? The main purpose of our database is to store this data with quick access to it. Thanks in advance!

1 Like

Hello @Vladimir_Shatsky! I apologize for the delay in responding to you. In TileDB we do support ā€œmulti-rangeā€ queries like you are trying to do. TileDB internally has logic to performance these multi-range queries in a performance manner. In python this can be done with the multi_index function. For example:

A.query(attrs=attrs_to_get, coords=True).multi_index[ [ [1, 2, 3, 45, 46, 100 ] ]

The multi_index supports lists, slices and x:y syntax , more details can be found in our docs and in the python reference .

For the consolidation and performance issue, I have several questions so I can get a better understanding of your setup and we can nail down what the performance issue is.

  1. What version TileDB / TileDB-Py are you using?
  2. How many fragments do you have in the array before consolidation? What is your write pattern like?
  3. To confirm, when you consolidate you are looking to merge into a single fragment and you are not interested in keeping older fragments for time traveling?
  4. Have you run the vacuum stage of consolidation to delete old fragments? (note: vacuuming is not always needed, its used to purge out consolidated fragments to reduce disk space and remove history)
  5. Where is the array stored? S3? Local filesystem?
  6. Can you share the array schema?
  7. Can you also share performance statistics of a read you consider slow.
  8. Can you share an example query that you are trying to run that you find slow? Ideally, if possible, the one you run statistics with.
1 Like

Hello, @Seth!
Thank you for your answer, I will consider your advice! Here are the answers on the questions you asked.

  1. My version of tiledb is 0.6.0
  2. In the directory of an array there are 4276 directories called like this: __1600892167166_1600892167166_6932d4c4f53e4f0e87659124549e145c_5
    I believe that these are fragments. And consolidation doesnā€™t make this number any smaller. The write pattern is the following:

for space_numbers in tiles_space:
for unit_numbers in tiles_units:
arr[space_numbers[0]:space_numbers[-1] + 1, unit_numbers[0]:unit_numbers[-1] + 1] = {ā€˜est_countsā€™: expressions[ā€˜est_countsā€™][unit_numbers].loc[space_numbers].values,
ā€˜TPMā€™: expressions[ā€˜TPMā€™][unit_numbers].loc[space_numbers].values}

ā€˜est_countsā€™ and ā€˜TPMā€™ are attributes that I store. In this way the writing pattern is similar to the read pattern: the data is cut to fragments that have consecutive indices and these fragments are being uploaded separately.
3. Thatā€™s right, I hope to find a single fragment after consolidation and I donā€™t want to keep the old fragments.
4. No, I donā€™t run the vacuum stage.
5. The data is stored in local filesystem (in a mounted directory from a separate machine, to be exact).
6. The array schema is implemented in the following function:

TILE_SIZE = db_dict[ā€˜measuresā€™][ā€˜horizontal_tile_sizeā€™]

def create_tiledb(name, unit_name, max_units, column_tile_size):

if not td.object_type(name) == 'array':
    max_runs_samples = np.iinfo(np.uint64).max - TILE_SIZE
    dom = td.Domain(td.Dim(name=unit_name, domain=(1, max_units),
                           tile=column_tile_size, dtype=np.uint64),
                    td.Dim(name='Object', domain=(1, max_runs_samples),
                           tile=TILE_SIZE, dtype=np.uint64))
    schema = td.ArraySchema(domain=dom, cell_order='col-major',
                            tile_order='col-major', sparse=False,
                            attrs=[td.Attr(name='est_counts', dtype=np.float64),
                                   td.Attr(name='TPM', dtype=np.float64)])
    td.DenseArray.create(name, schema)

class ExpressionBase():

def __init__(self):
    """
    Creates database if it is not created yet.
    """
    db_names = db_dict['database_names']
    measures = db_dict['measures']
    self.space_name = lambda s : 'Gene' if s == 'genes' else 'ENST'

    for units, space, subset in itertools.product(('runs', 'samples'), ('transcripts', 'genes'),
            ('coding', 'all')):
        try:
            db_name = db_names[units][space][subset]
        except KeyError:
            pass
        else:
            create_tiledb(db_name, self.space_name(space), measures['max_data'][space],
                    measures['vertical_tile_sizes'][space])

The corresponding db_dict is here:
{
ā€œdatabase_namesā€: {
ā€œrunsā€: {
ā€œtranscriptsā€: {
ā€œallā€: ā€œ{expression_storage_path}/runs_transcriptsā€
},
ā€œgenesā€: {
ā€œcodingā€: ā€œ{expression_storage_path}/runs_genes_codingā€,
ā€œallā€: ā€œ{expression_storage_path}/runs_genes_allā€
}
},
ā€œsamplesā€: {
ā€œtranscriptsā€: {
ā€œcodingā€: ā€œ{expression_storage_path}/samples_transcripts_codingā€
},
ā€œgenesā€: {
ā€œcodingā€: ā€œ{expression_storage_path}/samples_genes_codingā€
}
}
},
ā€œmeasuresā€: {
ā€œmax_dataā€: {
ā€œgenesā€: 1000000,
ā€œtranscriptsā€: 10000000
},
ā€œhorizontal_tile_sizeā€: 5,
ā€œvertical_tile_sizesā€: {
ā€œgenesā€: 1500,
ā€œtranscriptsā€: 7000
}
},
ā€œconsolidationā€: {
ā€œsm.consolidation.amplificationā€: 1000,
ā€œsm.consolidation.step_size_ratioā€: 0.000001,
ā€œsm.consolidation.step_min_fragsā€: 2,
ā€œsm.consolidation.buffer_sizeā€: 1000000000
}
}
7. Here are the stats for the query in 8:
==== READ ====

  • Number of read queries: 1

  • Number of attempts until results are found: 1

  • Number of attributes read: 2

    • Number of fixed-sized attributes read: 2
  • Number of dimensions read: 2

    • Number of fixed-sized dimensions read: 2
  • Number of logical tiles overlapping the query: 174

  • Number of physical tiles read: 696

    • Number of physical fixed-sized tiles read: 696
  • Number of cells read: 6090000

  • Number of result cells: 4729056

  • Precentage of usefull cells read: 77.6528%

  • Number of bytes read: 97841399 bytes (0.0911219 GB)

  • Number of read operations: 1670

  • Number of bytes unfiltered: 97442996 bytes (0.0907509 GB)

  • Unfiltering inflation factor: 0.995928x

  • Time to compute estimated result size: 0.0312368 secs

    • Time to compute tile overlap: 0.0304857 secs

      Time to compute relevant fragments: 0.00147552 secs
      Time to load relevant fragment R-trees: 0.0286785 secs
      Time to compute relevant fragment tile overlap: 0.000135883 secs

  • Time to open array: 19.5175 secs

    • Time to load array schema: 0.0428444 secs
    • Time to load consolidated fragment metadata: 1.458e-06 secs
    • Time to load fragment metadata: 18.3245 secs
  • Total metadata read: 378766 bytes (0.000352753 GB)

    • Array schema: 175 bytes (1.62981e-07 GB)
    • Fragment metadata: 375770 bytes (0.000349963 GB)
    • R-tree: 21 bytes (1.95578e-08 GB)
    • Fixed-sized tile offsets: 2800 bytes (2.6077e-06 GB)
  • Time to initialize the read state: 0.000223201 secs

  • Read time: 3.14742 secs

    • Time to compute next partition: 0.000801199 secs
    • Time to compute tile coordinates: 9.7209e-05 secs
    • Time to compute result coordinates: 6.3288e-05 secs

      Time to compute sparse result tiles: 6.1234e-05 secs

    • Time to compute dense result cell slabs: 1.38742 secs
    • Time to copy result attribute values: 0.801412 secs

      Time to read attribute tiles: 0.164169 secs
      Time to unfilter attribute tiles: 0.355877 secs
      Time to copy fixed-sized attribute values: 0.280759 secs

    • Time to fill dense coordinates: 0.951867 secs
  • Total read query time (array open + init state + read): 3.14764 secs

  1. The read query is the following:

with td.DenseArray(database_name, mode=ā€˜rā€™) as arr:
datas = [[ arr.query(attrs=attrs_to_get, coords=True)[ sp[0]:sp[-1] + 1, s[0]:s[-1] + 1 ] for s in slices] for sp in space_slices]

slices: [[66775,
66776,
66777,
66778,
66779,
66780,
66781,
66782,
66783,
66784,
66785,
66786,
66787,
66788,
66789,
66790,
66791,
66792,
66793,
66794,
66795,
66796,
66797,
66798]]
space_slices: [[1, 2, 3, ā€¦, 197044]]
This is not the case where we read many different fragments consequences of indices, but the performance is rather slow even here. In most cases the time can be much bigger.

@Vladimir_Shatsky thank you for all the information. Based on the performance statistics it seems like opening the array is definitely a slow, with the time to load the fragment metadata the bulk of the time.

There are two ways to improve this, first is with consolidation we have a mode to consolidate the fragment metadata into a single file. When you have a large number of fragments like in your array, this can have a large impact. To consolidate the fragment metadata you can run consolidate with the fragment_meta mode set:

tiledb.consolidate(array_uri, config=tiledb.Config({'sm.consolidation.mode': 'fragment_meta'}))

The second thing, is if you do not care about time traveling (as you indicated) is to run the vacuum step. The vacuum step will physically delete any fragments which have been marked as consolidated (by the presence of the .vac file). This will reduce the fragment count as you expected when you ran consolidation. In TileDB 2.0 the consolidation method was split into two parts, the consolidation and the vacuum. It was split because the vacuum stage forces you to loose history, which is not always desirable.

When vacuuming you also have to take care with read queries to avoid trying to time travel during the time vacuuming is running. Since the vacuum deletes fragments, you have to make sure no users is accessing them during the deletion, or the user might receive error messages. There will never be corrupt or invalid results, just potentially errors for the fragments being delayed when a user is reading from them. Please note that this only effects if a user is attempting to time travel to read the past fragments. Normal opening and reading of the array ignores the fragments that have been consolidated so it will not run into any issues with concurrent reads and vacuuming.

To vacuum:

# Vacuum fragments
tiledb.vacuum(array_uri, config=tiledb.Config({'sm.vacuum.mode': 'fragments'}))

# Vacuum fragment metadata
tiledb.vacuum(array_uri, config=tiledb.Config({'sm.vacuum.mode': 'fragment_meta'}))

I recommend you first try to run the fragment metadata consolidation. You should see improvements right away from this, and it should be a quick operation. Afterwards you can then run the vacuum to remove all older fragment like you were expecting. It is not required to do the fragment_meta consolidation first, I just recommend you try it to see the effects since it is potentially a much faster operation than full consolidation so you might run it more often as you are writing data.

One last note, is if possible you should upgrade your TileDB python version to 0.6.6. We released several improvements since 0.6.0 including performance improvements over the initial TileDB 2.0 release.

@Seth thank you for such a detailed answer!

Hello, @Seth!
I have tried to run the consolidation and vacuuming and seems like it helped a little, but the number of fragments is still rather high. Is there a way to configure consolidation to be sure that the number of fragments would be close to 1? What parameters would you recommend?