How to speed up the reading from tiledb

Vladimir_Shatsky · September 29, 2020, 1:26pm

Hello, everyone!
I am using tiledb to store vectors of fixed length (there are several bases, actually, and the smallest vector length there is around 20.000 and the largest is about 200.000). My db is a 2-dimensional array. The data comes regularly and portioned (the number of vectors coming in each portion can vary from 1 to several hundred). When a certain portion comes, the indices for the vectors are generated automatically and orderly, and the process of uploading the data is actually rather fast. The point is that we would rather have fast reading properties of the database, even if it means that the uploading will have to slow down a little bit.
A typical query for the base may contain indices that come from very different chronological points, which can significantly slow down the reading. This is because the data cannot be read with one query. For example, if I want to get vectors with indices [1, 2, 3, 45, 46, 100], I have to make 3 separate queries: first, for [1, 2, 3], then for [45, 46], and finally, for [100], and then to unite them. I use python lib to make queries and the query looks like this:

with td.DenseArray(database_name, mode=‘r’) as arr:
datas = [[
arr.query(attrs=attrs_to_get, coords=True)[
sp[0]:sp[-1] + 1, s[0]:s[-1] + 1
] for s in slices]
for sp in space_slices]

Here, slices is a list of list of ordered groups of vector indices (for the given list of indices slices would look like this: [[1, 2, 3], [45, 46], [100]], and space_slices is the same for the other dimension.
As far as I am concerned, the data in tiledb is held in fragments, which are generated with each upload query. I use consolidation from time to time with rather extreme parameters:

{
“sm.consolidation.amplification”: 1000,
“sm.consolidation.step_size_ratio”: 0.000001,
“sm.consolidation.step_min_frags”: 2,
“sm.consolidation.buffer_size”: 1000000000
}

It used to help in the past, but now it seems like the consolidation hardly works anymore. And the more data comes, the slower read queries work.

Is there a way to fix this problem? The main purpose of our database is to store this data with quick access to it. Thanks in advance!

seth · October 1, 2020, 11:27am

Hello @Vladimir_Shatsky! I apologize for the delay in responding to you. In TileDB we do support “multi-range” queries like you are trying to do. TileDB internally has logic to performance these multi-range queries in a performance manner. In python this can be done with the multi_index function. For example:

A.query(attrs=attrs_to_get, coords=True).multi_index[ [ [1, 2, 3, 45, 46, 100 ] ]

The multi_index supports lists, slices and x:y syntax , more details can be found in our docs and in the python reference .

For the consolidation and performance issue, I have several questions so I can get a better understanding of your setup and we can nail down what the performance issue is.

What version TileDB / TileDB-Py are you using?
How many fragments do you have in the array before consolidation? What is your write pattern like?
To confirm, when you consolidate you are looking to merge into a single fragment and you are not interested in keeping older fragments for time traveling?
Have you run the vacuum stage of consolidation to delete old fragments? (note: vacuuming is not always needed, its used to purge out consolidated fragments to reduce disk space and remove history)
Where is the array stored? S3? Local filesystem?
Can you share the array schema?
Can you also share performance statistics of a read you consider slow.
Can you share an example query that you are trying to run that you find slow? Ideally, if possible, the one you run statistics with.

Vladimir_Shatsky · October 2, 2020, 4:11pm

Hello, @Seth!
Thank you for your answer, I will consider your advice! Here are the answers on the questions you asked.

My version of tiledb is 0.6.0
In the directory of an array there are 4276 directories called like this: __1600892167166_1600892167166_6932d4c4f53e4f0e87659124549e145c_5
I believe that these are fragments. And consolidation doesn’t make this number any smaller. The write pattern is the following:

for space_numbers in tiles_space:
for unit_numbers in tiles_units:
arr[space_numbers[0]:space_numbers[-1] + 1, unit_numbers[0]:unit_numbers[-1] + 1] = {‘est_counts’: expressions[‘est_counts’][unit_numbers].loc[space_numbers].values,
‘TPM’: expressions[‘TPM’][unit_numbers].loc[space_numbers].values}

‘est_counts’ and ‘TPM’ are attributes that I store. In this way the writing pattern is similar to the read pattern: the data is cut to fragments that have consecutive indices and these fragments are being uploaded separately.
3. That’s right, I hope to find a single fragment after consolidation and I don’t want to keep the old fragments.
4. No, I don’t run the vacuum stage.
5. The data is stored in local filesystem (in a mounted directory from a separate machine, to be exact).
6. The array schema is implemented in the following function:

TILE_SIZE = db_dict[‘measures’][‘horizontal_tile_size’]

def create_tiledb(name, unit_name, max_units, column_tile_size):

if not td.object_type(name) == 'array':
    max_runs_samples = np.iinfo(np.uint64).max - TILE_SIZE
    dom = td.Domain(td.Dim(name=unit_name, domain=(1, max_units),
                           tile=column_tile_size, dtype=np.uint64),
                    td.Dim(name='Object', domain=(1, max_runs_samples),
                           tile=TILE_SIZE, dtype=np.uint64))
    schema = td.ArraySchema(domain=dom, cell_order='col-major',
                            tile_order='col-major', sparse=False,
                            attrs=[td.Attr(name='est_counts', dtype=np.float64),
                                   td.Attr(name='TPM', dtype=np.float64)])
    td.DenseArray.create(name, schema)

class ExpressionBase():

def __init__(self):
    """
    Creates database if it is not created yet.
    """
    db_names = db_dict['database_names']
    measures = db_dict['measures']
    self.space_name = lambda s : 'Gene' if s == 'genes' else 'ENST'

    for units, space, subset in itertools.product(('runs', 'samples'), ('transcripts', 'genes'),
            ('coding', 'all')):
        try:
            db_name = db_names[units][space][subset]
        except KeyError:
            pass
        else:
            create_tiledb(db_name, self.space_name(space), measures['max_data'][space],
                    measures['vertical_tile_sizes'][space])

The corresponding db_dict is here:
{
“database_names”: {
“runs”: {
“transcripts”: {
“all”: “{expression_storage_path}/runs_transcripts”
},
“genes”: {
“coding”: “{expression_storage_path}/runs_genes_coding”,
“all”: “{expression_storage_path}/runs_genes_all”
}
},
“samples”: {
“transcripts”: {
“coding”: “{expression_storage_path}/samples_transcripts_coding”
},
“genes”: {
“coding”: “{expression_storage_path}/samples_genes_coding”
}
}
},
“measures”: {
“max_data”: {
“genes”: 1000000,
“transcripts”: 10000000
},
“horizontal_tile_size”: 5,
“vertical_tile_sizes”: {
“genes”: 1500,
“transcripts”: 7000
}
},
“consolidation”: {
“sm.consolidation.amplification”: 1000,
“sm.consolidation.step_size_ratio”: 0.000001,
“sm.consolidation.step_min_frags”: 2,
“sm.consolidation.buffer_size”: 1000000000
}
}
7. Here are the stats for the query in 8:
==== READ ====

Number of read queries: 1
Number of attempts until results are found: 1
Number of attributes read: 2
- Number of fixed-sized attributes read: 2
Number of dimensions read: 2
- Number of fixed-sized dimensions read: 2
Number of logical tiles overlapping the query: 174
Number of physical tiles read: 696
- Number of physical fixed-sized tiles read: 696
Number of cells read: 6090000
Number of result cells: 4729056
Precentage of usefull cells read: 77.6528%
Number of bytes read: 97841399 bytes (0.0911219 GB)
Number of read operations: 1670
Number of bytes unfiltered: 97442996 bytes (0.0907509 GB)
Unfiltering inflation factor: 0.995928x
Time to compute estimated result size: 0.0312368 secs
- Time to compute tile overlap: 0.0304857 secs
  
  Time to compute relevant fragments: 0.00147552 secs
  Time to load relevant fragment R-trees: 0.0286785 secs
  Time to compute relevant fragment tile overlap: 0.000135883 secs
Time to open array: 19.5175 secs
- Time to load array schema: 0.0428444 secs
- Time to load consolidated fragment metadata: 1.458e-06 secs
- Time to load fragment metadata: 18.3245 secs
Total metadata read: 378766 bytes (0.000352753 GB)
- Array schema: 175 bytes (1.62981e-07 GB)
- Fragment metadata: 375770 bytes (0.000349963 GB)
- R-tree: 21 bytes (1.95578e-08 GB)
- Fixed-sized tile offsets: 2800 bytes (2.6077e-06 GB)
Time to initialize the read state: 0.000223201 secs
Read time: 3.14742 secs
- Time to compute next partition: 0.000801199 secs
- Time to compute tile coordinates: 9.7209e-05 secs
- Time to compute result coordinates: 6.3288e-05 secs
  
  Time to compute sparse result tiles: 6.1234e-05 secs
- Time to compute dense result cell slabs: 1.38742 secs
- Time to copy result attribute values: 0.801412 secs
  
  Time to read attribute tiles: 0.164169 secs
  Time to unfilter attribute tiles: 0.355877 secs
  Time to copy fixed-sized attribute values: 0.280759 secs
- Time to fill dense coordinates: 0.951867 secs
Total read query time (array open + init state + read): 3.14764 secs

The read query is the following:

with td.DenseArray(database_name, mode=‘r’) as arr:
datas = [[ arr.query(attrs=attrs_to_get, coords=True)[ sp[0]:sp[-1] + 1, s[0]:s[-1] + 1 ] for s in slices] for sp in space_slices]

slices: [[66775,
66776,
66777,
66778,
66779,
66780,
66781,
66782,
66783,
66784,
66785,
66786,
66787,
66788,
66789,
66790,
66791,
66792,
66793,
66794,
66795,
66796,
66797,
66798]]
space_slices: [[1, 2, 3, …, 197044]]
This is not the case where we read many different fragments consequences of indices, but the performance is rather slow even here. In most cases the time can be much bigger.

seth · October 3, 2020, 11:53pm

@Vladimir_Shatsky thank you for all the information. Based on the performance statistics it seems like opening the array is definitely a slow, with the time to load the fragment metadata the bulk of the time.

There are two ways to improve this, first is with consolidation we have a mode to consolidate the fragment metadata into a single file. When you have a large number of fragments like in your array, this can have a large impact. To consolidate the fragment metadata you can run consolidate with the fragment_meta mode set:

tiledb.consolidate(array_uri, config=tiledb.Config({'sm.consolidation.mode': 'fragment_meta'}))

The second thing, is if you do not care about time traveling (as you indicated) is to run the vacuum step. The vacuum step will physically delete any fragments which have been marked as consolidated (by the presence of the .vac file). This will reduce the fragment count as you expected when you ran consolidation. In TileDB 2.0 the consolidation method was split into two parts, the consolidation and the vacuum. It was split because the vacuum stage forces you to loose history, which is not always desirable.

When vacuuming you also have to take care with read queries to avoid trying to time travel during the time vacuuming is running. Since the vacuum deletes fragments, you have to make sure no users is accessing them during the deletion, or the user might receive error messages. There will never be corrupt or invalid results, just potentially errors for the fragments being delayed when a user is reading from them. Please note that this only effects if a user is attempting to time travel to read the past fragments. Normal opening and reading of the array ignores the fragments that have been consolidated so it will not run into any issues with concurrent reads and vacuuming.

To vacuum:

# Vacuum fragments
tiledb.vacuum(array_uri, config=tiledb.Config({'sm.vacuum.mode': 'fragments'}))

# Vacuum fragment metadata
tiledb.vacuum(array_uri, config=tiledb.Config({'sm.vacuum.mode': 'fragment_meta'}))

I recommend you first try to run the fragment metadata consolidation. You should see improvements right away from this, and it should be a quick operation. Afterwards you can then run the vacuum to remove all older fragment like you were expecting. It is not required to do the fragment_meta consolidation first, I just recommend you try it to see the effects since it is potentially a much faster operation than full consolidation so you might run it more often as you are writing data.

One last note, is if possible you should upgrade your TileDB python version to 0.6.6. We released several improvements since 0.6.0 including performance improvements over the initial TileDB 2.0 release.

Vladimir_Shatsky · October 5, 2020, 11:55am

@Seth thank you for such a detailed answer!

Vladimir_Shatsky · October 8, 2020, 9:02am

Hello, @Seth!
I have tried to run the consolidation and vacuuming and seems like it helped a little, but the number of fragments is still rather high. Is there a way to configure consolidation to be sure that the number of fragments would be close to 1? What parameters would you recommend?

Topic		Replies	Views
Tiledb performance with sparse point cloud data	8	947	March 23, 2023
Write confirmation (Question)	16	1994	March 9, 2021
Optimizing the reads for sparse arrays	9	765	June 27, 2023
How can we boost write/read performance in our scenario?	1	1036	June 10, 2021
Slow AWS Data Slicing	5	1039	June 12, 2020

How to speed up the reading from tiledb

Related topics