I am just going off original Tile-db C++ API example where they have code that prepares the array for reading before query is submitted.
Array array(ctx, array_name, TILEDB_WRITE);
Now, I have a considerably large dense array of two float attributes (5000 X 5M) and the size of the slice that I am reading is (5k X 100k). I am reading one attribute at a time
But the line above which is not even reading from the array is taking considerable good amount of time (15-20 seonds) everytime I call reader function on the array. Is this expected or I am doing something wrong?
Just to clarify, you meant to prepare the array for reading (with TILEDB_READ
, and not TILEDB_WRITE
), right?
The reason you are observing that is because that statement “opens” the array, loading all the fragment metadata. If your array is large, then the fragment metadata are expected to be fairly large as well, so it takes time to load and decompress them. Please note that this is an one-off cost: you should create the array object once, and then create multiple query objects passing that single array instance to avoid reloading the fragment metadata.
As another note, please note that we have just added an optimization on loading the fragment metadata. This is already in the dev
branch and scheduled for the 1.6 release over the next couple of days. Specifically, we now load the fragment metadata lazily during the reads (i.e., during the query submission).
Finally, we are planning to move to a full “out-of-core” fragment metadata implementation soon, where only a small fraction of relevant metadata is loaded upon a query, instead of all metadata as we do currently. So please expect this to be substantially improved over the next couple of months.
Yes it is being read in “TILEDB_READ” mode. But rest of your explanation makes sense. Thank you for your help on this. I am looking forward to newer version.