Hi I am building a backend API to serve my TileDB database, which has been built by a custom script. I want to use an API to abstract away the TileDB syntax from consumers.
We want to have a highly performant (speed-wise) database. In addition, we are interested in managing user access for specific cells/attributes (eg: users can’t access certain rows in our array if they are from x,y,z indices). I’m using the TileDB SOMA format currently and have a single, large SOMA for all my datasets so we wouldn’t want certain users having access to specific datasets.
Had a few questions with regards to our requirements:
How can I Is there some TileDB cloud functionality that can help me in this regard? Or is my only option here to have a standalone beefy server serving the database?
If I do have a single server, how do I configure the size of the server/TileDB configuration parameters such that the server doesn’t time out for a certain number of requests/threads i.e. how should I select parameters like py.init_buffer_bytes, sm.mem_total_budget, etc. I’ve noticed that when running queries my server can crash.
With regards to security, I’m thinking that an attribute called role can be stored within a cell, or I can design some logic in the API code using some traditional RDBMS solutions. Was just curious about this
Realize this is a long-winded/open-ended question but any guidance here would be amazing.
Thanks for the post and questions. I definitely think there are multiple options to achieve your goals. Please see my comments below, and in addition I sent you an email to setup a call to reconnect and discuss more.
How can I Is there some TileDB cloud functionality that can help me in this regard? Or is my only option here to have a standalone beefy server serving the database?
TileDB Cloud is designed with this use case in mind. TileDB Cloud handles queries in a serverless and elastic manner. It scales out automatically and handles the load of the queries. This means you can have lightweight clients that only have to deal with the results of your query. TileDB Cloud manages the memory allocations and handling incomplete queries serverside based on memory resources.
If I do have a single server, how do I configure the size of the server/TileDB configuration parameters such that the server doesn’t time out for a certain number of requests/threads i.e. how should I select parameters like py.init_buffer_bytes, sm.mem_total_budget, etc. I’ve noticed that when running queries my server can crash.
If you were running a single server, you are right that you need to manage the python memory budgets and the TileDB Embedded memory budget parameters. You have to also consider a total memory budget for the server, and restrict the number of running queries based on that. TileDB Cloud offers this built in and also offers the infrastructure scaling and retries capabilities backed in.
With regards to security, I’m thinking that an attribute called role can be stored within a cell, or I can design some logic in the API code using some traditional RDBMS solutions. Was just curious about this
An attribute called ‘role’ would be one option, and you can apply a query condition on this. We are working on a feature called “fine grained access control” that we will enable via a new concept called “Array Views” that will allow you to restrict a query to a specific subset of the data and limit its ranges and attributes. This is a feature we are expected to release over the next few months. I believe this feature will greatly simplify your proposed restrictions and desire to limit what data a user can access.