TIle-db s3 documentation

I am looking for documentation on how to store and access TIle-db array to and from AWS S3 bucket. Unfortunately, docs are pretty much non-existent. I stumbled [upon this page] (https://docs.tiledb.com/main/) when I Google for Tile-db s3. To my dismay, this page has no documentation or mention of how to store/access tile-db array from S3 bucket.

Can someone point me to any authoritative guide on how to store/retrieve tile-db array from aws s3 bucket?

You can start here: https://docs.tiledb.com/developer/backends/s3

The page details how to setup AWS credentials (if you need to), and after that it is just another URI so s3://my_bucket/my_array should be accessible to you (provided you built with s3 support) just like a local file would be.

For reference, here is example code accessing a bucket in us-west-1:

#include <tiledb/tiledb>
#include <iostream>

using namespace tiledb;
// dimension type
using DIM_T = uint64_t;

int main(int argc, char** argv) {
    if (std::getenv("AWS_ACCESS_KEY_ID") == nullptr || std::getenv("AWS_SECRET_ACCESS_KEY") == nullptr) {
        std::cout << "missing AWS_* access environment variables!" << std::endl;
        exit(1);
    }

    tiledb::Config cfg;
    cfg["vfs.s3.aws_access_key_id"] = std::string(std::getenv("AWS_ACCESS_KEY_ID"));
    cfg["vfs.s3.aws_secret_access_key"] = std::string(std::getenv("AWS_SECRET_ACCESS_KEY"));
    cfg["vfs.s3.region"] = "us-west-1";

    tiledb::Context ctx(cfg);

    // replace with URI of bucket you have access to using credentials above
    std::string uri("s3://bucket-us-west-1/test-array-4x4");

    auto array = tiledb::Array(ctx, uri, TILEDB_READ);
    auto schema = array.schema();
    auto domain = schema.domain();
    std::cout << "ndim: " << domain.ndim() << std::endl;

    std::vector<double> data(16);

    Query query(ctx, array, TILEDB_READ);
    query.set_layout(TILEDB_ROW_MAJOR)
         .set_buffer("", data);

    std::vector<DIM_T> subarray({0,3,0,3});
    query.set_subarray(subarray);

    query.submit();

    if (query.query_status() != Query::Status::COMPLETE) {
        std::cout << "query returned but not 'complete'" << std::endl;
        exit(1);
    }

    std::cout << "data[0,0]: " << data[0] << " data[3,3]: " << data[15] << std::endl;
}

Ok i have been facing this error

Exception: InvalidAccessKeyId
Error message: The AWS Access Key Id you provided does not exist in our records. with address : 52.219.24.144

My Access Key is : ASIA****

I have copied/pasted access_key and sexret from my aws config file. Also, I also see Session_token in the file as well. Not sure if this failure has anything to do with session token.

Also, my aws creds are granted to me via https://github.com/Nike-Inc/gimme-aws-creds MFA. and the they are issued to a profile name ‘global’ not ‘default’ not sure if that makes any difference.

If you feel like I am diverging or these details are out of scope for origina question then I can post another question. Please let me know

Ok, I believe this kind of access key will require us to pass the session token along with the key/secret pair. I’ve opened an issue about this in our tracker.

In the meantime, if you do not set any credentials in the environment or via a config object, then TileDB (via the AWS C++ SDK which we use internally) should load the credentials from the AWS CLI config (if you have set the credentials with aws login). In that case, I think the token should be correctly included and the request may work.

Also, if you have built TileDB from source and are willing to try a test, you could temporarily hard-code the session token here:

add the token after secret_access_key here:

Aws::Auth::AWSCredentials(access_key_id, secret_access_key, “session token”)

(we will need to set up/learn Amazon STS in order to test this ourselves, so if you are able to confirm the change above works, that will be very helpful to get this feature added more quickly)

@TileDbUser we’ve added support for STS tokens in the following pull request:

If you have built TileDB from source, please do test the change above and let us know. This code will be included in the next release.

In the meantime, adding to the note above, if you do not set tokens in the TileDB config, then the AWS SDK will attempt to find credentials from:

  • config directory, as set by aws login
  • environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN if necessary (see this document)

@ihnorton my apologies for late reply on this. While debugging this issue, I did take the approach of using a regular account which does not rely on STS token to at least do my POC on TIle-db S3.

I haven’t had a chance to build your PR locally and try it with my profile which uses Session_token. However, I can definitely give it a shot once the code is merged and I can set session_key directly from tile-db API.

Thank you so much for your time on this issue. I really appreciate it.