Azure Blob storage backend

Hi,

As mentioned in other threads there is a bit of lacking documentation when it comes to using Azure and GCS as storage backends.

While waiting for such documentation to be created, I’d like to share the problem I’m facing, with hopes of that the fix is swift.

Problem: Read / Write to an array on Azure Blob storage is not working.

Using the following simple script (account secret and sas-token redacted) in an effort to create a simple array:

import tiledb 
import numpy as np

def create_array(x_dim, y_dim, array_name, tiling = 1):

   print("Creating array")
   time_dim = tiledb.Dim(name="time", domain=(0, x_dim), tile=x_dim, dtype=np.int32)
   attribute_dim = tiledb.Dim(name="attribute", domain=(0, y_dim), tile=tiling, dtype=np.int32)
   dom = tiledb.Domain(time_dim, attribute_dim)
   attr = tiledb.Attr(name="value", dtype=np.int32)
   schema = tiledb.ArraySchema(domain=dom, sparse=False, attrs=[attr], tile_order='col-major')
   tiledb.Array.create(array_name, schema)

# Set up config
config = tiledb.Config()
config["vfs.azure.use_https"] = "true"
config["vfs.azure.storage_account_name"] = "tiledbtest"
config["vfs.azure.storage_account_key"] = "<my-account-secret>"
config["vfs.azure.storage_sas_token"] = "https://tiledbtest.blob.core.windows.net/25504f45-1975-4294-86d6-f90f7cde738f?sp=racwdli&st=2022-09-27T08:53:11Z&se=2022-09-28T16:53:11Z&spr=https&sv=2021-06-08&sr=c&sig=<sas-token-signature>"

# Define a TileDB context
ctx = tiledb.Ctx(config=config)

storage_acc = "tiledbtest"
container = "25504f45-1975-4294-86d6-f90f7cde738f"
azure_uri_manual = f"azure://{storage_acc}.blob.core.windows.net/{container}/"

# Create an array
x, y = 2, 1
create_array(x, y, azure_uri_manual)

seems to time out (during blob reading), this is the error message:

File "/workspace/tiledb/tiledb_azure.py", line 12, in create_array
    tiledb.Array.create(array_name, schema)
  File "tiledb/libtiledb.pyx", line 3552, in tiledb.libtiledb.Array.create
  File "tiledb/libtiledb.pyx", line 575, in tiledb.libtiledb._raise_ctx_err
  File "tiledb/libtiledb.pyx", line 560, in tiledb.libtiledb._raise_tiledb_error
tiledb.cc.TileDBError: [TileDB::Azure] Error: List blobs failed on: azure://tiledbtest.blob.core.windows.net/25504f45-1975-4294-86d6-f90f7cde738f/__schema/

I have verified that both the secret and sas-token can be used to list and create blobs using Azures python API independently from TileDB.

I have three questions:

  1. Do you have a working example of using Azure Blob Storage as backend with SAS tokens?
  2. Is there additional verbosity I can toggle to help with troubleshooting?
  3. Is there a flag to control if access to Azure is done using the secret or SAS-token? Or is it simply that it tries using SAS-token if present, otherwise account-key?

Things that might be clues:

  • TileDB does not seem to be using the credentials properly, or is attempting to access an endpoint which does not exist, because posting an invalid configuration of storage account + credentials does not cause immediate 4XX, simply times out in the same manner.

Let me know if you require additional information / debug traces.

Best Regards,
David

Hi @gruffaren,

I’ve added some documentation about using SAS tokens here.

  1. Do you have a working example of using Azure Blob Storage as backend with SAS tokens?

Here’s an example below that I’ve just used successfully – I think the big difference is I’m only using the SAS token whereas it looks like you might be using one of the connections strings.

(I’m using this one from the “create shared access credential” UI:
image)

import tiledb, numpy as np

azure_id = '<storage account name>'
sas_token = "?sv=2021-06-08..."
container = '<container name>'
array_uri = f"azure://{container}/sas_array1"

cfg = tiledb.Config()
cfg['vfs.azure.storage_account_name'] = azure_id
cfg['vfs.azure.storage_sas_token'] = sas_token

ctx = tiledb.Ctx(config=cfg)

array = np.random.rand(5,5)

print("local array is: ")
print(array)

tiledb.from_numpy(array_uri, array, ctx=ctx)
print("remote array path is: ", array_uri)
print("remote array content: ")
with tiledb.open(array_uri, ctx=ctx) as A:
    print(A[:])
  1. Is there additional verbosity I can toggle to help with troubleshooting?

Unfortunately not, at the moment. In the course of debugging this myself, I added some printouts of the status message from azure – we’ll add that to the log messages in the next release.

  1. Is there a flag to control if access to Azure is done using the secret or SAS-token? Or is it simply that it tries using SAS-token if present, otherwise account-key?

Right now we unconditionally pass the SAS token in to the SDK if the config parameter is set. I’m not sure about the precedence of the account key if it is also set (I’ve used with both token and key set in the past, but I haven’t traced through the SDK to see which one it chooses).

Best,
Isaiah

I think this is quite important, It’s very hard for the user to not know whether they have not been authorised or if they are not reaching the right endpoint etc…