Using a multi-dimensional sparse array in python

Hello!

I have data including 190 lines with each up to 180 points with x,y,z coordinates and I try to store it as a sparse array with tiledb.

dom = tiledb.Domain(tiledb.Dim(name="fibers", domain=(0,number_of_streamlines-1), tile=4, dtype=np.float32),
                        tiledb.Dim(name="points", domain=(0,max_line_length-1), tile=4, dtype=np.float32),
    #                     tiledb.Dim(name="coords", domain=(0,2), tile=4, dtype=np.float32), 
                       )

schema = tiledb.ArraySchema(ctx, domain=dom, sparse=True,
                                attrs=[tiledb.Attr(ctx, name="x", dtype=np.float32), 
                                       tiledb.Attr(ctx, name="y", dtype=np.float32),
                                       tiledb.Attr(ctx, name="z", dtype=np.float32)])

TypeError: init() got multiple values for keyword argument ‘name’

schema = tiledb.ArraySchema(ctx, domain=dom, sparse=True,
                                attrs=[tiledb.Attr(ctx, name="xyz", dtype=np.dtype([("", np.float32),
                                                                                    ("", np.float32),
                                                                                    ("", np.float32)]))])

TypeError: init() got multiple values for keyword argument ‘name’

This is a noob questions so I really don’t know what is going on.

Please help!

Thank you so much!

Hi Daniel,

Nice to see you on here!

The issue here is that the ctx argument should be passed as a keyword, if at all, rather than positionally. However, with Python you can omit it for routine use.

The following works for me with TileDB-Py 0.4.3 from Conda:

schema = tiledb.ArraySchema(domain=dom, sparse=True,
                                attrs=[tiledb.Attr(name="x", dtype=np.float32),
                                       tiledb.Attr(name="y", dtype=np.float32),
                                       tiledb.Attr(name="z", dtype=np.float32)])

The current version of the sparse tutorial and quickstart code look up-to-date, but please let me know any doc URLs you were working from, and I will make sure to update them.

Cheers,
Isaiah

Thank you so much! This part works now! :slight_smile:

Now, I tried the following schemas:

schema = tiledb.ArraySchema(domain=dom, sparse=True,
                                attrs=[tiledb.Attr(name="x", dtype=np.float32),
                                       tiledb.Attr(name="y", dtype=np.float32),
                                       tiledb.Attr(name="z", dtype=np.float32)])

and

schema = tiledb.ArraySchema(domain=dom, sparse=True,
                                attrs=[tiledb.Attr(name="coordinates", dtype=np.dtype([("", np.float32),
                                                                                       ("", np.float32), 
                                                                                       ("", np.float32)]))])

In both cases, I get this error whenever I want to write the data:
ValueError: all the input array dimensions except for the concatenation axis must match exactly

with

with tiledb.SparseArray(array_name, mode='w') as A:
        A[I,J] = {"x":current_points[:,0], "y":current_points[:,1], "z":current_points[:,2]}

and

with tiledb.SparseArray(array_name, mode='w') as A:
    print(A)
    A[I, J] = current_points

Here is the code: https://github.com/haehn/ABCDTRKV/blob/master/IPY/TileDB_SparseArrayTest2.ipynb (see bottom two sections)

Tried many things but no luck :frowning: What is wrong or how can I debug the dimensions of A[I, J] versus current_points? I would think they match!

Best,
Daniel

Hi Daniel,

Right now the API doesn’t automatically broadcast along indices like that, so each point id needs to also have the corresponding fiber id set (even though in this case it is the same)

The quick fix is in the following snippet (cells copied from your notebook in order to recreate the data, and slightly modified):

The important change is this line:

I, J = np.repeat(line_index-1, line_index), list(range(current_points.shape[0]))

Got it! You make it look easy, Thank you!!