Groups and TileDB-R API

Hi!

I was looking to use grouped arrays within TileDB R API, but I seem to have hit a wall . Here is a reproducible example below:

library(tiledb)

Data to store

data_1 ← list(a = c(1:3), b = c(“A”, “B”, “C”))
data_2 ← list(a = c(4:6), b = c(“D”, “E”, “F”))

URIs

uri_1 ← paste0(getwd(), “/”, “array_1”)
if(tiledb_vfs_is_dir(uri_1)) {tiledb_vfs_remove_dir(uri_1)}
uri_2 ← paste0(getwd(), “/”, “array_2”)
if(tiledb_vfs_is_dir(uri_2)) {tiledb_vfs_remove_dir(uri_2)}

Define the TileDB array

dim ← tiledb_dim(name = ‘dimension’,
domain = c(1L, 3L),
tile = 1L,
type = “INT32”)
attr_a ← tiledb_attr(name = “a”, type = “INT32”, nullable = TRUE)
attr_b ← tiledb_attr(name = “b”, type = “CHAR”, nullable = TRUE, ncells = NA)
dom ← tiledb_domain(dims = c(dim))
schema ← tiledb_array_schema(domain = dom,
attrs = c(attr_a, attr_b),
sparse = FALSE)
tiledb_array_create(uri_1, schema)
tiledb_array_create(uri_2, schema)

Write data

write_array ← tiledb_array(uri = uri_1)
write_array ← data.frame(data_1, row.names = c(1L:3L))
write_array ← tiledb_array(uri = uri_2)
write_array ← data.frame(data_2, row.names = c(1L:3L))

Group arrays

uri_grp = paste0(getwd(), “/”, “group”)
if(tiledb_vfs_is_dir(uri_grp)) {tiledb_vfs_remove_dir(uri_grp)}

tiledb_group_create(uri = uri_grp)
group ← tiledb_group(uri = uri_grp,
type = c(“WRITE”))
tiledb_group_add_member(grp = group,
uri = uri_1,
relative = FALSE,
name = “array_1”)
tiledb_group_add_member(grp = group,
uri = uri_2,
relative = FALSE,
name = “array_1”)

Check if groups have been populated:

group
tiledb_group_member_count(tiledb_group(uri = uri_grp, type = “READ”))

The group I have created do not contain anything except a name, and it seems the tiled_group_add_member command does not do anything.

group
group GROUP
tiledb_object_type(uri = uri_grp)
[1] “GROUP”
tiledb_group_member_count(tiledb_group(uri = uri_grp, type = “READ”))
[1] 0

Would anyone have else encountered the same issue?

J.

Hi Jérôme,

Thanks for getting in touch, and for posting a detailed reproducible example. After swapping out all the non-printable character (which may have come in via the html rendering, see below for a hint on how to avoid that) I can reproduce almost all.

But for example fail on the second tiledb_group_add_member as you supply a new uri but reuse the name leading to an error:

> tiledb_group_add_member(grp = group, uri = uri_1, relative = FALSE, name = "array_1")
> tiledb_group_add_member(grp = group, uri = uri_2, relative = FALSE, name = "array_1")
Error: Group Details: Cannot add group member array_1, a member with the same name or URI has already been added.
> 

Of course that is easy to fix. I can then replicate the finding that the count returns as zero. However, it is also worth noting that you (accidentally?) wrote the arrays next to rather than inside the group directory. From listing in the temporary directory I used:

edd@rob:/tmp/tiledb$ ls -ld array_? group/
drwxr-xr-x 8 edd edd 4096 Mar 20 11:14 array_1
drwxr-xr-x 8 edd edd 4096 Mar 20 11:14 array_2
drwxr-xr-x 4 edd edd 4096 Mar 20 11:15 group/
edd@rob:/tmp/tiledb$ 

Below is a quick demo of writing arrays and marking them as members of a group along with count checks. This should get you onto the right track. If not, please do not hesitate to follow-up here, via a GitHub issue or in email (where I can be reached at my first name at tiledb dot com).

With best regards, Dirk

library(tiledb)
setwd("/tmp/tiledb")   # adjust locally

uri <- file.path(getwd(), "demo_group")       # prefer absolute path here, likely not required
if (tiledb_vfs_is_dir(uri)) tiledb_vfs_remove_dir(uri)

tiledb_group_create(uri)
grp <- tiledb_group(uri)
grp <- tiledb_group_close(grp)

## create some temp arrays to adds as groups
uri1 <- file.path(uri, "tic")
uri2 <- file.path(uri, "tac")
uri3 <- file.path(uri, "toe")
df1 <- data.frame(val=seq(100, 200, by=10))
df2 <- data.frame(letters=letters)
df3 <- data.frame(nine=rep(9L, 9))
tiledb::fromDataFrame(df1, uri1)
tiledb::fromDataFrame(df2, uri2)
tiledb::fromDataFrame(df3, uri3)

## add member
grp <- tiledb_group_open(grp, "WRITE")
grp <- tiledb_group_add_member(grp, uri1, FALSE) 					# use absolute URL
grp <- tiledb_group_close(grp)
grp <- tiledb_group_open(grp, "READ")
cat("Cound should now be one:", tiledb_group_member_count(grp), "\n")
grp <- tiledb_group_close(grp)
grp <- tiledb_group_open(grp, "WRITE")
grp <- tiledb_group_add_member(grp, uri2, FALSE) 					# use absolute URL
grp <- tiledb_group_add_member(grp, uri3, FALSE) 					# use absolute URL
grp <- tiledb_group_close(grp)
grp <- tiledb_group_open(grp, "READ")
cat("Cound should now be three:", tiledb_group_member_count(grp), "\n")

The reference output and resulting directories (relative to my chosen target directory) are:

$ Rscript post_20240320_reply.R
Cound should now be one: 1 
Cound should now be three: 3 
$ ls -ld /tmp/tiledb/demo_group/*
drwxr-xr-x 2 edd edd 4096 Mar 20 11:46 /tmp/tiledb/demo_group/__group
drwxr-xr-x 2 edd edd 4096 Mar 20 11:46 /tmp/tiledb/demo_group/__meta
drwxr-xr-x 8 edd edd 4096 Mar 20 11:46 /tmp/tiledb/demo_group/tac
drwxr-xr-x 8 edd edd 4096 Mar 20 11:46 /tmp/tiledb/demo_group/tic
-rw-r--r-- 1 edd edd    0 Mar 20 11:46 /tmp/tiledb/demo_group/__tiledb_group.tdb
drwxr-xr-x 8 edd edd 4096 Mar 20 11:46 /tmp/tiledb/demo_group/toe
$ 

PS Markdown formatting here works. So

```r
# R code here

starts an code, and three backticks close it.

1 Like

Hi Dirk,

Thanks a lot for your help. The problem was indeed the relative position of group’s and arrays’s URIs, it was not clear to me that they have to be nested, but it makes total sense now!

2 Likes