How to store a vector of string

I have some meta data
vector<params> p;

and params is a struct

struct params
{
string channel;
string marker;
float min, ,max
}

Since struct is not supported yet, so I have to split it into 4 vectors and store them as separate attributes .i.e.
vector channels;
vector marker;
vector min;
vector max;

The doc shows var-length attribute example which uses a single string as input buffer and define the offset array to set buffer.
I wonder that is the recommended to way to treat a vector of string as input buffer?
Also marker vector may have some empty string elements, which I don’t think will work with var-length attribute since it requires the strict ascending order for offset array.

Currently I am storing channel vs marker as tiledb metadata which will duplicate the channel info, but I don’t know the other alternatives, since I do need to preserve the order of original channels by storing them as array attribute.

@mikejiang, apologize for the delay in responding. To make sure I understand the question, lets us an example.

Let’s assume you have 2 structs:

{
channel = "abc"
marker = "zyx"
min = 0.5
max = 10.5
},
{
channel = "def"
marker = "wvu"
min = 100.0
max = 150.0
}

With your 4 vectors you will end up with:

channels = ["abc", "def"]
marker = ["zyx", "wvu"]
min = [0.5, 100.0]
max = [10.5, 150.0]

If my above assumption is correct, the solution you are looking for is to collapse the vector of strings into a single string (or a vector of chars) + offsets.

channels = ["abcdef"]
channel_offsets = [0, 3]
marker = ["zyxwvu"]
maker_offsets = [0, 3]
min = [0.5, 100.0]
max = [10.5, 150.0]

This will result in tiledb storing the values of “abc” in one cell and “def” in a second cell, which lines up with the two original structures.

The offsets in TileDB for variable length attributes represent the start positions of the string you want to set for each cell. The start positions must be in ascending order, because this assumes you are writing a single vector of chars, not a vector of c++ string types.

But in my use cases, marker often can be empty string (i.e. “”), which will cause trouble for offsets, right?

@mikejiang we’ve just merged a change to dev to support empty variable length. You can now include one or more empty cells in your write by specifying the same offset position. The check for ascending offsets has been relaxed. This change will be included in a 2.0.2 release which we will aim for in the next week or so.

Note: currently single cell writes can not include an empty value, there must be at least 1 cell with valid data in the write. Support for single cell empty writes will come in a future update.

Example usage:
Extending the above examples, if we include a 3rd cell the buffers would look like:

channels = ["abcdef"]
channel_offsets = [0, 3, 6]
marker = ["zyxwvu"]
maker_offsets = [0, 0, 3]
min = [0.5, 100.0, 20.0]
max = [10.5, 150.0, 22.5]

Here the channel_offset’s last value (cell #3) would be empty, as the offset position is equal to the size of the channel array so now we compute this to mean the last cell is empty as the length is 0. The second attribute, market_offset’s 1st value would be left empty. Since the both offset position 0 and offset position 1 are equal to the same value, TileDB now computes that the length of the 0th value is 0.