Subarray of 2D sparse array with double dimensions

Hello!

I’m using TileDB C# library of 2.4.11 version and I need to use 2D sparse array with 50Kx50K dimensions of double type and string attribute. For integer dimensions it is pretty simple to read only a part of an array by using subarray but I haven’t found any possibility how to do the same for double dimensions.

Is there any option like this for C# library?

Thanks in advance!

Yes, we can do that using add_range_from_str_vector in C# library. Note that you need to convert the double value to string value when using add_range_from_str_vector. Please have a look at the example we created:

In the above example, the array has two double dimensions and one string attribute. Hopefully that array is similar to yours.
Please let us know if you have further questions about that.

1 Like

Thanks for your reply!

Yes, this way of using works but in strange way. In my array I have following bounds:
MinX: -8.074450492858887
MaxX: 5.952016830444336
MinY: -5.3138957023620605
MaxY: 5.27669620513916

When I’m trying to use add_range_from_str_vector and adding the same min and max values for range, amount of data returned is lower than exist in my array. I was thinking that this range searches excluding the values provided, so have tried to look data in range from -8.08 to 5.96 for X dimension and from -5.32 to 5.28 for Y dimension but result was the same and all the points have been returned only when I have used range from -9 to 6 for X dimension and from -6 to 6 for Y dimension, so looks like this method is not parsing decimal part of that ranges provided.

Have also tried to use add_range() instead but it is always throwing an error

Static type (CHAR) does not match expected type (FLOAT64)

Is it possible to somehow read the data from an array within that provided range by increasing decimal part instead of int part of coordinate?

Hi,
I tested with an example for the double ranges TileDB-CSharp/Program.cs at bd/ch13569-add-double-range-example · TileDB-Inc/TileDB-CSharp · GitHub
It seems to me that decimal parts are not ignored. When I use 5.28 as maxY, it returns all of data, but when I use 5.27 as maxY, one data point is filtered out. Can you have a look at the example and let me know if that is similar to your case. If you would like, you can also send me a sample code with data to find out what is the reason. Thanks

Hi,

Thanks for your reply!

This example seems to be similar but in my case it still doesn’t work, don’t know why.
Here are the methods I use:

Create:

public static void CreateDataArray(string arrayUri, double minX, double maxX, double minY, double maxY, double xTileSize, double yTileSize)
{
	// Create context
	using TileDB.Context ctx = new TileDB.Context();

	// Create array if doesn't exist
	using TileDB.VFS vfs = new TileDB.VFS(ctx);
	if (!vfs.is_dir(arrayUri))
	{
		// Create domain and add a dimension
		using TileDB.Domain dom = new TileDB.Domain(ctx);
		dom.add_double_dimension("x", minX, maxX, xTileSize);
		dom.add_double_dimension("y", minY, maxY, yTileSize);

		// Create array schema for a dense array, add domain and set tile and cell order
		using TileDB.ArraySchema schema = new TileDB.ArraySchema(ctx, TileDB.ArrayType.TILEDB_SPARSE);
		schema.set_domain(dom);
		schema.set_order(TileDB.LayoutType.TILEDB_ROW_MAJOR, TileDB.LayoutType.TILEDB_ROW_MAJOR);
		schema.set_allows_dups(true);

		// Create and add data attribute and compression filter to schema
		using TileDB.Attribute attr1 = TileDB.Attribute.create_attribute(ctx, "data", TileDB.DataType.TILEDB_STRING_ASCII);

		using TileDB.Filter compression = new TileDB.Filter(ctx, TileDB.FilterType.TILEDB_FILTER_GZIP);
		using TileDB.FilterList filterList = new TileDB.FilterList(ctx);
		filterList.add_filter(compression);
		attr1.set_filter_list(filterList);
		schema.add_attribute(attr1);

		// Create the array
		TileDB.Array.create(arrayUri, schema);
	}
}

Write:

// Coordinate and data arrays contain the same amount of elements
// String elements may contain different number of characters but not more than 12 characters
public static void WriteToArray(string arrayUri, double[] xCoords, double[] yCoords, string[] data)
{
	// Create context
	using TileDB.Context ctx = new TileDB.Context();

	// Create char vector to split string array to char array and long vector to store offsets
	using TileDB.VectorDouble xVector = new TileDB.VectorDouble(xCoords);
	using TileDB.VectorDouble yVector = new TileDB.VectorDouble(yCoords);
	using TileDB.VectorChar dataVector = new TileDB.VectorChar();
	using TileDB.VectorUInt64 dataOffsets = new TileDB.VectorUInt64();

	// Fill vectors with data array to write
	uint offset = 0;
	foreach (string value in data)
	{
		dataVector.AddRange(new TileDB.VectorChar(value.ToCharArray()));
		dataOffsets.Add(offset);
		offset += (uint)value.Length;
	}

	// Open array for writing
	using TileDB.Array array = new TileDB.Array(ctx, arrayUri, TileDB.QueryType.TILEDB_WRITE);

	// Create the query
	using TileDB.Query query = new TileDB.Query(ctx, array, TileDB.QueryType.TILEDB_WRITE);
	query.set_layout(TileDB.LayoutType.TILEDB_UNORDERED);
	query.set_char_vector_buffer_with_offsets("data", dataVector, dataOffsets);
	query.set_double_vector_buffer("x", xVector);
	query.set_double_vector_buffer("y", yVector);

	// Submit query
	query.submit();

	// Close the array
	array.close();
}

Read:

// Have tried different min and max values but it works only when integer part of each variable is increased/decreased
private static (double[], double[], string[]) ReadDataArray(string arrayUri, int elementCount, double minX, double maxX, double minY, double maxY)
{
	// Create context
	using TileDB.Context ctx = new TileDB.Context();

	// Vector to store coordinates, data and character offset. Multiplication value may be changed depending on the largest value size
	using TileDB.VectorDouble xCoords = TileDB.VectorDouble.Repeat(0, elementCount);
	using TileDB.VectorDouble yCoords = TileDB.VectorDouble.Repeat(0, elementCount);
	using TileDB.VectorUInt64 dataOffset = TileDB.VectorUInt64.Repeat(0, elementCount);
	using TileDB.VectorChar data = TileDB.VectorChar.Repeat(' ', elementCount * 12);

	// Open array for read
	using TileDB.Array array = new TileDB.Array(ctx, arrayUri, TileDB.QueryType.TILEDB_READ);
	TileDB.ArraySchema schema = new TileDB.ArraySchema(ctx, arrayUri);

	// Construct the query
	using TileDB.Query query = new TileDB.Query(ctx, array, TileDB.QueryType.TILEDB_READ);
	query.set_layout(TileDB.LayoutType.TILEDB_GLOBAL_ORDER);

	query.set_double_vector_buffer("x", xCoords);
	query.set_double_vector_buffer("y", yCoords);
	query.set_char_vector_buffer_with_offsets("data", data, dataOffset);

	TileDB.VectorString range1 = new TileDB.VectorString() { minX.ToString(), maxX.ToString() };
	query.add_range_from_str_vector(0, range1);

	TileDB.VectorString range2 = new TileDB.VectorString() { minY.ToString(), maxY.ToString() };
	query.add_range_from_str_vector(1, range2);

	query.submit();
	query.finalize();
	
	using TileDB.MapStringVectorUInt64 bufferElements = query.result_buffer_elements();
	array.close();

	// This output already returns less elements than expected
	Console.WriteLine($"bufferElements[0]: {bufferElements["data"][0]}");

	// Parse the result to an array
	ulong resultElementOffset = bufferElements["data"][0];
	ulong resultElementSize = bufferElements["data"][1];
	using TileDB.VectorUInt64 dataSizes = new TileDB.VectorUInt64();

	for (int i = 0; i < ((int)resultElementOffset - 1); ++i)
	{
		dataSizes.Add(dataOffset[i + 1] - dataOffset[i]);
	}
	dataSizes.Add(resultElementSize * TileDB.EnumUtil.datatype_size(TileDB.DataType.TILEDB_CHAR) - dataOffset[(int)resultElementOffset - 1]);

	string[] dataArray = new string[(int)resultElementOffset];
	for (int i = 0; i < (int)resultElementOffset; ++i)
	{
		dataArray[i] = new string(data.GetRange((int)dataOffset[i], (int)dataSizes[i]).ToArray());
	}

	// This output matches previous one
	Console.WriteLine(dataArray.Length);

	// Clean up unused elements in vectors
	xCoords.RemoveRange(dataArray.Length, elementCount - dataArray.Length);
	yCoords.RemoveRange(dataArray.Length, elementCount - dataArray.Length);

	return (xCoords.ToArray(), yCoords.ToArray(), dataArray);
}

thanks for your sample code. I will test your code with some generated data points.

Hi
I tried to mimic your code in the example TileDB-CSharp/Program.cs at bd/ch13569-add-double-range-example · TileDB-Inc/TileDB-CSharp · GitHub
50000 data points were simulated in the above example. I can get all 50000 data points when I use [minX_sim,maxX_sim] as x range and [minY_sim,maxY_sim] as y range. Could you have a look at the example? If you still have problems, could you send me a sample of your data? so I can investigate it furture. thanks

Hi,
I tried to run the same code without any changes on my machine and for me it has returned 38499 points

After adding or subtracting 0.01 for min and max sim variables result hasn’t changed

But when I have added +/- 1 to min and max sim variables program has returned 50000 points

What OS are you using to run this program? I’m using Windows 10, so maybe result is depending on platform which is used to run the code?

I tested on both Windows10 and macos. On both platforms, I got 50000 points for ranges [minX_sim,maxX_sim] and [minY_sim,maxY_sim], Also I got 49509 points when I used ranges [minX_sim+0.01,maxX_Sim-0.01] and [minY_sim+0.01,maxX_sim-0.01]. That is an interesting problem, I will try to find a windows10 computer which can also get your results.

Ok, please, let me know if you will need any additional information, have no idea what else can be a reason of different behavior of the same program