Subarray of 2D sparse array with double dimensions

vodohleb · January 14, 2022, 10:57am

Hello!

I’m using TileDB C# library of 2.4.11 version and I need to use 2D sparse array with 50Kx50K dimensions of double type and string attribute. For integer dimensions it is pretty simple to read only a part of an array by using subarray but I haven’t found any possibility how to do the same for double dimensions.

Is there any option like this for C# library?

Thanks in advance!

Bin_Deng · January 17, 2022, 8:29pm

Yes, we can do that using add_range_from_str_vector in C# library. Note that you need to convert the double value to string value when using add_range_from_str_vector. Please have a look at the example we created:

github.com

TileDB-Inc/TileDB-CSharp/blob/bd/ch13569-add-double-range-example/examples_nuget/double_range/Program.cs

using System;
using System.Linq;
using TileDB;  
var ctx = new TileDB.Context();
var dom = new TileDB.Domain(ctx);
//add dimensions
dom.add_double_dimension("rows",1.0,10.0,10.0);
dom.add_double_dimension("cols",1.0,10.0,10.0);
var schema = new TileDB.ArraySchema(ctx,TileDB.ArrayType.TILEDB_SPARSE);
schema.set_domain(dom);
//add attribute
var attr1 = TileDB.Attribute.create_attribute(ctx, "a1", TileDB.DataType.TILEDB_STRING_ASCII);
//var attr1 =  TileDB.Attribute.create_attribute(ctx, "a1", TileDB.DataType.TILEDB_INT32);
 
schema.add_attribute(attr1);
string array_uri = "test_doublerange_array";
var vfs = new TileDB.VFS(ctx);
if(vfs.is_dir(array_uri))
{
    vfs.remove_dir(array_uri);

This file has been truncated. show original

In the above example, the array has two double dimensions and one string attribute. Hopefully that array is similar to yours.
Please let us know if you have further questions about that.

vodohleb · January 18, 2022, 7:57am

Thanks for your reply!

Yes, this way of using works but in strange way. In my array I have following bounds:
MinX: -8.074450492858887
MaxX: 5.952016830444336
MinY: -5.3138957023620605
MaxY: 5.27669620513916

When I’m trying to use add_range_from_str_vector and adding the same min and max values for range, amount of data returned is lower than exist in my array. I was thinking that this range searches excluding the values provided, so have tried to look data in range from -8.08 to 5.96 for X dimension and from -5.32 to 5.28 for Y dimension but result was the same and all the points have been returned only when I have used range from -9 to 6 for X dimension and from -6 to 6 for Y dimension, so looks like this method is not parsing decimal part of that ranges provided.

Have also tried to use add_range() instead but it is always throwing an error

Static type (CHAR) does not match expected type (FLOAT64)

Is it possible to somehow read the data from an array within that provided range by increasing decimal part instead of int part of coordinate?

Bin_Deng · January 19, 2022, 3:49am

Hi,
I tested with an example for the double ranges TileDB-CSharp/Program.cs at bd/ch13569-add-double-range-example · TileDB-Inc/TileDB-CSharp · GitHub
It seems to me that decimal parts are not ignored. When I use 5.28 as maxY, it returns all of data, but when I use 5.27 as maxY, one data point is filtered out. Can you have a look at the example and let me know if that is similar to your case. If you would like, you can also send me a sample code with data to find out what is the reason. Thanks

vodohleb · January 19, 2022, 8:33am

Hi,

Thanks for your reply!

This example seems to be similar but in my case it still doesn’t work, don’t know why.
Here are the methods I use:

Create:

public static void CreateDataArray(string arrayUri, double minX, double maxX, double minY, double maxY, double xTileSize, double yTileSize)
{
	// Create context
	using TileDB.Context ctx = new TileDB.Context();

	// Create array if doesn't exist
	using TileDB.VFS vfs = new TileDB.VFS(ctx);
	if (!vfs.is_dir(arrayUri))
	{
		// Create domain and add a dimension
		using TileDB.Domain dom = new TileDB.Domain(ctx);
		dom.add_double_dimension("x", minX, maxX, xTileSize);
		dom.add_double_dimension("y", minY, maxY, yTileSize);

		// Create array schema for a dense array, add domain and set tile and cell order
		using TileDB.ArraySchema schema = new TileDB.ArraySchema(ctx, TileDB.ArrayType.TILEDB_SPARSE);
		schema.set_domain(dom);
		schema.set_order(TileDB.LayoutType.TILEDB_ROW_MAJOR, TileDB.LayoutType.TILEDB_ROW_MAJOR);
		schema.set_allows_dups(true);

		// Create and add data attribute and compression filter to schema
		using TileDB.Attribute attr1 = TileDB.Attribute.create_attribute(ctx, "data", TileDB.DataType.TILEDB_STRING_ASCII);

		using TileDB.Filter compression = new TileDB.Filter(ctx, TileDB.FilterType.TILEDB_FILTER_GZIP);
		using TileDB.FilterList filterList = new TileDB.FilterList(ctx);
		filterList.add_filter(compression);
		attr1.set_filter_list(filterList);
		schema.add_attribute(attr1);

		// Create the array
		TileDB.Array.create(arrayUri, schema);
	}
}

Write:

// Coordinate and data arrays contain the same amount of elements
// String elements may contain different number of characters but not more than 12 characters
public static void WriteToArray(string arrayUri, double[] xCoords, double[] yCoords, string[] data)
{
	// Create context
	using TileDB.Context ctx = new TileDB.Context();

	// Create char vector to split string array to char array and long vector to store offsets
	using TileDB.VectorDouble xVector = new TileDB.VectorDouble(xCoords);
	using TileDB.VectorDouble yVector = new TileDB.VectorDouble(yCoords);
	using TileDB.VectorChar dataVector = new TileDB.VectorChar();
	using TileDB.VectorUInt64 dataOffsets = new TileDB.VectorUInt64();

	// Fill vectors with data array to write
	uint offset = 0;
	foreach (string value in data)
	{
		dataVector.AddRange(new TileDB.VectorChar(value.ToCharArray()));
		dataOffsets.Add(offset);
		offset += (uint)value.Length;
	}

	// Open array for writing
	using TileDB.Array array = new TileDB.Array(ctx, arrayUri, TileDB.QueryType.TILEDB_WRITE);

	// Create the query
	using TileDB.Query query = new TileDB.Query(ctx, array, TileDB.QueryType.TILEDB_WRITE);
	query.set_layout(TileDB.LayoutType.TILEDB_UNORDERED);
	query.set_char_vector_buffer_with_offsets("data", dataVector, dataOffsets);
	query.set_double_vector_buffer("x", xVector);
	query.set_double_vector_buffer("y", yVector);

	// Submit query
	query.submit();

	// Close the array
	array.close();
}

Read:

// Have tried different min and max values but it works only when integer part of each variable is increased/decreased
private static (double[], double[], string[]) ReadDataArray(string arrayUri, int elementCount, double minX, double maxX, double minY, double maxY)
{
	// Create context
	using TileDB.Context ctx = new TileDB.Context();

	// Vector to store coordinates, data and character offset. Multiplication value may be changed depending on the largest value size
	using TileDB.VectorDouble xCoords = TileDB.VectorDouble.Repeat(0, elementCount);
	using TileDB.VectorDouble yCoords = TileDB.VectorDouble.Repeat(0, elementCount);
	using TileDB.VectorUInt64 dataOffset = TileDB.VectorUInt64.Repeat(0, elementCount);
	using TileDB.VectorChar data = TileDB.VectorChar.Repeat(' ', elementCount * 12);

	// Open array for read
	using TileDB.Array array = new TileDB.Array(ctx, arrayUri, TileDB.QueryType.TILEDB_READ);
	TileDB.ArraySchema schema = new TileDB.ArraySchema(ctx, arrayUri);

	// Construct the query
	using TileDB.Query query = new TileDB.Query(ctx, array, TileDB.QueryType.TILEDB_READ);
	query.set_layout(TileDB.LayoutType.TILEDB_GLOBAL_ORDER);

	query.set_double_vector_buffer("x", xCoords);
	query.set_double_vector_buffer("y", yCoords);
	query.set_char_vector_buffer_with_offsets("data", data, dataOffset);

	TileDB.VectorString range1 = new TileDB.VectorString() { minX.ToString(), maxX.ToString() };
	query.add_range_from_str_vector(0, range1);

	TileDB.VectorString range2 = new TileDB.VectorString() { minY.ToString(), maxY.ToString() };
	query.add_range_from_str_vector(1, range2);

	query.submit();
	query.finalize();
	
	using TileDB.MapStringVectorUInt64 bufferElements = query.result_buffer_elements();
	array.close();

	// This output already returns less elements than expected
	Console.WriteLine($"bufferElements[0]: {bufferElements["data"][0]}");

	// Parse the result to an array
	ulong resultElementOffset = bufferElements["data"][0];
	ulong resultElementSize = bufferElements["data"][1];
	using TileDB.VectorUInt64 dataSizes = new TileDB.VectorUInt64();

	for (int i = 0; i < ((int)resultElementOffset - 1); ++i)
	{
		dataSizes.Add(dataOffset[i + 1] - dataOffset[i]);
	}
	dataSizes.Add(resultElementSize * TileDB.EnumUtil.datatype_size(TileDB.DataType.TILEDB_CHAR) - dataOffset[(int)resultElementOffset - 1]);

	string[] dataArray = new string[(int)resultElementOffset];
	for (int i = 0; i < (int)resultElementOffset; ++i)
	{
		dataArray[i] = new string(data.GetRange((int)dataOffset[i], (int)dataSizes[i]).ToArray());
	}

	// This output matches previous one
	Console.WriteLine(dataArray.Length);

	// Clean up unused elements in vectors
	xCoords.RemoveRange(dataArray.Length, elementCount - dataArray.Length);
	yCoords.RemoveRange(dataArray.Length, elementCount - dataArray.Length);

	return (xCoords.ToArray(), yCoords.ToArray(), dataArray);
}

Bin_Deng · January 19, 2022, 9:29pm

thanks for your sample code. I will test your code with some generated data points.

Bin_Deng · January 24, 2022, 5:28am

Hi
I tried to mimic your code in the example TileDB-CSharp/Program.cs at bd/ch13569-add-double-range-example · TileDB-Inc/TileDB-CSharp · GitHub
50000 data points were simulated in the above example. I can get all 50000 data points when I use [minX_sim,maxX_sim] as x range and [minY_sim,maxY_sim] as y range. Could you have a look at the example? If you still have problems, could you send me a sample of your data? so I can investigate it furture. thanks

vodohleb · January 24, 2022, 7:42am

Hi,
I tried to run the same code without any changes on my machine and for me it has returned 38499 points

After adding or subtracting 0.01 for min and max sim variables result hasn’t changed

But when I have added +/- 1 to min and max sim variables program has returned 50000 points

What OS are you using to run this program? I’m using Windows 10, so maybe result is depending on platform which is used to run the code?

Bin_Deng · January 24, 2022, 3:43pm

I tested on both Windows10 and macos. On both platforms, I got 50000 points for ranges [minX_sim,maxX_sim] and [minY_sim,maxY_sim], Also I got 49509 points when I used ranges [minX_sim+0.01,maxX_Sim-0.01] and [minY_sim+0.01,maxX_sim-0.01]. That is an interesting problem, I will try to find a windows10 computer which can also get your results.

vodohleb · January 25, 2022, 7:47am

Ok, please, let me know if you will need any additional information, have no idea what else can be a reason of different behavior of the same program

vodohleb · April 7, 2022, 7:27am

Hi! Is there any update on this issue reason?

Topic		Replies	Views
Reading one attribute from Multi-Range Subarrays	2	828	July 10, 2020
Re-Indexing String Dimensions For Sparse Arrays	2	519	March 17, 2023
Confused by dimensions vs attributes need help designing backend for project	5	871	March 10, 2023
Am I wrongly filling a sparse array that has a variable length string attribute? Or is this a bug?	4	935	August 15, 2023
Using a multi-dimensional sparse array in python	4	1245	July 31, 2019

Subarray of 2D sparse array with double dimensions

Related topics