Empty folder when using 'dedup_coords'

Hi,

I have an array that throws an error saying there are duplicate coords when trying to consolidate & vacuum (no error when reading/writing). So, I added the dedup_coords option via config:

config = tiledb.Config()

config = tiledb.Config({
    "sm.tile_cache_size":str(5_000_000),
    "sm.consolidation.step_min_frags":"2",
    "sm.consolidation.step_max_frags":"20",
    "sm.consolidation.steps":"20",
    "sm.consolidation.buffer_size":str(5_000_000),
    "sm.consolidation.step_size_ratio": "0.002",
    "sm.dedup_coords":'true'
})

ctx = tiledb.Ctx(config)
            
tiledb.consolidate('arr',ctx=ctx)
tiledb.vacuum('arr',config=config)

When I do that, the consolidation process works. However, now a lot of empty folders are created. Is this a bug? Is it safe to delete empty folders without corrupting the array?

Were the empty folders there before you run the very last consolidation? I suspect those folders may have been created and not cleaned up when your consolidation was erroring out (which we should investigate). In general, yes, it is safe to delete the folders if they are empty or if they are missing file __fragment_metadata.tdb and the corresponding .ok file in the array folder (as the reader will just ignore those).

Another alternative is to use tiledb_array_schema_set_allows_dups to specify that an array allows duplicates. In that case there will be neither duplicate checking, nor deduplication.

I hope this helps.

Thanks!
I’m not sure if they were there before the last consolidation.