Github Copilot with TileDB Embedded

Hi everyone, I am not use if this is the correct forum for this because it relates more with TileDB embedded and Github Copilot than TileDB-cloud proper. Has anyone else noticed that with:

  • VS Code Jupyter notebooks
  • Github Copilot enabled
    there is a I/O error when trying to write a large dense array (10000 x 10000).
TileDBError: TileDB internal: [OrderedWriter::dowork]  ([TileDB::IO] Error: Cannot write to file '....\__fragments\__1734663652875_1734663652875_717d568ee2dcf5de4f362ef743681cd2_22\a0.tdb'; File opening error CreateFile GetLastError 32 (0x00000020): The process cannot access the file because it is being used by another process.

Issue goes away once I disable Github copilot. I suspect that copilot is somehow snooping into fragments and not closing them.

Hi @vichu_ors, could you share a reduced example?

Yeah, sure. This seems to break for me when co-pilot is on. I am using:

  • libtiledb 2.27.0
  • VS Code 1.96.2
import os
import shutil
from pathlib import Path

import numpy as np

import tiledb

min_idx = np.iinfo(np.int32).min
max_idx = np.iinfo(np.int32).max - 1

uri = "test"

dom = tiledb.Domain(
    tiledb.Dim(name="rows", domain=(min_idx, max_idx), tile=256, dtype="int32"),
    tiledb.Dim(name="cols", domain=(min_idx, max_idx), tile=256, dtype="int32"),
)

schema = tiledb.ArraySchema(
    domain=dom,
    sparse=False,
    attrs=[tiledb.Attr(name="a", dtype="int32")],
)

if os.path.exists(Path(uri)):
    shutil.rmtree(Path(uri))

tiledb.Array.create(uri, schema)

data = np.full((10000, 10000), 1, dtype=np.dtype("int32"))
with tiledb.open(uri, "w", attr="a") as A:
    A[0:data.shape[0], 0:data.shape[1]] = data

with tiledb.open(uri, "r") as A:
    out = A[0:data.shape[0], 0:data.shape[1]]

out["a"]

How are you running the code? I tried some variations in VSCode with Copilot enabled and was not able to reproduce (run file in terminal, and in interactive prompt; run selection in terminal, and interactive prompt).

Thanks Isaiah for looking into this. I am running the code:

  1. within a Jupyter notebook
  2. in VSCode
  3. with Copilot turned on in the background with an enterprise license
  4. using Windows
  5. on a company laptop

and I just double checked with another colleague that co-pilot is doing something weird for us. Because it works fine on my personal Linux desktop with my own free copilot license.

So I think it might be some enterprise security software running in the background that our IT has put in. Not sure, but I think this might just be an edge case for enterprise users not on TileDB cloud.

Shout Out: We’ve been using TileDB with Azure blob and it’s been absolutely fantastic so far. You guys really have some magic going on under the hood.

1 Like

What I am trying to understand is how you are actually executing the python code within VSCode – what button(s) are you pushing to make the code run? (x-ref Get Started Tutorial for Python in Visual Studio Code)

Shout Out : We’ve been using TileDB with Azure blob and it’s been absolutely fantastic so far. You guys really have some magic going on under the hood.

Great to hear, thanks!

Oh, within a Jupyter notebook I am just:

  1. Chunking the pasted code above into cells
  2. Executing each cell in sequence with Shift+Enter

The VS Code workspace is running with the python, Jupyter and copilot extensions enabled, in a virtual environment (venv + Python3.11) that has tiledb (0.33.0) installed from pip.