How to connect to TileDB Cloud programmatically

Hey tiledb team,

I am trying out the 101 Population Genomics tutorial.

After installing the package through conda and pip I tried to run the 1st snippet containing:

vcf_bucket = "s3://tiledb-inc-demo-data/examples/notebooks/vcfs/1kg-dragen"
samples_to_ingest = ["HG00096_chr21.gvcf.gz",
                     "HG00097_chr21.gvcf.gz", 
                     "HG00099_chr21.gvcf.gz", 
                     "HG00100_chr21.gvcf.gz", 
                     "HG00101_chr21.gvcf.gz"]
sample_uris = [f"{vcf_bucket}/{s}" for s in samples_to_ingest]

# The URIs of the samples to be ingested should look like this:
sample_uris

and I get:

miniconda3/envs/tiledb-vcf-tutorial/lib/python3.8/site-packages/tiledb/cloud/config.py:96: UserWarning: You must first login before you can run commands. Please run tiledb.cloud.login.
  warnings.warn(
Got ERROR: "Could not open mysql.plugin table: "Table 'mysql.plugin' doesn't exist". Some plugins may be not loaded" errno: 2000
Got ERROR: "Can't open and lock privilege tables: Table 'mysql.servers' doesn't exist" errno: 2000
Got ERROR: "Can't open the mysql.func table. Please run mysql_upgrade to create it." errno: 2000

What shall I run before the for-loop so I can connect to tiledb-cloud instance?

I have a Ubuntu machine and PyCharm (Community) IDE.

Thanks,
Damianos

Hi @damianos.melidis,

Thank you for reaching out!

The snippet you shared from that tutorial does not reference any calls to tiledb-cloud which is what’s showing up in your traceback, so i’m confused why that’s happening there.

That snippet is just running some purely Python builtin commands as preparation for working with tiledbvcf. You can ensure tiledb-cloud isn’t interfering by commenting out any mentions to import tiledb.cloud at the top of your script/module/notebook to double check.

The latter portion of the tutorial discusses how to use TileDB Cloud. You can sign up for a free account and request some introductory credits if you’d like! For that you will need to login and you can do that using tiledb.cloud.login (ref) and passing your credentials there (either your username and password or preferably your API token).

So in summary, since the portion you’re referencing is completely open source, you do not need to login to TileDB Cloud programatically. To take full advantage of TileDB and explore our Cloud platform, you can sign up for a free account and login as described above.

Let me know if you have any more questions,

Spencer
TileDB Team

@damianos.melidis if you don’t have a TileDB Cloud account, comment or remove the lines:

import tiledb.cloud
import tiledb.cloud.groups

Hey Spencer,

I have remove the tiledb cloud imports and I could run some more code indeed.

However when I come to the point of:

# Ignore any warnings
db = tiledb.sql.connect()
pd.read_sql(sql=f"select * from `{variant_stats_uri}` where pos >= 5030025 and pos <= 5030087", con=db)

running the sql connect brings up:

Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)

About the conda env that I use in this PyCharm project,
I have created an empty conda env and then install the dependencies (as explain in the start of the tutorial post).

Thanks,
Damianos

Hi @damianos.melidis, upon a quick test I’m not able to reproduce the segfault you are seeing. I’d like to confirm the package versions so I can test with closer to your setup. Would you mind running conda list -e and post the output? I’ll then be able to try to reproduce and diagnose what you are seeing.

Thank you in advance!

Hi @seth, I would like to provide a txt file containing the output of the conda list -e, but the ''upload" function does not like text files :stuck_out_tongue: so here you go:

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
_libgcc_mutex=0.1=conda_forge
_openmp_mutex=4.5=2_gnu
aws-c-auth=0.7.25=h15d0e8c_6
aws-c-cal=0.7.3=h8dac057_2
aws-c-common=0.9.27=h4bc722e_0
aws-c-compression=0.2.18=h038f3f9_10
aws-c-event-stream=0.4.2=h570d160_21
aws-c-http=0.8.7=ha1f794c_4
aws-c-io=0.14.18=h0040ed1_5
aws-c-mqtt=0.10.4=hc14a930_17
aws-c-s3=0.6.4=h558cea2_8
aws-c-sdkutils=0.1.19=h038f3f9_2
aws-checksums=0.1.18=h038f3f9_10
aws-crt-cpp=0.27.5=hd0b8a3b_7
aws-sdk-cpp=1.11.379=h7dc8893_3
azure-core-cpp=1.13.0=h935415a_0
azure-identity-cpp=1.8.0=hd126650_2
azure-storage-blobs-cpp=12.12.0=hd2e3451_0
azure-storage-common-cpp=12.7.0=h10ac4d7_1
azure-storage-files-datalake-cpp=12.11.0=h325d260_1
bzip2=1.0.8=h4bc722e_7
c-ares=1.33.0=ha66036c_0
ca-certificates=2024.7.4=hbcca054_0
fmt=11.0.2=h434a139_0
gflags=2.2.2=he1b5a44_1004
glog=0.7.1=hbabe93e_0
htslib=1.20=h5efdd21_2
icu=75.1=he02047a_0
keyutils=1.6.1=h166bdaf_0
krb5=1.21.3=h659f571_0
ld_impl_linux-64=2.38=h1181459_1
libabseil=20240116.2=cxx17_he02047a_1
libarrow=17.0.0=h8756180_8_cpu
libarrow-acero=17.0.0=he02047a_8_cpu
libarrow-dataset=17.0.0=he02047a_8_cpu
libarrow-substrait=17.0.0=hc9a23c6_8_cpu
libblas=3.9.0=23_linux64_openblas
libbrotlicommon=1.1.0=hd590300_1
libbrotlidec=1.1.0=hd590300_1
libbrotlienc=1.1.0=hd590300_1
libcblas=3.9.0=23_linux64_openblas
libcrc32c=1.1.2=h9c3ff4c_0
libcurl=8.9.1=hdb1bdb2_0
libdeflate=1.21=h4bc722e_0
libedit=3.1.20191231=he28a2e2_2
libev=4.33=hd590300_2
libevent=2.1.12=hf998b51_1
libffi=3.4.4=h6a678d5_1
libgcc-ng=14.1.0=h77fa898_0
libgfortran-ng=14.1.0=h69a702a_0
libgfortran5=14.1.0=hc5f4f2c_0
libgomp=14.1.0=h77fa898_0
libgoogle-cloud=2.28.0=h26d7fe4_0
libgoogle-cloud-storage=2.28.0=ha262f82_0
libgrpc=1.62.2=h15f2491_0
libiconv=1.17=hd590300_2
liblapack=3.9.0=23_linux64_openblas
libnghttp2=1.58.0=h47da74e_1
libnsl=2.0.1=hd590300_0
libopenblas=0.3.27=pthreads_hac2b453_1
libparquet=17.0.0=haa1307c_8_cpu
libprotobuf=4.25.3=h08a7969_0
libre2-11=2023.09.01=h5a48ba9_2
libsqlite=3.45.2=h2797004_0
libssh2=1.11.0=h0841786_0
libstdcxx-ng=14.1.0=hc0a3c3a_0
libthrift=0.20.0=hb90f79a_0
libtiledb-sql=0.33.0=h453a68b_0
libtiledb-sql-py=2.1.5=py38h7f3f72f_0
libtiledbvcf=0.34.1=h53fe7cb_2
libutf8proc=2.8.0=h166bdaf_0
libuuid=2.38.1=h0b41bf4_0
libwebp-base=1.4.0=hd590300_0
libxcrypt=4.4.36=hd590300_1
libxml2=2.12.7=he7c6b58_4
libzlib=1.3.1=h4ab18f5_1
lz4-c=1.9.4=hcb278e6_0
ncurses=6.5=h59595ed_0
numpy=1.24.4=py38h59b608b_0
openssl=3.3.1=h4bc722e_2
orc=2.0.2=h669347b_0
packaging=24.1=pyhd8ed1ab_0
pandas=2.0.3=py38h01efb38_1
pcre2=10.44=hba22ea6_2
pip=24.2=py38h06a4308_0
pyarrow=17.0.0=py38hb563948_1
pyarrow-core=17.0.0=py38h7debecc_1_cpu
pyarrow-hotfix=0.6=pyhd8ed1ab_0
python=3.8.18=hd12c33a_1_cpython
python-dateutil=2.9.0=pyhd8ed1ab_0
python-tzdata=2024.1=pyhd8ed1ab_0
python_abi=3.8=5_cp38
pytz=2024.1=pyhd8ed1ab_0
re2=2023.09.01=h7f4b329_2
readline=8.2=h5eee18b_0
s2n=1.5.0=h3400bea_0
setuptools=72.1.0=py38h06a4308_0
six=1.16.0=pyh6c4a22f_0
snappy=1.2.1=ha2e4443_0
spdlog=1.14.1=hed91bc2_1
sqlite=3.45.2=h2c6b66d_0
tabulate=0.9.0=py38h06a4308_0
tiledb=2.25.0=h213c483_7
tiledb-py=0.31.1=py38hf7b374a_0
tiledbvcf-py=0.34.1=py38h0d54072_2
tk=8.6.13=noxft_h4845f30_101
tzdata=2024a=h0c530f3_0
wheel=0.43.0=py38h06a4308_0
xz=5.4.6=h5eee18b_1
zlib=1.3.1=h4ab18f5_1
zstd=1.5.6=ha6fb4c9_0

Thanks,
Damianos