ModuleNotFoundError import tiledbvcf in basics tutorial

New to tiledb. Got a tiledb cloud account with $10 credits. Executing the Notebook TileDB-VCF tutorial. Import tiledbvcf is erroring out. I shouldn’t even need this locally on my mac because I was trying to run it within the TileDB cloud. Anyways went ahead and installed conda and tiledbvcf successfully on my mac using the commands below:

https://docs.conda.io/projects/conda/en/latest/user-guide/install/macos.html

conda create -n tiledbvcf
conda activate tiledbvcf
conda install -c conda-forge -c bioconda -c tiledb tiledbvcf-py
(tiledbvcf) awssal@sal-macbook ~ % python -c "import tiledbvcf; print(tiledbvcf.version)"
0.26.7

Still getting the same error ModuleNotFoundError import tiledbvcf . Anybody have a solution?

import os
import warnings
warnings.filterwarnings("ignore")
import tiledb
import tiledb.cloud
import tiledbvcf
import numpy as np

print(
    f"tiledb v{tiledb.version.version}\n"
    f"numpy v{np.__version__}\n"
    f"tiledb-vcf v{tiledbvcf.version}\n"
    f"tiledb-cloud v{tiledb.cloud.version.version}\n"
)

ModuleNotFoundError                       Traceback (most recent call last)
Input In [1], in <cell line: 6>()
      4 import tiledb
      5 import tiledb.cloud
----> 6 import tiledbvcf
      7 import numpy as np
      9 print(
     10     f"tiledb v{tiledb.version.version}\n"
     11     f"numpy v{np.__version__}\n"
     12     f"tiledb-vcf v{tiledbvcf.version}\n"
     13     f"tiledb-cloud v{tiledb.cloud.version.version}\n"
     14 )

ModuleNotFoundError: No module named 'tiledbvcf'

Hello @Sal ! When running in TileDB Cloud notebooks we have three image options currently when launching a notebook, Basic Data Science, Genomics and Geospatial. Do you know which notebook image you were in when you were trying to run the tutorial that errored? You need to be in the Genomics image for tiledbvcf to be available.

TileDB Cloud also supports automatically selecting the notebook image for you for most tutorials. This works if you do not already have a notebook server running. Would you mind linking me to the tutorial you were trying to run and I’ll double check that it defaults to Genomics in the “Launch” button on the notebook details page.

For the module error on your macbook. Can you let me know, is the script you were running in the same conda environment as the one line python -c "import tiledbvcf; print(tiledbvcf.version)" you ran? Can you also let me know how you install tiledb-cloud was that with pip? I’m looking to try to see what might be different between when you ran the python command and then the script.

[quote=“seth, post:2, topic:661, full:true”]
Hello @Sal ! When running in TileDB Cloud notebooks we have three image options currently when launching a notebook, Basic Data Science, Genomics and Geospatial. Do you know which notebook image you were in when you were trying to run the tutorial that errored? You need to be in the Genomics image for tiledbvcf to be available.

I logged into the cloud console. It automatically selected a ‘Basic data science’ image and I opened up the tutorial_tiledbvcf_basics.ipynb. How do I switch to the ‘Genomics’ image ?

TileDB Cloud also supports automatically selecting the notebook image for you for most tutorials. This works if you do not already have a notebook server running. Would you mind linking me to the tutorial you were trying to run and I’ll double check that it defaults to Genomics in the “Launch” button on the notebook details page.

https://cloud.tiledb.com/server?id=b3e58b19-58b0-4e45-b4c6-6fcac6add3e0&name=tiledb_101_arrays&namespace=TileDB-Inc

For the module error on your macbook. Can you let me know, is the script you were running in the same conda environment as the one line python -c "import tiledbvcf; print(tiledbvcf.version)" you ran? Can you also let me know how you install tiledb-cloud was that with pip? I’m looking to try to see what might be different between when you ran the python command and then the script.
[/quote]

I did not install tiledb-cloud. I ran the python command from my terminal window on my mac. The TleBD-VCF tutorial Notebook I ran from the TileDB Cloud console.

here is a screenshot of my browser

Thanks for the additional details @Sal . In TileDB Cloud the easiest way to switch notebook images is to click the “Shut down” button on the top right corner. After your notebook is shutdown, you can select “Compute” on the left hand side menu. From the compute screen you can then then select launch notebook and select the image type.

Compute screen:

Select your region:

Select the notebook image type:

1 Like

Thanks Seth. I no longer get the ModuleNotFoundError: No module named ‘tiledbvcf’ now

I am running this notebook cloud/public/TileDB-Inc/tutorial_tiledbvcf_basics.ipynb. And now its erroring out at step 8.

ds.create_dataset(vcf_attrs=batch1_uris[0], enable_allele_count=True, enable_variant_stats=True)

# verify the array exists
os.listdir(array_uri)
[E::hts_open_format] Failed to open file "vfs://s3://tiledb-inc-demo-data/examples/notebooks/vcfs/1kgp3-chr1/HG00096.bcf" : Resource temporarily unavailable
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 ds.create_dataset(vcf_attrs=batch1_uris[0], enable_allele_count=True, enable_variant_stats=True)
      3 # verify the array exists
      4 os.listdir(array_uri)

File /opt/conda/lib/python3.9/site-packages/tiledbvcf/dataset.py:674, in Dataset.create_dataset(self, extra_attrs, vcf_attrs, tile_capacity, anchor_gap, checksum_type, allow_duplicates, enable_allele_count, enable_variant_stats, compress_sample_dim, compression_level)
    671     self.writer.set_compression_level(compression_level)
    673 # This call throws an exception if the dataset already exists.
--> 674 self.writer.create_dataset()

RuntimeError: TileDB-VCF exception: Cannot convert header to string; bad VCF header.

Was I supposed to provide my own AWS storage ?

@Sal thanks for bringing this additional error up. The root cause is that the bucket where the example VCF files is stored is located in us-east-1. The notebook you launched was in us-west-2. A small adjustment to the tutorial was needed to ensure setting the region. I’ve updated the notebook, the change was in cell #7, adding a config of the region:

# We set the region to us-east-1 so we can load the example vcf files
ds = tiledbvcf.Dataset(uri=array_uri, mode="w", cfg=tiledbvcf.ReadConfig(tiledb_config={"vfs.s3.region": "us-east-1"}))
ds```

@seth Now it cannot find the second batch of VCF files in cell #14

%%time 
ds = tiledbvcf.Dataset(uri=array_uri, mode="w") #Incremental update to the array, previous data is not touched 
ds.ingest_samples(sample_uris = batch2_uris)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File <timed exec>:2, in <module>

File /opt/conda/lib/python3.9/site-packages/tiledbvcf/dataset.py:841, in Dataset.ingest_samples(self, sample_uris, threads, total_memory_budget_mb, total_memory_percentage, ratio_tiledb_memory, max_tiledb_memory_mb, input_record_buffer_mb, avg_vcf_record_size, ratio_task_size, ratio_output_flush, scratch_space_path, scratch_space_size, sample_batch_size, resume, contig_fragment_merging, contigs_to_keep_separate, contigs_to_allow_merging, contig_mode, thread_task_size, memory_budget_mb, record_limit)
    839 if self.schema_version() < 4:
    840     self.writer.register_samples()
--> 841 self.writer.ingest_samples()

RuntimeError: TileDB-VCF exception: Error processing sample; URI 's3://tiledb-inc-demo-data/examples/notebooks/vcfs/1kgp3-chr1/HG00102.bcf' does not exist.