TileDB-Py trying to connect to HDFS - but I don't want it to

ilveroluca · November 16, 2021, 9:56am

Hello,

I’m using TileDB-Py in a Docker container. As I try to create the TileDB context, with a configuration that points to S3 storage, the call fails with TileDB trying to connect to an non-existent HDFS:

hdfsBuilderConnect(forceNewInstance=1, nn=default, port=0, kerbTicketCachePath=(NULL), userName=(NULL)) error:
(unable to get root cause for java.lang.NoClassDefFoundError)
(unable to get stack trace for java.lang.NoClassDefFoundError)

The Docker image contains the Hadoop client libraries, but I don’t need to access HDFS and I don’t want to use them. Is there something I can do to “shut off” or avoid turning on HDFS-related functionality in TileDB?

The configuration I’m using is quite simple:

{
"vfs.s3.endpoint_override": "minio:9000",
"vfs.s3.scheme": "http",
"vfs.s3.region": "",
"vfs.s3.verify_ssl": "false",
"vfs.s3.use_virtual_addressing": "false",
"vfs.s3.use_multipart_upload": "false",
"vfs.s3.aws_access_key_id": "abc",
"vfs.s3.aws_secret_access_key": "def"
}

Cheers,

Luca

ihnorton · November 16, 2021, 12:30pm

Hi @ilveroluca,

My suspicion is that you may have a very old version of TileDB/TileDB-Py? A few questions so we can try to understand what is happening here:

what version of TileDB-Py, and how did you install it?
is it your own Dockerfile?
can you please share the protocol of the URI you are connecting to? eg s3:// azure:// etc.

Thanks,
Isaiah

ihnorton · November 16, 2021, 1:38pm

Updating: after some discussion, we see the issue – libtiledb is trying to initialize the HDFS client unconditionally, whenever the HDFS library is present (which usually is not true in our TileDB-Py test setup).

We will make the HDFS setup completely lazy (on-demand) in the next release to eliminate the startup error. Thank you for pointing this out @ilveroluca.

ilveroluca · November 16, 2021, 3:11pm

You’re quite welcome! FWIW, I managed to work around the issue by eliminating all HADOOP* environment variables.

Topic		Replies	Views
Internal TileDB uncaught exception; basic_string::compare:	4	897	November 6, 2020
GCS compatibility with Python API	3	658	April 5, 2022
Cant connect to tiledb in Docker	3	233	May 1, 2024
How to connect to TileDB Cloud programmatically	5	126	August 20, 2024
TileDB in AWS behind a corporate proxy server	4	1007	August 5, 2020

TileDB-Py trying to connect to HDFS - but I don't want it to

Related topics