Newest 'hdfs' Questions

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

8,254 questions

0 votes

0 answers

25 views

Jmx Prometheus agent - alternating and inconsistent values issue

Setup explanation: I am using jmx prometheus agent, which starts with the VM as vm argument in HDFS setup. Version: 0.20.0 There is a jmx http port displaying the Mbeans as JSON. Problem: When I view ...

Kabelan E's user avatar

Kabelan E

asked Nov 14, 2025 at 15:55

0 votes

0 answers

98 views

Apache Flink FileSink compaction extremely slow with many hot buckets/paths

I have a Flink ETL job that reads from ~13 Kafka topics and writes data into HDFS using a FileSink with compaction enabled. Right now, we have around 40 different output paths (buckets), and roughly ...

Hello's user avatar

Hello

asked Aug 26, 2025 at 19:41

2 votes

0 answers

40 views

Count(*) query returns empty when using Tez, but works with MapReduce

I have hadoop + hive setup using docker, however when I try to run count(*) on my table it gives me an empty return when using Tez and the correct one when using MapReduce, the table is an external ...

Pedro Novaes's user avatar

Pedro Novaes

asked Aug 1, 2025 at 18:02

0 votes

0 answers

20 views

Hadoop upload data using balancer to evenly distribute data across all nodes

I have a 3 node hadoop cluster (version 3.4.1) with java_home pointing to version 8 on each node. I want to evenly distribute the uploaded data across all nodes when I type the following: hdfs ...

vinhdiesal's user avatar

vinhdiesal

asked Jun 23, 2025 at 17:39

0 votes

1 answer

53 views

How are ResourceManager and NodeManager deployed in relation to NameNode and DataNode in Hadoop?

I'm currently learning Hadoop and am a bit confused about how the Hadoop Distributed File System (HDFS) and YARN components interact, especially in terms of deployment across machines. Here’s what I ...

Aaditya 's user avatar

Aaditya

asked May 25, 2025 at 8:48

0 votes

0 answers

174 views

MLflow does not upload images stored using mlflow.log_image()

I am working with Mlflow==2.19.0 in a Red Hat Enterprise Linux Server release 7.9 (Maipo). Everythig works fine except with the log_image method that for some reason is converting parts of the string ...

Ivan's user avatar

Ivan

asked Apr 24, 2025 at 18:19

1 vote

0 answers

97 views

Configure apache hop to use HDFS, accessible by kerberos authentication, when creating transformation pipeline on files stored on HDFS

I would like to create file transformation pipelines by downloading input files from remote HDFS and storing outputs files on the same remote HDFS. Kerberos is used to authenticate to my hadoop ...

user30133690's user avatar

user30133690

asked Apr 2, 2025 at 10:37

0 votes

0 answers

27 views

Java Hadoop client library supports Basic Authentication

I have a Cloudera cluster under knox gateway with basic authentication (username/password). I want to access HDFS (SWebHDFS) through SSL (https) using Java Apache Hadoop client library (Apache Hadoop ...

Andreas Oikonomou's user avatar

Andreas Oikonomou

3,287

asked Mar 31, 2025 at 19:31

0 votes

2 answers

260 views

Setting up a DBeaver 25.0.1 connection to a Delta Lake v2.4 Parquet table on Hadoop 3.3.4 filesystem

I am trying to create a new connection from DBeaver to a Delta Lake Parquet file which is located on the HDFS filesystem which I successfully created with a Spark/Hadoop/Scala/io.delta application. (...

Rene's user avatar

Rene

asked Mar 29, 2025 at 9:43

0 votes

1 answer

82 views

Unable to stream data to azure blob using flink job

I'm running flink job and on my local machine I dont see any issue of streaming the data to Azure blob, but when I deploy on dev environment I'm seeing an error in the console like Caused by: org....

Prajyod Kumar's user avatar

Prajyod Kumar

asked Mar 26, 2025 at 17:31

0 votes

1 answer

69 views

How to copy files with spaces in filenames from Unix to HDFS without renaming or loops?

I have a large number of files (tens of thousands) in a Unix directory that I need to copy to Hadoop using the command: hdfs dfs -put * /hdfs_folder/ However, some of these files have spaces in their ...

alejomarchan's user avatar

alejomarchan

asked Mar 21, 2025 at 13:09

1 vote

1 answer

107 views

How to read/write parquet on remote HDFS with python/pyspark in VSCode?

In Jupyter notebooks I suceed in reading parquet files in HDFS thanks to sparkmagic. Spark magic conf starts with : { "kernel_python_credentials" : { "username": "admin&...

LJRB's user avatar

LJRB

asked Mar 12, 2025 at 12:47

0 votes

1 answer

26 views

strange hostnames of the hdfs nodes

Why hadoop node had a nodename like this: iZib208xfvbhmyx1rha3gqZ on an alicloud ECS [root@worker1 hadoop-3.4.1]# hdfs namenode -format 2025年01月23日 10:13:46,887 INFO namenode.NameNode: STARTUP_MSG: /*...

yizhen liu's user avatar

yizhen liu

asked Feb 9, 2025 at 7:09

0 votes

1 answer

148 views

Spark Overwrite table , getting data loss when terminated at insertion stage

Objective : we need to read the table in spark application and trasform the data and rewrite the same table Senario : I am trying to overwrite external non partitioned table with spark Since same data ...

kartheek's user avatar

kartheek

asked Feb 6, 2025 at 9:38

0 votes

0 answers

28 views

HBase consistency model(within the same cluster)

Background : HBase reads seem to fall under the 'strong consistency' model, as : All reads are served from the master where the data has already been committed. As a result, the clients seem to always ...

Harshit's user avatar

Harshit

1,341

asked Feb 1, 2025 at 3:07

15 30 50 per page

2 3 4 5

...

551 Next

CollectivesTM on Stack Overflow

Jmx Prometheus agent - alternating and inconsistent values issue

Apache Flink FileSink compaction extremely slow with many hot buckets/paths

Count(*) query returns empty when using Tez, but works with MapReduce

Hadoop upload data using balancer to evenly distribute data across all nodes

How are ResourceManager and NodeManager deployed in relation to NameNode and DataNode in Hadoop?

MLflow does not upload images stored using mlflow.log_image()

Configure apache hop to use HDFS, accessible by kerberos authentication, when creating transformation pipeline on files stored on HDFS

Java Hadoop client library supports Basic Authentication

Setting up a DBeaver 25.0.1 connection to a Delta Lake v2.4 Parquet table on Hadoop 3.3.4 filesystem

Unable to stream data to azure blob using flink job

How to copy files with spaces in filenames from Unix to HDFS without renaming or loops?

How to read/write parquet on remote HDFS with python/pyspark in VSCode?

strange hostnames of the hdfs nodes

Spark Overwrite table , getting data loss when terminated at insertion stage

HBase consistency model(within the same cluster)

Hot Network Questions