8,254 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
0
votes
0
answers
25
views
Jmx Prometheus agent - alternating and inconsistent values issue
Setup explanation:
I am using jmx prometheus agent, which starts with the VM as vm argument in HDFS setup.
Version: 0.20.0
There is a jmx http port displaying the Mbeans as JSON.
Problem:
When I view ...
0
votes
0
answers
98
views
Apache Flink FileSink compaction extremely slow with many hot buckets/paths
I have a Flink ETL job that reads from ~13 Kafka topics and writes data into HDFS using a FileSink with compaction enabled.
Right now, we have around 40 different output paths (buckets), and roughly ...
2
votes
0
answers
40
views
Count(*) query returns empty when using Tez, but works with MapReduce
I have hadoop + hive setup using docker, however when I try to run count(*) on my table it gives me an empty return when using Tez and the correct one when using MapReduce, the table is an external ...
0
votes
0
answers
20
views
Hadoop upload data using balancer to evenly distribute data across all nodes
I have a 3 node hadoop cluster (version 3.4.1) with java_home pointing to version 8 on each node.
I want to evenly distribute the uploaded data across all nodes when I type the following:
hdfs ...
0
votes
1
answer
53
views
How are ResourceManager and NodeManager deployed in relation to NameNode and DataNode in Hadoop?
I'm currently learning Hadoop and am a bit confused about how the Hadoop Distributed File System (HDFS) and YARN components interact, especially in terms of deployment across machines.
Here’s what I ...
0
votes
0
answers
174
views
MLflow does not upload images stored using mlflow.log_image()
I am working with Mlflow==2.19.0 in a Red Hat Enterprise Linux Server release 7.9 (Maipo). Everythig works fine except with the log_image method that for some reason is converting parts of the string ...
1
vote
0
answers
95
views
Configure apache hop to use HDFS, accessible by kerberos authentication, when creating transformation pipeline on files stored on HDFS
I would like to create file transformation pipelines by downloading input files from remote HDFS and storing outputs files on the same remote HDFS. Kerberos is used to authenticate to my hadoop ...
0
votes
0
answers
27
views
Java Hadoop client library supports Basic Authentication
I have a Cloudera cluster under knox gateway with basic authentication (username/password). I want to access HDFS (SWebHDFS) through SSL (https) using Java Apache Hadoop client library (Apache Hadoop ...
0
votes
2
answers
260
views
Setting up a DBeaver 25.0.1 connection to a Delta Lake v2.4 Parquet table on Hadoop 3.3.4 filesystem
I am trying to create a new connection from DBeaver to a Delta Lake Parquet file which is located on the HDFS filesystem which I successfully created with a Spark/Hadoop/Scala/io.delta application.
(...
0
votes
1
answer
82
views
Unable to stream data to azure blob using flink job
I'm running flink job and on my local machine I dont see any issue of streaming the data to Azure blob, but when I deploy on dev environment I'm seeing an error in the console like Caused by: org....
0
votes
1
answer
69
views
How to copy files with spaces in filenames from Unix to HDFS without renaming or loops?
I have a large number of files (tens of thousands) in a Unix directory that I need to copy to Hadoop using the command:
hdfs dfs -put * /hdfs_folder/
However, some of these files have spaces in their ...
1
vote
1
answer
107
views
How to read/write parquet on remote HDFS with python/pyspark in VSCode?
In Jupyter notebooks I suceed in reading parquet files in HDFS thanks to sparkmagic.
Spark magic conf starts with :
{
"kernel_python_credentials" : {
"username": "admin&...
0
votes
1
answer
26
views
strange hostnames of the hdfs nodes
Why hadoop node had a nodename like this: iZib208xfvbhmyx1rha3gqZ on an alicloud ECS
[root@worker1 hadoop-3.4.1]# hdfs namenode -format
2025年01月23日 10:13:46,887 INFO namenode.NameNode: STARTUP_MSG:
/*...
0
votes
1
answer
148
views
Spark Overwrite table , getting data loss when terminated at insertion stage
Objective :
we need to read the table in spark application and trasform the data and rewrite the same table
Senario :
I am trying to overwrite external non partitioned table with spark
Since same data ...
0
votes
0
answers
28
views
HBase consistency model(within the same cluster)
Background :
HBase reads seem to fall under the 'strong consistency' model, as : All reads are served from the master where the data has already been committed. As a result, the clients seem to always ...