Skip to main content
Stack Overflow
  1. About
  2. For Teams
Filter by
Sorted by
Tagged with
-6 votes
0 answers
37 views

I’m evaluating Apache Spark Structured Streaming for a large-scale dynamic contact segmentation use case and would like guidance on feasibility and recommended design patterns. Scenario We have: ~35 ...
Advice
0 votes
0 replies
36 views

I’ve been thinking about modular production for "transparent factories" and data center outer frames/shells. Do you think this approach actually improves data center construction, and how much ...
Best practices
0 votes
5 replies
97 views

I have been working as a Data Engineer and got this issue. I came across a use case where I have a view(lets name it as inputView) which is created by reading data from some source. Now somewhere ...
6 votes
0 answers
129 views

I have two issues (Note that this code is generated in google colab): Issue 1 I want to stream the droid dataset, which is almost 2TB big. I want to only use data which matches my filter conditions. ...
1 vote
3 answers
205 views

I have a pandas DataFrame (df) with two columns (namely Tuple and Set) and approximately 100,000,000 entries. The Tuple column data is a string of exactly 9 characters. The Set column data is an ...
1 vote
1 answer
51 views

I am using below code to create Dataproc Spark Session to run a job from google.cloud.dataproc_spark_connect import DataprocSparkSession from google.cloud.dataproc_v1 import Session session = Session(...
0 votes
0 answers
85 views

I use Spark+ Hudi to write data into S3. I was writing data in bulk_insert mode, which cause there be many small paruqet files in Hudi table. Then I try to schedule clustering on the Hudi table: ...
0 votes
1 answer
54 views

I am trying to connect Power BI Desktop to our Apache Doris database (which is the VeloDB-Doris distribution). I am using the standard MySQL data source connector in Power BI, as Doris is compatible ...
1 vote
1 answer
83 views

I want to break down a column which contains several different features, so that a new column is built for each feature, also taking as column name the feature name. I already tried with: data = {'...
coridefe's user avatar
0 votes
1 answer
79 views

Geowave, Geomesa and S2 Geometry offers a Hilbert index that seems suitable for a quadrilateral grid, with a unique 64-bit cell_ID per cell, for all grid levels... However, I don't see how to use ...
0 votes
1 answer
55 views

I’m running an Apache Doris 2.1.7 cluster (3 FEs + 6 BEs) on CentOS 7. After issuing DROP TABLE big_fact, the table disappears from the information_schema, but the underlying tablets remain on every ...
0 votes
0 answers
18 views

I encountered an error while setting up and using Doris during unit testing: Error starting FE or unit test locally Cannot find external parser table action_table.dat I searched the community and ...
1 vote
1 answer
144 views

I encountered an issue while running an Apache Doris FE cluster, where the fe.log file shows the following error: 2024年01月09日 14:46:23,840 WARN (UNKNOWN fe_f78cf069_b094_4d9d_ac9c_ddc521dd494d(-1)|1) [...
0 votes
0 answers
57 views

We are intermittently encountering a query failure on our Apache Doris cluster. The query fails completely with the following error message: Query error: [E-230]missed_versions is empty This error ...
0 votes
0 answers
73 views

During the process of setting up and using Doris, I encountered a query error: Failed to get scan range, no queryable replica found in tablet: xxxx This error seems to be a scanning error for the ...

15 30 50 per page
1
2 3 4 5
...
530

AltStyle によって変換されたページ (->オリジナル) /