338 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
0
votes
1
answer
121
views
Partitioned Table - Query filtering on partition field
I have a large table which I want to move to a partitioned model. I created the partitioned table, same fields as the original and partioning by a particular timestamp field (by range). I then ...
0
votes
0
answers
65
views
Does new S3 bucket quota change AWS data partitioning best practice for multi-tenant systems
I am trying to find updated information regarding aws best practices when it comes to multi-tenant data partitioning in S3.
From what I know and what I studied for when I did my AWS Solutions ...
1
vote
1
answer
140
views
Partition pruning in BigQuery with incremental model
I have a BigQuery table where a PubSub subscription inserts new web events every second.
This table is partition by:
column: derived_tstamp
type: timestamp
granularity: daily
To create a specific ...
2
votes
1
answer
374
views
Postgres partitioning with a primary key
I have a big database that represents a graph with a ton of data in it that is constantly growing. The database looks something like:
CREATE TABLE node (
id BIGSERIAL PRIMARY KEY,
created_at ...
0
votes
1
answer
445
views
Is changing date partitionning granularity a breaking change?
In Bigquery, suppose I create a table and partition it by a date column "mydate" with a "DAY" granularity.
Using DBT, this can be done using :
partition_by = {
"...
-1
votes
2
answers
219
views
What should be my partition key and sort key of dynamo db table?
I am about to create a dynamo db table which has below columns and each row will have unique data,
user id
profile Id
attribute1
1001
9001
x
1002
9002
x
table will have 1M records which means unique ...
0
votes
2
answers
116
views
Perform determinations within a data partition
I have a dataset as below from which I would like to draw some inferences.
Id
Nbr
Dt
Status
Cont1Sta1
DateLagInDays
Recurrence
1
2
2023年10月1日
1
1
2
2023年11月2日
0
1
2
2023年12月13日
0
1
3
2023年10月1日
0
1
3
2023-...
0
votes
1
answer
260
views
Postgres Data Partition in Rails 7.0.8
We have situation in the database, where we have to make one table schema of entire tables as data partitioned based on tenant id clause
Using
create_table "billing_schedule_lines_old", id: :...
1
vote
1
answer
116
views
Is it possible in PostgreSQL to restrict changes for files whose data is not actually changed?
Problem: We have a table "test", consists of sections "test_202309", "test_202310", "test_202311". The sections store data for September 2023, October 2023 and November 2023.
I using the command "...
1
vote
2
answers
2k
views
What is hybrid-columnar storage?
Snowflake stores data using a hybrid-columnar storage method. I understand what columnar storage is and its benefits, but what does the hybrid mean? Is this simply referring to Snowflake accessing ...
0
votes
1
answer
391
views
How Azure dedicated pool partition switch work efficiently when data is sharded over 60 distributions
A table contains xyz columns, with 3 years of data.
Index = clustered column index
Hash distribution column = product.
Partition column = date.
As the new year data arrive ...
0
votes
0
answers
23
views
Top or Sample N of subgroups in Teradata in a large data set ("No Spool Space" error)
I've tried several routes to getting the 10 records from each subset of a large dataset and the best I can do is querying each subgroup explicitly in the query.
My first attempt from the (Teradata ...
0
votes
1
answer
386
views
Splitting data into training, test and validation sets depending on variable dependent for machine learning
I am trying to split my data into training, test and validation groups within my data. I have 2 groups: control and TP and within these groups I have a secondary variable called Bio with numbers in ...
0
votes
1
answer
183
views
Creating a partitioned version of a BigQuery table scheduled for daily updates
I am faced with the following situation: among the BigQuery datasets which I am handling there is a rather large table - let us call it lt - that undergoes daily updates (more specifically, this table ...
0
votes
1
answer
2k
views
PySpark: querying Hudi partitioned table
I'm following the Apache Hudi documentation to write and read a Hudi table. Here's the code I'm using to create and save a PySpark DataFrame into Azure DataLake Gen2:
tableName = "my_hudi_table&...