165 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
6
votes
2
answers
202
views
How to drop duplicate values when merging dataframes
I have a DataFrame that I want to merge and drop only duplicates values based on column name and row. For example, key_x and key_y has the
same values in the same row in row 0,3,10,12,15.
My DataFrame
...
-1
votes
1
answer
743
views
How to get rid of rowl level duplicates?
Using MERGE INTO, but getting rid of all duplicate rows whereas my expectation was it should behave like df.dropDuplicates()
Using Below MERGE INTO it's deleting all rows of duplicate which is leading ...
3
votes
4
answers
190
views
Change pointer adress inside function in C
I'm trying to make a program that deletes extra white spaces from a string. It should keep one space but delete any extra.
I wrote this code, which works up until the point I have to change the passed ...
0
votes
0
answers
83
views
pyspark drop_duplicates() unexpectedly increases count
I am getting somewhat unexpected results with df.drop_duplicates().
df2 = df.dropDuplicates()
print(df2.count())
# prints 424527
print(df.count())
# prints 424510
I do not understand why count is ...
2
votes
1
answer
143
views
Dropping duplicates by column in PySpark
I have a PySpark dataframe like this but with a lot more data:
user_id
event_date
123
'2024-01-01 14:45:12.00'
123
'2024-01-02 14:45:12.00'
456
'2024-01-01 14:45:12.00'
456
'2024-03-01 14:45:12.00'
I ...
-1
votes
1
answer
55
views
Why does drop_duplicates in_place = True not work in this case? [duplicate]
Sample data:
data = [[1, '[email protected]'], [2, '[email protected]'], [3, '[email protected]']]
person = pd.DataFrame(data, columns=['id', 'email']).astype({'id':'int64', 'email':'object'})
...
0
votes
1
answer
39
views
How to use GROUP_CONCAT where only the sequential duplicates get dropped
My trafficlog table looks like this:
intLogID
strSessionID
strPage
1
e3a8240b39
./
2
e3a8240b39
./about
3
e3a8240b39
./contact
4
5accab7da9
./
5
e3a8240b39
./contact
6
e3a8240b39
./about
7
71ee2ea4fe
....
-1
votes
4
answers
84
views
removing values in pandas according to condition
In pandas dataframe I want to remove the row with duplicate employee id but accoridng to the condition where the salary is null
my original data is
EEID Name gender salary
0 EMP01 ayushi ...
2
votes
1
answer
39
views
Remove duplicates from a list of tuples that involve matrices
I am trying to remove the duplicates from a list of tuples that contains matrices. The tuples are composed of two items: a number and a corresponding matrix. To be considered a ‘duplicate’, both the ...
0
votes
0
answers
19
views
Removing rows from a dataframe based on a condition in Pandas but keep duplicates [duplicate]
This is a very unique requirement and I need some help. This is not removing duplicates from a csv.
I have rows of data and I want to remove rows based on a condition like below example.
Input file:
...
0
votes
1
answer
101
views
Design MS Access SQL Query to mimic 'Remove Duplicate' Excel function
In MS Access 2016 I want to run a query that is synonymous with the Excel 'Remove Duplicates' function.
The excel query will remove duplicates and always leave one record.
For example, if we have 4 ...
-1
votes
1
answer
40
views
Pyspark drop duplicates keep the non null row
Hi I have a dataset like this :
id
name
1
A
1
null
2
A
3
B
4
null
4
B
5
A
6
null
And I want to remove duplicates row and keep the row where the name is not null
This is the expected output
id
name
1
A
...
0
votes
1
answer
71
views
Pyspark drop duplicates when a column is null
I have a dataset like this :
id
name
1
A
2
null
2
B
3
C
3
null
4
null
In case of duplicate IDs, I want to keep the only value with a not null name
In this example, I want to get this table :
id
name
1
...
1
vote
1
answer
168
views
PostgreSQL: Delete duplicate rows based on matching md5 hashes
I have a table which is populated by data scraped from the web. After adding the new data to the table I want to delete duplicates if no changes have been made to specific columns.
I have tried the ...
0
votes
2
answers
128
views
How to remove duplicates using time difference with linq
I have an IEnumerable of an item class defined like this:
class Checkup
{
public Guid SubjectGuid { get; set; }
public Guid DoctorGuid { get; set; }
public DateTime Date {get; set;}
}
For ...