Delete duplicate entries in a text file [duplicate]

Question 1

I created a txt file using two requests, one LDAP and one SQL. Results of the two requests are stored in the same txt file.

The txt file looks like this :

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

Because a user can be in the two databases, I need to delete duplicate entries, using bash.
How can I do it?

Question 2

If you don't mind your file ending up sorted, sort it and filter it; either

sort -u file

if your sort supports it, or

sort file | uniq

if not, and you'll get on standard output the sorted list of unique email addresses.

If you want to keep the addresses in the original order, use awk:

awk '!(count[0ドル]++)' file

Question 3

sort -u doesn't report the unique line but the first in lines sort the same in current locale.

Question 4

@cuonglm Indeed, but is there a case where two different email addresses would have the same collation?

Question 5

@StephenKitt: 1@example.com and 2@example.com in en_US.utf8 locale.

Question 6

@cuonglm: LC_ALL=en_US.UTF-8; (echo 1@example.com; echo 2@example.com) | sort | uniq also merges both lines, so only the awk solution is viable in that case.

Question 7

@StephenKitt: It seems that you are using GNU uniq, it's not POSIX compliant in this case, you must use uniq -i.

score 5 · Accepted Answer · 2015-06-11 08:26:04Z

5

If you don't mind your file ending up sorted, sort it and filter it; either

sort -u file

if your sort supports it, or

sort file | uniq

if not, and you'll get on standard output the sorted list of unique email addresses.

If you want to keep the addresses in the original order, use awk:

awk '!(count[0ドル]++)' file

Share

Improve this answer

answered Jun 11, 2015 at 8:26

Stephen Kitt's user avatar

Stephen Kitt Stephen Kitt

477k59 gold badges1.2k silver badges1.3k bronze badges

7

sort -u doesn't report the unique line but the first in lines sort the same in current locale.

cuonglm
– cuonglm

2015年06月11日 08:43:36 +00:00
Commented Jun 11, 2015 at 8:43
@cuonglm Indeed, but is there a case where two different email addresses would have the same collation?

Stephen Kitt
– Stephen Kitt

2015年06月11日 08:51:20 +00:00
Commented Jun 11, 2015 at 8:51
@StephenKitt: 1@example.com and 2@example.com in en_US.utf8 locale.

cuonglm
– cuonglm

2015年06月11日 09:18:58 +00:00
Commented Jun 11, 2015 at 9:18
@cuonglm: LC_ALL=en_US.UTF-8; (echo 1@example.com; echo 2@example.com) | sort | uniq also merges both lines, so only the awk solution is viable in that case.

Stephen Kitt
– Stephen Kitt

2015年06月11日 18:23:31 +00:00
Commented Jun 11, 2015 at 18:23
@StephenKitt: It seems that you are using GNU uniq, it's not POSIX compliant in this case, you must use uniq -i.

cuonglm
– cuonglm

2015年06月12日 01:09:38 +00:00
Commented Jun 12, 2015 at 1:09

| Show 2 more comments

Stack Exchange Network

Delete duplicate entries in a text file [duplicate]

1 Answer 1

Linked

Hot Network Questions

Delete duplicate entries in a text file [duplicate]

1 Answer 1

Linked

Related

Hot Network Questions