Linux Classes
Linux Classes
Share This With a Friend
LINUX CLASSES - DATA MANIPULATION

How Can I Eliminate Duplicates in a Linux File?

The uniq command reads the input file and compares adjacent lines. Any line that is the same as the one before it will be discarded. In other words, duplicates are discarded, leaving only the unique lines in the file. Let's say you're a publisher with an inventory of all your books in the my.books file shown here:

Atopic Dermatitis for Dummies
Atopic Dermatitis for Dummies
Chronic Rhinitis Unleashed
Chronic Rhinitis Unleashed
Chronic Rhinitis Unleashed
Learn Nasal Endoscopy in 21 Days

To remove all the duplicates from the list of books, use this command:

uniq my.books
Atopic Dermatitis for Dummies
Chronic Rhinitis Unleashed
Learn Nasal Endoscopy in 21 Days

If you want to print only the book titles that are not duplicated (to find out which books you have one copy of), add the -u flag, like this:

uniq -u my.books
Learn Nasal Endoscopy in 21 Days

Conversely, you might want to exclude the titles that appear only once. If so, add the -d flag, like this:

uniq -d my.books
Atopic Dermatitis for Dummies
Chronic Rhinitis Unleashed

Now let's take inventory. To summarize the list of books and add a count of the number of times each one appears in the list, add the -c flag, like this:

uniq -c my.books
2 Atopic Dermatitis for Dummies
3 Chronic Rhinitis Unleashed
1 Learn Nasal Endoscopy in 21 Days

Note that the uniq command does not sort the input file, so you may want to use the sort command to prepare the data for uniq in advance. (See the end of this section for an example.)

Here's a recap of the flags you can use with the uniq command:

-u Print only lines that appear once in the input file.

-d Print only the lines that appear more than once in the input file.
-c
Precede each output line with a count of the number of times it was found.

Previous Lesson: Sorting Data
Next Lesson: Selecting Columns

[ RETURN TO INDEX ]



Comments - most recent first
(Please feel free to answer questions posted by others!)

Samit (07 Nov 2014, 09:39)
Hi Bob,

I need to eliminates duplicates records in the file based on there date for e.g. see below file
so from below file i need to keep the record with uniq value from coloumn 1 and which are occured recently i.e by latest date

what command should be best suited


112 absdsds .&checktime(2014,03,01,':') 03:49
112 absdsds .&checktime(2014,03,01,':') 03:50
113 absdsds .&checktime(2014,03,01,':') 03:32
113 absdsds .&checktime(2014,03,01,':') 03:32
112 absdsds .&checktime(2014,03,01,':') 03:49
112 absdsds .&checktime(2014,06,01,':') 03:49
112 absdsds .&checktime(2014,04,01,':') 03:49
AdelaideSEO (02 Apr 2013, 08:27)
Hey thanks for the Tut.
Is it possible or how do I deduplicate a list of items in a .txt file without sorting them - they are already in the correct order but I know there are duplicates. cheers Steve

Andrew (02 Jan 2013, 01:18)
Hi, I admire and love the content of this tutorial. But did you change your site a little bit? I cannot reach or read your articles on the new style with some water in the background, just the links, but the main article stays invisible.
MikeQ (14 Nov 2012, 19:32)
This is nice and all but what if you don't want to sort the data because it messes things up ? why can't I use uniq with just cat or something.?
Tevera (15 Feb 2012, 06:57)
simple and gud
Saif (06 Jan 2012, 18:51)
Short & Sweet tutorial.. Thanks Author..
beterlatethannotatall (12 Dec 2011, 14:38)
Nisa your guess/assumption is correct. It eliminates dupes that are consecutive. I.e. 111223334444111555 would be 123415, sorting first does fix that.

Biko all of these cmds work on stdin and stdout unless directed to do otherwise. So when you use a file as input, stdout is still where the modified data will end up unless you redirect stdout to the file to save the output. I.e. The sample data above in a file nums.

$ echo '111223334444111555' > nums
$ cat nums | sort | uniq > nums
$

see no output between prompts is what you should get. the > nums is the redirect for stdout, putting the output into nums file for saving.

$ cat nums
12345
$

This is what should be in the nums file now.
Check.
Rockin (31 May 2011, 00:27)
awesome !
fiyas (16 Mar 2011, 01:11)
Superb Tutorial.... Thanks
Nisa (22 Sep 2010, 07:11)
@Biko It is said that only consecutive lines will be tested for duplicity. You will have to sort the file first so that the duplicate lines are adjacent. Don't forget to save the output of the sorting to the same file
biko (29 Jul 2010, 09:23)
it just brings the list out but doesnt merge o eliminate the duplicates and wen i use the -c flag it just counts them all as 1.......
Bob Rankin (28 Jul 2010, 08:55)
@biko - What response do you get? I sure hope it's not "uniq is not recognized as an internal or external command, operable program or batch file." :-)
biko (28 Jul 2010, 07:46)
hello i tried the uniq command and it did not work alone or with either of the flags why?
umar ayaz (27 Jun 2010, 15:58)
Excellent simple tutorial about cut
Dennis (16 Mar 2010, 09:21)
evrica! it works grate with piping :))
Dennis (16 Mar 2010, 08:49)
does uniq work for consecutive lines only?

(that means we need to sort first, isn't ?)

Bob Rankin (05 Mar 2010, 09:36)
Did you sort first?
ramya (05 Mar 2010, 06:15)
i cannot remove the duplicate entries using uniq command why?

I welcome your comments. However... I am puzzled by many people who say "Please send me the Linux tutorial." This website *is* your Linux Tutorial! Read everything here, learn all you can, ask questions if you like. But don't ask me to send what you already have. :-)

NO SPAM! If you post garbage, it will be deleted, and you will be banned.
*Name:
Email:
Notify me about new comments on this page
Hide my email
*Text:




Copyright © by - Privacy Policy
All rights reserved - Redistribution is allowed only with permission.

Popular Linux Topics

Linux Intro
Linux Files
Linux Commands
Change Password
Copy Files
Linux Shell Basics

Linux Tutorial

Who is Doctor Bob?
What is Linux?
History of Unix
Operating Systems
What's Next?

Linux Basics

Living in a Shell
Root and Other Users
Virtual Consoles
Logoff and Shutdown
Choosing a Shell
The Command Prompt
Wildcards
Command History
Aliases
Redirection
Pipelines
Processes
Stopping a Program
Environment Variables
Help!

Linux Files

The Linux File System
Linux File Names
Linux Directories
Directory Terminology
Navigating the File System
Listing Linux Files
Displaying Linux Files
Copying and Renaming Files
Creating Files and Directories
Deleting Files and Directories
Linux Files - Wildcards
The Nine Deadly Keystrokes
Linux File Permissions
Changing File Permissions

Linux Commands

Important Linux Commands
Changing Your Password
Switching Users
Who is Logged In?
Date and Time
The Echo Command
Spell Checking
Printing Linux Files
Joining Files
Searching for Files
Comparing Files
Task Scheduling
Linking Files

Linux Editors

The Vi Editor
The Emacs Editor
The Pico Editor

Linux Data Manipulation

Slicing & Dicing
Heads or Tails?
Sorting Data
Eliminating Duplicates
Selecting Columns
Selecting Records
Search & Replace
Crunching Data
Finding Files
Pipe Fitting

Linux Shell Programming

Linux Shell Scripts
Executing a Script
Shell Script Variables
Shell Script Logic
Shell Script Looping
Shell Script Debugging

Perl Programming

Perl Basics
Perl Variables
Perl Arguments
Perl Logic
Perl Looping
Perl and Files
Perl Pattern Matching

Linux and Email

Sending Email
Reading Email
Other Mail Commands
Using Pine for Email
The Pine Inbox
Pine Email Basics
Pine Email Folders
Pine for Power Users

Compression and Encoding

Linux File Compression
Archiving With Tar
Compression With Gzip
Compress and Zcat
Zmore and Zless
Zip and Unzip
Encoding and Decoding
Encryption

Linux Does DOS

Accesing DOS Files
Accesing DOS Partitions
Running DOS Programs

Managing Linux

Updating Your Linux System
Installing Packages with RPM
Uninstalling Packages w/ RPM
Upgrading Packages with RPM
Querying Packages with RPM

AltStyle によって変換されたページ (->オリジナル) /