Finding the first non-repeated character in a string

Question 1

Given a string, find the first non-repeated character in it.

E.g., "yellow" should return "y"

There are several solutions for this in other languages, but I haven't seen one written in Perl.

Can this be done in a more Perl-ish way or generally in an even shorter way?

use strict;
use warnings "all";
while (<DATA>)
{
 while (m,(.),g)
 {
 my $c = 1ドル;
 if (s,1,,ドルg < 2)
 {
 print "$c\n";
 last;
 }
 }
}
__DATA__
yellow
tooth

Question 2

What should be output for string aaabbc?

Question 3

@Tushar the output would be "c".

Question 4

One way is to replace all repeating characters using regex (.)1円* and get the first character of the resulting string.

Question 5

@yuri: 1円 must be used in the regex part, but in the replacement part you have to use 1ドル.

Question 6

perl -E '$_ = "aabbbcd"; s/(.)1円+//g; /(.)/ && say 1ドル'

Question 7

Don't reinvent the wheel. There is a function (i.e., singleton) in the package MoreUtils that does the job:

#!/usr/bin/perl
use Modern::Perl;
use List::MoreUtils qw(singleton);
while (<DATA>) {
 chomp; # don't forget it, it removes the linebreak.
 # split explodes the string in character
 # singleton keeps characters that appear only once
 # ($first) contains the first character that appears only once.
 my ($first) = singleton split//, $_;
 say $first;
}
__DATA__
yellow
tooth

Output:

y
h

Question 8

This is a nice solution but what if you work in an environment where you can't install extra dependencies?

Question 9

@yuri: Inspect the module and copy the code of the function.

Question 10

The previous answer is great because it poses an alternate solution which leverages existing code. The advantages of using CPAN modules are that the code tends to be:

well-documented
well-tested
of known coverage

But, there is also value in reviewing the code you posted.

Overview

The code is already quite "Perl-ish". I like how you break out of the loop early, as soon as you find no repeated characters; that is efficient.

Fatal

It is great that you used strict and warnings.

My preference is to use a very strict version of warnings:

use warnings FATAL => 'all';

In my experience, the warnings have always pointed to a bug in my code. The issue is that, in some common usage scenarios, it is too easy to miss the warning messages unless you are looking for them. They can be hard to spot even if your code generates a small amount of output, not to mention anything that scrolls off the screen. This option will kill your program dead so that there is no way to miss the warnings.

chomp

It would be good to use chomp in the outer while loop to remove the newline character. While this will not alter the behavior of the code, it more explicitly conveys the intent because you don't really want the newline character to be part of your analysis.

Regex

It is much more common to use the // regular expression delimiters than the ,, delimiters that you used. I think most people would find the code easier to understand with //. There is nothing wrong with your code; this is merely a style issue.

Naming

It is great that you immediately gave a name to the special regex match variable 1ドル:

while (m,(.),g)
{
 my $c = 1ドル;

This is widely considered a good coding practice. However, the variable name $c is not very descriptive in this context. $char would be better:

 my $char = 1ドル;

Once you set this variable, you should no longer use 1ドル. Change:

 if (s,1,,ドルg < 2)

to:

 if (s,$char,,g < 2)

quotemeta

If your string can contain any character, not just letters, then the code does not work if the character is a regular expression metacharacter, such as the period (.). For example, try the code with this input string:

.point

The code prints nothing for that, but it should print .

quotemeta can be used to support metacharacters:

 my $char = quotemeta 1ドル;

Documentation

You should either add a comment near the top of your code to describe its purpose, or use plain old documentation (POD) and get manpage-like help with perldoc.

Function

The DATA block is great for creating self-contained code like this. However, you really want to create a sub which makes your code easier reuse and test. And adding tests gives you greater confidence that the code works as intended.

Layout

I prefer "cuddled" braces, where the opening brace is on the same line as the code, instead of on its own line. This saves on valuable vertical space. For example:

while (<DATA>) {

Here is new code with many of the suggestions above:

use strict;
use warnings FATAL => 'all';
while (<DATA>) {
 chomp;
 while (/(.)/g) {
 my $char = quotemeta 1ドル;
 if (s/$char//g < 2) {
 print "$char\n";
 last;
 }
 }
}
__DATA__
yellow
tooth
.point
llama

Outputs:

y
h
\.
m

Toto Toto 5791 gold badge8 silver badges15 bronze badges · Accepted Answer · 2017-05-24 10:00:17Z

Don't reinvent the wheel. There is a function (i.e., singleton) in the package MoreUtils that does the job:

#!/usr/bin/perl
use Modern::Perl;
use List::MoreUtils qw(singleton);
while (<DATA>) {
 chomp; # don't forget it, it removes the linebreak.
 # split explodes the string in character
 # singleton keeps characters that appear only once
 # ($first) contains the first character that appears only once.
 my ($first) = singleton split//, $_;
 say $first;
}
__DATA__
yellow
tooth

Output:

y
h

This is a nice solution but what if you work in an environment where you can't install extra dependencies?
@yuri: Inspect the module and copy the code of the function.

Stack Exchange Network

Finding the first non-repeated character in a string

2 Answers 2

Overview

Fatal

chomp

Regex

Naming

quotemeta

Documentation

Function

Layout

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Finding the first non-repeated character in a string

2 Answers 2

Overview

Fatal

chomp

Regex

Naming

quotemeta

Documentation

Function

Layout

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions