Find characters common among all strings
You are given multiple Strings as an input. Your task is to find the characters that are present in every string, and return them in any order. There may be duplicates.
If the same character is present in all strings more than once, you have to return multiple copies of it.
Example:
Input:
"cattle", "throttle", "rotten", "torrent"Output:
't', 'e', 't'Input:
"cattle", "throttle", "rotten", "tree"Output:
't', 'e'Input:
"hauberk", "ill"Output:
Input:
"haunt", "haunting-haunt"Output:
'h', 'u', 'a', 't', 'n'
Constraints
- Input is only ASCII
- 1 <=
inputlength <= 100 - 1 <=
input[i]length <= 100
I/O format
Flexible w/ default I/O policies.
$$Let's\ see\ your\ shortest\ solutions!$$
27 Answers 27
Jelly, 3 bytes
œ&/
Takes a single input argument with a list of strings.
Explanation
Reduce (/) using multiset intersection (œ&).
Python, (削除) 65 (削除ここまで) 60 bytes
-5 thanks to Jitse (unnecessary generator-to-list inside join(): -2; replace set(X) with {*X}: -3)
lambda L:"".join(a*min(w.count(a)for w in L)for a in{*L[0]})
lambda L:"".join( # Return a string by concatenating a sequence of
a* # - letters
min(w.count(a)for w in L) # - multiplied by the minimum common count,
for a in # - taken from
{*L[0]} # unique letters from the first string.
)
A previous, 7 bytes more expensive, solution had ...join(L and[...]) to account for the case of empty input, that was later clarified to be out of scope.
Discarded approaches: reduce ((削除) 112 (削除ここまで) 108 bytes) and recursive (100 bytes)
The most popular genre of the other answers, i.e. "reduce/fold multiset intersection", doesn't have a direct port to Python (no multisets). The closest would be using Counters but the imports are very verbose, and so is the .elements() method to make the results a primitive.
Explicit (112 bytes, Attempt This Online!):
lambda L:[*reduce(lambda a,b:(k:=Counter)(a)&k(b),L).elements()]
from functools import*
from collections import*
Further golfed by Stef (108 bytes, Attempt This Online!):
lambda L:[*reduce((k:=Counter).__and__,map(k,L)).elements()]
from functools import*
from collections import*
We save only a few bytes by replacing reduce with recursion (100 bytes, Attempt This Online!)
from collections import*
k=Counter
f=lambda L:[*(k(L[0])&k(f(L[1:]))).elements()]if L[1:]else[*L[0]]
-
1\$\begingroup\$ -4bytes on the
reduceversion:lambda L:[*reduce((k:=Counter).__and__,map(k,L)).elements()]\$\endgroup\$Stef– Stef2024年06月10日 19:01:18 +00:00Commented Jun 10, 2024 at 19:01
R, 49 bytes
\(S)Reduce(\(x,y)y[pmatch(x,y,0)],strsplit(S,""))
Or alternatively, this 66 bytes version:
R, 66 bytes
\(S,`+`=make.unique)Reduce(\(x,y)y[match(+x,+y,0)],strsplit(S,""))
In the both cases the function pmatch/match compares two words (vectors of the characters) and outputs the positions of the matching characters, while subsetting produces the characters themselves. Reduce runs this process quasi-recursively and finally outputs only those characters which matched all words.
J, 21 bytes
a.#~[:<./+/@(=/&a.)&>
Takes input as list of boxed words.
- For each word in the list:
- For each letter in the word:
- Create a length 256 bit mask corresponding to the ascii alphabet, which is 1 if the letter equals that ascii character and 0 otherwise.
- Sum those bit masks. Now we have length 256 array, with each position giving us the letter count within the word.
- For each letter in the word:
- Take the element-wise min across all those length 256 letter counts.
- Use that new count mask to duplicate the letters of the ascii alphabet.
NARS2000 (APL), 3 characters; 6 bytes
∩⍦/
/ reduce using
⍦ multi-set
∩ intersection
Haskell, 75 bytes
import Data.List
f x=[y|y<-mapM(group.sort)x,[z]<-[nub$concat y]]>>=minimum
Explanation:
group . sortsorts the characters of a given string, and then splits the result into a list of substrings, each composed of multiple occurrences of the same character (e.g.(group . sort) "cattle"is equal to["a", "c", "e", "l", "tt"]); thesortis necessary to ensure that all occurrences of a given character are contiguous before thegroup.mapM g xis the cartesian product of the lists inmap g x*; for example, letys = mapM g ["cattle", "throttle", "rotten", "torrent"]whereg = (group . sort):- the first element of
ysis["a", "e", "e", "e"], where"a"is the first element ofg "cattle", the first"e"is the first element ofg "throttle", and so on; - the last element of
ysis["tt", "ttt", "tt", "tt"], where the first"tt"is the last element ofg "cattle","ttt"is the last element ofg "throttle", and so on.
- the first element of
- Then,
nub $ concat yconcatenates all substrings ofy, andnubkeeps only the first occurrence of each character; for example:nub $ concat ["a", "e", "e", "e"]is"ae",nub $ concat ["tt", "ttt", "tt", "tt"]is"t".
- With the
[z] <- ...syntax we select only theys such thatnub $ concat ycomprises a single element, since the pattern match fails ifnub $ concat yhas zero or more than one element; for example, the first element ofysis filtered out, while the last element ofysis kept; as a consequence, we select only thoseys that contain only substrings that all contain the same character. - Finally,
(>>= minimum)is equivalent toconcatMap minimum;minimumapplied to a list like["tt", "ttt", "tt", "tt"]selects the shortest string, so it select the substring containingnoccurrences of the same character so that every string of the initial input contains at leastnoccurrences of the considered character; theconcatis applied so the function returns"ett"and not["e", "tt"].
*: the cartesian product is a very inefficient and unnecessary step for solving the problem, but it allows for a short(er) solution
-
1\$\begingroup\$ Can you add an explanation? There's some cool stuff I don't understand going on in this answer, and I want to be able to fully appreciate it. \$\endgroup\$DLosc– DLosc2024年06月11日 22:04:33 +00:00Commented Jun 11, 2024 at 22:04
-
1\$\begingroup\$ @DLosc done, hope it's clear enough. Thank you for your interest :) \$\endgroup\$matteo_c– matteo_c2024年06月15日 12:36:57 +00:00Commented Jun 15, 2024 at 12:36
Retina 0.8.2, 40 bytes
%O`.
!`(.)(?<=^.*(1円+))(?=.*(¶.*2円.*)*$)
Try it online! Takes each string on its own line and outputs each common character in ASCII order on its own line. Explanation:
%O`.
Sort each string into ASCII order.
!`(.)
Output all characters...
(?<=^.*(1円+))
... from the first string, counting how many times this character has appeared so far...
(?=.*(¶.*2円.*)*$)
... that appear that many times in all of the other strings as well.
Vyxal 3 l, 3 bytes
/Þ)
Try it Online! (link is to literate version)
uses literate mode, ends up being a port of jelly anyways due to weird behavior in the set intersection
-
1\$\begingroup\$ 1. you have an
hflag. 2. This is not right, try adding attocattleand it will result inttet. \$\endgroup\$Jonathan Allan– Jonathan Allan2024年06月09日 14:16:53 +00:00Commented Jun 9, 2024 at 14:16
Charcoal, 18 bytes
WS⊞υιΦθ⬤υ›NoλιNo...θκι
Try it online! Link is to verbose version of code. Takes input as a list of newline-terminated strings. Explanation:
WS⊞υι
Read the input into an array.
θ First string
Φ Filtered where
υ List of strings
⬤ All satisfy
No Count of
ι Current character
λ In current string
› Is greater than
No Count of
ι Current character
θ In first string
... Truncated to length
κ Outer index
Taking the input as a JSON array would save 3 bytes, as υ would be replaced with θ which itself would have to be replaced with ⌊θ (which isn't necessarily the first string but it will be consistent which is all that's required here).
JavaScript (ES10), 62 bytes
Expects words as arrays of characters and returns in the same format.
a=>a.reduce((s,w)=>s.flatMap(c=>w.splice(w.indexOf(c)>>>0,1)))
Commented
a => // a[] = input array
a.reduce((s, w) => // for each word w[] in a[], with accumulator s[]:
s.flatMap(c => // for each character c in s[]:
w.splice( // remove from w[]:
w.indexOf(c) // the first occurrence of c in w[]
>>> 0, // if not found, we turn the index -1 into
1 // the maximum unsigned 32-bit value so that
// nothing is removed
) // end of splice() -> returns [c] or []
) // end of flatMap() -> removes all empty entries
) // end of reduce()
vemf, 5 bytes
Takes a list of strings.
ϖ֨\
Though vemf has no built-in multiset intersection, it doesn't do too badly, because û is made to implement multiset operations anyways (stolen from BQN, which doesn't happen to have an entry). The obvious solution would be ╧ Reduce │û\╓¿ Multiset Intersection, but it turns out leaving the Nones there until the end saves 1 byte. online
Zsh & coreutils, (削除) 110 (削除ここまで) (削除) 93 (削除ここまで) 83 bytes
d()(<<<1ドル|fold -1|sort)
m=1ドル;while ((#>1))m=`comm -12 <(d $m) <(d 2ドル)`&&shift
<<<$m
d()is a function to turn a string liketreeinto a sorted column of data;e, e, r, t.$mis a string of common letters built up by iterating.while ((#>1))iterates over the input arguments (enumerated by$#, andshiftedeach time)commtakes 2 input files of sorted columnar data. The-12option returns only data common to both files. The<( )things redirect the commandsd $mandd 2ドルto temp files forcommto read.
-
\$\begingroup\$ TODO: try using the
${name:*arrayname}array intersection. \$\endgroup\$roblogic– roblogic2024年06月10日 12:19:20 +00:00Commented Jun 10, 2024 at 12:19
Nibbles, 5 nibbles (2.5 bytes)
/$`&$
/$`&$ # full program
/$`&$@ # with implicit arg shown
/ # fold over
$ # input:
`& # list intersection of
$@ # L & R elements
Pip, (削除) 16 (削除ここまで) 12 bytes
@SS_Mg@X*UQa
Takes the strings as command-line arguments. Attempt This Online!
Explanation
Pip doesn't have multiset builtins, so we have to do it by hand:
@SS_Mg@X*UQa
a Take the first string (by convenience; could be any of them)
UQ Uniquify
X* Convert each character to a regex matching that character
@ For each of those, find all matches in
g Each command-line argument
M Map this function to each list of lists of matches:
SS_ Sort using string comparison (in this case, since the characters
are all the same, this means shortest first)
@ Get the first (shortest) one
Concatenate and autoprint (implicit)
For example, with inputs "cattle" "throttle" "rotten" "torrent":
UQa "catle"
X* [`c`;`a`;`t`;`l`;`e`]
g@ [[["c"];[];[];[]];[["a"];[];[];[]];[["t";"t"];["t";"t";"t"];["t";"t"];["t";"t"]];[["l"];["l"];[];[]];[["e"];["e"];["e"];["e"]]]
SS_M [[[];[];[];["c"]];[[];[];[];["a"]];[["t";"t"];["t";"t"];["t";"t"];["t";"t";"t"]];[[];[];["l"];["l"]];[["e"];["e"];["e"];["e"]]]
@ [[];[];["t";"t"];[];["e"]]
which outputs tte.
Google Sheets, 279 bytes
=let(f,lambda(t,split(regexreplace(t,"\B|\b","→"),"→")),g,lambda(t,join(,sort(tocol(f(t))))),v,map(tocol(A:A,1),g),w,lambda(m,rows(v)=counta(ifna(filter(v,regexmatch(v,m))))),f(join(,unique(reduce(,f(A1),lambda(a,c,let(s,regexextract(g(A1),c&"+"),{a;ifs(w(s),s,w(c),c,1,)})))))))
Put the strings in cells A1:A and the formula in cell B1.
screenshot
Ungolfed:
=let(
f, lambda(t, split(regexreplace(t, "\B|\b", "→"), "→")),
g, lambda(t, join(, sort(tocol(f(t))))),
v, map(tocol(A:A, 1), g),
w, lambda(m, counta(v) = counta(ifna(filter(v, regexmatch(v, m))))),
r, reduce(, f(A1), lambda(a, c, let(
s, regexextract(g(A1), c & "+"),
vstack(
a,
ifs(w(s), s, w(c), c, 1, )
)
))),
f(join(, unique(r)))
)
The formula sorts characters within strings and finds repeated characters in A1 in the other strings, falling back to finding non-repeated characters in A1. When the number of matches is the same as the total number of strings, the repeated characters are included in the result (or ditto for a non-repeated character.) Finally, get uniques, possibly including repeated characters.
(Note: While this will give the correct result for all test cases presented in the question, it won't handle a triple like "aardvark" in A1 correctly when there are doubles like "lama" in other strings. Triples and above are fine in the rest of strings so "lama", "aardvark" does get the correct result. Note also that regex special characters are not escaped.)
-
3\$\begingroup\$ I read "handle duplicates" here as "handle arbitrary amount of repetition". That's the usual intended meaning. \$\endgroup\$Jonah– Jonah2024年06月09日 02:31:51 +00:00Commented Jun 9, 2024 at 2:31
-
\$\begingroup\$ @Jonah yes... I agree that the usual intended meaning in computing obviously includes triples and above. The dictionary definition of "duplicate" mentions "twin, double, clone, match, mate, fellow, counterpart". Spreadsheet formula languages have no variables but constants only, and functions cannot have side-effects. Could not think of a concise way of handling the case where there are different triples or above in all strings. I will look at adding a slight improvement if and when there is a test case where the answer fails. \$\endgroup\$doubleunary– doubleunary2024年06月09日 07:40:39 +00:00Commented Jun 9, 2024 at 7:40
05AB1E, 9 bytes
ε{æ}.»Ãéθ
(Output is sorted.)
Try it online or verify all test cases.
Explanation:
ε # Map over each string of the (implicit) input-list:
{ # Sort the characters in the string
æ # Pop and push the powerset
}.» # After the map: reduce the list of lists of strings by:
à # Keep powerset-strings which are present in both lists
é # Then sort the remaining powerset-strings by length
θ # And pop and push the last/longest one
# (which is output implicitly as result)
Arturo, 68 bytes
$->L->join map unique L0円'a->repeat a min map L=>[enumerate&=>[&=a]]
A port of Nicola Sap's Python answer. Takes input as a list of lists of characters as singleton strings.
AWK,-vRS="," -F "" 122 bytes
1~NR{for(;i++<NF;)b[$i]++}NR>1{for(i in b)b[i]=(c=gsub(i,X))?(b[i]<c?b[i]:c):0}END{for(i in b)for(j=1;j++<=b[i];)printf i}
It's not working in TIO for whatever reason, but you can try this instead:
BEGIN{RS=",";FS=X}1~NR{for(;i++<NF;)b[$i]++}NR>1{for(i in b)b[i]=(c=gsub(i,X))?(b[i]<c?b[i]:c):0}END{for(i in b)for(j=1;j++<=b[i];)s=s i;print s}
BEGIN{RS=",";FS=X} # split up data (-vRS="," -F "")
1~NR{for(;i++<NF;)b[$i]++} # first record becomes dict
NR>1{for(i in b) # compare new strings to dict
b[i]=(c=gsub(i,X))?(b[i]<c?b[i]:c):0} # return matches
END{for(i in b)for(j=1;j++<=b[i];) # goes through remaining chars
print i}
Dyalog APL, (削除) 25 (削除ここまで) 23 bytes
Well, not having a built-in multiset (that I know of (using TryAPL...)) here's a much longer solution. I couldn't quite mold ∩ to give me the multiset functionality I needed, but I'm not convinced it is not possible.
{x/⍨⌊⌿↑+/ ̈⍵∘.∊⍨ ̈⊂x←⎕AV}
x←⎕AV # Using AV because it conveniently has upper- and lower-case characters, store that in x
∘.∊⍨ # Find the membership outer product between
⍵ ̈ # each input word
⊂ # and the whole (extraneous) alphabet
+/ ̈ # Sum over to get counts by letter
⌊⌿↑ # Mix so I can find the minimum count by column. This is the per-letter multiplicity. 0 would mean its not present in every word
x/⍨ # Replicate over that alphabet to pull as many letters as the multiset multiplicity
💎
Created with the help of Luminespire.
Here, then, is my multi-purpose multiset intersection:
{x/⍨⌊⌿↑+/ ̈⍵∘.∊⍨ ̈⊂x←∪∊⍵}
"cattle", "throttle", "rotten", "tree"(which should give't', 'e'if I understand correctly). \$\endgroup\$