Searching for repeated characters in a sequence of characters

Question 1

The following code solves this problem:

The 3072 characters below contain a sequence of 5 characters which is repeated. However, there is a twist: one of the sequences has a typo. To be specific, you're looking for two groups of 5 characters which share 4 of those characters. Enter the 4 shared characters below to proceed.

Any suggestions on how it can be improved using idiomatic Lisp/Clojure?

(ns slurp.slurp)
(defn get-five [in]
 (if (> (.length in) 4)
 (subs in 0 5)
 nil))
(defn how-good [a b position goodness bad-position]
 (if (= position 5)
 (list goodness bad-position)
 (if (= (subs a position (+ 1 position)) (subs b position (+ 1 position)))
 (recur a b (+ 1 position) (+ 1 goodness) bad-position)
 (recur a b (+ 1 position) goodness position))))
(defn matches [a b]
 (let [x (how-good a b 0 0 -1)]
 (list (> (first x) 3) (last x))))
(defn remove-bad-char [in-string bad-index]
 (str (subs in-string 0 bad-index) (subs in-string (+ 1 bad-index))))
(defn sub-parse [left in]
 (let [right (get-five in)]
 (if (not (= right nil))
 (let [match-result (matches left right)]
 (if (first match-result)
 (let [bad-index (last match-result)]
 (println left "is a match. bad-position:" (last match-result) "ans: "(remove-bad-char right bad-index)))
 (recur left (subs in 1)))))))
(defn parse [in]
 (let [left (get-five in)] ;; as soon as we hit the ass end of the input string, get-five returns null and we stop trying to match.
 (if (not (= left nil))
 (do
 (sub-parse left (subs in 1))
 (recur (subs in 1))))))
(parse (slurp "/Users/clojure/challange.txt"))
;; https://www.readyforzero.com/challenge/491f97bc25574346aed237d43e7d0404
;; (answer is i2+r)

Question 2

Actual URL of input data appears to be: readyforzero.com/challenge-data/…

Question 3

this isn't common-lisp. tag is incorrect.

Question 4

Yeah, I've done the ready for zero programming challenge too... :)

Question 5

I'd start bottom-up on this one if you want to do it in a slightly more idiomatic / functional style. Also it's good to get the data in Clojure sequences rather than strings as it makes it a bit easier to manipulate.

You're trying to work on groups of five adjacent characters so I'd set up a sequence of such adjacent groups:

; define some source data, right answer is "1235"
(def char-seq (seq "dfyfgvbufi12345idvdfhopshbsopgeobvufpdhbugphp123a5ghauoyhewqojbvrbon"))
; now partition into adjacent overlapping groups of five (with a step of 1)
(def char-groups (partition 5 1 char-seq))
; prove it works
(take 3 char-groups)
=> ((\d \f \y \f \g) (\f \y \f \g \v) (\y \f \g \v \b))

Now we want to define what a valid match means: in this case 4 characters should be the same in two groups being compared.

; helper function to count the number of true values in a collection / sequence
(defn count-true [coll]
 (reduce 
 #(+ %1 (if %2 1 0))
 0 
 coll))
; function to determine if exactly four characters are equal in two groups
(defn matches? [group1 group2]
 (= 4
 (count-true (map = group1 group2))))

Finally you just need to run the matches? function over all possible groups:

(def results
 (distinct
 (keep identity
 (for [group1 char-groups 
 group2 char-groups]
 (if (matches? group1 group2)
 (filter identity (map #(if (= %1 %2) %1 nil) group1 group2))
 nil)))))
results
=> ((1円 2円 3円 5円))

Question 6

thanks mikera. I got to learn some new stuff: partition reduce distinct keep indentity filter map for. btw, why did you need (keep identity ....) ?

Question 7

I see that the keep identity is used to filter the nil's returned by: (for [group1 char-groups group2 char-groups] (println group1 group2)) But I can't understand where the nils come from when char-groups Does not return any nils. Why is the for putting them in?

Question 8

the (for ....) construct returns nils when there is no match via the if statement. This is an artifact of the way the for construct works (it creates a lazy sequence over all possible combinations of char groups). Incidentally this is also why you need the "distinct" because it will match the char-groups twice (once in normal order, once in reverse order)

Question 9

Here is my version. Works on any seq (not just strings) and is fully lazy.

(def data (slurp "data.txt"))
(defn same-pos 
 "Return all the elements of xs and ys that match, position-wise. e.g.
 (same-pos [\\a \\x \\c] [\\a \\r \\c]) returns [\\a \\c]. Returns nil if xs
 and ys have different lengths."
 [xs ys]
 (when (= (count xs) (count ys))
 (filter (complement nil?) (map #(if (= %1 %2) %1) xs ys))))
(def answer
 (let [ps (partition 5 1 data)] ; build seq of sliding 5-tuples
 (->> (for [x ps y ps] (same-pos x y)) ; n^2 FTW
 (filter (comp (partial = 4) count)) ; 4 of 5 positions must match
 (first))))
(println (apply str "Answer: " answer)) ; will print "Answer: i2+r"

Update: I didn't read other answers beforehand (wanted to figure it out myself), but it looks like I've got basically the same thing as mikera. So maybe not a whole lot to learn here :)

mikera mikera 3612 silver badges5 bronze badges · Answer 1 · 2011-07-16 11:25:28Z

I'd start bottom-up on this one if you want to do it in a slightly more idiomatic / functional style. Also it's good to get the data in Clojure sequences rather than strings as it makes it a bit easier to manipulate.

You're trying to work on groups of five adjacent characters so I'd set up a sequence of such adjacent groups:

; define some source data, right answer is "1235"
(def char-seq (seq "dfyfgvbufi12345idvdfhopshbsopgeobvufpdhbugphp123a5ghauoyhewqojbvrbon"))
; now partition into adjacent overlapping groups of five (with a step of 1)
(def char-groups (partition 5 1 char-seq))
; prove it works
(take 3 char-groups)
=> ((\d \f \y \f \g) (\f \y \f \g \v) (\y \f \g \v \b))

Now we want to define what a valid match means: in this case 4 characters should be the same in two groups being compared.

; helper function to count the number of true values in a collection / sequence
(defn count-true [coll]
 (reduce 
 #(+ %1 (if %2 1 0))
 0 
 coll))
; function to determine if exactly four characters are equal in two groups
(defn matches? [group1 group2]
 (= 4
 (count-true (map = group1 group2))))

Finally you just need to run the matches? function over all possible groups:

(def results
 (distinct
 (keep identity
 (for [group1 char-groups 
 group2 char-groups]
 (if (matches? group1 group2)
 (filter identity (map #(if (= %1 %2) %1 nil) group1 group2))
 nil)))))
results
=> ((1円 2円 3円 5円))

thanks mikera. I got to learn some new stuff: partition reduce distinct keep indentity filter map for. btw, why did you need (keep identity ....) ?
I see that the keep identity is used to filter the nil's returned by: (for [group1 char-groups group2 char-groups] (println group1 group2)) But I can't understand where the nils come from when char-groups Does not return any nils. Why is the for putting them in?
the (for ....) construct returns nils when there is no match via the if statement. This is an artifact of the way the for construct works (it creates a lazy sequence over all possible combinations of char groups). Incidentally this is also why you need the "distinct" because it will match the char-groups twice (once in normal order, once in reverse order)

overthink overthink 1614 bronze badges · Answer 2 · 2011-07-19 03:25:53Z

Here is my version. Works on any seq (not just strings) and is fully lazy.

(def data (slurp "data.txt"))
(defn same-pos 
 "Return all the elements of xs and ys that match, position-wise. e.g.
 (same-pos [\\a \\x \\c] [\\a \\r \\c]) returns [\\a \\c]. Returns nil if xs
 and ys have different lengths."
 [xs ys]
 (when (= (count xs) (count ys))
 (filter (complement nil?) (map #(if (= %1 %2) %1) xs ys))))
(def answer
 (let [ps (partition 5 1 data)] ; build seq of sliding 5-tuples
 (->> (for [x ps y ps] (same-pos x y)) ; n^2 FTW
 (filter (comp (partial = 4) count)) ; 4 of 5 positions must match
 (first))))
(println (apply str "Answer: " answer)) ; will print "Answer: i2+r"

Update: I didn't read other answers beforehand (wanted to figure it out myself), but it looks like I've got basically the same thing as mikera. So maybe not a whole lot to learn here :)

Stack Exchange Network

Searching for repeated characters in a sequence of characters

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Searching for repeated characters in a sequence of characters

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions