I was given following task in the interview:
Given a string, find shortest substring in it that contains all of the different characters contained in the original string.
Here is my solution in emacs lisp:
(defun add-to-hash (c ht)
"Add an instance of character C to hash table HT"
(puthash c (1+ (gethash c ht 0)) ht))
(defun remove-from-hash (c ht &optional try)
"Remove an instance of character C from hash table HT
Return nil if no instances of C remain in HT"
(let ((old (gethash c ht 0)))
(if (and try (< 2 old))
nil
(if (> 1 (puthash c (1- (gethash c ht 0)) ht))
(remhash c ht)
t))))
(defun find-min-substring (sstr)
"Find minimal substring that contains all the characters in a given string"
;; get all characters
(let* ((all-chars (make-hash-table))
(slen (length sstr))
(fcnt (progn
(mapc (lambda (c) (add-to-hash c all-chars)) sstr)
(hash-table-count all-chars)))
(beg 0) (end fcnt)
(res sstr))
(if (= end slen)
res
(let* ((cand (substring sstr beg end))
(cand-chars (make-hash-table))
(ccnt (progn
(mapc (lambda (c) (add-to-hash c cand-chars)) cand)
(hash-table-count cand-chars))))
;; find first candidate, that is a substring with all the characters
(while (< ccnt fcnt)
(add-to-hash (aref sstr end) cand-chars)
(setq end (1+ end))
(setq ccnt (hash-table-count cand-chars)))
(setq cand (substring sstr beg end))
;; shorten it as much as possible
(while (remove-from-hash (aref sstr beg) cand-chars t)
(setq beg (1+ beg)))
(setq cand (substring sstr beg end))
(setq res cand)
;; check other variants
(while (< end slen)
;; advance both ends
(add-to-hash (aref sstr end) cand-chars)
(setq end (1+ end))
(remove-from-hash (aref sstr beg) cand-chars)
(setq beg (1+ beg))
(setq ccnt (hash-table-count cand-chars))
;; another candidate is found
(when (= ccnt fcnt)
;; only change candidate if it was shortened
(while (remove-from-hash (aref sstr beg) cand-chars t)
(setq beg (1+ beg))
(setq cand (substring sstr beg end)))
(setq res cand)))
res))))
1 Answer 1
I know this is old, but I will write some remarks anyways, hope you will find it useful.
- This code lacks knowledge of usual language shortcuts, eg.
(if x nil y)
is equivalent to(unless x y)
. Similarly,(setq x (1+ x))
is the same as(incf x)
,(setq x y) (setq z q)
is the same as(setq x y z q)
inside (implicit)progn
.(puthash c (1- (gethash c ht 0)) ht)
is the same as(decf (gethash c ht 0))
. - Language offers specialized data-structure for working with characters--
char-table
. - It is better to express algorithmic code in terms it is easy for humans to interpret (i.e. leave the low-level details of implementation to helper functions, while giving helpers meaningful names. For example
add-to-hash
isn't particularly better thanputhash
. Perhaps better partitioning of your code would involve functions likeunique-characters-of
,generate-candidate
etc.
Finally, and most importantly, your code doesn't really solve the problem, try this for instance:
(find-min-substring "aaaaabacbaaaaac") ; => "acbaaaa"
Below is an alternative solution. It is very straight-forward (probably inefficient), but (hopefully) correct:
(defun string-cost (s)
(cl-loop
with mask = (make-vector 128 0)
for c across s do
(setf (aref mask c) 1)
finally (cl-return (cl-reduce '+ mask))))
(defun generate-candidates (s)
(list (substring s 0 (1- (length s)))
(substring s 1)))
(defun choose-candidate (a b min-cost)
(let ((a-candidate (min-common-substring a min-cost))
(b-candidate (min-common-substring b min-cost)))
(if (< (length a-candidate) (length b-candidate))
a-candidate
b-candidate)))
(defun min-common-substring (s &optional min-cost)
(unless min-cost (setf min-cost (string-cost s)))
(cl-destructuring-bind (a b)
(generate-candidates s)
(let ((a-cost (string-cost a))
(b-cost (string-cost b)))
(cond
((and (>= a-cost min-cost)
(>= b-cost min-cost))
(choose-candidate a b min-cost))
((>= a-cost min-cost)
(min-common-substring a min-cost))
((>= b-cost min-cost)
(min-common-substring b min-cost))
(t s)))))
Where more involved, but slightly more efficient version of string-cost
could look like this:
(defun string-cost (s)
(cl-loop
with mask = (make-char-table t 0)
for c across s do
(set-char-table-range mask c 1)
finally
(cl-return
(let ((cost 0))
(map-char-table
#'(lambda (k v)
(when (and (/= v 0) (consp k))
(setf v (- (cdr k) (1- (car k)))))
(incf cost v))
mask)
cost))))
-
\$\begingroup\$ Does the 128 in your code mean that your function only works for ASCII? \$\endgroup\$Roland Illig– Roland Illig2017年10月15日 18:27:06 +00:00Commented Oct 15, 2017 at 18:27
-
\$\begingroup\$ @RolandIllig yup, that was the intention, but also see the second
string-cost
where it useschar-table
which can adjust its size based on character range used. \$\endgroup\$wvxvw– wvxvw2017年10月16日 06:04:13 +00:00Commented Oct 16, 2017 at 6:04 -
\$\begingroup\$ Thanks for answering! It seems error in my code was where i used "<" instead of ">", and shortening of strings failed to occur. Did not test it through... Will educate myself on things you have mentioned, starting with
cl-loop
. \$\endgroup\$Srv19– Srv192017年10月16日 14:45:12 +00:00Commented Oct 16, 2017 at 14:45
Explore related questions
See similar questions with these tags.