4
\$\begingroup\$

I was given following task in the interview:

Given a string, find shortest substring in it that contains all of the different characters contained in the original string.

Here is my solution in emacs lisp:

(defun add-to-hash (c ht)
 "Add an instance of character C to hash table HT"
 (puthash c (1+ (gethash c ht 0)) ht))
(defun remove-from-hash (c ht &optional try)
 "Remove an instance of character C from hash table HT
Return nil if no instances of C remain in HT"
 (let ((old (gethash c ht 0)))
 (if (and try (< 2 old))
 nil
 (if (> 1 (puthash c (1- (gethash c ht 0)) ht))
 (remhash c ht)
 t))))
(defun find-min-substring (sstr)
 "Find minimal substring that contains all the characters in a given string"
 ;; get all characters
 (let* ((all-chars (make-hash-table))
 (slen (length sstr))
 (fcnt (progn
 (mapc (lambda (c) (add-to-hash c all-chars)) sstr)
 (hash-table-count all-chars)))
 (beg 0) (end fcnt)
 (res sstr))
 (if (= end slen)
 res
 (let* ((cand (substring sstr beg end))
 (cand-chars (make-hash-table))
 (ccnt (progn
 (mapc (lambda (c) (add-to-hash c cand-chars)) cand)
 (hash-table-count cand-chars))))
 ;; find first candidate, that is a substring with all the characters
 (while (< ccnt fcnt)
 (add-to-hash (aref sstr end) cand-chars)
 (setq end (1+ end))
 (setq ccnt (hash-table-count cand-chars)))
 (setq cand (substring sstr beg end))
 ;; shorten it as much as possible
 (while (remove-from-hash (aref sstr beg) cand-chars t)
 (setq beg (1+ beg)))
 (setq cand (substring sstr beg end))
 (setq res cand)
 ;; check other variants
 (while (< end slen)
 ;; advance both ends
 (add-to-hash (aref sstr end) cand-chars)
 (setq end (1+ end))
 (remove-from-hash (aref sstr beg) cand-chars)
 (setq beg (1+ beg))
 (setq ccnt (hash-table-count cand-chars))
 ;; another candidate is found
 (when (= ccnt fcnt)
 ;; only change candidate if it was shortened
 (while (remove-from-hash (aref sstr beg) cand-chars t)
 (setq beg (1+ beg))
 (setq cand (substring sstr beg end)))
 (setq res cand)))
 res))))
asked May 10, 2017 at 10:42
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

I know this is old, but I will write some remarks anyways, hope you will find it useful.

  1. This code lacks knowledge of usual language shortcuts, eg. (if x nil y) is equivalent to (unless x y). Similarly, (setq x (1+ x)) is the same as (incf x), (setq x y) (setq z q) is the same as (setq x y z q) inside (implicit) progn. (puthash c (1- (gethash c ht 0)) ht) is the same as (decf (gethash c ht 0)).
  2. Language offers specialized data-structure for working with characters--char-table.
  3. It is better to express algorithmic code in terms it is easy for humans to interpret (i.e. leave the low-level details of implementation to helper functions, while giving helpers meaningful names. For example add-to-hash isn't particularly better than puthash. Perhaps better partitioning of your code would involve functions like unique-characters-of, generate-candidate etc.

Finally, and most importantly, your code doesn't really solve the problem, try this for instance:

(find-min-substring "aaaaabacbaaaaac") ; => "acbaaaa"

Below is an alternative solution. It is very straight-forward (probably inefficient), but (hopefully) correct:

(defun string-cost (s)
 (cl-loop
 with mask = (make-vector 128 0)
 for c across s do
 (setf (aref mask c) 1)
 finally (cl-return (cl-reduce '+ mask))))
(defun generate-candidates (s)
 (list (substring s 0 (1- (length s)))
 (substring s 1)))
(defun choose-candidate (a b min-cost)
 (let ((a-candidate (min-common-substring a min-cost))
 (b-candidate (min-common-substring b min-cost)))
 (if (< (length a-candidate) (length b-candidate))
 a-candidate
 b-candidate)))
(defun min-common-substring (s &optional min-cost)
 (unless min-cost (setf min-cost (string-cost s)))
 (cl-destructuring-bind (a b)
 (generate-candidates s)
 (let ((a-cost (string-cost a))
 (b-cost (string-cost b)))
 (cond
 ((and (>= a-cost min-cost)
 (>= b-cost min-cost))
 (choose-candidate a b min-cost))
 ((>= a-cost min-cost)
 (min-common-substring a min-cost))
 ((>= b-cost min-cost)
 (min-common-substring b min-cost))
 (t s)))))

Where more involved, but slightly more efficient version of string-cost could look like this:

(defun string-cost (s)
 (cl-loop
 with mask = (make-char-table t 0)
 for c across s do
 (set-char-table-range mask c 1)
 finally
 (cl-return
 (let ((cost 0))
 (map-char-table
 #'(lambda (k v)
 (when (and (/= v 0) (consp k))
 (setf v (- (cdr k) (1- (car k)))))
 (incf cost v))
 mask)
 cost))))
answered Oct 15, 2017 at 12:32
\$\endgroup\$
3
  • \$\begingroup\$ Does the 128 in your code mean that your function only works for ASCII? \$\endgroup\$ Commented Oct 15, 2017 at 18:27
  • \$\begingroup\$ @RolandIllig yup, that was the intention, but also see the second string-cost where it uses char-table which can adjust its size based on character range used. \$\endgroup\$ Commented Oct 16, 2017 at 6:04
  • \$\begingroup\$ Thanks for answering! It seems error in my code was where i used "<" instead of ">", and shortening of strings failed to occur. Did not test it through... Will educate myself on things you have mentioned, starting with cl-loop. \$\endgroup\$ Commented Oct 16, 2017 at 14:45

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.