2
\$\begingroup\$

Let X = "1234567891011..." the infinite string contains all positive integers. str is a sequence of digits. We are asked to find the first location in X that str appears.

I have tried the KMP algorithm. It runs fine on small input, but poorly on input such as str = "667788999". It takes forever to finish.
I also tried to use regexpr with a sliding window; the above test case will still cost 20s. Our programs are tested on hidden test cases. And even using the second approach, my program still timed out for 1 test case.

Are there any ways to improve the efficiency, or any other algorithms?

We cannot use Rcpp - R base packages only.

KMP_version = function(s) {
 pattern = as.integer(strsplit(s, "")[[1]])
 m = length(pattern)
 
 compute_failure = function(pattern) {
 n = length(pattern)
 lps = integer(n)
 len = 0
 lps[1] = 0
 
 for (i in 2:n) {
 while (len > 0 && pattern[len + 1] != pattern[i]) {
 len = lps[len]
 }
 if (pattern[len + 1] == pattern[i]) {
 len = len + 1
 }
 lps[i] = len
 }
 return(lps)
 }
 
 get_digits = function(num) {
 if (num == 0) return(0)
 digits = integer(floor(log10(num)) + 1)
 idx = length(digits)
 while (num > 0) {
 digits[idx] = num %% 10
 num = num %/% 10
 idx = idx - 1
 }
 return(digits)
 }
 
 lps = compute_failure(pattern)
 
 idx = 0 # current matching idx in s
 pos = 0 # current pos in S, the infinite string
 n = 1 # current number
 
 repeat {
 digits = get_digits(n)
 
 for (digit in digits) {
 pos = pos + 1
 
 while (idx > 0 && digit != pattern[idx + 1]) {
 idx = lps[idx]
 }
 
 if (digit == pattern[idx + 1]) idx = idx + 1
 
 if (idx == m) return(pos - m + 1)
 }
 n = n + 1
 }
}
regexpr_version = function(s) {
 m = nchar(s)
 buffer = "" # Buffer to hold the sliding window
 pos = 0 # Position in the infinite string S
 n = 1 # Current number to append
 batch_size = 10000
 
 repeat {
 numbers = paste0(n:(n + batch_size - 1), collapse = "")
 search_str = paste0(buffer, numbers)
 match = regexpr(s, search_str, fixed = TRUE)[1]
 
 # match found
 if (match != -1) {
 start_pos = pos - nchar(buffer) + match
 return(start_pos)
 }
 
 pos = pos + nchar(numbers)
 buffer = substr(search_str, nchar(search_str) - m + 2, nchar(search_str))
 n = n + batch_size
 }
}
toolic
14.5k5 gold badges29 silver badges203 bronze badges
asked Oct 17, 2024 at 6:21
\$\endgroup\$
0

1 Answer 1

4
\$\begingroup\$

Instead of jumping straight into string searching, perhaps a complete change of approach is warranted?

I think we should start by examining the search string to see whether it's a sequence of consecutive 1-, 2-, ... n-digit numbers (remember to handle offsetting so that the first and last numbers may be incomplete - and if there's no repeat, we'll need to choose the lexicographically-lowest rotation of the digits). Once we have done that, it's a simple matter of arithmetic to calculate its position in the infinite list.

I recommend writing a set of unit tests to exercise the code. They won't be exactly the same as the hidden test battery, but with a sufficiently devious mind you can work outward from the simplest case to most complex, building up the function as you go.

answered Oct 17, 2024 at 6:55
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.