"Digit Queries" CSES Solution giving TLE

Question 1

I really need a more accurate way to size inf_string. This code works but gets exponentially more inefficient the bigger max(queries) is, and therefore just can't hold up to the gargantuan numbers that CSES throws into it.

Problem: Consider an infinite string that consists of all positive integers in increasing order: 12345678910111213141516171819202122232425... Your task is to process q queries of the form: what is the digit at position k in the string? The first input line has an integer q: the number of queries. After this, there are q lines that describe the queries. Each line has an integer k: a 1-indexed position in the string. For each query, print the corresponding digit.

queries = []
for i in range(int(input())):
 queries.append(int(input()))
# TODO FIND A BETTER WAY TO LIMIT inf_string LENGTH
inf_string = [str(x) for x in range(1, max(queries)+1)]
inf_string = "".join(inf_string)
for k in queries:
 print(inf_string[k-1])

Question 2

What are the bounds for \$q\$ and \$k\$?

Question 3

1≤q≤1000 1≤k≤10^18 @Peilonrayz

Question 4

Why don't you give us the link?

Question 5

this is oeis.org/A033307

Question 6

Building or iterating the whole string takes far too long. Instead group the numbers by length:

length 1 digit: 1 to 9 -> 9 numbers => 9 digits total
length 2 digits: 10 to 99 -> 90 numbers => 180 digits total
length 3 digits: 100 to 999 -> 900 numbers => 2700 digits total
length 4 digits: 1000 to 9999 -> 9000 numbers => 36000 digits total
etc

The number of digits per group (9, 180, 2700, 36000, etc) are easy to compute. Skip these groups of equal-length numbers until you reach the right group (the remaining k falls into it). Then just compute the right number inside the group and get the right digit from it.

for _ in range(int(input())):
 k = int(input())
 length = 1
 while k > 9 * 10**(length-1) * length:
 k -= 9 * 10**(length-1) * length
 length += 1
 q, r = divmod(k-1, length)
 print(str(10**(length-1) + q)[r])

Takes about 0.03 seconds of the allowed 1 second.

Or keeping 10**(length-1) in a variable for less duplication:

for _ in range(int(input())):
 k = int(input())
 length = first = 1
 while k > 9 * first * length:
 k -= 9 * first * length
 length += 1
 first *= 10
 q, r = divmod(k-1, length)
 print(str(first + q)[r])

Question 7

this is a good answer. in fact the string does not need to be generated at all, although the final string generation is very small.

Question 8

Could you elaborate more on why we take divmod of k-1 ?

Question 9

As Peilonrayz pointed out, there is a mathematical approach to the problem.

Let's assume there wasn't.

# TODO FIND A BETTER WAY TO LIMIT inf_string LENGTH
inf_string = [str(x) for x in range(1, max(queries)+1)]
inf_string = "".join(inf_string)

With \$k \le 10^{18}\$, your string can have up to \10ドル^{18}\$ digits. That's a lot of memory for a program to use for one string.

There is no reason to make the string that large. One can simply generate the string, one value at a time, and return the digits of those strings, yielding the infinite digit string while taking almost no memory.

from typing import Iterable
def infinite_digits() -> Iterable[str]:
 """
 An infinite series of digits from the infinite sequence of positive
 integers. Ie:
 "1" "2" "3" "4" "5" "6" "7" "8" "9" "1" "0" "1" "1" "1" "2" "1" "3" ...
 """
 x = 1
 while True:
 yield from str(x)
 x += 1

To find the \$n^{th}\$ digit, simply generate the sequence, discarding all of the digits until you reach the desired one. There is even a recipe for that in the itertools docs:

from more_itertools import nth
def digit(k: int) -> str:
 return nth(infinite_digits(), k - 1)

With the above two functions, your code can be simplified to:

for _ in range(int(input())):
 k = int(input())
 print(digit(k))

The code will still take a very, very long time, but at least it won't consume much memory.

You can make things more efficient by sorting your queries, pass over the infinite_digit() stream once, and remember the positions of interest. Finally, pass over the original query set again, and output the memorized digit for that query value.

It could be up to 1000 times faster, since you have up to 1000 queries, but it can still take a very long time.

Question 10

You just made me realize for a performant and simple solution we could use something like def fn(i, n): return str(range(10**(n - 1), 10**n)[i // n])[i % n] (very similar to your code but with how my fn works). Ugh, why'd I spend like an hour looking at math when I could have just used something so simple! 🤦‍♀️

Question 11

@Peilonrayz Huh. I would have used the math approach myself, but you had already gone that route, so I went the brute-force-and-ignorance method, but without the memory impact, to show that the problem could be solved in smaller pieces than just building the complete huge string. Nice performant upgrade.

Question 12

I completely disagree with the checkmark being awarded to this answer. At 1 nanosecond per digit, a \10ドル^{18}\$ query would take over 31 years to solve. The math solution deserves the checkmark.

Question 13

@qwr Code Review is about teaching skills the asker needs not just giving solutions to problems. Effective memory utilization is a skill needed in many other parts of programming, even in programming challenges.

Question 14

@qwr I know it is not the optimal solution. I said as much in the comment immediately above yours. The 31 year run time to solve the last test case is excessive, but won't drag the computer to its knees allocating and swapping memory while trying to build a quintillion digit string. It does however solve the OP's issue of "FIND A BETTER WAY TO LIMIT inf_string LENGTH": the length issue is avoided.

Question 15

You need to identify a pattern. We should be able to build a fairly simple pattern by grouping numbers in specific ways.

123456789 # One dimensional
 10111213141516171819 # Two dimensional
 20212223242526272829
 30313233343536373839
 ...
 90919293949596979899
 100101102103104105106107108109 200201202203204205206207208209 # Three dimensional
 110111112113114115116117118119
 120121122123124125126127128129
 ...
 190191192193194195196197198199
 1000100110021003100410051006100710081009 # Four dimensional

We know if \$k\$ is less than 10 the value is \$k\$. To see the pattern look at the one dimensional group and see the index where each number belongs. We can see each digit has only one possible position.
- \$k \in \{1\} \therefore 1\$
  (if \$k = 1\$ then the digit is 1)
- \$k \in \{2\} \therefore 2\$
  (if \$k = 2\$ then the digit is 2)
$$ f_1(k) = k $$
To make the pattern easier to see \$k_2 = k - 10\$.
Now \$f_2(0) = 1\$, \$f_2(1) = 0\$, \$f_2(2) = 1\$, \$f_2(3) = 1\$.

If we focus on how to build each section of the number we can see similar patterns. For example the end digit is 0 in the 2nd column in the entire two dimensional block. Since we know the width of the two dimensional block is 20 we know every \20ドルn + 1\$ number results in 0.
- End digits (Odd numbers, \$k_2\ \%\ 2 = 1\$)
  
  Just like we did for 0 we can look at every second column and see the value repeats all the way down.
  - \$k_2 \in \{1, 21, 41, ..., 161\} \therefore 0\$
    (if \$k_2 = 1\$ or \$k_2 = 21\$ or etc, the digit is 0)
  - \$k_2 \in \{3, 23, 43, ..., 163\} \therefore 1\$
  - \$k_2 \in \{5, 25, 45, ..., 165\} \therefore 2\$
  We can then calculate the digit from the input number. First we can modulo the number, \$k_2 % 20\$, to truncate each of the sets to the same number. For example \21ドル \% 20 = 1\$, \41ドル \% 20 = 1\$, etc. Now the sets are much simpler.
  - \$k_2\ \%\ 20 \in \{1\} \therefore 0\$
    (if \$k_2\ \%\ 20 = 1\$ the digit is 0)
  - \$k_2\ \%\ 20 \in \{3\} \therefore 1\$
  - \$k_2\ \%\ 20 \in \{5\} \therefore 2\$
  Now getting the same digit for each input is really easy. We can just floor divide by 2. Since \$\lfloor\frac{1}{2}\rfloor = 0\$, \$\lfloor\frac{3}{2}\rfloor = 1\$, etc.
  
  So we know if the number is odd we can get the digit by using the following formula:
  
  $$\lfloor\frac{k_2\ \%\ 20}{2}\rfloor$$
- Head digits (Even numbers, \$k_2\ \%\ 2 = 0\$)
  
  Just like we did for the columns we can see a pattern in the rows. Every two columns the first number is the same. We can build the sets for where each number is.
  - \$k_2 \in \{ 0, 2, 4, ..., 18\} \therefore 1\$
  - \$k_2 \in \{20, 22, 24, ..., 38\} \therefore 2\$
  - \$k_2 \in \{40, 42, 44, ..., 58\} \therefore 3\$
  We can see any less than 20 are 1, then any less than 40 are 2. We can floor divide by 20 and add 1. And so the formula is:
  
  $$\lfloor\frac{k_2}{20}\rfloor + 1$$
We can merge both the previous functions into one function.

$$ f_2(k_2) = \begin{cases} \lfloor\frac{k_2}{20}\rfloor + 1, & \text{if $k_2 \% 2 = 0$ (head)} \\ \lfloor\frac{k_2\ \%\ 20}{2}\rfloor, & \text{if $k_2 \% 2 = 1$ (end)} \end{cases} $$
To make the pattern easier to see \$k_3 = k_2 - 180\$.

Again we'll focus on building each digit.
- End digits \$k_3\ \%\ 3 = 2\$
  
  Ok the end digits are following the same pattern again. Every \30ドルn + 2 = 0\$, \30ドルn + 5 = 1\$.
  - \$k_3 \in \{2, 32, 62, ..., 2672\} \therefore 0\$
  - \$k_3 \in \{5, 35, 65, ..., 2675\} \therefore 1\$
  - \$k_3 \in \{8, 38, 68, ..., 2678\} \therefore 2\$
  Again we can use a similar algorithm as before to truncate the sets to just one number each. Afterwards we can figure out how to convert each of the numbers to the digit.
  
  We can modulo \$k_3\$ by 30 to truncate each set to one number each, leaving 2, 5, 8, .... The final numbers can be converted to the correct output by just floor dividing by 3.
  
  $$\lfloor\frac{k_3\ \%\ 30}{3}\rfloor$$
- Mid digits \$k_3\ \%\ 3 = 1\$
  
  The mid digits have an interesting pattern. Because the numbers are repeating in the 3rd dimension each time starting from around 100, 200, 300, etc. As such I have used nested sets to show the split patterns. The nested sets are just to show the start, end and step of the repeating patterns.
  - \$k_3 \in \{\{1, 4, 7, ..., 28\}, \{301, 304, ... 328\}, ..., \{2401, 2404, ..., 2428\}\} \therefore 0\$
  - \$k_3 \in \{\{31, 34, 37, ..., 58\}, \{331, 334, ... 358\}, ..., \{2431, 2434, ..., 2458\}\} \therefore 1\$
  - \$k_3 \in \{\{61, 64, 67, ..., 88\}, \{361, 364, ... 388\}, ..., \{2461, 2464, ..., 2488\}\} \therefore 2\$
  Just like in both the previous end digits we can use modulo to reduce the size of the set. We can see the outer sets repeat every 300 digits so using modulo 300 we can reduce the sets to something more manageable.
  - \$k_3 \% 300 \in \{1, 4, 7, ..., 28\} \therefore 0\$
  - \$k_3 \% 300 \in \{31, 34, 37, ..., 58\} \therefore 1\$
  - \$k_3 \% 300 \in \{61, 64, 67, ..., 88\} \therefore 2\$
  Now the sets look like sets we got in the head digit. Now we can just floor divide by 30 to get the digit.
  
  $$\lfloor\frac{k_3\ \%\ 300}{30}\rfloor$$
- Head digits \$k_3\ \%\ 3 = 0\$
  - \$k_3 \in \{ 0, 3, 6, ..., 297\} \therefore 1\$
  - \$k_3 \in \{300, 303, 306, ..., 597\} \therefore 2\$
  - \$k_3 \in \{600, 603, 606, ..., 797\} \therefore 3\$
  Again the head digit pattern is the same, just with a different number. We can just floor divide by 300 and add one to get the digit.
  
  $$\lfloor\frac{k_3}{300}\rfloor + 1$$
Again we can merge the patterns into a single function:

$$ f_3(k_3) = \begin{cases} \lfloor\frac{k_3}{300}\rfloor + 1, & \text{if $k_3\ \%\ 3 = 0$ (head)} \\ \lfloor\frac{k_3\ \%\ 300}{30}\rfloor, & \text{if $k_3\ \%\ 3 = 1$ (mid)} \\ \lfloor\frac{k_3\ \%\ 30}{3}\rfloor, & \text{if $k_3\ \%\ 3 = 2$ (end)} \end{cases} $$

We can see a pattern emerging. Lets have \$k_1 = k - 1\$.

We can see similar values 2, 20, 3, 30 and 300. So we can move the dimension, \$n\$, out of the numbers and get \1ドルn\$, \10ドルn\$, \1ドルn\$, \10ドルn\$ and \100ドルn\$ respectively.
Looking at the previous numbers we can see a pattern again 1, 10 and 100. The pattern between the numbers is \10ドル^?\$ where we currently don't know what \$?\$ is.
We can see the numerator's modulo is always \$? + 1\$ the denominator's \$?\$.

As such we can change the above functions to a very similar pattern.

$$ f_3(k_3) = \begin{cases} \lfloor\frac{k_3\ \%\ (3 * 10^{3 - 0})}{3 * 10^{3 - 1}}\rfloor + 1, & \text{if $k_3\ \%\ 3 = 0$} \\ \lfloor\frac{k_3\ \%\ (3 * 10^{3 - 1})}{3 * 10^{3 - 2}}\rfloor, & \text{if $k_3\ \%\ 3 = 1$} \\ \lfloor\frac{k_3\ \%\ (3 * 10^{3 - 2})}{3 * 10^{3 - 3}}\rfloor, & \text{if $k_3\ \%\ 3 = 2$} \end{cases}\\ f_2(k_2) = \begin{cases} \lfloor\frac{k_2\ \%\ (2 * 10^{2 - 0})}{2 * 10^{2 - 1}}\rfloor + 1, & \text{if $k_2 \% 2 = 0$} \\ \lfloor\frac{k_2\ \%\ (2 * 10^{2 - 1})}{2 * 10^{2 - 2}}\rfloor, & \text{if $k_2 \% 2 = 1$} \end{cases}\\ f_1(k_1) = \begin{cases} \lfloor\frac{k_1\ \%\ (1 * 10^{1 - 0})}{1 * 10^{1 - 1}}\rfloor + 1, & \text{if $k_1 \% 1 = 0$} \end{cases} $$

Here we have found what \$?\$ is. Looking at the "if" we can see \$?\$ is based on the modulo of \$k\$. As such we can improve the function further to:

$$ f_n(k_n) = \lfloor\frac{k_n\ \%\ (n10^{n - (k_n\ \%\ n)})}{n10^{n - (k_n\ \%\ n) - 1}}\rfloor + \begin{cases} 1, & \text{if $k_n \% n = 0$}\\ 0, & \text{otherwise} \end{cases} $$

I'll leave getting \$n\$ and \$k_n\$ from \$k\$ as a challenge for the OP.

We can check the equation works with the following code.

def fn(k, n):
 mod = k % n
 return (
 (
 (k % (n * 10 ** (n - mod)))
 // (n * 10 ** (n - mod - 1))
 )
 + (0 if mod else 1)
 )
for row in range(0, 180, 20):
 digits = [
 fn(index, 3)
 for index in range(row, row + 20)
 ]
 print("".join(str(d) for d in digits))

10111213141516171819
20212223242526272829
30313233343536373839
40414243444546474849
50515253545556575859
60616263646566676869
70717273747576777879
80818283848586878889
90919293949596979899

Note: The function doesn't work correctly on the 'last chunk' of numbers, fn(180, 2) == 10 but when the function loops back around we get the correct numbers fn(200, 2) == 1. Because we don't need to generate \00ドル\ \to\ 09\$ I defiled the function. To get the 'pure' function remove + (0 if mod else 1) and adjust the input accordingly. I.e start at 20 not 0.

Question 16

I think your solution is correct but your explanation makes the problem seem way more complicated than it really is by even thinking about end digits or head digits or whatever. Kelly Bundy's solution is much simpler to understand of grouping numbers by length.

Question 17

@qwr I think you've misunderstood the point of the answer. It was to teach how to solve other pattern based problems in the future. By showing how to break big problems down into smaller and more manageable parts. Then solving all the small problems to solve the big one. A lot of programming challenges rely on pattern recognition skills, where using strings are not an option.

Question 18

I know about pattern recognition. I still think you are overcomplicating the problem. Maybe I will write my own answer of how I think about the problem.

Question 19

It would be nice if you provided a working solution for f(k). As it stands, the function still needs n as input!

Question 20

@qwr "It would be nice if you provided a working solution for f(k)." I set figuring out an exponential function as homework to the asker. So the user can build pattern recognition skills. Maybe you know the saying; give a man a fish feed him for a day, teach a man to fish feed him for a lifetime.

Question 21

This is a competition programming problem so it's worth thinking about the algorithmic complexity of our approaches. Let n=10^18 be the original length of the string. Generating the entire string one digit at a time will take O(n) time, even if we don't store the whole string, so that approach won't work. Instead, the goal is to compute precisely which number of all the concatenated numbers our index lands in and how far into the number it is. (By the way, this sequence is http://oeis.org/A033307)

A simple approach is to group all numbers by how many digits is in each number, like in Kelly Bundy's answer.

For numbers 1 to 9, there are 9 numbers and each number is 1-digit, so there are 9*1=9 digits in the group.
For numbers 10 to 90, there are 90 numbers and each number is 2-digit, so there are 90*2 = 180 total digits.
etc.
In general, for d-digit numbers, there are g = 9*(10^(d-1)) * d total digits. This is http://oeis.org/A212704.

So we can find out what d is for our given k by incrementing d and subtracting from k until we cannot subtract any more. Each time we subtract on the order of 10^d, so we only need O(log10(n)) steps to reach our group (check index of http://oeis.org/A033713 to see just how many steps we need), so for q=1000 queries, this is very fast.

Then we can compute the digit inside the number we are currently in by using m = 10^(d-1) + q, where q is floor division of k-1 by d (convince yourself why this is true). Within each number, we need to extract the exact digit, which can be done with mod and floor division powers of 10 operations in constant time instead of using strings. There is little performance difference but I like to avoid using strings for numeric problems if I can as a matter of elegance: although this problem is presented as one of an infinite string, it is computed mathematically without any strings at all, showing that this is not really a string manipulation problem but one of careful counting and nothing more.

Here is the modified version of Kelly Bundy's code, using the new python walrus operator to make the code a little cleaner and without using strings, thus very fast and suitable for implementation in C/assembly/any language without nice string functions.

def f(k):
 d = 1
 while k > (g := 9 * 10**(d-1) * d):
 k -= g
 d += 1
 q, r = divmod(k-1, d) # convert 1-indexing
 m = 10**(d-1) + q
 return (m // (10**(d-1-r))) % 10

Question 22

Hmm, I didn't downvote, but... 1) You make a big deal of getting the code accepted being important, and then you modify my code so that it doesn't get accepted anymore. It gets SyntaxError: invalid syntax for the :=. I had already tried that and that's why I didn't use it. 2) If you consider your final digit extraction constant time, you must also consider the string-way constant time. 3) You say "as a matter of elegance", but (m // (10**(d-1-r))) % 10 really doesn't look more elegant than str(m)[r] to me.

Question 23

4) It looks almost twice as slow, and the conversion to str that print will do makes it even slower. Not saying it matters, just saying I wouldn't ...

Question 24

... advertise that as a speed improvement when it appears to be slower. Though if my benchmarks are somehow bad and I'm mistaken about that, feel free to demonstrate that :-)

Question 25

1. What version of python are you using? I tested it on python 3.8.5 on ubuntu 20.04. 2. I know this and I said there is little performance benefit, although maybe my wording made it sound like the string time took not effectively constant time. 3. personal preference.

Question 26

4. I do not advertise any speed improvement, rather I say the code is very fast, which given the time limit it is. It is useless to micro-optimize python for competition programming anyway because python is not a good language for competition programming. Anyhow the python code is just psuedocode for a C or assembly implementation where I believe string functions are slow and cumbersome. Idk if this applies to C++.

Kelly Bundy Kelly Bundy 3,2377 silver badges21 bronze badges · Accepted Answer · 2021-03-06 04:25:18Z

Building or iterating the whole string takes far too long. Instead group the numbers by length:

length 1 digit: 1 to 9 -> 9 numbers => 9 digits total
length 2 digits: 10 to 99 -> 90 numbers => 180 digits total
length 3 digits: 100 to 999 -> 900 numbers => 2700 digits total
length 4 digits: 1000 to 9999 -> 9000 numbers => 36000 digits total
etc

The number of digits per group (9, 180, 2700, 36000, etc) are easy to compute. Skip these groups of equal-length numbers until you reach the right group (the remaining k falls into it). Then just compute the right number inside the group and get the right digit from it.

for _ in range(int(input())):
 k = int(input())
 length = 1
 while k > 9 * 10**(length-1) * length:
 k -= 9 * 10**(length-1) * length
 length += 1
 q, r = divmod(k-1, length)
 print(str(10**(length-1) + q)[r])

Takes about 0.03 seconds of the allowed 1 second.

Or keeping 10**(length-1) in a variable for less duplication:

for _ in range(int(input())):
 k = int(input())
 length = first = 1
 while k > 9 * first * length:
 k -= 9 * first * length
 length += 1
 first *= 10
 q, r = divmod(k-1, length)
 print(str(first + q)[r])

this is a good answer. in fact the string does not need to be generated at all, although the final string generation is very small.

Stack Exchange Network

"Digit Queries" CSES Solution giving TLE

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

"Digit Queries" CSES Solution giving TLE

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions