I really need a more accurate way to size inf_string. This code works but gets exponentially more inefficient the bigger max(queries) is, and therefore just can't hold up to the gargantuan numbers that CSES throws into it.
Problem: Consider an infinite string that consists of all positive integers in increasing order: 12345678910111213141516171819202122232425... Your task is to process q queries of the form: what is the digit at position k in the string? The first input line has an integer q: the number of queries. After this, there are q lines that describe the queries. Each line has an integer k: a 1-indexed position in the string. For each query, print the corresponding digit.
queries = []
for i in range(int(input())):
queries.append(int(input()))
# TODO FIND A BETTER WAY TO LIMIT inf_string LENGTH
inf_string = [str(x) for x in range(1, max(queries)+1)]
inf_string = "".join(inf_string)
for k in queries:
print(inf_string[k-1])
-
2\$\begingroup\$ What are the bounds for \$q\$ and \$k\$? \$\endgroup\$Peilonrayz– Peilonrayz ♦2021年03月05日 19:35:29 +00:00Commented Mar 5, 2021 at 19:35
-
\$\begingroup\$ 1≤q≤1000 1≤k≤10^18 @Peilonrayz \$\endgroup\$t.y.ler– t.y.ler2021年03月05日 20:19:48 +00:00Commented Mar 5, 2021 at 20:19
-
1\$\begingroup\$ Why don't you give us the link? \$\endgroup\$superb rain– superb rain2021年03月06日 03:57:02 +00:00Commented Mar 6, 2021 at 3:57
-
1\$\begingroup\$ this is oeis.org/A033307 \$\endgroup\$qwr– qwr2021年03月07日 01:59:35 +00:00Commented Mar 7, 2021 at 1:59
4 Answers 4
Building or iterating the whole string takes far too long. Instead group the numbers by length:
length 1 digit: 1 to 9 -> 9 numbers => 9 digits total
length 2 digits: 10 to 99 -> 90 numbers => 180 digits total
length 3 digits: 100 to 999 -> 900 numbers => 2700 digits total
length 4 digits: 1000 to 9999 -> 9000 numbers => 36000 digits total
etc
The number of digits per group (9, 180, 2700, 36000, etc) are easy to compute. Skip these groups of equal-length numbers until you reach the right group (the remaining k
falls into it). Then just compute the right number inside the group and get the right digit from it.
for _ in range(int(input())):
k = int(input())
length = 1
while k > 9 * 10**(length-1) * length:
k -= 9 * 10**(length-1) * length
length += 1
q, r = divmod(k-1, length)
print(str(10**(length-1) + q)[r])
Takes about 0.03 seconds of the allowed 1 second.
Or keeping 10**(length-1)
in a variable for less duplication:
for _ in range(int(input())):
k = int(input())
length = first = 1
while k > 9 * first * length:
k -= 9 * first * length
length += 1
first *= 10
q, r = divmod(k-1, length)
print(str(first + q)[r])
-
\$\begingroup\$ this is a good answer. in fact the string does not need to be generated at all, although the final string generation is very small. \$\endgroup\$qwr– qwr2021年03月07日 02:55:34 +00:00Commented Mar 7, 2021 at 2:55
-
\$\begingroup\$ Could you elaborate more on why we take
divmod
ofk-1
? \$\endgroup\$sabius– sabius2021年03月18日 17:03:38 +00:00Commented Mar 18, 2021 at 17:03
As Peilonrayz pointed out, there is a mathematical approach to the problem.
Let's assume there wasn't.
# TODO FIND A BETTER WAY TO LIMIT inf_string LENGTH
inf_string = [str(x) for x in range(1, max(queries)+1)]
inf_string = "".join(inf_string)
With \$k \le 10^{18}\$, your string can have up to \10ドル^{18}\$ digits. That's a lot of memory for a program to use for one string.
There is no reason to make the string that large. One can simply generate the string, one value at a time, and return the digits of those strings, yielding the infinite digit string while taking almost no memory.
from typing import Iterable
def infinite_digits() -> Iterable[str]:
"""
An infinite series of digits from the infinite sequence of positive
integers. Ie:
"1" "2" "3" "4" "5" "6" "7" "8" "9" "1" "0" "1" "1" "1" "2" "1" "3" ...
"""
x = 1
while True:
yield from str(x)
x += 1
To find the \$n^{th}\$ digit, simply generate the sequence, discarding all of the digits until you reach the desired one. There is even a recipe for that in the itertools
docs:
from more_itertools import nth
def digit(k: int) -> str:
return nth(infinite_digits(), k - 1)
With the above two functions, your code can be simplified to:
for _ in range(int(input())):
k = int(input())
print(digit(k))
The code will still take a very, very long time, but at least it won't consume much memory.
You can make things more efficient by sorting your queries, pass over the infinite_digit()
stream once, and remember the positions of interest. Finally, pass over the original query set again, and output the memorized digit for that query value.
It could be up to 1000 times faster, since you have up to 1000 queries, but it can still take a very long time.
-
3\$\begingroup\$ You just made me realize for a performant and simple solution we could use something like
def fn(i, n): return str(range(10**(n - 1), 10**n)[i // n])[i % n]
(very similar to your code but with how myfn
works). Ugh, why'd I spend like an hour looking at math when I could have just used something so simple! 🤦♀️ \$\endgroup\$2021年03月06日 00:16:18 +00:00Commented Mar 6, 2021 at 0:16 -
\$\begingroup\$ @Peilonrayz Huh. I would have used the math approach myself, but you had already gone that route, so I went the brute-force-and-ignorance method, but without the memory impact, to show that the problem could be solved in smaller pieces than just building the complete huge string. Nice performant upgrade. \$\endgroup\$AJNeufeld– AJNeufeld2021年03月06日 06:15:39 +00:00Commented Mar 6, 2021 at 6:15
-
\$\begingroup\$ I completely disagree with the checkmark being awarded to this answer. At 1 nanosecond per digit, a \10ドル^{18}\$ query would take over 31 years to solve. The math solution deserves the checkmark. \$\endgroup\$AJNeufeld– AJNeufeld2021年03月06日 06:26:01 +00:00Commented Mar 6, 2021 at 6:26
-
2\$\begingroup\$ @qwr Code Review is about teaching skills the asker needs not just giving solutions to problems. Effective memory utilization is a skill needed in many other parts of programming, even in programming challenges. \$\endgroup\$2021年03月07日 02:38:22 +00:00Commented Mar 7, 2021 at 2:38
-
1\$\begingroup\$ @qwr I know it is not the optimal solution. I said as much in the comment immediately above yours. The 31 year run time to solve the last test case is excessive, but won't drag the computer to its knees allocating and swapping memory while trying to build a quintillion digit string. It does however solve the OP's issue of "FIND A BETTER WAY TO LIMIT inf_string LENGTH": the length issue is avoided. \$\endgroup\$AJNeufeld– AJNeufeld2021年03月07日 02:47:51 +00:00Commented Mar 7, 2021 at 2:47
You need to identify a pattern. We should be able to build a fairly simple pattern by grouping numbers in specific ways.
123456789 # One dimensional
10111213141516171819 # Two dimensional
20212223242526272829
30313233343536373839
...
90919293949596979899
100101102103104105106107108109 200201202203204205206207208209 # Three dimensional
110111112113114115116117118119
120121122123124125126127128129
...
190191192193194195196197198199
1000100110021003100410051006100710081009 # Four dimensional
We know if \$k\$ is less than 10 the value is \$k\$. To see the pattern look at the one dimensional group and see the index where each number belongs. We can see each digit has only one possible position.
- \$k \in \{1\} \therefore 1\$
(if \$k = 1\$ then the digit is 1) - \$k \in \{2\} \therefore 2\$
(if \$k = 2\$ then the digit is 2)
$$ f_1(k) = k $$
- \$k \in \{1\} \therefore 1\$
To make the pattern easier to see \$k_2 = k - 10\$.
Now \$f_2(0) = 1\$, \$f_2(1) = 0\$, \$f_2(2) = 1\$, \$f_2(3) = 1\$.If we focus on how to build each section of the number we can see similar patterns. For example the end digit is 0 in the 2nd column in the entire two dimensional block. Since we know the width of the two dimensional block is 20 we know every \20ドルn + 1\$ number results in 0.
End digits (Odd numbers, \$k_2\ \%\ 2 = 1\$)
Just like we did for 0 we can look at every second column and see the value repeats all the way down.
- \$k_2 \in \{1, 21, 41, ..., 161\} \therefore 0\$
(if \$k_2 = 1\$ or \$k_2 = 21\$ or etc, the digit is 0) - \$k_2 \in \{3, 23, 43, ..., 163\} \therefore 1\$
- \$k_2 \in \{5, 25, 45, ..., 165\} \therefore 2\$
We can then calculate the digit from the input number. First we can modulo the number, \$k_2 % 20\$, to truncate each of the sets to the same number. For example \21ドル \% 20 = 1\$, \41ドル \% 20 = 1\$, etc. Now the sets are much simpler.
- \$k_2\ \%\ 20 \in \{1\} \therefore 0\$
(if \$k_2\ \%\ 20 = 1\$ the digit is 0) - \$k_2\ \%\ 20 \in \{3\} \therefore 1\$
- \$k_2\ \%\ 20 \in \{5\} \therefore 2\$
Now getting the same digit for each input is really easy. We can just floor divide by 2. Since \$\lfloor\frac{1}{2}\rfloor = 0\$, \$\lfloor\frac{3}{2}\rfloor = 1\$, etc.
So we know if the number is odd we can get the digit by using the following formula:
$$\lfloor\frac{k_2\ \%\ 20}{2}\rfloor$$
- \$k_2 \in \{1, 21, 41, ..., 161\} \therefore 0\$
Head digits (Even numbers, \$k_2\ \%\ 2 = 0\$)
Just like we did for the columns we can see a pattern in the rows. Every two columns the first number is the same. We can build the sets for where each number is.
- \$k_2 \in \{ 0, 2, 4, ..., 18\} \therefore 1\$
- \$k_2 \in \{20, 22, 24, ..., 38\} \therefore 2\$
- \$k_2 \in \{40, 42, 44, ..., 58\} \therefore 3\$
We can see any less than 20 are 1, then any less than 40 are 2. We can floor divide by 20 and add 1. And so the formula is:
$$\lfloor\frac{k_2}{20}\rfloor + 1$$
We can merge both the previous functions into one function.
$$ f_2(k_2) = \begin{cases} \lfloor\frac{k_2}{20}\rfloor + 1, & \text{if $k_2 \% 2 = 0$ (head)} \\ \lfloor\frac{k_2\ \%\ 20}{2}\rfloor, & \text{if $k_2 \% 2 = 1$ (end)} \end{cases} $$
To make the pattern easier to see \$k_3 = k_2 - 180\$.
Again we'll focus on building each digit.
End digits \$k_3\ \%\ 3 = 2\$
Ok the end digits are following the same pattern again. Every \30ドルn + 2 = 0\$, \30ドルn + 5 = 1\$.
- \$k_3 \in \{2, 32, 62, ..., 2672\} \therefore 0\$
- \$k_3 \in \{5, 35, 65, ..., 2675\} \therefore 1\$
- \$k_3 \in \{8, 38, 68, ..., 2678\} \therefore 2\$
Again we can use a similar algorithm as before to truncate the sets to just one number each. Afterwards we can figure out how to convert each of the numbers to the digit.
We can modulo \$k_3\$ by 30 to truncate each set to one number each, leaving 2, 5, 8, .... The final numbers can be converted to the correct output by just floor dividing by 3.
$$\lfloor\frac{k_3\ \%\ 30}{3}\rfloor$$
Mid digits \$k_3\ \%\ 3 = 1\$
The mid digits have an interesting pattern. Because the numbers are repeating in the 3rd dimension each time starting from around 100, 200, 300, etc. As such I have used nested sets to show the split patterns. The nested sets are just to show the start, end and step of the repeating patterns.
- \$k_3 \in \{\{1, 4, 7, ..., 28\}, \{301, 304, ... 328\}, ..., \{2401, 2404, ..., 2428\}\} \therefore 0\$
- \$k_3 \in \{\{31, 34, 37, ..., 58\}, \{331, 334, ... 358\}, ..., \{2431, 2434, ..., 2458\}\} \therefore 1\$
- \$k_3 \in \{\{61, 64, 67, ..., 88\}, \{361, 364, ... 388\}, ..., \{2461, 2464, ..., 2488\}\} \therefore 2\$
Just like in both the previous end digits we can use modulo to reduce the size of the set. We can see the outer sets repeat every 300 digits so using modulo 300 we can reduce the sets to something more manageable.
- \$k_3 \% 300 \in \{1, 4, 7, ..., 28\} \therefore 0\$
- \$k_3 \% 300 \in \{31, 34, 37, ..., 58\} \therefore 1\$
- \$k_3 \% 300 \in \{61, 64, 67, ..., 88\} \therefore 2\$
Now the sets look like sets we got in the head digit. Now we can just floor divide by 30 to get the digit.
$$\lfloor\frac{k_3\ \%\ 300}{30}\rfloor$$
Head digits \$k_3\ \%\ 3 = 0\$
- \$k_3 \in \{ 0, 3, 6, ..., 297\} \therefore 1\$
- \$k_3 \in \{300, 303, 306, ..., 597\} \therefore 2\$
- \$k_3 \in \{600, 603, 606, ..., 797\} \therefore 3\$
Again the head digit pattern is the same, just with a different number. We can just floor divide by 300 and add one to get the digit.
$$\lfloor\frac{k_3}{300}\rfloor + 1$$
Again we can merge the patterns into a single function:
$$ f_3(k_3) = \begin{cases} \lfloor\frac{k_3}{300}\rfloor + 1, & \text{if $k_3\ \%\ 3 = 0$ (head)} \\ \lfloor\frac{k_3\ \%\ 300}{30}\rfloor, & \text{if $k_3\ \%\ 3 = 1$ (mid)} \\ \lfloor\frac{k_3\ \%\ 30}{3}\rfloor, & \text{if $k_3\ \%\ 3 = 2$ (end)} \end{cases} $$
We can see a pattern emerging. Lets have \$k_1 = k - 1\$.
We can see similar values 2, 20, 3, 30 and 300. So we can move the dimension, \$n\$, out of the numbers and get \1ドルn\$, \10ドルn\$, \1ドルn\$, \10ドルn\$ and \100ドルn\$ respectively.
Looking at the previous numbers we can see a pattern again 1, 10 and 100. The pattern between the numbers is \10ドル^?\$ where we currently don't know what \$?\$ is.
We can see the numerator's modulo is always \$? + 1\$ the denominator's \$?\$.
As such we can change the above functions to a very similar pattern.
$$ f_3(k_3) = \begin{cases} \lfloor\frac{k_3\ \%\ (3 * 10^{3 - 0})}{3 * 10^{3 - 1}}\rfloor + 1, & \text{if $k_3\ \%\ 3 = 0$} \\ \lfloor\frac{k_3\ \%\ (3 * 10^{3 - 1})}{3 * 10^{3 - 2}}\rfloor, & \text{if $k_3\ \%\ 3 = 1$} \\ \lfloor\frac{k_3\ \%\ (3 * 10^{3 - 2})}{3 * 10^{3 - 3}}\rfloor, & \text{if $k_3\ \%\ 3 = 2$} \end{cases}\\ f_2(k_2) = \begin{cases} \lfloor\frac{k_2\ \%\ (2 * 10^{2 - 0})}{2 * 10^{2 - 1}}\rfloor + 1, & \text{if $k_2 \% 2 = 0$} \\ \lfloor\frac{k_2\ \%\ (2 * 10^{2 - 1})}{2 * 10^{2 - 2}}\rfloor, & \text{if $k_2 \% 2 = 1$} \end{cases}\\ f_1(k_1) = \begin{cases} \lfloor\frac{k_1\ \%\ (1 * 10^{1 - 0})}{1 * 10^{1 - 1}}\rfloor + 1, & \text{if $k_1 \% 1 = 0$} \end{cases} $$
Here we have found what \$?\$ is. Looking at the "if" we can see \$?\$ is based on the modulo of \$k\$. As such we can improve the function further to:
$$ f_n(k_n) = \lfloor\frac{k_n\ \%\ (n10^{n - (k_n\ \%\ n)})}{n10^{n - (k_n\ \%\ n) - 1}}\rfloor + \begin{cases} 1, & \text{if $k_n \% n = 0$}\\ 0, & \text{otherwise} \end{cases} $$
I'll leave getting \$n\$ and \$k_n\$ from \$k\$ as a challenge for the OP.
We can check the equation works with the following code.
def fn(k, n):
mod = k % n
return (
(
(k % (n * 10 ** (n - mod)))
// (n * 10 ** (n - mod - 1))
)
+ (0 if mod else 1)
)
for row in range(0, 180, 20):
digits = [
fn(index, 3)
for index in range(row, row + 20)
]
print("".join(str(d) for d in digits))
10111213141516171819
20212223242526272829
30313233343536373839
40414243444546474849
50515253545556575859
60616263646566676869
70717273747576777879
80818283848586878889
90919293949596979899
Note: The function doesn't work correctly on the 'last chunk' of numbers, fn(180, 2) == 10
but when the function loops back around we get the correct numbers fn(200, 2) == 1
.
Because we don't need to generate \00ドル\ \to\ 09\$ I defiled the function.
To get the 'pure' function remove + (0 if mod else 1)
and adjust the input accordingly.
I.e start at 20 not 0.
-
\$\begingroup\$ I think your solution is correct but your explanation makes the problem seem way more complicated than it really is by even thinking about end digits or head digits or whatever. Kelly Bundy's solution is much simpler to understand of grouping numbers by length. \$\endgroup\$qwr– qwr2021年03月07日 02:05:20 +00:00Commented Mar 7, 2021 at 2:05
-
\$\begingroup\$ @qwr I think you've misunderstood the point of the answer. It was to teach how to solve other pattern based problems in the future. By showing how to break big problems down into smaller and more manageable parts. Then solving all the small problems to solve the big one. A lot of programming challenges rely on pattern recognition skills, where using strings are not an option. \$\endgroup\$2021年03月07日 02:32:57 +00:00Commented Mar 7, 2021 at 2:32
-
\$\begingroup\$ I know about pattern recognition. I still think you are overcomplicating the problem. Maybe I will write my own answer of how I think about the problem. \$\endgroup\$qwr– qwr2021年03月07日 02:51:54 +00:00Commented Mar 7, 2021 at 2:51
-
\$\begingroup\$ It would be nice if you provided a working solution for f(k). As it stands, the function still needs n as input! \$\endgroup\$qwr– qwr2021年03月07日 07:50:55 +00:00Commented Mar 7, 2021 at 7:50
-
\$\begingroup\$ @qwr "It would be nice if you provided a working solution for f(k)." I set figuring out an exponential function as homework to the asker. So the user can build pattern recognition skills. Maybe you know the saying; give a man a fish feed him for a day, teach a man to fish feed him for a lifetime. \$\endgroup\$2021年03月07日 09:28:23 +00:00Commented Mar 7, 2021 at 9:28
This is a competition programming problem so it's worth thinking about the algorithmic complexity of our approaches. Let n=10^18 be the original length of the string. Generating the entire string one digit at a time will take O(n) time, even if we don't store the whole string, so that approach won't work. Instead, the goal is to compute precisely which number of all the concatenated numbers our index lands in and how far into the number it is. (By the way, this sequence is http://oeis.org/A033307)
A simple approach is to group all numbers by how many digits is in each number, like in Kelly Bundy's answer.
- For numbers 1 to 9, there are 9 numbers and each number is 1-digit, so there are 9*1=9 digits in the group.
- For numbers 10 to 90, there are 90 numbers and each number is 2-digit, so there are 90*2 = 180 total digits.
- etc.
- In general, for d-digit numbers, there are g = 9*(10^(d-1)) * d total digits. This is http://oeis.org/A212704.
So we can find out what d is for our given k by incrementing d and subtracting from k until we cannot subtract any more. Each time we subtract on the order of 10^d, so we only need O(log10(n)) steps to reach our group (check index of http://oeis.org/A033713 to see just how many steps we need), so for q=1000 queries, this is very fast.
Then we can compute the digit inside the number we are currently in by using m = 10^(d-1) + q, where q is floor division of k-1 by d (convince yourself why this is true). Within each number, we need to extract the exact digit, which can be done with mod and floor division powers of 10 operations in constant time instead of using strings. There is little performance difference but I like to avoid using strings for numeric problems if I can as a matter of elegance: although this problem is presented as one of an infinite string, it is computed mathematically without any strings at all, showing that this is not really a string manipulation problem but one of careful counting and nothing more.
Here is the modified version of Kelly Bundy's code, using the new python walrus operator to make the code a little cleaner and without using strings, thus very fast and suitable for implementation in C/assembly/any language without nice string functions.
def f(k):
d = 1
while k > (g := 9 * 10**(d-1) * d):
k -= g
d += 1
q, r = divmod(k-1, d) # convert 1-indexing
m = 10**(d-1) + q
return (m // (10**(d-1-r))) % 10
-
\$\begingroup\$ Hmm, I didn't downvote, but... 1) You make a big deal of getting the code accepted being important, and then you modify my code so that it doesn't get accepted anymore. It gets
SyntaxError: invalid syntax
for the:=
. I had already tried that and that's why I didn't use it. 2) If you consider your final digit extraction constant time, you must also consider the string-way constant time. 3) You say "as a matter of elegance", but(m // (10**(d-1-r))) % 10
really doesn't look more elegant thanstr(m)[r]
to me. \$\endgroup\$Kelly Bundy– Kelly Bundy2021年03月07日 16:30:10 +00:00Commented Mar 7, 2021 at 16:30 -
\$\begingroup\$ 4) It looks almost twice as slow, and the conversion to
str
thatprint
will do makes it even slower. Not saying it matters, just saying I wouldn't ... \$\endgroup\$Kelly Bundy– Kelly Bundy2021年03月07日 16:38:08 +00:00Commented Mar 7, 2021 at 16:38 -
\$\begingroup\$ ... advertise that as a speed improvement when it appears to be slower. Though if my benchmarks are somehow bad and I'm mistaken about that, feel free to demonstrate that :-) \$\endgroup\$Kelly Bundy– Kelly Bundy2021年03月07日 16:40:24 +00:00Commented Mar 7, 2021 at 16:40
-
\$\begingroup\$ 1. What version of python are you using? I tested it on python 3.8.5 on ubuntu 20.04. 2. I know this and I said there is little performance benefit, although maybe my wording made it sound like the string time took not effectively constant time. 3. personal preference. \$\endgroup\$qwr– qwr2021年03月07日 21:11:54 +00:00Commented Mar 7, 2021 at 21:11
-
\$\begingroup\$ 4. I do not advertise any speed improvement, rather I say the code is very fast, which given the time limit it is. It is useless to micro-optimize python for competition programming anyway because python is not a good language for competition programming. Anyhow the python code is just psuedocode for a C or assembly implementation where I believe string functions are slow and cumbersome. Idk if this applies to C++. \$\endgroup\$qwr– qwr2021年03月07日 21:20:41 +00:00Commented Mar 7, 2021 at 21:20
Explore related questions
See similar questions with these tags.