I have a school project question (for Python) that goes like this:
Given a string_input such as "abcd&1-4efg", the function must remove the "&1-4" and insert the string slice from 1 to 4 where the "&1-4" was.
eg. if string_input = "abcd&1-4efg",
"&1-4" is removed.
The remaining characters are indexed as follows: a=0, b=1, c=2, d=3, e=4, f=5, g=6
The new string becomes: "abcdbcdeefg"
I've managed to write a long chunk of code to do this, but I'm wondering if anyone has any more efficient solutions?
Things to note:
- The instructions can include double digits (eg. &10-15)
- If the index isn't found, the returned string should print "?" for every missing index (eg. "abcd&5-10efgh" would return "abcdfgh???efgh")
- Intructions can be back-to-back (eg. "&10-15abcdef&1-5&4-5pqrs")
The code I've written is:
def expand(text):
text += "|"
import string
digits_dash = string.digits + "-"
idx_ref_str = ""
replace_list = []
record_val = False
output_to_list = []
instruct = ""
and_idx_mark = 0
#builds replace_list & idx_ref_list
for idx in range(len(text)):
if text[idx] == "&" and record_val==True:
output_to_list.append(instruct)
output_to_list.append(and_idx_mark)
replace_list.append(output_to_list)
output_to_list, instruct, inst_idx, and_idx_mark = [],"",0,0
and_idx_mark = idx
continue
elif text[idx] == "&":
record_val = True
and_idx_mark = idx
continue
#executes if currently in instruction part
if record_val == True:
#adds to instruct
if text[idx] in digits_dash:
instruct += text[idx]
#take info, add to replace list
else:
output_to_list.append(instruct)
output_to_list.append(and_idx_mark)
replace_list.append(output_to_list)
output_to_list, instruct, inst_idx, and_idx_mark, record_val = [],"",0,0,False
#executes otherwise
if record_val == False:
idx_ref_str += text[idx]
idx_ref_str = idx_ref_str[:-1]
text = text[:-1]
#converts str to int indexes in replace list[x][2]
for item in replace_list:
start_idx = ""
end_idx = ""
#find start idx
for char in item[0]:
if char in string.digits:
start_idx += char
elif char == "-":
start_idx = int(start_idx)
break
#find end idx
for char in item[0][::-1]:
if char in string.digits:
end_idx = char + end_idx
elif char == "-":
end_idx = int(end_idx)
break
start_end_list = [start_idx,end_idx]
item+=start_end_list
#split text into parts in list
count = 0
text_block = ""
text_block_list = []
idx_replace = 0
for char in text:
if char == "&":
text_block_list.append(text_block)
text_block = ""
count += len(replace_list[idx_replace][0])
idx_replace +=1
elif count > 0:
count -= 1
else:
text_block += char
text_block_list.append(text_block)
#creates output str
output_str = ""
for idx in range(len(text_block_list)-1):
output_str += text_block_list[idx]
#creates to_add var to add to output_str
start_repl = replace_list[idx][1]
end_repl = replace_list[idx][1] + len(replace_list[idx][0])
find_start = replace_list[idx][2]
find_end = replace_list[idx][3]
if end_idx >= len(idx_ref_str):
gap = end_idx + 1 - len(idx_ref_str)
to_add = idx_ref_str[find_start:] + "?" * gap
else:
to_add = idx_ref_str[find_start:find_end+1]
output_str += to_add
output_str += text_block_list[-1]
return output_str
1 Answer 1
Indeed there is. For such tasks you should use regex expressions.
import re
WILDCARD_REGEX = "&[0-9]+-[0-9]+"
def get_text_without_wildcards(text):
return re.sub(WILDCARD_REGEX, "", text)
def find_wildcard(text):
wildcard = re.search(WILDCARD_REGEX, text)
if wildcard is not None:
return wildcard.group(0)
return None
def parse_wildcard(wildcard):
first_index, last_index = re.split("-", wildcard[1:])
return int(first_index), int(last_index)
def get_replacement(begin, end, text):
if end + 1 >= len(text):
return "?"
return text[begin:end+1]
def replace(text):
clean_text = get_text_without_wildcards(text)
wildcard = find_wildcard(text)
while wildcard is not None:
first_index, last_index = parse_wildcard(wildcard)
replacement = get_replacement(first_index, last_index, clean_text)
text = re.sub(wildcard, replacement, text)
wildcard = find_wildcard(text)
return text
print(replace("&10-15abcdef&1-5&4-5pqrs"))
The main part is: replace wildcards until there are none left.
The regex expression is simple: &[0-9]+-[0-9]+ - which means match all strings that have "&<numbers>-<numbers>".
Parsing the wildcard is also simple: skip the & and split the remainder on -. The result is a list of length 2. All you need to do is parse the elements to int.
When you have the beginning and end of the replacement string, you just need to substring the string without wildcards (clean_text).
All that is left is replacing the wildcard with the replacement string.
-
\$\begingroup\$ To add onto this, I'd make
find_wildcard
a generator that yields all match objects (not just the found str matching the pattern). That way, you dont make the number of times you iterate through the string proportional to the number of wildcards. Same goes for replacing the text, instead of using the whole regexp engine, use the indexes from the yieldedmatch
object. Lastly, mutating astr
in a loop is an expensive operation, since they are immutable (see this answer). Instead, use a list \$\endgroup\$Miguel Alorda– Miguel Alorda2021年09月19日 21:46:56 +00:00Commented Sep 19, 2021 at 21:46
import string
in the middle of a function? \$\endgroup\$