Function to replace all "&int-int" with the respective string slices in an input string

Question 1

I have a school project question (for Python) that goes like this:

Given a string_input such as "abcd&1-4efg", the function must remove the "&1-4" and insert the string slice from 1 to 4 where the "&1-4" was.

eg. if string_input = "abcd&1-4efg",

"&1-4" is removed.
The remaining characters are indexed as follows: a=0, b=1, c=2, d=3, e=4, f=5, g=6
The new string becomes: "abcdbcdeefg"

I've managed to write a long chunk of code to do this, but I'm wondering if anyone has any more efficient solutions?

Things to note:

The instructions can include double digits (eg. &10-15)
If the index isn't found, the returned string should print "?" for every missing index (eg. "abcd&5-10efgh" would return "abcdfgh???efgh")
Intructions can be back-to-back (eg. "&10-15abcdef&1-5&4-5pqrs")

The code I've written is:

def expand(text):
 text += "|"
 import string
 digits_dash = string.digits + "-"
 idx_ref_str = ""
 replace_list = []
 record_val = False
 output_to_list = []
 instruct = ""
 and_idx_mark = 0
 #builds replace_list & idx_ref_list
 for idx in range(len(text)):
 if text[idx] == "&" and record_val==True:
 output_to_list.append(instruct)
 output_to_list.append(and_idx_mark)
 replace_list.append(output_to_list)
 output_to_list, instruct, inst_idx, and_idx_mark = [],"",0,0
 and_idx_mark = idx
 continue
 elif text[idx] == "&":
 record_val = True
 and_idx_mark = idx
 continue
 #executes if currently in instruction part
 if record_val == True:
 #adds to instruct
 if text[idx] in digits_dash:
 instruct += text[idx]
 #take info, add to replace list
 else:
 output_to_list.append(instruct)
 output_to_list.append(and_idx_mark)
 replace_list.append(output_to_list)
 output_to_list, instruct, inst_idx, and_idx_mark, record_val = [],"",0,0,False
 #executes otherwise
 if record_val == False:
 idx_ref_str += text[idx]
 idx_ref_str = idx_ref_str[:-1]
 text = text[:-1]
 #converts str to int indexes in replace list[x][2]
 for item in replace_list:
 start_idx = ""
 end_idx = ""
 #find start idx
 for char in item[0]:
 if char in string.digits:
 start_idx += char
 elif char == "-":
 start_idx = int(start_idx)
 break
 #find end idx
 for char in item[0][::-1]:
 if char in string.digits:
 end_idx = char + end_idx
 elif char == "-":
 end_idx = int(end_idx)
 break
 start_end_list = [start_idx,end_idx]
 item+=start_end_list
 #split text into parts in list
 count = 0
 text_block = ""
 text_block_list = []
 idx_replace = 0
 for char in text:
 if char == "&":
 text_block_list.append(text_block)
 text_block = ""
 count += len(replace_list[idx_replace][0])
 idx_replace +=1
 elif count > 0:
 count -= 1
 else:
 text_block += char
 text_block_list.append(text_block)
 #creates output str
 output_str = ""
 for idx in range(len(text_block_list)-1):
 output_str += text_block_list[idx]
 #creates to_add var to add to output_str
 start_repl = replace_list[idx][1]
 end_repl = replace_list[idx][1] + len(replace_list[idx][0])
 find_start = replace_list[idx][2]
 find_end = replace_list[idx][3]
 if end_idx >= len(idx_ref_str):
 gap = end_idx + 1 - len(idx_ref_str)
 to_add = idx_ref_str[find_start:] + "?" * gap
 else:
 to_add = idx_ref_str[find_start:find_end+1]
 
 output_str += to_add
 output_str += text_block_list[-1]
 return output_str

Question 2

The top-level function is not properly indented. – Is it intentional that you import string in the middle of a function?

Question 3

Indeed there is. For such tasks you should use regex expressions.

import re
WILDCARD_REGEX = "&[0-9]+-[0-9]+"
def get_text_without_wildcards(text):
 return re.sub(WILDCARD_REGEX, "", text)
def find_wildcard(text):
 wildcard = re.search(WILDCARD_REGEX, text)
 if wildcard is not None:
 return wildcard.group(0)
 return None
def parse_wildcard(wildcard):
 first_index, last_index = re.split("-", wildcard[1:])
 return int(first_index), int(last_index)
def get_replacement(begin, end, text):
 if end + 1 >= len(text):
 return "?"
 return text[begin:end+1]
def replace(text):
 clean_text = get_text_without_wildcards(text)
 wildcard = find_wildcard(text)
 while wildcard is not None:
 first_index, last_index = parse_wildcard(wildcard)
 replacement = get_replacement(first_index, last_index, clean_text)
 text = re.sub(wildcard, replacement, text)
 wildcard = find_wildcard(text)
 return text
print(replace("&10-15abcdef&1-5&4-5pqrs"))

The main part is: replace wildcards until there are none left.

The regex expression is simple: &[0-9]+-[0-9]+ - which means match all strings that have "&<numbers>-<numbers>".

Parsing the wildcard is also simple: skip the & and split the remainder on -. The result is a list of length 2. All you need to do is parse the elements to int.

When you have the beginning and end of the replacement string, you just need to substring the string without wildcards (clean_text).

All that is left is replacing the wildcard with the replacement string.

Question 4

To add onto this, I'd make find_wildcard a generator that yields all match objects (not just the found str matching the pattern). That way, you dont make the number of times you iterate through the string proportional to the number of wildcards. Same goes for replacing the text, instead of using the whole regexp engine, use the indexes from the yielded match object. Lastly, mutating a str in a loop is an expensive operation, since they are immutable (see this answer). Instead, use a list

Blaž Mrak Blaž MrakBlaž Mrak 8834 silver badges7 bronze badges · Answer 1 · 2021-09-19 17:39:47Z

Indeed there is. For such tasks you should use regex expressions.

import re
WILDCARD_REGEX = "&[0-9]+-[0-9]+"
def get_text_without_wildcards(text):
 return re.sub(WILDCARD_REGEX, "", text)
def find_wildcard(text):
 wildcard = re.search(WILDCARD_REGEX, text)
 if wildcard is not None:
 return wildcard.group(0)
 return None
def parse_wildcard(wildcard):
 first_index, last_index = re.split("-", wildcard[1:])
 return int(first_index), int(last_index)
def get_replacement(begin, end, text):
 if end + 1 >= len(text):
 return "?"
 return text[begin:end+1]
def replace(text):
 clean_text = get_text_without_wildcards(text)
 wildcard = find_wildcard(text)
 while wildcard is not None:
 first_index, last_index = parse_wildcard(wildcard)
 replacement = get_replacement(first_index, last_index, clean_text)
 text = re.sub(wildcard, replacement, text)
 wildcard = find_wildcard(text)
 return text
print(replace("&10-15abcdef&1-5&4-5pqrs"))

The main part is: replace wildcards until there are none left.

The regex expression is simple: &[0-9]+-[0-9]+ - which means match all strings that have "&<numbers>-<numbers>".

Parsing the wildcard is also simple: skip the & and split the remainder on -. The result is a list of length 2. All you need to do is parse the elements to int.

When you have the beginning and end of the replacement string, you just need to substring the string without wildcards (clean_text).

All that is left is replacing the wildcard with the replacement string.

To add onto this, I'd make find_wildcard a generator that yields all match objects (not just the found str matching the pattern). That way, you dont make the number of times you iterate through the string proportional to the number of wildcards. Same goes for replacing the text, instead of using the whole regexp engine, use the indexes from the yielded match object. Lastly, mutating a str in a loop is an expensive operation, since they are immutable (see this answer). Instead, use a list

Stack Exchange Network

Function to replace all "&int-int" with the respective string slices in an input string

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Function to replace all "&int-int" with the respective string slices in an input string

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions