2
\$\begingroup\$

I have a school project question (for Python) that goes like this:

Given a string_input such as "abcd&1-4efg", the function must remove the "&1-4" and insert the string slice from 1 to 4 where the "&1-4" was.

eg. if string_input = "abcd&1-4efg",

  1. "&1-4" is removed.

  2. The remaining characters are indexed as follows: a=0, b=1, c=2, d=3, e=4, f=5, g=6

  3. The new string becomes: "abcdbcdeefg"

I've managed to write a long chunk of code to do this, but I'm wondering if anyone has any more efficient solutions?

Things to note:

  1. The instructions can include double digits (eg. &10-15)
  2. If the index isn't found, the returned string should print "?" for every missing index (eg. "abcd&5-10efgh" would return "abcdfgh???efgh")
  3. Intructions can be back-to-back (eg. "&10-15abcdef&1-5&4-5pqrs")

The code I've written is:

def expand(text):
 text += "|"
 import string
 digits_dash = string.digits + "-"
 idx_ref_str = ""
 replace_list = []
 record_val = False
 output_to_list = []
 instruct = ""
 and_idx_mark = 0
 #builds replace_list & idx_ref_list
 for idx in range(len(text)):
 if text[idx] == "&" and record_val==True:
 output_to_list.append(instruct)
 output_to_list.append(and_idx_mark)
 replace_list.append(output_to_list)
 output_to_list, instruct, inst_idx, and_idx_mark = [],"",0,0
 and_idx_mark = idx
 continue
 elif text[idx] == "&":
 record_val = True
 and_idx_mark = idx
 continue
 #executes if currently in instruction part
 if record_val == True:
 #adds to instruct
 if text[idx] in digits_dash:
 instruct += text[idx]
 #take info, add to replace list
 else:
 output_to_list.append(instruct)
 output_to_list.append(and_idx_mark)
 replace_list.append(output_to_list)
 output_to_list, instruct, inst_idx, and_idx_mark, record_val = [],"",0,0,False
 #executes otherwise
 if record_val == False:
 idx_ref_str += text[idx]
 idx_ref_str = idx_ref_str[:-1]
 text = text[:-1]
 #converts str to int indexes in replace list[x][2]
 for item in replace_list:
 start_idx = ""
 end_idx = ""
 #find start idx
 for char in item[0]:
 if char in string.digits:
 start_idx += char
 elif char == "-":
 start_idx = int(start_idx)
 break
 #find end idx
 for char in item[0][::-1]:
 if char in string.digits:
 end_idx = char + end_idx
 elif char == "-":
 end_idx = int(end_idx)
 break
 start_end_list = [start_idx,end_idx]
 item+=start_end_list
 #split text into parts in list
 count = 0
 text_block = ""
 text_block_list = []
 idx_replace = 0
 for char in text:
 if char == "&":
 text_block_list.append(text_block)
 text_block = ""
 count += len(replace_list[idx_replace][0])
 idx_replace +=1
 elif count > 0:
 count -= 1
 else:
 text_block += char
 text_block_list.append(text_block)
 #creates output str
 output_str = ""
 for idx in range(len(text_block_list)-1):
 output_str += text_block_list[idx]
 #creates to_add var to add to output_str
 start_repl = replace_list[idx][1]
 end_repl = replace_list[idx][1] + len(replace_list[idx][0])
 find_start = replace_list[idx][2]
 find_end = replace_list[idx][3]
 if end_idx >= len(idx_ref_str):
 gap = end_idx + 1 - len(idx_ref_str)
 to_add = idx_ref_str[find_start:] + "?" * gap
 else:
 to_add = idx_ref_str[find_start:find_end+1]
 
 output_str += to_add
 output_str += text_block_list[-1]
 return output_str
Alex Waygood
1,0376 silver badges12 bronze badges
asked Sep 19, 2021 at 10:31
\$\endgroup\$
1
  • \$\begingroup\$ The top-level function is not properly indented. – Is it intentional that you import string in the middle of a function? \$\endgroup\$ Commented Sep 19, 2021 at 13:32

1 Answer 1

1
\$\begingroup\$

Indeed there is. For such tasks you should use regex expressions.

import re
WILDCARD_REGEX = "&[0-9]+-[0-9]+"
def get_text_without_wildcards(text):
 return re.sub(WILDCARD_REGEX, "", text)
def find_wildcard(text):
 wildcard = re.search(WILDCARD_REGEX, text)
 if wildcard is not None:
 return wildcard.group(0)
 return None
def parse_wildcard(wildcard):
 first_index, last_index = re.split("-", wildcard[1:])
 return int(first_index), int(last_index)
def get_replacement(begin, end, text):
 if end + 1 >= len(text):
 return "?"
 return text[begin:end+1]
def replace(text):
 clean_text = get_text_without_wildcards(text)
 wildcard = find_wildcard(text)
 while wildcard is not None:
 first_index, last_index = parse_wildcard(wildcard)
 replacement = get_replacement(first_index, last_index, clean_text)
 text = re.sub(wildcard, replacement, text)
 wildcard = find_wildcard(text)
 return text
print(replace("&10-15abcdef&1-5&4-5pqrs"))

The main part is: replace wildcards until there are none left.

The regex expression is simple: &[0-9]+-[0-9]+ - which means match all strings that have "&<numbers>-<numbers>".

Parsing the wildcard is also simple: skip the & and split the remainder on -. The result is a list of length 2. All you need to do is parse the elements to int.

When you have the beginning and end of the replacement string, you just need to substring the string without wildcards (clean_text).

All that is left is replacing the wildcard with the replacement string.

answered Sep 19, 2021 at 17:39
\$\endgroup\$
1
  • \$\begingroup\$ To add onto this, I'd make find_wildcard a generator that yields all match objects (not just the found str matching the pattern). That way, you dont make the number of times you iterate through the string proportional to the number of wildcards. Same goes for replacing the text, instead of using the whole regexp engine, use the indexes from the yielded match object. Lastly, mutating a str in a loop is an expensive operation, since they are immutable (see this answer). Instead, use a list \$\endgroup\$ Commented Sep 19, 2021 at 21:46

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.