Separating data from string representation of objects, with added extras

Question 1

Given a string representation of data, I want to extract the information into its corresponding object.

However,

If the string has "|" separators then these should be considered options and need to be picked at random.

If the string data has numbers shown as a range "1-10" then a random value should be chosen between the range. It should also preserve the numerical datatype i.e int or float

I.e

"(1-3,1,1)" returns either (1, 1, 1), (2, 1, 1) or (3, 1, 1)

"(0.2-0.4,1,1)" returns either (0.2, 1, 1), (0.3, 1, 1) or (0.4, 1, 1)

"foo|bar|foobar" returns either "foo", "bar" or "foobar"

"[1-2,1,2]|foo|bar|[1,8-10,99]" could return :

"foo","bar", [1, 1, 2], [2, 1, 2], [1, 8, 99], [1, 9, 99] or [1, 10, 99]

This is what I have and it works well. But I cant help think it could be achieved in a more concise way. Let me know what I could have done better.

import re
import random
import ast
def randomize_by_pipe(st_value):
 """
 Used to split strings with the pipe character and randomly choose and option.
 :param: st_value - (str)
 """
 if not st_value is None:
 st_arr = st_value.split("|")
 random.shuffle(st_arr)
 return st_arr[0]
 else:
 return st_value
def randomise_range(text):
 if text is None:
 return text
 else:
 matches = re.findall("\d*\.*\d*-{1}\d*\.*\d*",text)
 
 for match in matches:
 startingPos = 0
 position = text.find(match, startingPos)
 while True:
 position = text.find(match, startingPos)
 if position > -1:
 txt = text[position:position+len(match)]
 txt = rand_no_from_string(txt)
 new_text = text[0:position+len(match)].replace(match,str(txt))
 text = new_text + text[position+len(match):]
 else:
 break
 try:
 return ast.literal_eval(text)
 except ValueError:
 return text
def rand_no_from_string(txt):
 
 is_int = False
 txt_arr = txt.split("-")
 num_arr = [float(x) for x in txt_arr]
 if int(num_arr[0]) == num_arr[0]:
 mul = 1
 is_int = True
 else:
 #new section to deal with the decimals
 mul = 10 ** len(str(num_arr[0]).split(".")[1])
 num_arr = [x*mul for x in num_arr]
 
 if num_arr[0] > num_arr[1]:
 num_arr[1], num_arr[0] = num_arr[0], num_arr[1]
 
 val = random.randint(num_arr[0],num_arr[1])/mul
 return int(val) if is_int else val

Run with:

text="(108-100,0.25-0.75,100)|Foo|Bar|[123,234,234-250]"
randomise_range(randomize_by_pipe(text))

Question 2

So which function handles those strings like "(0.2-0.4,1,1)" and "[1-2,1,2]|foo|bar|[1,8-10,99]"? None of the three you showed seems to be able to.

Question 3

@superbrain works fine for me. Have you seen the "Run with" section.

Question 4

Oops, I actually did manage to miss that. So is this how to always run it? Then I think there should be a function to do that. Also, I just tried text = "(0.2-0.4,1,1)", which you say returns either (0.2, 1, 1), (0.3, 1, 1) or (0.4, 1, 1), and it didn't work. I got (0.324, 1, 1) iinstead.

Question 5

@superbrain you are correct. I will have to make an adjustment to take into account the decimals of the float to accommodate this. It should work as follows. 0.2-0.4 would only produce 0.2,0.3,0.4 && 0.20-0.22 would produce 0.20,0.21,0.22 etc etc

Question 6

@superbrain i've tweaked it now.

Question 7

Type hinting

Instead of having helpdocs declare the types of function parameters, why not go with type hinting?

Complexity

Your code currently has too many moving parts. You define 2 different functions to parse the data, and they both need to be called in chain. This should be done by a single parsing function.

Let the parser get data text, then the parser should be handling first parsing using pipe and later using the numerical ranges.

Selection from a list

Your randomize_by_pipe shuffles the list, and selects the 0th value. You can instead let random.choice do the job.

`range` parsing

I think range parsing can be improved a little. How about the following flow:

Remove [ and ] from the given text.
Split from ,.
For each section of the split, try parsing as float (or int, depending on your dataset)
In case of float conversion error, let the rand_no_from_string get a value.

regex

You have a regex, but you're not making full/elegant use of it. Instead of matches, you can group the results, and operate on those groups. The pattern itself can also be a little optimised:

\d+(?:\.\d+)?-\d+(?:\.\d+)?

A rewrite, for eg:

from re import sub, Match
from random import choice, randint
def randomise_range(match: Match):
 given_range = match.group(0).split("-")
 low, high = map(float, given_range)
 if low > high:
 low, high = high, low
 if low.is_integer():
 return str(randint(int(low), int(high)))
 multiplier = 10 ** len(given_range[0].split(".")[-1])
 low = int(low * multiplier)
 high = int(high * multiplier)
 return str(randint(low, high) / multiplier)
def extract_range(text: str = None):
 if not text:
 return text
 return sub(r"\d+(?:\.\d+)?-\d+(?:\.\d+)?", randomise_range, text)
def parse(text: str = None):
 if not text:
 return text
 selection = choice(text.split("|"))
 if selection[0] in ('[', '('):
 return extract_range(selection)
 return selection
if __name__ == "__main__":
 examples = (
 "(1-3,1,1)",
 "(0.2-0.4,1,1)",
 "foo|bar|foobar",
 "(108-100,0.25-0.75,100)|Foo|Bar|[123,234,234-250]",
 "[1-2,1,2]|foo|bar|[1,8-10,99]",
 )
 for text in examples:
 print(parse(text))

Question 8

I hate regex, and I cant get what you've given me to work. Can you give me an example of how I can group them. For some reason regex just goes over my head. Its just so alien to my brain. Maybe you have a good resource for it I can read up on? I know its powerful and I should learn it.

Question 9

click the link, regex101 provides a detailed explanation of the expression.

Question 10

You are amazing. Jeez that websites good.

Question 11

I cant believe it even generates you the python code 😮

Question 12

@LewisMorris there is also debuggex.com :)

Question 13

Here's an implementation whose major endeavour, when compared with your implementation as well as that of the accepted answer, is separation of parsing and execution. It's unclear whether this is important for you, but it's generally good design, and is likely faster to re-execute once parsed:

import re
from numbers import Real
from random import randint, choice
from typing import Union, Callable
class Pattern:
 chunk_pat = re.compile(
 r'([^|]+)' # group: within a chunk, at least one non-pipe character
 r'(?:' # non-capturing group for termination character
 r'\||$' # pipe, or end of string
 r')' # end of termination group
 )
 option_pat = re.compile(
 r'([^,]+)' # at least one non-comma character in an option
 r'(?:' # non-capturing group for termination character
 r',|$' # comma, or end of string
 r')' # end of termination group
 )
 range_pat = re.compile(
 r'^' # start
 r'('
 r'[0-9.]+' # first number group
 r')-('
 r'[0-9.]+' # second number group
 r')'
 r'$' # end
 )
 def __init__(self, pattern: str):
 chunk_strs = Pattern.chunk_pat.finditer(pattern)
 self.tree = tuple(
 self.parse_chunk(chunk[1])
 for chunk in chunk_strs
 )
 @staticmethod
 def choose_in_group(group: tuple) -> tuple:
 for option in group:
 if isinstance(option, Callable):
 yield option()
 else:
 yield option
 def choose(self) -> Union[str, tuple]:
 group = choice(self.tree)
 if isinstance(group, tuple):
 return tuple(self.choose_in_group(group))
 return group
 @staticmethod
 def precis_parse(as_str: str) -> (Real, int):
 if '.' in as_str:
 return float(as_str), len(as_str.rsplit('.', 1)[-1])
 return int(as_str), 0
 @classmethod
 def make_choose(cls, start: Real, end: Real, precis: int):
 if precis:
 factor = 10**precis
 start = int(start * factor)
 end = int(end * factor)
 def choose():
 return randint(start, end) / factor
 else:
 def choose():
 return randint(start, end)
 return choose
 @classmethod
 def parse_options(cls, options: str):
 for option in cls.option_pat.finditer(options):
 range_match = cls.range_pat.match(option[1])
 if range_match:
 start_str, end_str = range_match.groups()
 start, start_n = cls.precis_parse(start_str)
 end, end_n = cls.precis_parse(end_str)
 yield cls.make_choose(start, end, max(start_n, end_n))
 else:
 # Fall back to one raw string
 yield option[1]
 @classmethod
 def parse_chunk(cls, chunk: str):
 if (
 chunk[0] == '(' and chunk[-1] == ')' or
 chunk[0] == '[' and chunk[-1] == ']'
 ):
 return tuple(cls.parse_options(chunk[1:-1]))
 # Fall back to returning the raw string
 return chunk
def test():
 p = Pattern('foo|(bar,3-4,50,6.3-7,92-99)')
 for _ in range(20):
 print(p.choose())
if __name__ == '__main__':
 test()

hjpotter92 hjpotter92 8,9011 gold badge26 silver badges49 bronze badges · Accepted Answer · 2020-10-15 12:52:37Z

Type hinting

Instead of having helpdocs declare the types of function parameters, why not go with type hinting?

Complexity

Your code currently has too many moving parts. You define 2 different functions to parse the data, and they both need to be called in chain. This should be done by a single parsing function.

Let the parser get data text, then the parser should be handling first parsing using pipe and later using the numerical ranges.

Selection from a list

Your randomize_by_pipe shuffles the list, and selects the 0th value. You can instead let random.choice do the job.

`range` parsing

I think range parsing can be improved a little. How about the following flow:

Remove [ and ] from the given text.
Split from ,.
For each section of the split, try parsing as float (or int, depending on your dataset)
In case of float conversion error, let the rand_no_from_string get a value.

regex

You have a regex, but you're not making full/elegant use of it. Instead of matches, you can group the results, and operate on those groups. The pattern itself can also be a little optimised:

\d+(?:\.\d+)?-\d+(?:\.\d+)?

A rewrite, for eg:

from re import sub, Match
from random import choice, randint
def randomise_range(match: Match):
 given_range = match.group(0).split("-")
 low, high = map(float, given_range)
 if low > high:
 low, high = high, low
 if low.is_integer():
 return str(randint(int(low), int(high)))
 multiplier = 10 ** len(given_range[0].split(".")[-1])
 low = int(low * multiplier)
 high = int(high * multiplier)
 return str(randint(low, high) / multiplier)
def extract_range(text: str = None):
 if not text:
 return text
 return sub(r"\d+(?:\.\d+)?-\d+(?:\.\d+)?", randomise_range, text)
def parse(text: str = None):
 if not text:
 return text
 selection = choice(text.split("|"))
 if selection[0] in ('[', '('):
 return extract_range(selection)
 return selection
if __name__ == "__main__":
 examples = (
 "(1-3,1,1)",
 "(0.2-0.4,1,1)",
 "foo|bar|foobar",
 "(108-100,0.25-0.75,100)|Foo|Bar|[123,234,234-250]",
 "[1-2,1,2]|foo|bar|[1,8-10,99]",
 )
 for text in examples:
 print(parse(text))

I hate regex, and I cant get what you've given me to work. Can you give me an example of how I can group them. For some reason regex just goes over my head. Its just so alien to my brain. Maybe you have a good resource for it I can read up on? I know its powerful and I should learn it.
click the link, regex101 provides a detailed explanation of the expression.

Stack Exchange Network

Separating data from string representation of objects, with added extras

2 Answers 2

Type hinting

Complexity

Selection from a list

`range` parsing

regex

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Separating data from string representation of objects, with added extras

2 Answers 2

Type hinting

Complexity

Selection from a list

range parsing

regex

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`range` parsing