Given a string representation of data, I want to extract the information into its corresponding object.
However,
If the string has "|" separators then these should be considered options and need to be picked at random.
If the string data has numbers shown as a range "1-10" then a random value should be chosen between the range. It should also preserve the numerical datatype i.e int or float
I.e
"(1-3,1,1)" returns either (1, 1, 1), (2, 1, 1) or (3, 1, 1)
"(0.2-0.4,1,1)" returns either (0.2, 1, 1), (0.3, 1, 1) or (0.4, 1, 1)
"foo|bar|foobar" returns either "foo", "bar" or "foobar"
"[1-2,1,2]|foo|bar|[1,8-10,99]" could return :
"foo","bar", [1, 1, 2], [2, 1, 2], [1, 8, 99], [1, 9, 99] or [1, 10, 99]
This is what I have and it works well. But I cant help think it could be achieved in a more concise way. Let me know what I could have done better.
import re
import random
import ast
def randomize_by_pipe(st_value):
"""
Used to split strings with the pipe character and randomly choose and option.
:param: st_value - (str)
"""
if not st_value is None:
st_arr = st_value.split("|")
random.shuffle(st_arr)
return st_arr[0]
else:
return st_value
def randomise_range(text):
if text is None:
return text
else:
matches = re.findall("\d*\.*\d*-{1}\d*\.*\d*",text)
for match in matches:
startingPos = 0
position = text.find(match, startingPos)
while True:
position = text.find(match, startingPos)
if position > -1:
txt = text[position:position+len(match)]
txt = rand_no_from_string(txt)
new_text = text[0:position+len(match)].replace(match,str(txt))
text = new_text + text[position+len(match):]
else:
break
try:
return ast.literal_eval(text)
except ValueError:
return text
def rand_no_from_string(txt):
is_int = False
txt_arr = txt.split("-")
num_arr = [float(x) for x in txt_arr]
if int(num_arr[0]) == num_arr[0]:
mul = 1
is_int = True
else:
#new section to deal with the decimals
mul = 10 ** len(str(num_arr[0]).split(".")[1])
num_arr = [x*mul for x in num_arr]
if num_arr[0] > num_arr[1]:
num_arr[1], num_arr[0] = num_arr[0], num_arr[1]
val = random.randint(num_arr[0],num_arr[1])/mul
return int(val) if is_int else val
Run with:
text="(108-100,0.25-0.75,100)|Foo|Bar|[123,234,234-250]"
randomise_range(randomize_by_pipe(text))
2 Answers 2
Type hinting
Instead of having helpdocs declare the types of function parameters, why not go with type hinting?
Complexity
Your code currently has too many moving parts. You define 2 different functions to parse the data, and they both need to be called in chain. This should be done by a single parsing function.
Let the parser get data text, then the parser should be handling first parsing using pipe
and later using the numerical ranges.
Selection from a list
Your randomize_by_pipe
shuffles the list, and selects the 0th value. You can instead let random.choice
do the job.
range
parsing
I think range parsing can be improved a little. How about the following flow:
- Remove
[
and]
from the given text. - Split from
,
. - For each section of the split, try parsing as
float
(orint
, depending on your dataset) - In case of float conversion error, let the
rand_no_from_string
get a value.
regex
You have a regex, but you're not making full/elegant use of it. Instead of matches, you can group the results, and operate on those groups. The pattern itself can also be a little optimised:
\d+(?:\.\d+)?-\d+(?:\.\d+)?
A rewrite, for eg:
from re import sub, Match
from random import choice, randint
def randomise_range(match: Match):
given_range = match.group(0).split("-")
low, high = map(float, given_range)
if low > high:
low, high = high, low
if low.is_integer():
return str(randint(int(low), int(high)))
multiplier = 10 ** len(given_range[0].split(".")[-1])
low = int(low * multiplier)
high = int(high * multiplier)
return str(randint(low, high) / multiplier)
def extract_range(text: str = None):
if not text:
return text
return sub(r"\d+(?:\.\d+)?-\d+(?:\.\d+)?", randomise_range, text)
def parse(text: str = None):
if not text:
return text
selection = choice(text.split("|"))
if selection[0] in ('[', '('):
return extract_range(selection)
return selection
if __name__ == "__main__":
examples = (
"(1-3,1,1)",
"(0.2-0.4,1,1)",
"foo|bar|foobar",
"(108-100,0.25-0.75,100)|Foo|Bar|[123,234,234-250]",
"[1-2,1,2]|foo|bar|[1,8-10,99]",
)
for text in examples:
print(parse(text))
-
\$\begingroup\$ I hate regex, and I cant get what you've given me to work. Can you give me an example of how I can group them. For some reason regex just goes over my head. Its just so alien to my brain. Maybe you have a good resource for it I can read up on? I know its powerful and I should learn it. \$\endgroup\$Lewis Morris– Lewis Morris2020年10月15日 17:57:22 +00:00Commented Oct 15, 2020 at 17:57
-
1\$\begingroup\$ click the link, regex101 provides a detailed explanation of the expression. \$\endgroup\$hjpotter92– hjpotter922020年10月15日 18:19:22 +00:00Commented Oct 15, 2020 at 18:19
-
\$\begingroup\$ You are amazing. Jeez that websites good. \$\endgroup\$Lewis Morris– Lewis Morris2020年10月15日 18:29:22 +00:00Commented Oct 15, 2020 at 18:29
-
\$\begingroup\$ I cant believe it even generates you the python code 😮 \$\endgroup\$Lewis Morris– Lewis Morris2020年10月15日 18:31:31 +00:00Commented Oct 15, 2020 at 18:31
-
\$\begingroup\$ @LewisMorris there is also debuggex.com :) \$\endgroup\$hjpotter92– hjpotter922020年10月15日 18:40:58 +00:00Commented Oct 15, 2020 at 18:40
Here's an implementation whose major endeavour, when compared with your implementation as well as that of the accepted answer, is separation of parsing and execution. It's unclear whether this is important for you, but it's generally good design, and is likely faster to re-execute once parsed:
import re
from numbers import Real
from random import randint, choice
from typing import Union, Callable
class Pattern:
chunk_pat = re.compile(
r'([^|]+)' # group: within a chunk, at least one non-pipe character
r'(?:' # non-capturing group for termination character
r'\||$' # pipe, or end of string
r')' # end of termination group
)
option_pat = re.compile(
r'([^,]+)' # at least one non-comma character in an option
r'(?:' # non-capturing group for termination character
r',|$' # comma, or end of string
r')' # end of termination group
)
range_pat = re.compile(
r'^' # start
r'('
r'[0-9.]+' # first number group
r')-('
r'[0-9.]+' # second number group
r')'
r'$' # end
)
def __init__(self, pattern: str):
chunk_strs = Pattern.chunk_pat.finditer(pattern)
self.tree = tuple(
self.parse_chunk(chunk[1])
for chunk in chunk_strs
)
@staticmethod
def choose_in_group(group: tuple) -> tuple:
for option in group:
if isinstance(option, Callable):
yield option()
else:
yield option
def choose(self) -> Union[str, tuple]:
group = choice(self.tree)
if isinstance(group, tuple):
return tuple(self.choose_in_group(group))
return group
@staticmethod
def precis_parse(as_str: str) -> (Real, int):
if '.' in as_str:
return float(as_str), len(as_str.rsplit('.', 1)[-1])
return int(as_str), 0
@classmethod
def make_choose(cls, start: Real, end: Real, precis: int):
if precis:
factor = 10**precis
start = int(start * factor)
end = int(end * factor)
def choose():
return randint(start, end) / factor
else:
def choose():
return randint(start, end)
return choose
@classmethod
def parse_options(cls, options: str):
for option in cls.option_pat.finditer(options):
range_match = cls.range_pat.match(option[1])
if range_match:
start_str, end_str = range_match.groups()
start, start_n = cls.precis_parse(start_str)
end, end_n = cls.precis_parse(end_str)
yield cls.make_choose(start, end, max(start_n, end_n))
else:
# Fall back to one raw string
yield option[1]
@classmethod
def parse_chunk(cls, chunk: str):
if (
chunk[0] == '(' and chunk[-1] == ')' or
chunk[0] == '[' and chunk[-1] == ']'
):
return tuple(cls.parse_options(chunk[1:-1]))
# Fall back to returning the raw string
return chunk
def test():
p = Pattern('foo|(bar,3-4,50,6.3-7,92-99)')
for _ in range(20):
print(p.choose())
if __name__ == '__main__':
test()
Explore related questions
See similar questions with these tags.
text = "(0.2-0.4,1,1)"
, which you say returns either (0.2, 1, 1), (0.3, 1, 1) or (0.4, 1, 1), and it didn't work. I got (0.324, 1, 1) iinstead. \$\endgroup\$