I've written a simple script that takes a string filled with brackets and strings and builds them to a dictionary.
Example Input:
string_1 = '(()())'
string_2 = '((a)(b))'
string_3 = '(a(a)b(b)c)'
string_4 = 'a(a(a)b(b)c)b'
string_4 = 'a[a{a}b{b}c]b'
string_5 = 'beginning(content[word]{another word})outside'
Output for string5
and the other examples:
[
('0-8', 'beginning'),
('9', '('),
('10-16', 'content'),
('17', '['),
('18-21', 'word'),
('22', ']'),
('23', '{'),
('24-35', 'another word'),
('36', '}'),
('37', ')'),
('38-44', 'outside')
]
def sort(dictionary):
return sorted(dictionary.items(), key=lambda v: int(v[0].split("-")[0]))
def join(a, b):
return ''.join([repr(a), '-', repr(b - 1)]) if a != b else repr(b)
class Positions:
def __init__(self, name):
self.name = name
self.content = {} # creates a new empty list for each bracket
def push(self, pos, content):
self.content[pos] = content
def convert_string(string):
string_content = string
string = Positions(string)
pos = 0
start_of_str_pos = 0
internal_string = ''
for char in string_content:
if char in ['(', '{', '[', ')', '<', '}', ']', '>']:
string.push(repr(pos), repr(char))
if internal_string != '':
string.push(join(start_of_str_pos, pos), internal_string)
internal_string = ''
start_of_str_pos = pos + 1
else:
internal_string += char
pos += 1
if internal_string != '':
string.push(''.join([repr(start_of_str_pos),
'-', repr(pos - 1)]), internal_string)
print sort(string.content)
3 Answers 3
Obvious but easy
There are some issues that are immediately obvious and easy to correct.
In this statement:
return ''.join([repr(a), '-', repr(b - 1)]) if a != b else repr(b)
The multiple repr
are awkward,
and so is the join
.
It would be less awkward, simpler and more readable by using string formatting:
return '{}-{}'.format(a, b - 1) if a != b else repr(a)
This can be written a lot simpler:
if char in ['(', '{', '[', ')', '<', '}', ']', '>']:
Like this:
if char in '({[)<}]>':
But actually, it's a bit disturbing that the opening and closing brackets are ordered inconsistently. For example, looking at this string, it's not immediately obvious that it includes all matching closing brackets for all opening brackets. This way would be immediately obvious and less error prone:
if char in '(){}[]<>':
This is both confusing and disturbing:
def convert_string(string): string_content = string string = Positions(string)
Why shuffle around the string
parameter?
It's a bad practice to reassign the value of a parameter,
as it makes it confusing what it really stands for throughout the function.
Another problem here is calling an instance of Positions
a "string".
Later in the code when I read "string",
I think it's an actually string, but in fact it's not,
it's an instance of "Positions".
Very misleading.
Since empty strings are falsy, instead of this:
if internal_string != '': # ...
You can write simply:
if internal_string:
# ...
Obvious but harder
There are some issues that are sort of obviously wrong, but not so easy to correct.
This sorting function smells:
def sort(dictionary):
return sorted(dictionary.items(), key=lambda v: int(v[0].split("-")[0]))
Instead of splitting a string and parsing to int, my instinct says there should be a way to use the original int values directly.
In general, the code formats these strings from int values, and it seems silly to later un-format the strings so you can sort. There should be a better way.
The Positions
class stinks.
It's called "Positions",
but its fields are "name" and "content".
Sounds more like a key-value pair.
Then there's this code with the comment:
self.content = {} # creates a new empty list for each bracket
The comment talks about "list", but the code uses a dictionary.
And there's the push
method:
def push(self, pos, content): self.content[pos] = content
push
is a term commonly used with stacks, or sometimes with lists.
But here the method puts a value into a dictionary,
an operation normally called "put".
Alternative implementation
Based on the above suggestions, and reorganizing to more meaningful elements, you could do something like this:
from functools import total_ordering
@total_ordering
class Item:
def __init__(self, start, end, content):
self.start = start
self.end = end
self.content = content
@property
def position(self):
a = self.start
b = self.end
return '{}-{}'.format(a, b - 1) if a != b else repr(a)
def __lt__(self, other):
return self.start < other.start
def convert_string(string):
start = 0
internal_string = ''
items = []
for (pos, char) in enumerate(string):
if char in '(){}[]<>':
items.append(Item(pos, pos, char))
if internal_string:
items.append(Item(start, pos, internal_string))
internal_string = ''
start = pos + 1
else:
internal_string += char
if internal_string:
items.append(Item(start, len(string), internal_string))
return items
Although with the implementation of __lt__
,
I made it possible to sort simply by sorted(items)
,
actually there is no need to sort,
as the items are naturally sorted.
Note: the purpose of @total_ordering
is to implement the other rich comparison operators like __le__
, as recommended by PEP8.
Just looking at the first few lines of convert_string()
...
def convert_string(string): string_content = string string = Positions(string) pos = 0 start_of_str_pos = 0 ...
... and the last line...
... print sort(string.content)
... I'm pretty confused! Why are you renaming the string
parameter to string_content
? Then why is string
not a string, but a Positions
object? And that object's name is actually the string, whereas its content is a dictionary? On top of that, string_content
looks confusingly like string.content
, but it's actually the same as string.name
???
Honestly, I'd rather rewrite it from scratch than try to figure out what you did. I'd use regular expressions, because they are a handy way to do string matching and the right tool for the job. Instead of printing the results in convert_string()
, it would be better to return a data structure. And the data structure I would choose is collections.OrderedDict()
, because it preserves the order in which the keys were added, so you don't have to sort the result.
from collections import OrderedDict
import re
def convert_string(s):
result = OrderedDict()
for match in re.finditer(r'[^<({\[\]})>]+|(?P<delim>.)', s):
if match.group('delim'):
result[str(match.start())] = match.group()
else:
result['%d-%d' % (match.start(), match.end() - 1)] = match.group()
return result
You have an odd mix of class and function. Rather than a standalone function that creates a Positions
object from a string, for example, I would make that a class method:
class Positions(object): # note new-style class
CHARS = set('()[]{}<>') # class attribute, uses set to speed up 'in'
...
@classmethod
def from_string(cls, string):
...
Now rather than calling convert_string(foo)
, you can get a Positions
instance with Positions.from_string(foo)
. Similarly, you could move the join
and sort
functionality into the class, as they're all connected to each other. You should move the "pretty print" formatting into a __str__
method.
Your sort function has a problem; it only works for the first "level". Rather than encapsulate the whole sort in a function, I would write a custom sort key
function, then replace e.g.
print sort(string.content)
with:
print sorted(string.content, key=sort_key)
For example, you could use map
to apply int
to all of the parts of the key:
def sort_key(item):
"""Convert string for multi-level numerical sort.
>>> sort_key(('1-2-3', None))
[1, 2, 3]
"""
key, _ = item
return map(int, key.split('-'))
Or, alternatively, use a tuple of integers ((1, 2, 3)
, in this case) as the key rather than a string.
Note the docstring to explain what the function does and provide built-in demonstration/testing with doctest
. A docstring would be particularly useful in join
, which does some odd things I couldn't quite figure out and would really benefit from explanation. As it stands, your script only has a single explanatory comment:
self.content = {} # creates a new empty list for each bracket
which isn't actually true!
Explore related questions
See similar questions with these tags.
lowercase_with_underscore
function/variable names, so you're not yet fully compliant. \$\endgroup\$