Here is a class named Text
, which takes a string as its argument of its constructor. It has property-methods to get the words in the string and also the frequency of alphabets occuring in it.
The words
property result in all words in the string. It works as follows, after using the .split
method, there is a possibility that a "word" is of the form ,word
or word.
or ,word.
etc. So to encounter this we check each end using .isalpha
.
We also have find_words
method, which has words as its inputs. This method will summarize the frequencies of a set of words.
How can I write this better to:
- The level of a library/package (I don't really understand the techniques for this)
- More efficient and readable (using
re
module is last option) - More functional, but compact
#Author: Arief. A
example = "This is an example, an Anonymous text.\
This text has 59 letters or numbers.";
class Text:
def __init__(self, input_text):
self.text = input_text;
def show(self):
print(self.text);
def as_list(self):
return list(self.text);
def get_length(self):
return len(self.text);
def count_char(self, char):
return self.text.count(char);
@property
def letter_freqs(self):
letters="abcdefghijklmnopqrstuvwxyz";
res={i: self.count_char(i) \
for i in letters};
caps={i.capitalize(): self.count_char(i.capitalize()) \
for i in letters};
res.update(caps);
return res
@property
def words(self):
raw=self.text.split();
for i in range(len(raw)):
if not raw[i][0].isalpha():
raw[i]=raw[i][1:];
if not raw[i][-1].isalpha():
raw[i]=raw[i][0:-1];
[raw.remove('') for i in range(raw.count(''))];
return raw
def swap_first(word):
swaped=word[0].swapcase();
res=swaped+word[1:];
return res
def find_words(self, *args):
return {i: self.words.count(i)\
+self.words.count(Text.swap_first(i))\
for i in args};
Example
text_obj=Text(example);
text_obj.show();
print(text_obj.as_list());
print(text_obj.get_length());
print(text_obj.count_char("a"));
print(text_obj.words)
print(text_obj.find_words("Anonymous", "Text","This"));
Output
This is an example, an Anonymous text. This text has 59 letters or numbers.
['T', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', 'n', ' ', 'e', 'x', 'a', 'm', 'p', 'l', 'e', ',', ' ', 'a', 'n', ' ', 'A', 'n', 'o', 'n', 'y', 'm', 'o', 'u', 's', ' ', 't', 'e', 'x', 't', '.', ' ', 'T', 'h', 'i', 's', ' ', 't', 'e', 'x', 't', ' ', 'h', 'a', 's', ' ', '5', '9', ' ', 'l', 'e', 't', 't', 'e', 'r', 's', ' ', 'o', 'r', ' ', 'n', 'u', 'm', 'b', 'e', 'r', 's', '.']
75
4
['This', 'is', 'an', 'example', 'an', 'Anonymous', 'text', 'This', 'text', 'has', 'letters', 'or', 'numbers']
{'This': 2, 'Text': 2, 'Anonymous': 1}
3 Answers 3
Instead of
letters="abcdefghijklmnopqrstuvwxyz";
use a built-in constant:letters = string.letters;
The code only inspects first and last character of the word. It means that
aaa.bbb
will be treated as a single word. Is this an intent?split(sep)
allows a string of separators. Calling it withraw = self.text.split(string.punctuation + string.whitespace)
eliminates the need to inspect individual words. If you still want to inspect them, a loop
for i in range(len(raw)):
should be written more idiomatically asfor word in raw:
(you don't really need
i
anywhere in the loop).
-
1\$\begingroup\$ instead of string.letters, use string.ascii_lowercase :) \$\endgroup\$A. Romeu– A. Romeu2018年02月22日 21:52:08 +00:00Commented Feb 22, 2018 at 21:52
-
\$\begingroup\$ @vnp thanks for the input, just know there is
string
module. \$\endgroup\$Redsbefall– Redsbefall2018年02月24日 22:35:45 +00:00Commented Feb 24, 2018 at 22:35 -
\$\begingroup\$ The
aaa.bbb
was not expected, i was assuming there is no typo. The constants instring
is useful. \$\endgroup\$Redsbefall– Redsbefall2018年02月24日 22:40:23 +00:00Commented Feb 24, 2018 at 22:40
This I consider un-pythonic:
class Text:
...
def as_list(self):
return list(self.text);
def get_length(self):
return len(self.text);
...
You write methods on a class which use trivial builtins, but the instance itself will behave completely differently when you invoke len
, resp. list
on it.
Furthermore the class basically implements text, which can be considered a sepcial case of a string. So why not subclass it from there:
class Text(str):
def show(self):
print(self);
def count_char(self, char):
return self.count(char)
You can get the list and lengt then by calling list(text_instance)
, resp. len(text_instance)
.
You can simplify the letter_freqs
method using string.ascii_letters
:
@property
def letter_freqs(self):
return {i: self.count_char(i) for i in string.ascii_letters}
def find_words(self, *args):
return {i: self.words.count(i) + self.words.count(
Text.swap_first(i)) for i in args}
Some other things:
- You don't need semicolons at the end of statements in python.
- Backslashes to break lines are not required within braces, brackets or parentheses and thus not required in dict comprehensions.
- The
swap_first
function is neither an instance method, nor a class method and thus should be defined on module-level. - Document your class and methods using docstrings.
- Plase also make yourself familiar with PEP8.
Also consider rewriting your words
property. It does not behave reliably e.g. on a text like 'He shot him with a .45.'
I like the general content in the other answers, so I'll just add my 2 cents.
Cache your properties
No need to pointlessly redo your string manipulations every time; just cache them and calculate on-demand.
Use collections.Counter
You're making a histogram; that has built-in support
from collections import Counter
_counter = None
@property
def letter_freqs(self):
if self._counter is None:
self._counter = Counter(letter for letter in self.text if self.meets_filter(letter))
return self._counter
You can implement self.meets_filter
pretty trivially (letter.lower() in string.ascii_lowercase
, for example), and even in-line if you don't have more complex logic.
You can also apply Counter
to your words list too.
def find_words(self, *args):
return Counter(word for word in self.words
if word in args or Text.swap_first(word) in args))
You aren't returning something "raw", so don't call it that
Also, you can simplify this a lot, and fix the bug mentioned before, by just removing the bad text.
import re
_words = None
@property
def words(self):
if self._words is None:
self._words = [re.sub(r"[_\d\W]", '', word) for word in self.text.split()]
return self._words
Stop using semicolons
Please.