Text class in Python

Question 1

Here is a class named Text, which takes a string as its argument of its constructor. It has property-methods to get the words in the string and also the frequency of alphabets occuring in it.

The words property result in all words in the string. It works as follows, after using the .split method, there is a possibility that a "word" is of the form ,word or word. or ,word. etc. So to encounter this we check each end using .isalpha.

We also have find_words method, which has words as its inputs. This method will summarize the frequencies of a set of words.

How can I write this better to:

The level of a library/package (I don't really understand the techniques for this)
More efficient and readable (using re module is last option)
More functional, but compact

#Author: Arief. A
example = "This is an example, an Anonymous text.\
 This text has 59 letters or numbers.";
class Text:
 def __init__(self, input_text):
 self.text = input_text;
 def show(self):
 print(self.text);
 def as_list(self):
 return list(self.text);
 def get_length(self):
 return len(self.text);
 def count_char(self, char):
 return self.text.count(char);
 @property 
 def letter_freqs(self):
 letters="abcdefghijklmnopqrstuvwxyz";
 res={i: self.count_char(i) \
 for i in letters};
 caps={i.capitalize(): self.count_char(i.capitalize()) \
 for i in letters};
 res.update(caps);
 return res
 @property 
 def words(self):
 raw=self.text.split();
 for i in range(len(raw)):
 if not raw[i][0].isalpha():
 raw[i]=raw[i][1:];
 if not raw[i][-1].isalpha():
 raw[i]=raw[i][0:-1];
 [raw.remove('') for i in range(raw.count(''))];
 return raw 
 def swap_first(word):
 swaped=word[0].swapcase();
 res=swaped+word[1:];
 return res
 def find_words(self, *args):
 return {i: self.words.count(i)\
 +self.words.count(Text.swap_first(i))\
 for i in args};

Example

text_obj=Text(example);
text_obj.show();
print(text_obj.as_list());
print(text_obj.get_length());
print(text_obj.count_char("a"));
print(text_obj.words)
print(text_obj.find_words("Anonymous", "Text","This"));

Output

This is an example, an Anonymous text. This text has 59 letters or numbers.
['T', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', 'n', ' ', 'e', 'x', 'a', 'm', 'p', 'l', 'e', ',', ' ', 'a', 'n', ' ', 'A', 'n', 'o', 'n', 'y', 'm', 'o', 'u', 's', ' ', 't', 'e', 'x', 't', '.', ' ', 'T', 'h', 'i', 's', ' ', 't', 'e', 'x', 't', ' ', 'h', 'a', 's', ' ', '5', '9', ' ', 'l', 'e', 't', 't', 'e', 'r', 's', ' ', 'o', 'r', ' ', 'n', 'u', 'm', 'b', 'e', 'r', 's', '.']
75
4
['This', 'is', 'an', 'example', 'an', 'Anonymous', 'text', 'This', 'text', 'has', 'letters', 'or', 'numbers']
{'This': 2, 'Text': 2, 'Anonymous': 1}

Question 2

Instead of letters="abcdefghijklmnopqrstuvwxyz"; use a built-in constant:
```
letters = string.letters;
```
The code only inspects first and last character of the word. It means that aaa.bbb will be treated as a single word. Is this an intent?
split(sep) allows a string of separators. Calling it with
```
raw = self.text.split(string.punctuation + string.whitespace)
```
eliminates the need to inspect individual words. If you still want to inspect them, a loop for i in range(len(raw)): should be written more idiomatically as
```
for word in raw:
```
(you don't really need i anywhere in the loop).

Question 3

instead of string.letters, use string.ascii_lowercase :)

Question 4

@vnp thanks for the input, just know there is string module.

Question 5

The aaa.bbb was not expected, i was assuming there is no typo. The constants in string is useful.

Question 6

This I consider un-pythonic:

class Text:
 ...
 def as_list(self):
 return list(self.text);
 def get_length(self):
 return len(self.text);
 ...

You write methods on a class which use trivial builtins, but the instance itself will behave completely differently when you invoke len, resp. list on it.

Furthermore the class basically implements text, which can be considered a sepcial case of a string. So why not subclass it from there:

class Text(str):
 def show(self):
 print(self);
 def count_char(self, char):
 return self.count(char)

You can get the list and lengt then by calling list(text_instance), resp. len(text_instance).

You can simplify the letter_freqs method using string.ascii_letters:

 @property 
 def letter_freqs(self):
 return {i: self.count_char(i) for i in string.ascii_letters}
 def find_words(self, *args):
 return {i: self.words.count(i) + self.words.count(
 Text.swap_first(i)) for i in args}

Some other things:

You don't need semicolons at the end of statements in python.
Backslashes to break lines are not required within braces, brackets or parentheses and thus not required in dict comprehensions.
The swap_first function is neither an instance method, nor a class method and thus should be defined on module-level.
Document your class and methods using docstrings.
Plase also make yourself familiar with PEP8.

Also consider rewriting your words property. It does not behave reliably e.g. on a text like 'He shot him with a .45.'

Question 7

I like the general content in the other answers, so I'll just add my 2 cents.

Cache your properties

No need to pointlessly redo your string manipulations every time; just cache them and calculate on-demand.

Use `collections.Counter`

You're making a histogram; that has built-in support

from collections import Counter
_counter = None
@property 
def letter_freqs(self):
 if self._counter is None:
 self._counter = Counter(letter for letter in self.text if self.meets_filter(letter))
 return self._counter

You can implement self.meets_filter pretty trivially (letter.lower() in string.ascii_lowercase, for example), and even in-line if you don't have more complex logic.

You can also apply Counter to your words list too.

def find_words(self, *args):
 return Counter(word for word in self.words 
 if word in args or Text.swap_first(word) in args))

You aren't returning something "raw", so don't call it that

Also, you can simplify this a lot, and fix the bug mentioned before, by just removing the bad text.

import re
_words = None
@property 
def words(self):
 if self._words is None:
 self._words = [re.sub(r"[_\d\W]", '', word) for word in self.text.split()]
 return self._words

Stop using semicolons

Please.

vnp vnp 58.7k4 gold badges55 silver badges144 bronze badges · Answer 1 · 2018-02-22 17:52:28Z

Instead of letters="abcdefghijklmnopqrstuvwxyz"; use a built-in constant:
```
letters = string.letters;
```
The code only inspects first and last character of the word. It means that aaa.bbb will be treated as a single word. Is this an intent?
split(sep) allows a string of separators. Calling it with
```
raw = self.text.split(string.punctuation + string.whitespace)
```
eliminates the need to inspect individual words. If you still want to inspect them, a loop for i in range(len(raw)): should be written more idiomatically as
```
for word in raw:
```
(you don't really need i anywhere in the loop).

@vnp thanks for the input, just know there is string module.
The aaa.bbb was not expected, i was assuming there is no typo. The constants in string is useful.

Richard Neumann Richard Neumann 6,42122 silver badges41 bronze badges · Answer 2 · 2018-02-26 12:11:50Z

This I consider un-pythonic:

class Text:
 ...
 def as_list(self):
 return list(self.text);
 def get_length(self):
 return len(self.text);
 ...

You write methods on a class which use trivial builtins, but the instance itself will behave completely differently when you invoke len, resp. list on it.

Furthermore the class basically implements text, which can be considered a sepcial case of a string. So why not subclass it from there:

class Text(str):
 def show(self):
 print(self);
 def count_char(self, char):
 return self.count(char)

You can get the list and lengt then by calling list(text_instance), resp. len(text_instance).

You can simplify the letter_freqs method using string.ascii_letters:

 @property 
 def letter_freqs(self):
 return {i: self.count_char(i) for i in string.ascii_letters}
 def find_words(self, *args):
 return {i: self.words.count(i) + self.words.count(
 Text.swap_first(i)) for i in args}

Some other things:

You don't need semicolons at the end of statements in python.
Backslashes to break lines are not required within braces, brackets or parentheses and thus not required in dict comprehensions.
The swap_first function is neither an instance method, nor a class method and thus should be defined on module-level.
Document your class and methods using docstrings.
Plase also make yourself familiar with PEP8.

Also consider rewriting your words property. It does not behave reliably e.g. on a text like 'He shot him with a .45.'

Dan Oberlam Dan Oberlam 8,0492 gold badges33 silver badges74 bronze badges · Answer 3 · 2018-02-26 12:52:56Z

I like the general content in the other answers, so I'll just add my 2 cents.

Cache your properties

No need to pointlessly redo your string manipulations every time; just cache them and calculate on-demand.

Use `collections.Counter`

You're making a histogram; that has built-in support

from collections import Counter
_counter = None
@property 
def letter_freqs(self):
 if self._counter is None:
 self._counter = Counter(letter for letter in self.text if self.meets_filter(letter))
 return self._counter

You can implement self.meets_filter pretty trivially (letter.lower() in string.ascii_lowercase, for example), and even in-line if you don't have more complex logic.

You can also apply Counter to your words list too.

def find_words(self, *args):
 return Counter(word for word in self.words 
 if word in args or Text.swap_first(word) in args))

You aren't returning something "raw", so don't call it that

Also, you can simplify this a lot, and fix the bug mentioned before, by just removing the bad text.

import re
_words = None
@property 
def words(self):
 if self._words is None:
 self._words = [re.sub(r"[_\d\W]", '', word) for word in self.text.split()]
 return self._words

Stop using semicolons

Please.

Stack Exchange Network

Text class in Python

3 Answers 3

Cache your properties

Use `collections.Counter`

You aren't returning something "raw", so don't call it that

Stop using semicolons

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Text class in Python

3 Answers 3

Cache your properties

Use collections.Counter

You aren't returning something "raw", so don't call it that

Stop using semicolons

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Use `collections.Counter`