This is a command line application which displays the text of an EPUB one sentence at a time.
I am going to make it more robust, including:
- make the segmentation more accurate, because it currently groups together unrelated text sometimes
- make it faster, so that the segmentation occurs a first time, then the segments are saved on the filesystem
- add in more reading capabilities, like a progress meter and the ability to take notes on each sentence
However, for now, I'm really just interested in feedback about optimizing the code I have. Is there any more elegant design pattern?
Thanks very much.
# Note: this code works, but it's slow to start because Spacy's nlp runs for a while before the curses display launches.
# This is a Python program which takes the name of an EPUB from the command line, extracts the plaintext content from the EPUB, then segments it with Spacy, then displays each sentence one at a time on-screen.
# The controls are "n" for next sentence, "b" for last sentence, and "q" to quit the application.
import sys
import spacy
import epub2txt
import curses
def main(stdscr):
# Get the name of the EPUB
textname = sys.argv[1]
# Get the plaintext out of the EPUB
text = epub2txt.epub2txt(textname)
# Segment the text with Spacy.
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
lines = list(doc.sents)
# loop through the sentences with index:
i = 0
while i < len(lines):
stdscr.clear()
stdscr.addstr(str(lines[i]))
stdscr.refresh()
c = stdscr.getch()
if c == ord('q'):
break
elif c == ord('b'):
if i > 0:
i -= 1
elif c == ord('n'):
if i < len(lines) - 1:
i += 1
curses.wrapper(main)
-
\$\begingroup\$ This was a very clear explanation of what your program does \$\endgroup\$Zachary Vance– Zachary Vance2021年12月12日 21:06:38 +00:00Commented Dec 12, 2021 at 21:06
1 Answer 1
Right now this looks okay as-is. In general it's less important to optimize style the shorter something is, because it will be readable either way. As you add features, style and breaking things up will become more important. As such, I've put some suggestions to do now, and some suggestions to do later.
Style now
- Read and follow PEP 8, a fairly universal style guide. It will suggest removing many of your blank lines.
- Use a
if __name__ == '__main__':
guard - Don't use
ord('q')
. Instead changec
from a number into a character. - Rename variables to be more descriptive.
c
should beuser_command
.i
should bevisible_sentence_index
(yes that's long). # loop through the sentences with index:
is inaccurate. You are not looping over sentences, you are letting the user browsing them.- Your check
if i < len(lines) - 1:
does nothing and never triggers.
Style later
- Split out the parsing logic, the display logic, the logic to read a command, and the logic to execute the command into four sections. Each section should probably be a function with an appropriate name and comment.
Suggested features
- Add on-screen documentation for the buttons you can press to navigate, etc. I personally like
nano
's method of on-screen documentation (screenshot, hotkeys are at the bottom). - Add some feedback when you press an invalid button.
- (Difficult!) If you want to make it look nicer, you could add threading and load the UI before you finish parsing the document. Or if you can parse the epub incrementally, you could be displaying the first sentence before the whole document is extracted.
- I suspect the screen flickers right now every time you press forward/back, that would be nice to clean up.
-
\$\begingroup\$ Awesome, thanks so much. Had almost given up hope that anybody would pay attention to my project. "Split out the parsing logic, the display logic, the logic to read a command, and the logic to execute the command into four sections. Each section should probably be a function with an appropriate name and comment." Should it be four functions in one file? Could you perhaps provide an example? I am really interested in alternatives to the "loop". Is there a simple way to trigger actions on key presses so the program is written in more self-contained methods and not one big loop? Thank you! \$\endgroup\$Julius Hamilton– Julius Hamilton2021年12月14日 09:09:19 +00:00Commented Dec 14, 2021 at 9:09