I've decided to write a very simply output programming language. All the user does is write ASCII values inside ASCII fish, and the interpreter pieces the values together and outputs them.
I'm mainly looking for feedback on the interpreter, as the language itself is very easy to understand.
Here's what a Hello, World!
program looks like in Fishy
:
><72> ><101> ><108> ><108> ><111> ><44> ><32> ><87> ><111> ><114> ><108> ><100> ><33>
All the rules of the language are listed in the module docstring of the program.
"""
Fishy (.fishy extension)
><> Frontfish
Implementation is simple:
You enter ASCII values between the facing signs <>
Commands on separate lines will have output separated by a new line
Example:
><98> ><112> ><113> ><107>
bpqk
><97>
><108>
><101>
a
l
e
NO TRAILING WHITESPACE!
Trailing whitespace after the last fish on the line will result in a syntax error
"""
import argparse
import os
import sys
from typing import List
def run_code(code: List[str]):
"""
Runs the passed Fishy Code
"""
for line in code:
# Clean up code and separate commands#
line = line.strip("\n")
commands = line.split(" ")
# Check if line has multiple statements in it
if len(commands) > 1:
if correct_syntax(commands):
output = "".join(chr(get_number(fish)) for fish in commands)
print(output)
else:
if correct_syntax(commands):
print(chr(get_number(commands[0])))
def correct_syntax(pond: List[str]) -> bool:
"""
Checks the syntax of the passed list of commands on the following criteria:
Is a fish ><..>
Correct Example:
><98> ><108> ><56>
Incorrect Example:
><98> >><<76>> ><[108>
"""
for fish in pond:
if not is_fish(fish):
sys.exit(f"Incorrect Syntax: {fish}")
return True
def is_fish(fish: str) -> bool:
"""
Returns if the passed fish is the fish or not
Fish: Starts with >< ends with >
A fish like so ><98g> will be caught by "get_number()" function
"""
return fish.startswith("><") and fish.endswith(">")
def get_number(fish: str) -> int:
"""
Returns the number in the fish
"""
# Check font fish first #
try:
number = int(fish[2:-1])
except ValueError:
sys.exit(f"Incorrect Syntax: {fish}")
return number
def get_content(file: str) -> List[str]:
"""
Returns all the content in the passed file path
:param file -> str: File to read content
:return List[str]: Content in file
"""
with open(file, "r") as file:
return [line for line in file]
def main() -> None:
"""
Sets up argparser and runs main program
"""
parser = argparse.ArgumentParser(description="Enter path to .fishy program file")
parser.add_argument("Path", metavar="path", type=str, help="path to .fishy program file")
args = parser.parse_args()
file_path = args.Path
if not os.path.isfile(file_path):
sys.exit("The file does not exist")
content = get_content(file_path)
run_code(content)
if __name__ == "__main__":
main()
2 Answers 2
Restructuring and optimization
The initial approach introduces inefficient file processing as get_content
function reads all lines from the input file into a list at once and holds that list in memory throughout the entire processing. The traversal of the lines that were read is then redundantly repeated in run_code
function.
The more efficient way is to convert get_content
into a generator function and consume one line from file at a time on demand.
The optimized get_content
function:
def get_content(file: str) -> List[str]:
"""
Yields lines from the passed file path
:param file -> str: File to read content
:return List[str]: Content in file
"""
with open(file, "r") as file:
for line in file:
yield line.rstrip()
run_code
function is renamed to parse_code
Inefficiency of validating and traversing commands
In parse_code
(formerly run_code
) function the commands
sequence is potentially being traversed twice:
once on correct_syntax(commands)
call and then - on getting numbers chr(get_number(fish)) for fish in commands
.
Moreover, consequent validations in this case may lead to redundant calculations.
Consider the following situation: commands
contains 10 items, all of them passed correct_syntax
check but then, the 9th item fails on get_number
check. That causes 10 redundant operations/checks.
To optimize validations we notice that is_fish
and get_number
are conceptually dependent on the same context - "fish" and are intended to validate the same "fish" object.
Thus, those 2 validations are reasonably combined/consolidated into one validation function is_fish
:
def is_fish(fish: str) -> bool:
"""
Validates "fish" item
Fish: Starts with >< ends with > and has number inside
A fish like so ><98g> will fail the check
"""
return fish.startswith("><") and fish.endswith(">") and fish[2:-1].isdigit()
get_number
function is now removed.
The correct_syntax
function is renamed to get_fish_numbers
and its responsibility now is "Collect fish numbers from valid fishes":
def get_fish_numbers(pond: List[str]) -> bool:
"""
Collects fish numbers with checking the syntax of the passed list of commands on the following criteria:
Is a fish ><..>
Correct Example:
><98> ><108> ><56>
Incorrect Example:
><98> >><<76>> ><[108>
"""
fish_numbers = []
for fish in pond:
if not is_fish(fish):
sys.exit(f"Incorrect Syntax: {fish}")
fish_numbers.append(int(fish[2:-1]))
return fish_numbers
And finally the optimized parse_code
function:
def parse_code(code: List[str]):
"""
Parse and output the passed Fishy Code
"""
for line in code:
# Clean up code and separate commands#
commands = line.split(" ")
# Check if line has multiple statements in it
fish_numbers = get_fish_numbers(commands)
if len(fish_numbers) > 1:
output = "".join(chr(num) for num in fish_numbers)
print(output)
else:
print(chr(fish_numbers[0]))
Here is a potential solution which was minimized from a finite automata. To make this solution more maintainable, a parse tree could have been created (or an explicit finite automata) so that the syntax can be modified in the future.
Note: this answer is a bit academic in that its practical use is limited, however, provides a starting point to convert this program into a parse tree.
It doesn't have the file reading capabilities or the argparse
abilities, but it has the core of the solution (checks if the program is valid and if so, run it.)
import re
input_program = "><72> ><101> ><108> ><108> ><111> ><44> ><32> ><87> ><111> ><114> ><108> ><100> ><33>"
regex = r"(?:^\>\<((1|2|3|4|5|6|7|8|9|10|1{2}|12|13|14|15|16|17|18|19|20|21|2{2}|23|24|25|26|27|28|29|30|31|32|3{2}|34|35|36|37|38|39|40|41|42|43|4{2}|45|46|47|48|49|50|51|52|53|54|5{2}|56|57|58|59|60|61|62|63|64|65|6{2}|67|68|69|70|71|72|73|74|75|76|7{2}|78|79|80|81|82|83|84|85|86|87|8{2}|89|90|91|92|93|94|95|96|97|98|9{2}|10{2}|101|102|103|104|105|106|107|108|109|1{2}0|1{3}|1{2}2|1{2}3|1{2}4|1{2}5|1{2}6|1{2}7|1{2}8|1{2}9|120|121|12{2}|123|124|125|126|127))\> )+(?:\>\<(1|2|3|4|5|6|7|8|9|10|1{2}|12|13|14|15|16|17|18|19|20|21|2{2}|23|24|25|26|27|28|29|30|31|32|3{2}|34|35|36|37|38|39|40|41|42|43|4{2}|45|46|47|48|49|50|51|52|53|54|5{2}|56|57|58|59|60|61|62|63|64|65|6{2}|67|68|69|70|71|72|73|74|75|76|7{2}|78|79|80|81|82|83|84|85|86|87|8{2}|89|90|91|92|93|94|95|96|97|98|9{2}|10{2}|101|102|103|104|105|106|107|108|109|1{2}0|1{3}|1{2}2|1{2}3|1{2}4|1{2}5|1{2}6|1{2}7|1{2}8|1{2}9|120|121|12{2}|123|124|125|126|127)\>)$"
pattern = re.compile(regex)
def extract_ascii_codes(input_text):
"""
Converts the ASCII codes into text
"""
matches = re.finditer(r"\d+", input_text)
for matchNum, match in enumerate(matches, start=1):
yield int(match.group())
def parse_line(input_program):
"""
Checks if the line in the program is syntatically valid; returns if it is
"""
if pattern.match(input_program) is not None:
return (''.join(map(chr, extract_ascii_codes(input_program))))
parsed_program = list(map(parse_line, input_program.split("\n")))
if all(parsed_program):
for a_line in parsed_program:
print(a_line)
else:
print("Syntax error")
Finite automata (condensed):
-
\$\begingroup\$ Why
1{2}
instead of11
in the regex? \$\endgroup\$L. F.– L. F.2019年12月02日 13:20:03 +00:00Commented Dec 2, 2019 at 13:20 -
\$\begingroup\$ I guess if its automatically generated the regex is simplified anywhere the generator finds repitition \$\endgroup\$andowt– andowt2019年12月02日 14:06:23 +00:00Commented Dec 2, 2019 at 14:06