Overall, the code is quite clean and understandable. Though, I have never done anything like you are doing, but here are some notes about the code:
the
Token
andTokenTypes
classes may use "__slots__
" may use "__slots__
" for faster attribute access and memory savingsthe
char in builtin_map.keys()
can and should be replaced with justchar in builtin_map
to avoid creating the list of keys and looking up in the list instead of in the hashtable -O(n)
vsO(1)
the
char in ('?', '!', '.')
should be replaced with something likechar in ADDITIONAL_BUILTIN_CHARS
whereADDITIONAL_BUILTIN_CHARS = {'?', '!', '.'}
(note - Python 2.7+ syntax) - it is a set that you should define on the module level - or, may be even move to_builtins
module to keep all the related constants togetherall the other places where you use
in
to check if character is one of multiple characters can benefit from a similar improvement - defining as a set, giving a meaningful name and moving to the module levelthe Lexer should probably support the "iterator" protocol "iterator" protocol (advancing to the next token continuously), like for example this or this one does
the PEP8 import guidelines suggest to put a new line between the different types of imports, replace:
from collections import namedtuple from _builtins import builtin_map
with:
from collections import namedtuple
from _builtins import builtin_map
- one-line doc strings can stay on a single line (PEP257)
As a side note, I think you should not be putting the "modified date" into the module itself - let the github and git handle it.
Overall, the code is quite clean and understandable. Though, I have never done anything like you are doing, but here are some notes about the code:
the
Token
andTokenTypes
classes may use "__slots__
" for faster attribute access and memory savingsthe
char in builtin_map.keys()
can and should be replaced with justchar in builtin_map
to avoid creating the list of keys and looking up in the list instead of in the hashtable -O(n)
vsO(1)
the
char in ('?', '!', '.')
should be replaced with something likechar in ADDITIONAL_BUILTIN_CHARS
whereADDITIONAL_BUILTIN_CHARS = {'?', '!', '.'}
(note - Python 2.7+ syntax) - it is a set that you should define on the module level - or, may be even move to_builtins
module to keep all the related constants togetherall the other places where you use
in
to check if character is one of multiple characters can benefit from a similar improvement - defining as a set, giving a meaningful name and moving to the module levelthe Lexer should probably support the "iterator" protocol (advancing to the next token continuously), like for example this or this one does
the PEP8 import guidelines suggest to put a new line between the different types of imports, replace:
from collections import namedtuple from _builtins import builtin_map
with:
from collections import namedtuple
from _builtins import builtin_map
- one-line doc strings can stay on a single line (PEP257)
As a side note, I think you should not be putting the "modified date" into the module itself - let the github and git handle it.
Overall, the code is quite clean and understandable. Though, I have never done anything like you are doing, but here are some notes about the code:
the
Token
andTokenTypes
classes may use "__slots__
" for faster attribute access and memory savingsthe
char in builtin_map.keys()
can and should be replaced with justchar in builtin_map
to avoid creating the list of keys and looking up in the list instead of in the hashtable -O(n)
vsO(1)
the
char in ('?', '!', '.')
should be replaced with something likechar in ADDITIONAL_BUILTIN_CHARS
whereADDITIONAL_BUILTIN_CHARS = {'?', '!', '.'}
(note - Python 2.7+ syntax) - it is a set that you should define on the module level - or, may be even move to_builtins
module to keep all the related constants togetherall the other places where you use
in
to check if character is one of multiple characters can benefit from a similar improvement - defining as a set, giving a meaningful name and moving to the module levelthe Lexer should probably support the "iterator" protocol (advancing to the next token continuously), like for example this or this one does
the PEP8 import guidelines suggest to put a new line between the different types of imports, replace:
from collections import namedtuple from _builtins import builtin_map
with:
from collections import namedtuple
from _builtins import builtin_map
- one-line doc strings can stay on a single line (PEP257)
As a side note, I think you should not be putting the "modified date" into the module itself - let the github and git handle it.
Overall, the code is quite clean and understandable. Though, I have never done anything like you are doing, but here are some notes about the code:
the
Token
andTokenTypes
classes may use "__slots__
" for faster attribute access and memory savingsthe
char in builtin_map.keys()
can and should be replaced with justchar in builtin_map
to avoid creating the list of keys and looking up in the list instead of in the hashtable -O(n)
vsO(1)
the
char in ('?', '!', '.')
should be replaced with something likechar in ADDITIONAL_BUILTIN_CHARS
whereADDITIONAL_BUILTIN_CHARS = {'?', '!', '.'}
(note - Python 2.7+ syntax) - it is a set that you should define on the module level - or, may be even move to_builtins
module to keep all the related constants togetherall the other places where you use
in
to check if character is one of multiple characters can benefit from a similar improvement - defining as a set, giving a meaningful name and moving to the module levelthe Lexer should probably support the "iterator" protocol (advancing to the next token continuously), like for example this or this one does
the PEP8 import guidelines suggest to put a new line between the different types of imports, replace:
from collections import namedtuple from _builtins import builtin_map
with:
from collections import namedtuple
from _builtins import builtin_map
- one-line doc strings can stay on a single line (PEP257 )
As a side note, I think you should not be putting the "modified date" into the module itself - let the github and git handle it.
Overall, the code is quite clean and understandable. Though, I have never done anything like you are doing, but here are some notes about the code:
the
Token
andTokenTypes
classes may use "__slots__
" for faster attribute access and memory savingsthe
char in builtin_map.keys()
can and should be replaced with justchar in builtin_map
to avoid creating the list of keys and looking up in the list instead of in the hashtable -O(n)
vsO(1)
the
char in ('?', '!', '.')
should be replaced with something likechar in ADDITIONAL_BUILTIN_CHARS
whereADDITIONAL_BUILTIN_CHARS = {'?', '!', '.'}
(note - Python 2.7+ syntax) - it is a set that you should define on the module level - or, may be even move to_builtins
module to keep all the related constants togetherall the other places where you use
in
to check if character is one of multiple characters can benefit from a similar improvement - defining as a set, giving a meaningful name and moving to the module levelthe Lexer should probably support the "iterator" protocol (advancing to the next token continuously), like for example this or this one does
the PEP8 import guidelines suggest to put a new line between the types of imports, replace:
from collections import namedtuple from _builtins import builtin_map
with:
from collections import namedtuple
from _builtins import builtin_map
As a side note, I think you should not be putting the "modified date" into the module itself - let the github and git handle it.
Overall, the code is quite clean and understandable. Though, I have never done anything like you are doing, but here are some notes about the code:
the
Token
andTokenTypes
classes may use "__slots__
" for faster attribute access and memory savingsthe
char in builtin_map.keys()
can and should be replaced with justchar in builtin_map
to avoid creating the list of keys and looking up in the list instead of in the hashtable -O(n)
vsO(1)
the
char in ('?', '!', '.')
should be replaced with something likechar in ADDITIONAL_BUILTIN_CHARS
whereADDITIONAL_BUILTIN_CHARS = {'?', '!', '.'}
(note - Python 2.7+ syntax) - it is a set that you should define on the module level - or, may be even move to_builtins
module to keep all the related constants togetherall the other places where you use
in
to check if character is one of multiple characters can benefit from a similar improvement - defining as a set, giving a meaningful name and moving to the module levelthe Lexer should probably support the "iterator" protocol (advancing to the next token continuously), like for example this or this one does
the PEP8 import guidelines suggest to put a new line between the different types of imports, replace:
from collections import namedtuple from _builtins import builtin_map
with:
from collections import namedtuple
from _builtins import builtin_map
- one-line doc strings can stay on a single line (PEP257 )
As a side note, I think you should not be putting the "modified date" into the module itself - let the github and git handle it.
Overall, the code is quite clean and understandable. Though, I have never done anything like you are doing, but here are some notes about the code:
- the
Token
andTokenTypes
classes may use "__slots__
" for faster attribute access and memory savingsthe
Token
andTokenTypes
classes may use "__slots__
" for faster attribute access and memory savings - the
char in builtin_map.keys()
can and should be replaced with justchar in builtin_map
to avoid creating the list of keys and looking up in the list instead of in the hashtable -O(n)
vsO(1)
the
char in builtin_map.keys()
can and should be replaced with justchar in builtin_map
to avoid creating the list of keys and looking up in the list instead of in the hashtable -O(n)
vsO(1)
- the
char in ('?', '!', '.')
should be replaced with something likechar in ADDITIONAL_BUILTIN_CHARS
whereADDITIONAL_BUILTIN_CHARS = {'?', '!', '.'}
(note - Python 2.7+ syntax) - it is a set that you should define on the module level - or, may be even move to_builtins
module to keep all the related constants togetherthe
char in ('?', '!', '.')
should be replaced with something likechar in ADDITIONAL_BUILTIN_CHARS
whereADDITIONAL_BUILTIN_CHARS = {'?', '!', '.'}
(note - Python 2.7+ syntax) - it is a set that you should define on the module level - or, may be even move to_builtins
module to keep all the related constants together - all the other places where you use
in
to check if character is one of multiple characters can benefit from a similar improvement - defining as a set, giving a meaningful name and moving to the module levelall the other places where you use
in
to check if character is one of multiple characters can benefit from a similar improvement - defining as a set, giving a meaningful name and moving to the module level the Lexer should probably support the "iterator" protocol (advancing to the next token continuously), like for example this or this one does
the PEP8 import guidelines suggest to put a new line between the types of imports, replace:
from collections import namedtuple from _builtins import builtin_map
with:
from collections import namedtuple
from _builtins import builtin_map
As a side note, I think you should not be putting the "modified date" into the module itself - let the github and git handle it.
Overall, the code is quite clean and understandable. Though, I have never done anything like you are doing, but here are some notes about the code:
- the
Token
andTokenTypes
classes may use "__slots__
" for faster attribute access and memory savings - the
char in builtin_map.keys()
can and should be replaced with justchar in builtin_map
to avoid creating the list of keys and looking up in the list instead of in the hashtable -O(n)
vsO(1)
- the
char in ('?', '!', '.')
should be replaced with something likechar in ADDITIONAL_BUILTIN_CHARS
whereADDITIONAL_BUILTIN_CHARS = {'?', '!', '.'}
(note - Python 2.7+ syntax) - it is a set that you should define on the module level - or, may be even move to_builtins
module to keep all the related constants together - all the other places where you use
in
to check if character is one of multiple characters can benefit from a similar improvement - defining as a set, giving a meaningful name and moving to the module level
Overall, the code is quite clean and understandable. Though, I have never done anything like you are doing, but here are some notes about the code:
the
Token
andTokenTypes
classes may use "__slots__
" for faster attribute access and memory savingsthe
char in builtin_map.keys()
can and should be replaced with justchar in builtin_map
to avoid creating the list of keys and looking up in the list instead of in the hashtable -O(n)
vsO(1)
the
char in ('?', '!', '.')
should be replaced with something likechar in ADDITIONAL_BUILTIN_CHARS
whereADDITIONAL_BUILTIN_CHARS = {'?', '!', '.'}
(note - Python 2.7+ syntax) - it is a set that you should define on the module level - or, may be even move to_builtins
module to keep all the related constants togetherall the other places where you use
in
to check if character is one of multiple characters can benefit from a similar improvement - defining as a set, giving a meaningful name and moving to the module levelthe Lexer should probably support the "iterator" protocol (advancing to the next token continuously), like for example this or this one does
the PEP8 import guidelines suggest to put a new line between the types of imports, replace:
from collections import namedtuple from _builtins import builtin_map
with:
from collections import namedtuple
from _builtins import builtin_map
As a side note, I think you should not be putting the "modified date" into the module itself - let the github and git handle it.