I have a file in gran/config.py AND I cannot import this file (not an option).
Inside this config.py, there is the following code
...<more code>
animal = dict(
bear = r'^bear4x',
tiger = r'^.*\tiger\b.*$'
)
...<more code>
I want to be able parse r'^bear4x' or r'^.*\tiger\b.*$' based on bear or tiger.
I started out with
try:
text = open('gran/config.py','r')
tline = filter('not sure', text.readlines())
text.close()
except IOError, str:
pass
I was hoping to grab the whole animal dict by
grab = re.compile("^animal\s*=\s*('.*')") or something like that
and maybe change tline to tline = filter(grab.search,text.readlines())
but it only grabs animal = dict( and not the following lines of dict.
how can i grab multiple lines?
look for animal then confirm the first '(' then continue to look until ')' ??
Note: the size of animal dict may change so anything static approach (like grab 4 extra lines after animal is found) wouldnt work
-
What kind of error appears when you try to import the file?badc0re– badc0re2013年08月27日 18:43:35 +00:00Commented Aug 27, 2013 at 18:43
-
@badc0re hmm not related because not an option. importing is not an option because config.py is trying to import something that's not available so I have to treat it as a text file. by importing it, it will try to run the code, import something thats not available.ealeon– ealeon2013年08月27日 18:47:27 +00:00Commented Aug 27, 2013 at 18:47
3 Answers 3
Maybe you should try some AST hacks? With python it is easy, just:
import ast
config= ast.parse( file('config.py').read() )
So know you have your parsed module. You need to extract assign to animals and evaluate it. There are safe ast.literal_eval function but since we make a call to dict it wont work here. The idea is to traverse whole module tree leaving only assigns and run it localy:
class OnlyAssings(ast.NodeTransformer):
def generic_visit( self, node ):
return None #throw other things away
def visit_Module( self, node ):
#We need to visit Module and pass it
return ast.NodeTransformer.generic_visit( self, node )
def visit_Assign(self, node):
if node.targets[0].id == 'animals': # this you may want to change
return node #pass it
return None # throw away
config= OnlyAssings().visit(config)
Compile it and run:
exec( compile(config,'config.py','exec') )
print animals
If animals should be in some dictionary, pass it as a local to exec:
data={}
exec( compile(config,'config.py','exec'), globals(), data )
print data['animals']
There is much more you can do with ast hacking, like visit all If and For statement or much more. You need to check documentation.
Comments
If the only reason you can't import that file as-is is because of imports that will fail otherwise, you can potentially hack your way around it than trying to process a perfectly good Python file as just text.
For example, if I have a file named busted_import.py with:
import doesnotexist
foo = 'imported!'
And I try to import it, I will get an ImportError. But if I define what the doesnotexist module refers to using sys.modules before trying to import it, the import will succeed:
>>> import sys
>>> sys.modules['doesnotexist'] = ""
>>> import busted_import
>>> busted_import.foo
'imported!'
So if you can just isolate the imports that will fail in your Python file and redefine those prior to attempting an import, you can work around the ImportErrors
4 Comments
busted_import.py is your config.py. Try importing, and when you get the ImportError, simply redirect that module within sys.modules using the example above and then try the import again. If you get another ImportError, repeat the process until you get no more ImportErrors.I am not getting what exactly are you trying to do.
If you want to process each line with regular expression - you have ^ in regular expression re.compile("^animal\s*=\s*('.*')"). It matches only when animal is at the start of line, not after some spaces. Also of course it does not match bear or tiger - use something like re.compile("^\s*([a-z]+)\s*=\s*('.*')").
If you want to process multiple lines with single regular expression, read about re.DOTALL and re.MULTILINE and how they affect matching newline characters:
http://docs.python.org/2/library/re.html#re.MULTILINE
Also note that text.readlines() reads lines, so the filter function in filter('not sure', text.readlines()) is run on each line, not on whole file. You cannot pass regular expression in this filter(<re here>, text.readlines()) and hope it will match multiple lines.
BTW processing Python files (and HTML, XML, JSON... files) using regular expressions is not wise. For every regular expression you write there are cases where it will not work. Use parser designed for given format - for Python source code it's ast. But for your use case ast is too complex.
Maybe it would be better to use classic config files and configparser. More structured data like lists and dicts can be easily stored in JSON or YAML files.
4 Comments
text.readlines(). Read all contents of the file into single string: text.read()