I'd like to be able to run a python file (file1) that simply loads several large files into memory as python objects, then, with a different python file (file2), access those same objects without having to reload the files into memory a second time. The motivation is that I want to be able to iteratively modify/develop file2 without having to waste time reloading the same large files on every iteration.
In a Jupyter notebook this is easily accomplished by running a cell that loads the files once; these objects are then accessible to all other cells in the notebook. I'd like to be able to establish this same cross talk between separate python files.
Is there a way to establish the within-notebook Jupyter style cell-to-cell sharing of python objects between separate .py files?
(Edited to include an example)
Below is an example scenario; let's say there are two files:
file1.py:
from sklearn.externals import joblib
q = joblib.load('large_dict') #load a large dictionary that has been saved to disk; let's say it takes 2 minutes to load
p = joblib.load('large_dict2') #load another large file; another 2 minutes load time
file2.py:
#notice that the q, p objects are never loaded, but they are accessible
#(this is possible if the contents of these py files are in separate cells
#in a Jupyter notebook)
for experiment, chromosome in q.iteritems():
#stuff to work with dict object
for experiment, chromosome in p.iteritems():
#stuff to work with dict object
I want to do
python file1.py
once, and then do
python file2.py
an arbitrary number of times (i.e. iteratively modify the code in file2). Notice that in this scenario the objects created in file1.py are accessible to file2.py. My question is: Is this scenario possible?
-
What do you mean by "Sharing objects between files"? Do you mean "between threads"?Vsevolod Timchenko– Vsevolod Timchenko2016年06月24日 09:49:51 +00:00Commented Jun 24, 2016 at 9:49
-
I think the question is pretty clear as it isRyan– Ryan2016年06月24日 09:51:30 +00:00Commented Jun 24, 2016 at 9:51
-
1No, it isn't. I have the same question. A "file" is not something that does anything in Python. It may be the main module, it may be imported, and so on, but then its activity depends on that, not on it being a "file". Please clarify. If multiple people say your question is unclear, it is unclear.Rory Daulton– Rory Daulton2016年06月24日 09:53:20 +00:00Commented Jun 24, 2016 at 9:53
-
@Ryan. Oh, okay. Despite the fact that files have no direct connection to objects and "Sharing objects between files" doesn't even make sense as a question.Vsevolod Timchenko– Vsevolod Timchenko2016年06月24日 09:54:36 +00:00Commented Jun 24, 2016 at 9:54
-
It appears as though I'm lacking (at least) some knowledge of how python objects are stored in memory. Despite this, I can't see how I was anything but completely clear on what my goal is, so it would be nice to get some constructive feedback or help in clarifying exactly what I'm missingRyan– Ryan2016年06月24日 09:58:49 +00:00Commented Jun 24, 2016 at 9:58
1 Answer 1
Objects do not belong to a specific file. The class they belong to or the function that generates them may have been out of a module that "physically" resides in a different file, but this doesn't matter. As long as you are in a single python interpreter session objects will not need to be copied.
There is one problem: If you have a module and you want to modify it and you want to load the newest version of the module into a running python interpreter which already imported the module it will "refuse" to do so (this is actually a performance optimization so that you can savely import modules more than once).
You can "force" the python interpreter to reload a module by importlib.reload in python3 or the reload builtin in python2. See this question for more information.
In your example the data will not be shared because you are having two different python processes. Data is not shared between two processes (generally, so if you have two "C processes" (programs that are written in C) they also don't share any data. They can though send the data to each other but this needs copying, which you want to avoid).
But you can import the data and functions into a "shared" python interpreter.
file1.py:
from sklearn.externals import joblib
q = joblib.load('large_dict') #load a large dictionary that has been saved to disk; let's say it takes 2 minutes to load
p = joblib.load('large_dict2') #load another large file; another 2 minutes load time
file2.py:
from file1 import q,p
#notice that the q, p objects are never loaded, but they are accessible
#(this is possible if the contents of these py files are in cells
#in a Jupyter notebook)
for experiment, chromosome in q.iteritems():
#stuff to work with dict object
for experiment, chromosome in p.iteritems():
#stuff to work with dict object
file3.py:
import file2
# file2.py will be executed once when importing
# attributes will be accessible by file2.attribute_name
inp = ""
while inp != "quit":
inp = input("Type anything to reload or 'quit' to quit")
# 'import file2' would **not** execute file2 because imports
# are only done once. Use importlib.reload (or reload in python2)
# to "force" reloading of module
importlib.reload(file2)
Then you can "start executing" by python file3.py and it will wait for any input for you to reload file2. Of course you could make the mechanism when to reload arbitrary complex, e.g. reload whenever file2.py changes (watchdog might be helpful for that)
Another way would be to use something like
file4.py:
import importlib
import file2
def reload():
importlib.reload(file2)
and then use python -i file4.py. You are then in a normal python interpreter but reload() will reload (i.e. executes) file2.
Note that you can do the same in a jupyter/ipython notebook. There are even some magic commands to help you with that. See the documentation for more information on that.
7 Comments
reload. So you can already do what you want to do. If you have a concrete problem (e.g. "I tried to reload module xyz but when I execute the function it still uses the old code") add some details. But without the any concrete featues you need the answer to your question is "python already does that".