0

I'd like to be able to run a python file (file1) that simply loads several large files into memory as python objects, then, with a different python file (file2), access those same objects without having to reload the files into memory a second time. The motivation is that I want to be able to iteratively modify/develop file2 without having to waste time reloading the same large files on every iteration.

In a Jupyter notebook this is easily accomplished by running a cell that loads the files once; these objects are then accessible to all other cells in the notebook. I'd like to be able to establish this same cross talk between separate python files.

Is there a way to establish the within-notebook Jupyter style cell-to-cell sharing of python objects between separate .py files?


(Edited to include an example)

Below is an example scenario; let's say there are two files:

file1.py:

from sklearn.externals import joblib
q = joblib.load('large_dict') #load a large dictionary that has been saved to disk; let's say it takes 2 minutes to load
p = joblib.load('large_dict2') #load another large file; another 2 minutes load time

file2.py:

#notice that the q, p objects are never loaded, but they are accessible 
#(this is possible if the contents of these py files are in separate cells
#in a Jupyter notebook)
for experiment, chromosome in q.iteritems():
 #stuff to work with dict object
for experiment, chromosome in p.iteritems():
 #stuff to work with dict object

I want to do

python file1.py

once, and then do

python file2.py

an arbitrary number of times (i.e. iteratively modify the code in file2). Notice that in this scenario the objects created in file1.py are accessible to file2.py. My question is: Is this scenario possible?

asked Jun 24, 2016 at 9:47
7
  • What do you mean by "Sharing objects between files"? Do you mean "between threads"? Commented Jun 24, 2016 at 9:49
  • I think the question is pretty clear as it is Commented Jun 24, 2016 at 9:51
  • 1
    No, it isn't. I have the same question. A "file" is not something that does anything in Python. It may be the main module, it may be imported, and so on, but then its activity depends on that, not on it being a "file". Please clarify. If multiple people say your question is unclear, it is unclear. Commented Jun 24, 2016 at 9:53
  • @Ryan. Oh, okay. Despite the fact that files have no direct connection to objects and "Sharing objects between files" doesn't even make sense as a question. Commented Jun 24, 2016 at 9:54
  • It appears as though I'm lacking (at least) some knowledge of how python objects are stored in memory. Despite this, I can't see how I was anything but completely clear on what my goal is, so it would be nice to get some constructive feedback or help in clarifying exactly what I'm missing Commented Jun 24, 2016 at 9:58

1 Answer 1

4

Objects do not belong to a specific file. The class they belong to or the function that generates them may have been out of a module that "physically" resides in a different file, but this doesn't matter. As long as you are in a single python interpreter session objects will not need to be copied.

There is one problem: If you have a module and you want to modify it and you want to load the newest version of the module into a running python interpreter which already imported the module it will "refuse" to do so (this is actually a performance optimization so that you can savely import modules more than once).

You can "force" the python interpreter to reload a module by importlib.reload in python3 or the reload builtin in python2. See this question for more information.

In your example the data will not be shared because you are having two different python processes. Data is not shared between two processes (generally, so if you have two "C processes" (programs that are written in C) they also don't share any data. They can though send the data to each other but this needs copying, which you want to avoid).

But you can import the data and functions into a "shared" python interpreter.

file1.py:

from sklearn.externals import joblib
q = joblib.load('large_dict') #load a large dictionary that has been saved to disk; let's say it takes 2 minutes to load
p = joblib.load('large_dict2') #load another large file; another 2 minutes load time

file2.py:

from file1 import q,p
#notice that the q, p objects are never loaded, but they are accessible 
#(this is possible if the contents of these py files are in cells
#in a Jupyter notebook)
 for experiment, chromosome in q.iteritems():
 #stuff to work with dict object
 for experiment, chromosome in p.iteritems():
 #stuff to work with dict object

file3.py:

import file2 
# file2.py will be executed once when importing
# attributes will be accessible by file2.attribute_name
inp = ""
while inp != "quit":
 inp = input("Type anything to reload or 'quit' to quit")
 # 'import file2' would **not** execute file2 because imports
 # are only done once. Use importlib.reload (or reload in python2)
 # to "force" reloading of module 
 importlib.reload(file2)

Then you can "start executing" by python file3.py and it will wait for any input for you to reload file2. Of course you could make the mechanism when to reload arbitrary complex, e.g. reload whenever file2.py changes (watchdog might be helpful for that)

Another way would be to use something like

file4.py:

import importlib
import file2
def reload():
 importlib.reload(file2)

and then use python -i file4.py. You are then in a normal python interpreter but reload() will reload (i.e. executes) file2.

Note that you can do the same in a jupyter/ipython notebook. There are even some magic commands to help you with that. See the documentation for more information on that.

answered Jun 24, 2016 at 9:50
Sign up to request clarification or add additional context in comments.

7 Comments

I don't see any proposals for how I would accomplish my stated goal or a clear answer to the question
Your question doesn't really make sense, because objects have no connection to .py files. So you cannot share objecst between .py files because that does not make sense. If you have two modules (files) and import the first one into the second one, all objects will be available in the second module (without copiying).
Then, if I'm not mistaken, the answer is simply, "No."
No, the behaviour you want (having objects in different modules but not needing to load from file again) is the default behavior. There are problem when modifying a file at runtime but these can be circumvented by using reload. So you can already do what you want to do. If you have a concrete problem (e.g. "I tried to reload module xyz but when I execute the function it still uses the old code") add some details. But without the any concrete featues you need the answer to your question is "python already does that".
I've added an example scenario
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.