[PEAK] Virtual Pythons
Phillip J. Eby
pje at telecommunity.com
Wed Jul 28 00:32:31 EDT 2004
One of the mildly annoying things about trying to develop any sort of
really sophisticated extensible system in Python is that modules are
global. That is, you can only have one instance of a given module or package.
This leads to interesting issues when you want to build a large system
where different plugins may need to rely on different versions of common
modules. For example, an application server that's running multiple
applications in the same process.
In Java, this issue is resolved via the notion of classloaders. A
classloader effectively partitions the runtime namespace into virtual
interpreters, of a sort. Classes loaded by a given classloader can "see"
its peer classes, plus any that are loaded by parent classloaders. But
each classloader can use a different classpath (similar to sys.path in
concept) to define where classes are loaded from.
It seems to me that one could create an almost identical system to this in
Python. The keys are the __import__ function and the sys module.
The Python import statement is implemented by calling the __import__
function, using sys.path, sys.modules, and in later Python versions,
various other 'sys' module attributes.
So in effect, if you could switch out those variables on the real sys
module whenever __import__ were called, you could basically partition the
Python interpreter into separate "module spaces", roughly equivalent to the
Java classloader hierarchy.
Actually, in order to complete the illusion, the __import__ function would
need to load a fake 'sys' module into each module space, so that code that
manipulates or looks at sys.modules, sys.path, etc. won't see anything funny.
Some of the uses of such a system:
* Tests and debugging: boot up an application in a module space, run
tests on it, debug it, whatever, then throw the whole thing away and start
again... without any import pollution. That is, *proper reloading* of the
Python environment, similar to the "rollback importer" of unittestgui, but
allowing multiple instances.
* Documentation or IDE tools that would like to work on a "live" object
set, but without conflicting with modules used by the tool itself
* Application server dependency management
* Implementing Eclipse plug-ins using Python, while retaining Eclipse's
plugin boundaries. That is, allowing different plugins to use different
versions of a given Python module, if need be.
* Working around globally-configured stdlib modules like logging, urllib,
etc., that cause weirdness if two modules separately configure things
I do see a few complications:
* Each module space needs its own '__import__' function (although it'd
likely be an object that also holds that module space's 'sys' emulation)
* C extensions might get funky if they're shared, specifically if they're
loaded from the same file and they have module-level variables managed as C
static variables. So, extensions may have to be forced into the top-level
import space, and there would need to be a way to configure whether this
happens. (Of course, I may be wrong and this might be completely safe, but
I doubt such safety would be 100% portable...)
* Modules loaded in a parent space may occasionally need to import
something from a child space, which will require the ability to set a
per-thread context loader, similar to what Java allows for the same issue
* C code that does imports is probably always going to see the "root"
module space.
* I'm not sure what modules besides built-ins and 'sys' should appear in
a newly created module space
* There will probably need to be some sort of re-entrant locking to
prevent multiple threads from trying to perform an import in the same
module space at the same time, and it will need to be in addition to the
pre-existing __import__ lock.
These issues aside, it actually seems like a pretty simple
proposition. The API might look something like:
loader = ModuleSpace(previous_loader)
loader.sys.path = whatever
# Execute code in the module space
exec "import foo" in loader.ns
# Or import something from it
foo = loader.import_('foo')
# or run a file as '__main__' in it
loader.run_main('file.py')
'previous_loader' might default to 'None', meaning that the parent is the
normal Python module system.
So, a unit test runner could do something like:
loader = ModuleSpace()
test_suite = loader.import(test_suite_name)()
in order to load the test suite in a new module space and create the test
suite. The tests can then be run entirely in that module space. After the
tests are run, the module space is thrown away, cleaning things up
completely. If this is being run interactively (e.g. in a GUI), then each
run is guaranteed to see an up-to-date version of the tests and the code
being tested.
What's more, imagine this:
loader = ModuleSpace()
loader.sys.modules['socket'] = dummy_socketmodule
test_suite = loader.import(test_suite_name)()
That is, imagine testing code with dummy versions of any global Python
facility, while simultaneously *not* disturbing the *real* version.
Ah well, this is mostly to document the idea for future reference, should I
decide I really actually need this for something. In practice, I don't see
needing it any time soon unless I start actively developing a mechanism to
use Python to create Eclipse plugins, or if I start developing some sort of
application server or IDE that needs this sort of partitioning.
Hm. I wonder if the Zope or Chandler folks have thought of this? ISTM
that both environments would want to allow independently-developed products
or plugins to safely depend on different versions of the same dependency.
Guess I should cc: this at least to Zope3-dev and see if anybody's
interested in fleshing this out with me. It might make for an amusing
afternoon diversion if nothing else. :)
More information about the PEAK
mailing list