[PEAK] Virtual Pythons

Wed Jul 28 00:32:31 EDT 2004

One of the mildly annoying things about trying to develop any sort of 
really sophisticated extensible system in Python is that modules are 
global.  That is, you can only have one instance of a given module or package.

This leads to interesting issues when you want to build a large system 
where different plugins may need to rely on different versions of common 
modules.  For example, an application server that's running multiple 
applications in the same process.

In Java, this issue is resolved via the notion of classloaders.  A 
classloader effectively partitions the runtime namespace into virtual 
interpreters, of a sort.  Classes loaded by a given classloader can "see" 
its peer classes, plus any that are loaded by parent classloaders.  But 
each classloader can use a different classpath (similar to sys.path in 
concept) to define where classes are loaded from.

It seems to me that one could create an almost identical system to this in 
Python.  The keys are the __import__ function and the sys module.

The Python import statement is implemented by calling the __import__ 
function, using sys.path, sys.modules, and in later Python versions, 
various other 'sys' module attributes.

So in effect, if you could switch out those variables on the real sys 
module whenever __import__ were called, you could basically partition the 
Python interpreter into separate "module spaces", roughly equivalent to the 
Java classloader hierarchy.

Actually, in order to complete the illusion, the __import__ function would 
need to load a fake 'sys' module into each module space, so that code that 
manipulates or looks at sys.modules, sys.path, etc. won't see anything funny.

Some of the uses of such a system:

  * Tests and debugging: boot up an application in a module space, run 
tests on it, debug it, whatever, then throw the whole thing away and start 
again...  without any import pollution.  That is, *proper reloading* of the 
Python environment, similar to the "rollback importer" of unittestgui, but 
allowing multiple instances.

  * Documentation or IDE tools that would like to work on a "live" object 
set, but without conflicting with modules used by the tool itself

  * Application server dependency management

  * Implementing Eclipse plug-ins using Python, while retaining Eclipse's 
plugin boundaries.  That is, allowing different plugins to use different 
versions of a given Python module, if need be.

  * Working around globally-configured stdlib modules like logging, urllib, 
etc., that cause weirdness if two modules separately configure things

I do see a few complications:

  * Each module space needs its own '__import__' function (although it'd 
likely be an object that also holds that module space's 'sys' emulation)

  * C extensions might get funky if they're shared, specifically if they're 
loaded from the same file and they have module-level variables managed as C 
static variables.  So, extensions may have to be forced into the top-level 
import space, and there would need to be a way to configure whether this 
happens.  (Of course, I may be wrong and this might be completely safe, but 
I doubt such safety would be 100% portable...)

  * Modules loaded in a parent space may occasionally need to import 
something from a child space, which will require the ability to set a 
per-thread context loader, similar to what Java allows for the same issue

  * C code that does imports is probably always going to see the "root" 
module space.

  * I'm not sure what modules besides built-ins and 'sys' should appear in 
a newly created module space

  * There will probably need to be some sort of re-entrant locking to 
prevent multiple threads from trying to perform an import in the same 
module space at the same time, and it will need to be in addition to the 
pre-existing __import__ lock.

These issues aside, it actually seems like a pretty simple 
proposition.  The API might look something like:

   loader = ModuleSpace(previous_loader)
   loader.sys.path = whatever

   # Execute code in the module space
   exec "import foo" in loader.ns

   # Or import something from it
   foo = loader.import_('foo')

   # or run a file as '__main__' in it
   loader.run_main('file.py')

'previous_loader' might default to 'None', meaning that the parent is the 
normal Python module system.

So, a unit test runner could do something like:

    loader = ModuleSpace()
    test_suite = loader.import(test_suite_name)()

in order to load the test suite in a new module space and create the test 
suite.  The tests can then be run entirely in that module space.  After the 
tests are run, the module space is thrown away, cleaning things up 
completely.  If this is being run interactively (e.g. in a GUI), then each 
run is guaranteed to see an up-to-date version of the tests and the code 
being tested.

What's more, imagine this:

    loader = ModuleSpace()
    loader.sys.modules['socket'] = dummy_socketmodule
    test_suite = loader.import(test_suite_name)()

That is, imagine testing code with dummy versions of any global Python 
facility, while simultaneously *not* disturbing the *real* version.

Ah well, this is mostly to document the idea for future reference, should I 
decide I really actually need this for something.  In practice, I don't see 
needing it any time soon unless I start actively developing a mechanism to 
use Python to create Eclipse plugins, or if I start developing some sort of 
application server or IDE that needs this sort of partitioning.

Hm.  I wonder if the Zope or Chandler folks have thought of this?  ISTM 
that both environments would want to allow independently-developed products 
or plugins to safely depend on different versions of the same dependency.

Guess I should cc: this at least to Zope3-dev and see if anybody's 
interested in fleshing this out with me.  It might make for an amusing 
afternoon diversion if nothing else.  :)