[PEAK] A ridiculously simple mapping layer
Phillip J. Eby
pje at telecommunity.com
Mon Jul 26 15:21:13 EDT 2004
"The obscure, we can do right now. The obvious takes a little longer."
It seems I've been overlooking an obvious, dirt-simple, slap-your-forehead,
why-didn't-I-think-of-that way to specify an object mapping layer, the
bridge from abstract model to concrete implementation, because I was too
busy trying to figure out how to create registries of classes to tables or
some such. And all along, it could just be something as simple as this:
# === mymodel.py =============
class Invoice(model.Element):
# etc...
class Customer(model.Element):
# blah blah
# === mystorage.py =============
class PhysicalSchema1(storage.Schema):
Invoice = mapTable('invoices', someField=someConverter, ...)
Customer = mapTable('customers', otherField=thus_and_such,...)
# etc.
class PhysicalSchema2(PhysicalSchema1):
# customer is mapped differently in this version of the schema,
# but everything else is the same
Customer = mapTable('customers', otherField=sturm_und_drang,...)
# === myapp.ini =============
[Storage Mappings]
mymodel = mystorage.PhysicalSchema2 # or 1, or whatever...
# === make_invoice.py =============
from peak.api import *
import mymodel
root = config.makeRoot(iniFiles=[("peak","peak.ini"), "myapp.ini"])
# create a workspace for the model
ws = storage.newWorkspace(root, mymodel)
storage.beginTransaction(ws)
# look up a customer
cust = ws.Customer.get(custno='1234')
# create a new invoice
inv = ws.Invoice(customer=cust)
# etc...
storage.commitTransaction(ws)
Duh. D'oh! So simple, yet 100% modular and reusible/extensible. We
define our domain model as a module, or set of modules. We define a
physical schema as a class. Descriptors on the schema map from the "pure"
domain class to an implementation-specific subclass. That is, in the
example above, 'ws.Customer' and 'ws.Invoice' do not return either the
original class in 'mymodel', but rather a *subclass*, that knows it is part
of that workspace. The subclass is just the original class plus:
1) a mixin (e.g. 'RelationalMixin', 'LDAPMixin', etc.) that defines the
mapping process
2) some metadata to drive the mapping (table name, field names, converters)
3) classmethods such as 'find', 'get', and 'delete'
4) overrides of domain-specific instance or class methods where
necessary for performance or to handle unusual mapping requirements
Further, the new subclass has the workspace object as its parent
component. So, the subclass *itself* is a component, and thus has access
to the workspace, which lets it refer to other implementation-specific
classes. For instance, ws.Invoice can refer to ws.Customer, as long as it
uses a classAttr binding, e.g.
'binding.classAttr(binding.Obtain("Customer"))' to get at it.
Equally important, the implementation-specific classes can bind to the
physical database component(s) they need, via the workspace component's
context.
We're not limited to a single module and its enclosed classes, either. If
we allow schemas to contain schemas, then we can map entire packages within
our workspace, using e.g. 'ws.Billing.Invoice' and 'ws.CRM.Customer' to
access them. If this is too awkward, then in principle we can just make
another module that imports the bits we need from the different packages,
possibly renaming them in the process. Due to the way peak.model already
works, classes know their "real" names and modules, so the mapping system
shouldn't be confused by such "repackaging" for programmer convenience.
It seems it would also be possible within this scheme to "upgrade" a model
to include extensions, without changing the base model. For example, the
mechanism that creates the schema-specific subclass can check a property
namespace to see if the class should be replaced by another, thus allowing
an application to be locally extended. (Thus replacing the "class
replacement" AOP use case.)
Also note that this entire approach works just fine with *multiple
instances* of a given schema. That is, you could have more than one
workspace open with the same schema, but using different databases. Or
different schemas, for that matter, which might be quite useful for
migration. We'll probably also have a few simple schema types that don't
require any metadata to do the mapping. For example, an "in-memory"
mapper, or a "document" mapper that reads/writes pickles or XML. For these
you'd just specify 'mymodel = someGenericMapperClass' in the configuration
and not need to create an explicit mapping definition.
While not all of these mechanisms are interchangeable at the top-level of
code (since one would have to specify the file to load, when to save it,
etc., while others do not), any code that receives an already-opened
workspace object is not affected by these mapping issues. Only the code
that manages the workspace object's lifecycle has to know anything about
the overall persistence model. (Assuming that we have a query language
that can be applied to the classes exposed by a workspace, and it operates
on storage interfaces supplied by the implementation-specific subclasses.)
There are many complications lurking in this approach, of course, since for
example I don't know what those "storage interfaces" will consist of, and
won't be entirely sure until I'm trying to develop the query language. I'm
also worried that the complexity of the namespace mapping will start to
blow up once the scope includes module-level functions, constants, and
variables, plus the features and methods of the actual model classes.
On the bright side, though, these are mostly technical difficulties. The
part that's really been bothering me the most in trying to design the "next
generation" model and storage systems has been getting a clean API -- a
simple way to "spell" what we want to do. And now that we have it, my joy
at having a solution is rivalled only by my chagrin at not having thought
of it much, much sooner. Maybe it's just taken me this long to regrow
enough of the brain cells that were destroyed by an overexposure to
corporate groupthink. :)
More information about the PEAK
mailing list