[PEAK] Persistence styles, MDA, AOP, PyProtocols, and PEAK
Phillip J. Eby
pje at telecommunity.com
Fri Jul 9 11:08:56 EDT 2004
At 04:50 PM 7/8/04 -0700, John Landahl wrote:
> > The idea of the domain model is that you put an application's core
> > behaviors into objects that reflect the application domain.
> > Unfortunately, this idea scales rather poorly when dealing with large
> > numbers of objects. Object-oriented languages are oriented towards
> > dealing with individual objects more so than collections of them.
> > Loops over large collections are inefficient when compared to bulk
> > operations on an RDBMS, for example. This means that building practical
> > applications requires these bulk operations to be factored out, somehow.
>
>Indeed, you've really hit the nail on the head here. The domain model
>emphasis on converting data to objects before performing any operations on
>them is *terrible* for working on large data sets, and completely negates the
>performance advantages of RDBMSs. Without any standard way to get around
>this limitation, individual developers will come up with ad hoc solutions of
>varying efficiency and effectiveness. Usually this means going directly to
>SQL, but then the solution can only work with a SQL-based storage medium.
>I'll be quite interested to see how your "fact base" ideas play out.
The really sad thing is to realize that this "hard problem" I've been
struggling with, is in fact easy once you move it into generic function
terms. I mean, as soon as you think of it as, "here, under these
circumstances do it like this, and under these circumstances do it like
that", it's like... where's the problem? There is no problem.
Indeed, I almost don't care about the labor-saving features of the O-R
mapping anymore, or having a super-duper "any backend" framework, because
for any given application, who cares? Chances are good you'll have only
one mapping plus whatever you use for testing. The main point is that
anything that's specific to the mapping should be properly
compartmentalized or parameterized, and with generic functions you can
easily do that.
> > Going Beyond AOP (with 40-Year Old Technology!)
>
>Lisp is >40 years old, but isn't CLOS itself only about 20 years old?
Okay, so I exaggerated for dramatic effect. :)
> > Doesn't this look a *lot* easier to you than writing DM classes? It sure
> > does to me.
>
>Yes, *much* nicer. I'm wondering where the metadata mapping hooks would go,
>though. Perhaps more generic functions that would be called here?
More or less. My rough idea is that one mapping layer is from object-ness
to fact-ness (and vice versa). That layer is pretty boilerplate, and in
fact could probably work almost entirely off of existing peak.model
metadata right now. The second mapping layer is from the logical fact
schema to a physical one, and basically consists of relational algebra
operations. Finally, there's the relational algebra to SQL conversion.
If you look at what's in peak.query today, you'll see a relational algebra
implementation, where you can define a "relvar" as sort of a view over
other relvars (e.g. tables), and you can then translate that into physical
SQL. So, the basics of what's needed are there in prototype form, minus
what I considered the "hard bits". The hard bits primarily consisted of
issues like mapping functions to a specific SQL dialect, dealing with
cross-database joins, and various other "easy with multiple dispatch, hard
to do without it" problems.
So at some point I'll need to go in and rip up what I prototyped there, and
replace it with a more scalable arrangement based on generic
functions. (The prototype has *no* support for different SQL dialects yet,
although it was starting to grow a framework to support it.)
Anyway, the "algebra to SQL" layer will be pretty much "write once" per SQL
dialect, and of course if you have custom functions to map, then you'll
write little implementations for those. And, the objects-to-facts layer
will likely be metadata driven, and thus another "write once"
situation. That leaves the logical-to-physical schema mapping, which will
essentially consist of view definitions.
IOW, the net result will be that everything should be largely metadata
driven, without writing any "code" per se, even though some of your
definitions might be in Python, or strings of Python-like
specification. However, you'll be able to *override* anything to deal with
special cases, using generic functions. (Including adding special "after
doing that, do this" sorts of rules, most likely.)
> > The parser will reduce
> > normal Python expressions to fast-executing objects, so there's no need to
> > eval() expression strings at runtime.
>
>At what point does this reduction occur? At import time?
When the function that the 'when()' decorates is defined. So the sequence
will be, "when()" gets called, function is defined, and finally the string
gets reduced to a list of "Signature" objects that define the various
circumstances under which that function will be used. Then, the next
statement following the function definition will execute.
So, if a function is defined at the top level of a module, or a class
that's in the top-level, then yes, this will happen at import time. If the
function is defined inside another function, (e.g. as a closure), then it
will happen when the enclosing function is called.
In case you're wondering why you'd want to use closures to create methods
for generic functions, it's because that's an easy way to create lots of
similar methods, but with certain parts already "partially evaluated" for
speed, or to create an API that doesn't look like generic functions. For
example, I might do something like this:
def classification(condition, name):
from stereotypes import classify
[when(condition)]
def classify(age):
return name
classification("age>=30", "thirtysomething")
classification("age<20", "teenager")
...
As a shortcut way of doing something rule-like without having to repeat all
the 'when's and 'def classify's and 'return's.
Anyway, in this example, each expression gets parsed just before the
'classification' routine returns.
>[snip praise of the algorithms used by the prototype]
The specific features you praised are due to the Chambers and Chen paper,
so I can't take credit for them.
>This is very impressive. Perhaps this belongs in Python core, but then there
>would no doubt be a battle over how multiple dispatch just isn't
>"Pythonic"...
I'm not sure. I've heard PyPy uses generic functions of some sort, but I
don't know what kind of rules they use. But in any case, I think I'll let
somebody else fight that battle. :)
>I've been looking into Hibernate lately, as well, and also Modeling. Will
>the
>PEAK "editing context" concept be anything like that in Modeling? Will there
>be support for nested editing contexts as in Modeling?
That's a good question. But it's hard to say for sure. Nested editing
contexts are relatively easy in "persistent root" and "document" models,
but are hard when you're doing a "fact base" over an RDBMS, unless you can
use nested transactions (or savepoints) in the DB. But, I suppose if the
needed mechanisms are generic functions, it might well be possible to
parameterize that so it can work.
What are your use cases for nested contexts?
>Will the
>"storage.beginTransaction(X)" approach be a thing of the past?
I don't know. Currently, the component hierarchy is an easy way to pass
the transaction around. But I suppose if there's an editing context, it
might naturally carry a transaction; after all, you'll need to have it there.
I do not, however, expect any significant changes to how the transaction
service itself works.
> > * Making a separate top-level package for generics (e.g. 'from generics
> > import when') instead of a subpackage of 'protocols'
>
>This seems like a good approach, since multiple dispatch functionality is
>generally useful on its own, regardless of whether other PyProtocols features
>are being used. You might also get a lot more adoption (and
>testers/feedback) if it was available separately to the Python community.
Well, I'll take that as one vote for the approach. PyProtocols will still
be required, though, as I'm currently using adaptation in the dispatch
system's current API, and expect to have uses for it in the expression builder.
>peak.model: You didn't get into specifics here, but mention the
>possibility of
>destabilizing changes. Since changes to the model layer of an application
>can have far-reaching affects throughout the application's code, I'd like to
>hear a lot more about what you have in mind here. Do you have specific
>changes in mind? How safe is it to continue using peak.model as it is today?
Primarily, I'm thinking about changes to the persistence mechanism,
internal attribute representation machinery, event/validation machinery,
and maybe the "method exporter" subsystem of peak.model. I'm not really
looking at changing the surface APIs, which I'd like to disturb as little
as possible.
So, if you're just defining metadata or methods you might not even notice
that anything's changing. But if you're doing certain kinds of validation
hooks or digging deep into the feature framework, there might be some changes.
Here are the specifics I see:
* The use of ZODB4 persistence will go away, probably replaced by making
objects' dictionaries persistent instead, or by using generic function
hooks (e.g. "before" methods) to load an object. Some of the semi-internal
interfaces that exist now (like binding.IBindableAttrs and
model.IFeatureSPI) may have to change a bit because of this. Also,
anything that begins with '_p_' (e.g. _p_oid, _p_jar) *will* be going away.
* Class-to-class linkages may happen internally as a generic function that
keys off the object's owning context. (This is to allow "class
replacement", one of the two remaining AOP use cases.) Admittedly, this
change may be tricky to do without changing some current code that uses
features directly from their class, without reference to the instance. It
may simply happen that in order to be "future compatible", one will use a
set of peak.model generic functions to access metadata, rather than using
the metadata off of feature objects directly. However, existing
applications would still work as long as they didn't use the new features.
* Call generic functions for get/set/del/add/remove events. This will
allow the definition of "before", "after", and "around" methods to do
validation, lazy loading, triggers, GUI view updates, etc. It may also be
that associations/bidirectional links also get reimplemented to use generic
functions. But, I don't expect any of this to change the API for code that
*uses* model objects, and maybe not even the code that defines them.
So, while I can't promise I haven't left anything out, or that something
else might not come up while I'm working on it, I do want to try to limit
the scope of things. In particular, note that PEAK itself has quite a lot
of modules that use the existing model API, so I'm not going to change
anything gratuitously.
>peak.storage: Because of the separation of model and storage layers, I'm less
>concerned about the impact of peak.storage changes. The gains in flexibility
>will more than make up for any short-term inconvenience. How far along are
>your thoughts on the new peak.storage API?
Not very far, other than the things I've been saying for some time now:
1) There shall be an editing-context-like thing (I'm thinking of
calling it a "workspace")
2) It shall support execution of various kinds of queries, both simple
find-by-key operations, and queries of (almost) arbitrary complexity
3) It shall support metadata-driven mapping to tabular data sources
such as SQL
4) There will be no DM's. Not as a visible part of the API as they are
today, and not as part of the back end.
and then there are the main things that I've just added:
5) The services provided by the editing context-like thing will be
implemented as generic functions
6) To the extent that it's both meaningful and Pythonic to do so, the
API will resemble that of Hibernate, except that the query language will
likely be Python rather than anything resembling SQL. However, our API
will likely be smaller because e.g. no parameter binding API is needed.
>A customer is quite interested in using PEAK for general application
>development, with a specific interest in flexible object persistence. On the
>RDBMS side, they would like some of the higher-level ORM features of Modeling
>and Hibernate. The ideas you have in mind for queries look quite good. Do
>you have any specifics in mind yet for how the physical-to-logical schema
>mapping features will work?
I think I've laid out some of this already, earlier in this post. But to
try to answer what I think your question might be, let's say that you
wanted to use a Hibernate XML file to define the mapping. Well, that would
basically just entail writing something to read the mapping and define
generic function cases for the particular operations needed for that sort
of mapping. IOW, you'd do something like:
def define_hibernate_feature_X(klass, attrName, ...):
from peak.storage.api import save_thingy, do_something, load_whatsis
[when("isinstance(ob,klass) and ...")]
def load_whatsis(ob, ...):
# code to load a whatsis for the given class/feature
[when("isinstance(ob,klass) and ...")]
def save_thingy(ob, ...):
# code to save a thingy for the given class/feature
for each kind of hibernate mapping (collection, reference, inheritance,
etc.). Then, you'd call the outer function for each occurence of that kind
of tag in the Hibernate XML file. Does that make sense?
It's possible we'll also make use of "method combining". That "Practical
Lisp" chapter discusses it briefly, but I haven't addressed it in my posts
much. The idea is that for some generic functions, you don't want to just
pick the "most specific" method, but rather you want some combination of
the return values from all the cases that apply. For example, the total of
the return values, a list of the return values, a boolean "and" of the
return values, etc.
In the case of an SQL mapping, we might effectively define methods that
represent sets of fields applicable to a given class, and define the method
combining rule such that it collects the fields for a class and all its
base classes.
The dispatch system I've written allows this, in a manner as flexible as
CLOS, although I don't have any of the convenience features actually
written. You just pass a "method combining" function in when you initially
create the generic function, and I have two sample method combiners
currently written: "most-specific-only", and "most-specific-to-least with
call-next-method support". The production version of the dispatch system
will probably offer a superset of the CLOS options, or at any rate a
"subset plus extensions", since a few of the CLOS combining rules don't
have a direct translation in Python.
>peak.metamodels: With peak.metamodels I'm primarily interested on XMI-based
>code generation capabilities. Will this feature be retained? It's one that
>might help distinguish PEAK (and Python generally) as a platform for serious
>enterprise application development.
Sadly, I don't have the bandwidth right now to pursue any improvements to
it without funding. I'd like to put this in a separate distribution of
some kind, because I don't want to keep lots of unfinished experimental
things in mainline PEAK. So, IMO, it's got to "grow or go". My thought is
to move it into its own distribution, where it can sit until I have
time/interest/funding to move it forward again.
On the bright side, generic functions make code generation and almost any
other type of AST/model/tree processing easier, so you may find that you
can do what you need with what's already there, once generic functions have
landed. Heck, if you can contribute some code-generation-from-UML stuff to
complement the existing generation-from-MOF, maybe I could simply consider
peak.metamodels "finished", and therefore keep it in the primary distro. :)
>PyProtocols: You mention there will be changes to PyProtocols for 1.0 -- no
>doubt you'll be using multiple dispatch quite a bit behind the scenes, but
>will there be much change at the API level?
I would like to get some input on revising the PyProtocols API for
1.0. After having used it extensively for some time, I have to say I find
the current spellings to be tedious at best, and error-prone for features I
use infrequently. The feedback I've had from the few people who've
commented is that they'd rather see the Zope-like:
implements(IFoo, IBar)
in place of:
advise(instancesProvide=[IFoo,IBar])
One reason this is coming up is because I want to have protocol function
decorators in 1.0, some way of saying:
[advise(thisFunctionReturns=[IFoo], asAdapterForTypes=[int])]
def intAsFoo(ob):
...
And also some way of saying:
[advise(thisFunctionProvides=[IFooFunction])]
def foo_by_wurbling(...):
...
That is, I'd like to be able to declare that 1) a function is an adapter
factory, and/or 2) that it conforms to some interface describing its call
signature.
However, all of the spellings above are not only hideous, but will also
bloat the number of 'advise' keyword arguments beyond all sense or
sensibility. It would probably be more Pythonic at this point to break
'advise' down into separate functions for specific use cases. (Keeping
'advise()' active through probably at least 1.1 for backward compatibility.)
I'm open to suggestions on what these new functions should look like.
More information about the PEAK
mailing list