[PEAK] Persistence styles, MDA, AOP, PyProtocols, and PEAK

Fri Jul 9 11:08:56 EDT 2004

At 04:50 PM 7/8/04 -0700, John Landahl wrote:

> > The idea of the domain model is that you put an application's core
> > behaviors into objects that reflect the application domain.
> > Unfortunately, this idea scales rather poorly when dealing with large
> > numbers of objects.  Object-oriented languages are oriented towards
> > dealing  with individual objects more so than collections of them.
> > Loops over large collections are inefficient when compared to bulk
> > operations on an RDBMS, for example.  This means that building practical
> > applications requires these bulk operations to be factored out, somehow.
>
>Indeed, you've really hit the nail on the head here.  The domain model
>emphasis on converting data to objects before performing any operations on
>them is *terrible* for working on large data sets, and completely negates the
>performance advantages of RDBMSs.  Without any standard way to get around
>this limitation, individual developers will come up with ad hoc solutions of
>varying efficiency and effectiveness.  Usually this means going directly to
>SQL, but then the solution can only work with a SQL-based storage medium.
>I'll be quite interested to see how your "fact base" ideas play out.

The really sad thing is to realize that this "hard problem" I've been 
struggling with, is in fact easy once you move it into generic function 
terms.  I mean, as soon as you think of it as, "here, under these 
circumstances do it like this, and under these circumstances do it like 
that", it's like...  where's the problem?  There is no problem.

Indeed, I almost don't care about the labor-saving features of the O-R 
mapping anymore, or having a super-duper "any backend" framework, because 
for any given application, who cares?  Chances are good you'll have only 
one mapping plus whatever you use for testing.  The main point is that 
anything that's specific to the mapping should be properly 
compartmentalized or parameterized, and with generic functions you can 
easily do that.

> > Going Beyond AOP (with 40-Year Old Technology!)
>
>Lisp is >40 years old, but isn't CLOS itself only about 20 years old?

Okay, so I exaggerated for dramatic effect.  :)

> > Doesn't this look a *lot* easier to you than writing DM classes?  It sure
> > does to me.
>
>Yes, *much* nicer.  I'm wondering where the metadata mapping hooks would go,
>though.  Perhaps more generic functions that would be called here?

More or less.  My rough idea is that one mapping layer is from object-ness 
to fact-ness (and vice versa).  That layer is pretty boilerplate, and in 
fact could probably work almost entirely off of existing peak.model 
metadata right now.  The second mapping layer is from the logical fact 
schema to a physical one, and basically consists of relational algebra 
operations.  Finally, there's the relational algebra to SQL conversion.

If you look at what's in peak.query today, you'll see a relational algebra 
implementation, where you can define a "relvar" as sort of a view over 
other relvars (e.g. tables), and you can then translate that into physical 
SQL.  So, the basics of what's needed are there in prototype form, minus 
what I considered the "hard bits".  The hard bits primarily consisted of 
issues like mapping functions to a specific SQL dialect, dealing with 
cross-database joins, and various other "easy with multiple dispatch, hard 
to do without it" problems.

So at some point I'll need to go in and rip up what I prototyped there, and 
replace it with a more scalable arrangement based on generic 
functions.  (The prototype has *no* support for different SQL dialects yet, 
although it was starting to grow a framework to support it.)

Anyway, the "algebra to SQL" layer will be pretty much "write once" per SQL 
dialect, and of course if you have custom functions to map, then you'll 
write little implementations for those.  And, the objects-to-facts layer 
will likely be metadata driven, and thus another "write once" 
situation.  That leaves the logical-to-physical schema mapping, which will 
essentially consist of view definitions.

IOW, the net result will be that everything should be largely metadata 
driven, without writing any "code" per se, even though some of your 
definitions might be in Python, or strings of Python-like 
specification.  However, you'll be able to *override* anything to deal with 
special cases, using generic functions.  (Including adding special "after 
doing that, do this" sorts of rules, most likely.)

> > The parser will reduce
> > normal Python expressions to fast-executing objects, so there's no need to
> > eval() expression strings at runtime.
>
>At what point does this reduction occur?  At import time?

When the function that the 'when()' decorates is defined.  So the sequence 
will be, "when()" gets called, function is defined, and finally the string 
gets reduced to a list of "Signature" objects that define the various 
circumstances under which that function will be used.  Then, the next 
statement following the function definition will execute.

So, if a function is defined at the top level of a module, or a class 
that's in the top-level, then yes, this will happen at import time.  If the 
function is defined inside another function, (e.g. as a closure), then it 
will happen when the enclosing function is called.

In case you're wondering why you'd want to use closures to create methods 
for generic functions, it's because that's an easy way to create lots of 
similar methods, but with certain parts already "partially evaluated" for 
speed, or to create an API that doesn't look like generic functions.  For 
example, I might do something like this:

     def classification(condition, name):
         from stereotypes import classify
         [when(condition)]
         def classify(age):
             return name

     classification("age>=30", "thirtysomething")
     classification("age<20",  "teenager")
     ...

As a shortcut way of doing something rule-like without having to repeat all 
the 'when's and 'def classify's and 'return's.

Anyway, in this example, each expression gets parsed just before the 
'classification' routine returns.

>[snip praise of the algorithms used by the prototype]

The specific features you praised are due to the Chambers and Chen paper, 
so I can't take credit for them.

>This is very impressive.  Perhaps this belongs in Python core, but then there
>would no doubt be a battle over how multiple dispatch just isn't
>"Pythonic"...

I'm not sure.  I've heard PyPy uses generic functions of some sort, but I 
don't know what kind of rules they use.  But in any case, I think I'll let 
somebody else fight that battle.  :)

>I've been looking into Hibernate lately, as well, and also Modeling.  Will 
>the
>PEAK "editing context" concept be anything like that in Modeling?  Will there
>be support for nested editing contexts as in Modeling?

That's a good question.  But it's hard to say for sure.  Nested editing 
contexts are relatively easy in "persistent root" and "document" models, 
but are hard when you're doing a "fact base" over an RDBMS, unless you can 
use nested transactions (or savepoints) in the DB.  But, I suppose if the 
needed mechanisms are generic functions, it might well be possible to 
parameterize that so it can work.

What are your use cases for nested contexts?

>Will the
>"storage.beginTransaction(X)" approach be a thing of the past?

I don't know.  Currently, the component hierarchy is an easy way to pass 
the transaction around.  But I suppose if there's an editing context, it 
might naturally carry a transaction; after all, you'll need to have it there.

I do not, however, expect any significant changes to how the transaction 
service itself works.

> > * Making a separate top-level package for generics (e.g. 'from generics
> > import when') instead of a subpackage of 'protocols'
>
>This seems like a good approach, since multiple dispatch functionality is
>generally useful on its own, regardless of whether other PyProtocols features
>are being used.  You might also get a lot more adoption (and
>testers/feedback) if it was available separately to the Python community.

Well, I'll take that as one vote for the approach.  PyProtocols will still 
be required, though, as I'm currently using adaptation in the dispatch 
system's current API, and expect to have uses for it in the expression builder.

>peak.model: You didn't get into specifics here, but mention the 
>possibility of
>destabilizing changes.  Since changes to the model layer of an application
>can have far-reaching affects throughout the application's code, I'd like to
>hear a lot more about what you have in mind here.  Do you have specific
>changes in mind?  How safe is it to continue using peak.model as it is today?

Primarily, I'm thinking about changes to the persistence mechanism, 
internal attribute representation machinery, event/validation machinery, 
and maybe the "method exporter" subsystem of peak.model.  I'm not really 
looking at changing the surface APIs, which I'd like to disturb as little 
as possible.

So, if you're just defining metadata or methods you might not even notice 
that anything's changing.  But if you're doing certain kinds of validation 
hooks or digging deep into the feature framework, there might be some changes.

Here are the specifics I see:

* The use of ZODB4 persistence will go away, probably replaced by making 
objects' dictionaries persistent instead, or by using generic function 
hooks (e.g. "before" methods) to load an object.  Some of the semi-internal 
interfaces that exist now (like binding.IBindableAttrs and 
model.IFeatureSPI) may have to change a bit because of this.  Also, 
anything that begins with '_p_' (e.g. _p_oid, _p_jar) *will* be going away.

* Class-to-class linkages may happen internally as a generic function that 
keys off the object's owning context.  (This is to allow "class 
replacement", one of the two remaining AOP use cases.)  Admittedly, this 
change may be tricky to do without changing some current code that uses 
features directly from their class, without reference to the instance.  It 
may simply happen that in order to be "future compatible", one will use a 
set of peak.model generic functions to access metadata, rather than using 
the metadata off of feature objects directly.  However, existing 
applications would still work as long as they didn't use the new features.

* Call generic functions for get/set/del/add/remove events.  This will 
allow the definition of "before", "after", and "around" methods to do 
validation, lazy loading, triggers, GUI view updates, etc.  It may also be 
that associations/bidirectional links also get reimplemented to use generic 
functions.  But, I don't expect any of this to change the API for code that 
*uses* model objects, and maybe not even the code that defines them.

So, while I can't promise I haven't left anything out, or that something 
else might not come up while I'm working on it, I do want to try to limit 
the scope of things.  In particular, note that PEAK itself has quite a lot 
of modules that use the existing model API, so I'm not going to change 
anything gratuitously.

>peak.storage: Because of the separation of model and storage layers, I'm less
>concerned about the impact of peak.storage changes.  The gains in flexibility
>will more than make up for any short-term inconvenience.  How far along are
>your thoughts on the new peak.storage API?

Not very far, other than the things I've been saying for some time now:

    1) There shall be an editing-context-like thing (I'm thinking of 
calling it a "workspace")

    2) It shall support execution of various kinds of queries, both simple 
find-by-key operations, and queries of (almost) arbitrary complexity

    3) It shall support metadata-driven mapping to tabular data sources 
such as SQL

    4) There will be no DM's.  Not as a visible part of the API as they are 
today, and not as part of the back end.

and then there are the main things that I've just added:

    5) The services provided by the editing context-like thing will be 
implemented as generic functions

    6) To the extent that it's both meaningful and Pythonic to do so, the 
API will resemble that of Hibernate, except that the query language will 
likely be Python rather than anything resembling SQL.  However, our API 
will likely be smaller because e.g. no parameter binding API is needed.

>A customer is quite interested in using PEAK for general application
>development, with a specific interest in flexible object persistence.  On the
>RDBMS side, they would like some of the higher-level ORM features of Modeling
>and Hibernate.  The ideas you have in mind for queries look quite good.  Do
>you have any specifics in mind yet for how the physical-to-logical schema
>mapping features will work?

I think I've laid out some of this already, earlier in this post.  But to 
try to answer what I think your question might be, let's say that you 
wanted to use a Hibernate XML file to define the mapping.  Well, that would 
basically just entail writing something to read the mapping and define 
generic function cases for the particular operations needed for that sort 
of mapping.  IOW, you'd do something like:

     def define_hibernate_feature_X(klass, attrName, ...):

         from peak.storage.api import save_thingy, do_something, load_whatsis

         [when("isinstance(ob,klass) and ...")]
         def load_whatsis(ob, ...):
             # code to load a whatsis for the given class/feature

         [when("isinstance(ob,klass) and ...")]
         def save_thingy(ob, ...):
             # code to save a thingy for the given class/feature

for each kind of hibernate mapping (collection, reference, inheritance, 
etc.).  Then, you'd call the outer function for each occurence of that kind 
of tag in the Hibernate XML file.  Does that make sense?

It's possible we'll also make use of "method combining".  That "Practical 
Lisp" chapter discusses it briefly, but I haven't addressed it in my posts 
much.  The idea is that for some generic functions, you don't want to just 
pick the "most specific" method, but rather you want some combination of 
the return values from all the cases that apply.  For example, the total of 
the return values, a list of the return values, a boolean "and" of the 
return values, etc.

In the case of an SQL mapping, we might effectively define methods that 
represent sets of fields applicable to a given class, and define the method 
combining rule such that it collects the fields for a class and all its 
base classes.

The dispatch system I've written allows this, in a manner as flexible as 
CLOS, although I don't have any of the convenience features actually 
written.  You just pass a "method combining" function in when you initially 
create the generic function, and I have two sample method combiners 
currently written: "most-specific-only", and "most-specific-to-least with 
call-next-method support".  The production version of the dispatch system 
will probably offer a superset of the CLOS options, or at any rate a 
"subset plus extensions", since a few of the CLOS combining rules don't 
have a direct translation in Python.

>peak.metamodels: With peak.metamodels I'm primarily interested on XMI-based
>code generation capabilities.  Will this feature be retained?  It's one that
>might help distinguish PEAK (and Python generally) as a platform for serious
>enterprise application development.

Sadly, I don't have the bandwidth right now to pursue any improvements to 
it without funding.  I'd like to put this in a separate distribution of 
some kind, because I don't want to keep lots of unfinished experimental 
things in mainline PEAK.  So, IMO, it's got to "grow or go".  My thought is 
to move it into its own distribution, where it can sit until I have 
time/interest/funding to move it forward again.

On the bright side, generic functions make code generation and almost any 
other type of AST/model/tree processing easier, so you may find that you 
can do what you need with what's already there, once generic functions have 
landed.  Heck, if you can contribute some code-generation-from-UML stuff to 
complement the existing generation-from-MOF, maybe I could simply consider 
peak.metamodels "finished", and therefore keep it in the primary distro.  :)

>PyProtocols: You mention there will be changes to PyProtocols for 1.0 -- no
>doubt you'll be using multiple dispatch quite a bit behind the scenes, but
>will there be much change at the API level?

I would like to get some input on revising the PyProtocols API for 
1.0.  After having used it extensively for some time, I have to say I find 
the current spellings to be tedious at best, and error-prone for features I 
use infrequently.  The feedback I've had from the few people who've 
commented is that they'd rather see the Zope-like:

     implements(IFoo, IBar)

in place of:

     advise(instancesProvide=[IFoo,IBar])

One reason this is coming up is because I want to have protocol function 
decorators in 1.0, some way of saying:

    [advise(thisFunctionReturns=[IFoo], asAdapterForTypes=[int])]
    def intAsFoo(ob):
        ...

And also some way of saying:

    [advise(thisFunctionProvides=[IFooFunction])]
    def foo_by_wurbling(...):
        ...

That is, I'd like to be able to declare that 1) a function is an adapter 
factory, and/or 2) that it conforms to some interface describing its call 
signature.

However, all of the spellings above are not only hideous, but will also 
bloat the number of 'advise' keyword arguments beyond all sense or 
sensibility.  It would probably be more Pythonic at this point to break 
'advise' down into separate functions for specific use cases.  (Keeping 
'advise()' active through probably at least 1.1 for backward compatibility.)

I'm open to suggestions on what these new functions should look like.