[PEAK] Persistence styles, MDA, AOP, PyProtocols, and PEAK

Wed Jul 7 20:42:49 EDT 2004

At 02:06 PM 7/7/04 -0700, Robert Brewer wrote:
>(in the cases where, for example, I've needed
>to hand-tune SQL, I've just bypassed the O-R mapper at that point)

The shame!  The shame!  ;)

Seriously, though, at that point the framework has failed, and the 
application now has a very leaky abstraction.  And the next guy who works 
on that code will widen the leak a little more...  and more...  until the 
application is unmaintainable.

And I've found that this happens even when the "next guy" is the same guy 
who wrote the app and the framework.  And yes, I'm talking about myself 
here...  :)

> > """[(invoice, invoice.customer) for invoice in Invoices
> >      if invoice.status == coerce(stat) and duedate > coerce(due)]"""
> >
> > IOW, the evaluation is within the current variable namespace,
> > so local and
> > global variables are both available for use in the
> > expression.  No string
> > interpolation or separate "bind variables" required.
>
>I take it you do that at declaration time?

I'll probably keep a cache mapping query strings to parse trees so the 
statement doesn't get reparsed each time, but there's really no 
"declaration" time.

>That is, if you're going to
>pass the "when" string around, you need to wrap up locals/globals in a
>closure somehow?

No, if we call something like 'find(some_querystring)', it will obtain the 
callers' locals and globals using sys._getframe() at the point of invocation.

>I've ended up supporting both early and late binding
>with separate mechanisms. Early is done with bytecode hacking; late is
>done via 1) keyword arguments to the expression, or 2) standardized
>functions like today(), which returns a datetime.date. App developers
>can supply their own functions.

This won't be using bytecode hacks.  I'm looking forward to future 
possibilities such as PyPy and IronPython being able to run this stuff 
faster than today's CPython.

> > Anyway, these queries will actually be easier to use than
> > SQL.  I don't
> > like using strings to represent code, but the alternative
> > approach of using
> > expression proxies the way SQLObject and some other
> > frameworks do, doesn't
> > really work all that well for expressing arbitrary queries.
> > There's no way to specify arbitrary joins properly, for example.
>
>What would be the string-based way to specify such? This is where my
>framework really falls down, IMO.

"[(x,y) for x in table_X for y in table_Y]" is a simple cartesian 
product.  In general, you can translate any:

     SELECT c1,c2,...cN
       FROM t1 as a1, t2 as a2, ... tM as aM
      WHERE predicate

to:
     [(a1.c1,a2.c2,...) for a1 in t1 for a2 in t2 ...  if predicate]

Of course, ORDER BY, GROUP BY, HAVING, and a host of other things aren't so 
trivially translated, but I haven't started any serious work on those 
yet.  Most likely, they'll be expressed as functions over listcomps, like:

     sum([a for a in b])

     sort([a for a in b], key=something)

...but again, this is just speculation right now.

> > Even if we have convenience functions to simplify the
> > typing  (e.g. 'register_sql_function(int,Oracle,lambda args:
> > "FLOOR(%s)" % args[0])'), the full extensibility will be
> > there if and when it's needed.
>
>Right. Then you could write:
>
>def register_oracle_function(func, disp):
>     register_sql_function(func, Oracle, disp)
>
>..or even <mischievous grin>:
>
>class OracleAdapter(SQLDialectAdapter):
>     ...
>     def int(self, args):
>         "FLOOR(%s)" % args[0]
>
>for func in [int, ...]:
>     register_sql_function(func, Oracle, getattr(OracleAdapter,
>func.__name__))

Note that that only works in single dispatch if you're willing to extend 
OracleAdapter for your application when you have application-specific 
functions.  Also note that such an approach is not necessarily 
*composable*.  That is, if there are two components that each need a custom 
mapping, you have to figure out how to glue them together.  The multiple 
dispatch scenario, however, is *inherently* composable.  Each component 
defines its function mappings, and that's that.

Also, I think you've missed here that the mapping from the string 
"int(foo)" in a query is going to go by way of '<builtin type int>', and is 
*not* looked up by searching for some function registered under the *name* 
'"int"'.  IOW, queries that perform a 'foo(bar)' operation, that are 
defined in modules with different 'foo' functions should each get the kind 
of foo function they mean in context, rather than being limited to a single 
function named '"foo"'.

As you can see, if you have a requirement like that, the single dispatch 
approach is going to be rather awkward, as you end up bending the 
requirements to try and make it easier to implement (both at the framework 
and end-developer levels).  This was one of the many issues I ran into when 
I first started trying to build out the 'peak.query' framework.

Specifically, I ran into the issue of defining "what is a function?" for 
purposes of SQL mappings.  The obvious solution now is to say, "anything", 
and let the rules sort it out.  Meanwhile, the "simplest thing that could 
possibly work" solution for my immediate goals at Verio was to implement 
the current "thunk" system in PEAK's SQL drivers, which is name-based and 
kludgy but solved the immediate problem.

For the long-term solution, multiple dispatch (on at least the function, db 
type, and schema) is a much better way to go, not requiring any "extra" 
registration mechanisms.

>Mischief aside, it might be a way to ease the transition from single
>dispatch (building such tools into the framework); then, when an app
>developer wants something more complex, the rewriting is already half
>done. Just a thought.

Possibly.  Keep in mind that you could also do something really simple like:

     oracle_func = lambda fname: "dbtype in Oracle and f is "+fname

     [when(oracle_func('int'))]
     def func_to_sql(...):
         ...

     [when(oracle_func('math.sqrt'))]
     def func_to_sql(...):
         ...