[PEAK] Persistence styles, MDA, AOP, PyProtocols, and PEAK
Phillip J. Eby
pje at telecommunity.com
Wed Jul 7 20:42:49 EDT 2004
At 02:06 PM 7/7/04 -0700, Robert Brewer wrote:
>(in the cases where, for example, I've needed
>to hand-tune SQL, I've just bypassed the O-R mapper at that point)
The shame! The shame! ;)
Seriously, though, at that point the framework has failed, and the
application now has a very leaky abstraction. And the next guy who works
on that code will widen the leak a little more... and more... until the
application is unmaintainable.
And I've found that this happens even when the "next guy" is the same guy
who wrote the app and the framework. And yes, I'm talking about myself
here... :)
> > """[(invoice, invoice.customer) for invoice in Invoices
> > if invoice.status == coerce(stat) and duedate > coerce(due)]"""
> >
> > IOW, the evaluation is within the current variable namespace,
> > so local and
> > global variables are both available for use in the
> > expression. No string
> > interpolation or separate "bind variables" required.
>
>I take it you do that at declaration time?
I'll probably keep a cache mapping query strings to parse trees so the
statement doesn't get reparsed each time, but there's really no
"declaration" time.
>That is, if you're going to
>pass the "when" string around, you need to wrap up locals/globals in a
>closure somehow?
No, if we call something like 'find(some_querystring)', it will obtain the
callers' locals and globals using sys._getframe() at the point of invocation.
>I've ended up supporting both early and late binding
>with separate mechanisms. Early is done with bytecode hacking; late is
>done via 1) keyword arguments to the expression, or 2) standardized
>functions like today(), which returns a datetime.date. App developers
>can supply their own functions.
This won't be using bytecode hacks. I'm looking forward to future
possibilities such as PyPy and IronPython being able to run this stuff
faster than today's CPython.
> > Anyway, these queries will actually be easier to use than
> > SQL. I don't
> > like using strings to represent code, but the alternative
> > approach of using
> > expression proxies the way SQLObject and some other
> > frameworks do, doesn't
> > really work all that well for expressing arbitrary queries.
> > There's no way to specify arbitrary joins properly, for example.
>
>What would be the string-based way to specify such? This is where my
>framework really falls down, IMO.
"[(x,y) for x in table_X for y in table_Y]" is a simple cartesian
product. In general, you can translate any:
SELECT c1,c2,...cN
FROM t1 as a1, t2 as a2, ... tM as aM
WHERE predicate
to:
[(a1.c1,a2.c2,...) for a1 in t1 for a2 in t2 ... if predicate]
Of course, ORDER BY, GROUP BY, HAVING, and a host of other things aren't so
trivially translated, but I haven't started any serious work on those
yet. Most likely, they'll be expressed as functions over listcomps, like:
sum([a for a in b])
sort([a for a in b], key=something)
...but again, this is just speculation right now.
> > Even if we have convenience functions to simplify the
> > typing (e.g. 'register_sql_function(int,Oracle,lambda args:
> > "FLOOR(%s)" % args[0])'), the full extensibility will be
> > there if and when it's needed.
>
>Right. Then you could write:
>
>def register_oracle_function(func, disp):
> register_sql_function(func, Oracle, disp)
>
>..or even <mischievous grin>:
>
>class OracleAdapter(SQLDialectAdapter):
> ...
> def int(self, args):
> "FLOOR(%s)" % args[0]
>
>for func in [int, ...]:
> register_sql_function(func, Oracle, getattr(OracleAdapter,
>func.__name__))
Note that that only works in single dispatch if you're willing to extend
OracleAdapter for your application when you have application-specific
functions. Also note that such an approach is not necessarily
*composable*. That is, if there are two components that each need a custom
mapping, you have to figure out how to glue them together. The multiple
dispatch scenario, however, is *inherently* composable. Each component
defines its function mappings, and that's that.
Also, I think you've missed here that the mapping from the string
"int(foo)" in a query is going to go by way of '<builtin type int>', and is
*not* looked up by searching for some function registered under the *name*
'"int"'. IOW, queries that perform a 'foo(bar)' operation, that are
defined in modules with different 'foo' functions should each get the kind
of foo function they mean in context, rather than being limited to a single
function named '"foo"'.
As you can see, if you have a requirement like that, the single dispatch
approach is going to be rather awkward, as you end up bending the
requirements to try and make it easier to implement (both at the framework
and end-developer levels). This was one of the many issues I ran into when
I first started trying to build out the 'peak.query' framework.
Specifically, I ran into the issue of defining "what is a function?" for
purposes of SQL mappings. The obvious solution now is to say, "anything",
and let the rules sort it out. Meanwhile, the "simplest thing that could
possibly work" solution for my immediate goals at Verio was to implement
the current "thunk" system in PEAK's SQL drivers, which is name-based and
kludgy but solved the immediate problem.
For the long-term solution, multiple dispatch (on at least the function, db
type, and schema) is a much better way to go, not requiring any "extra"
registration mechanisms.
>Mischief aside, it might be a way to ease the transition from single
>dispatch (building such tools into the framework); then, when an app
>developer wants something more complex, the rewriting is already half
>done. Just a thought.
Possibly. Keep in mind that you could also do something really simple like:
oracle_func = lambda fname: "dbtype in Oracle and f is "+fname
[when(oracle_func('int'))]
def func_to_sql(...):
...
[when(oracle_func('math.sqrt'))]
def func_to_sql(...):
...
More information about the PEAK
mailing list