[PEAK] The path of model and storage

Wed Jul 28 18:37:46 EDT 2004

At 02:21 PM 7/28/04 -0700, Robert Brewer wrote:
> > ...conflicts will be resolved via the physical database's
> > locking facilities.
> >
> > So, there's absolutely no need to duplicate that logic, either at the
> > consumer code level or in the framework, unless for some
> > reason (that I can't presently fathom) you *want* to.
>
>To save processor time. I'd say about half of my data benefits from
>being retained in memory, as fully-formed objects, in a single process
>throughout the lifetime of the app. At the same time, it's the sort of
>data which can't simply be hard-coded--it needs to be user-configurable
>like other domain objects, and benefits greatly from the same framework
>tools that other domain objects enjoy. I'm talking about things like
>system tables for populating lists, which have lots of reads (sometimes
>every page view) but aren't _quite_ immutable; the users might have some
>interface for occasionally mutating the list. MRU's also fall in here,
>as does most of the content on any random Amazon page.

Ah.  If I understand your use case correctly, this need is met by 
workspaces with "longer-than-a-transaction" cache lifetimes.  In PEAK 
today, you'd do this with a DM whose 'resetStatesAfterTxn' flag is 
False.  (That is, don't ghostify the objects when a transaction is finished.)

For multithreaded scenarios, the idea is that you use an "application 
pool".  An application pool is just a list of zero or more top-level 
application objects.  When a worker thread picks up a request, it also pops 
an application from the pool, or creates a fresh instance if the pool is 
empty.  At the end of the request, the worker thread returns whatever 
application instance it used to the pool (via pool.append()).  This of 
course automatically and incrementally grows the application pool up to the 
number of concurrent threads.  Each application instance of course has its 
own workspace, and therefore its own cache, containing "live" 
objects.  And, with appropriate cache invalidation support, all cache 
instances can clean themselves if any thread or process commits a change to 
these long-lived objects.  (Of course, to do this, there will have to be a 
cache invalidation message queueing service, and it will have to be 
thread-safe.  But all this is for later implementation, anyway.)

Anyway, the point is that regardless of your precise caching scenario, it 
certainly can be implemented using the overall framework provided.  I 
haven't discussed the mapping layer's semantics yet, but it should be a 
pretty straightforward process to implement a mapping API that gives you 
some sort of support for sharing an in-process cache, should you decide 
that the "application pool" approach consumes too much memory (i.e. a 
cached copy of the object for each thread).

Last, but not least, I should note that use of threads in Python 
applications is in general more of a way to *burn* processor time than to 
save it.  Don't forget the GIL!

> > It's a rare business application that has a use case for actual
> > concurrency (as opposed to asynchrony).
>
>It's not so much a question of concurrency vs asynchrony as it is one of
>overhead; plenty of enterprise apps could benefit from having a variety
>of lifecycle* mechanisms available. When 100 users request the same
>dynamic page (which takes 1 second to build) within a 1-minute window,
>one can't help but wonder if there is a way to avoid the continuous

Hm.  If that dynamic page is *really* the same, then shouldn't web-tier 
caching (e.g. via the Expires: header) be able to handle some of that?  Or 
for that matter, why not build a static page and redirect to it?

I'm not saying that you shouldn't cache in the application tier, I'm just 
saying that if the page is really the same, then it seems to me that 
web-tier strategies are the appropriate tactic.

>processes of 1) database reads, 2) ORM DB-to-object creation and
>coercion overhead, and 3) workspace creation and population overhead. In
>Python, these options are limited because calls to DB's (written in
>static lang's) often still outperform naive caching implemented in pure
>Python.

Actually, my experience has been that for Python web applications, page 
*rendering* (i.e. the string processing to create the page) dominates 
response time in the common case.  If database retrieval and conversion to 
objects dominates response time, it seems likely that there's a problem 
with your database!  (Or the application is not making optimal use of the 
database.)

>So sharing such intermediate caches is not feasible for a good
>chunk of your data; a tool like the one I described would go a long way
>to helping that situation. And I won't even mention the Prevalence folks
>who have use cases where *all* your data fits in memory. Oops, I just
>mentioned them. Dang.

Workspaces are fine for prevalence, since prevalence requires all commands 
to be serialized.  So, with a worker thread that applies commands it reads 
from a queue, a single in-memory workspace suffices.  This is another 
example of a common use case that doesn't need or want any "thread safety" 
overhead.

>I've tried to write dejavu in such a way that such lifecycle decisions
>are deferred from developers to deployers, because:
>
>1) The number of possible combinations of DB's (or other storage),
>webservers, GUI's, platforms, distributed object utopian nightmares(!)
>and plain 'ol app-specific needs is vast, and
>2) Most developers who use your framework won't notice they have
>concurrency issues until their app has been deployed.
>
>I've tried to give deployers the tools to quickly and easily test which
>concurrency/lifecycle mechanisms work best for their particular
>combination of components; significantly, this is configurable per
>domain object, not per app.

Keep in mind that code that simply retrieves objects from a workspace and 
modifies them (or adds them or deletes them) doesn't care how the workspace 
internally manages the objects' lifecycles or caching or anything 
else.  Such concerns are actually properly part of what I've been calling 
the "mapping layer".

A mapping layer directs the transformation from abstract model classes to 
concrete implementation classes, and registers operations with the 
workspace's generic functions to perform all necessary loading, saving, 
cache management, etc.

Initially, I'll be developing a "trivial" mapping layer that does 
everything in-memory, providing no support for loading, saving, locking, etc.

A "schema" is effectively an instance of a mapping layer, that includes 
metadata to drive the mapping to an underlying storage machinery.  If a 
mapping layer can determine everything it needs to know from the model 
classes, it will not need a schema.  But e.g. a relational database mapping 
layer will need a schema to configure it.  (Whereas a mapping layer that 
reads and writes XMI files might not need any data but the classes themselves.)

Anyway, the idea is that it should be possible to have more or less one 
Workspace class used for all workspaces, but instances will have generic 
functions that are configured by the mapping layer to do the right thing 
for the specific schema and other issues.

Mapping layers/schemas should also be extensible by the deployer (i.e. 
whoever controls the app's .ini files!) to install additional policies for 
tuning.