[PEAK] The path of model and storage
Phillip J. Eby
pje at telecommunity.com
Wed Jul 28 18:37:46 EDT 2004
At 02:21 PM 7/28/04 -0700, Robert Brewer wrote:
> > ...conflicts will be resolved via the physical database's
> > locking facilities.
> >
> > So, there's absolutely no need to duplicate that logic, either at the
> > consumer code level or in the framework, unless for some
> > reason (that I can't presently fathom) you *want* to.
>
>To save processor time. I'd say about half of my data benefits from
>being retained in memory, as fully-formed objects, in a single process
>throughout the lifetime of the app. At the same time, it's the sort of
>data which can't simply be hard-coded--it needs to be user-configurable
>like other domain objects, and benefits greatly from the same framework
>tools that other domain objects enjoy. I'm talking about things like
>system tables for populating lists, which have lots of reads (sometimes
>every page view) but aren't _quite_ immutable; the users might have some
>interface for occasionally mutating the list. MRU's also fall in here,
>as does most of the content on any random Amazon page.
Ah. If I understand your use case correctly, this need is met by
workspaces with "longer-than-a-transaction" cache lifetimes. In PEAK
today, you'd do this with a DM whose 'resetStatesAfterTxn' flag is
False. (That is, don't ghostify the objects when a transaction is finished.)
For multithreaded scenarios, the idea is that you use an "application
pool". An application pool is just a list of zero or more top-level
application objects. When a worker thread picks up a request, it also pops
an application from the pool, or creates a fresh instance if the pool is
empty. At the end of the request, the worker thread returns whatever
application instance it used to the pool (via pool.append()). This of
course automatically and incrementally grows the application pool up to the
number of concurrent threads. Each application instance of course has its
own workspace, and therefore its own cache, containing "live"
objects. And, with appropriate cache invalidation support, all cache
instances can clean themselves if any thread or process commits a change to
these long-lived objects. (Of course, to do this, there will have to be a
cache invalidation message queueing service, and it will have to be
thread-safe. But all this is for later implementation, anyway.)
Anyway, the point is that regardless of your precise caching scenario, it
certainly can be implemented using the overall framework provided. I
haven't discussed the mapping layer's semantics yet, but it should be a
pretty straightforward process to implement a mapping API that gives you
some sort of support for sharing an in-process cache, should you decide
that the "application pool" approach consumes too much memory (i.e. a
cached copy of the object for each thread).
Last, but not least, I should note that use of threads in Python
applications is in general more of a way to *burn* processor time than to
save it. Don't forget the GIL!
> > It's a rare business application that has a use case for actual
> > concurrency (as opposed to asynchrony).
>
>It's not so much a question of concurrency vs asynchrony as it is one of
>overhead; plenty of enterprise apps could benefit from having a variety
>of lifecycle* mechanisms available. When 100 users request the same
>dynamic page (which takes 1 second to build) within a 1-minute window,
>one can't help but wonder if there is a way to avoid the continuous
Hm. If that dynamic page is *really* the same, then shouldn't web-tier
caching (e.g. via the Expires: header) be able to handle some of that? Or
for that matter, why not build a static page and redirect to it?
I'm not saying that you shouldn't cache in the application tier, I'm just
saying that if the page is really the same, then it seems to me that
web-tier strategies are the appropriate tactic.
>processes of 1) database reads, 2) ORM DB-to-object creation and
>coercion overhead, and 3) workspace creation and population overhead. In
>Python, these options are limited because calls to DB's (written in
>static lang's) often still outperform naive caching implemented in pure
>Python.
Actually, my experience has been that for Python web applications, page
*rendering* (i.e. the string processing to create the page) dominates
response time in the common case. If database retrieval and conversion to
objects dominates response time, it seems likely that there's a problem
with your database! (Or the application is not making optimal use of the
database.)
>So sharing such intermediate caches is not feasible for a good
>chunk of your data; a tool like the one I described would go a long way
>to helping that situation. And I won't even mention the Prevalence folks
>who have use cases where *all* your data fits in memory. Oops, I just
>mentioned them. Dang.
Workspaces are fine for prevalence, since prevalence requires all commands
to be serialized. So, with a worker thread that applies commands it reads
from a queue, a single in-memory workspace suffices. This is another
example of a common use case that doesn't need or want any "thread safety"
overhead.
>I've tried to write dejavu in such a way that such lifecycle decisions
>are deferred from developers to deployers, because:
>
>1) The number of possible combinations of DB's (or other storage),
>webservers, GUI's, platforms, distributed object utopian nightmares(!)
>and plain 'ol app-specific needs is vast, and
>2) Most developers who use your framework won't notice they have
>concurrency issues until their app has been deployed.
>
>I've tried to give deployers the tools to quickly and easily test which
>concurrency/lifecycle mechanisms work best for their particular
>combination of components; significantly, this is configurable per
>domain object, not per app.
Keep in mind that code that simply retrieves objects from a workspace and
modifies them (or adds them or deletes them) doesn't care how the workspace
internally manages the objects' lifecycles or caching or anything
else. Such concerns are actually properly part of what I've been calling
the "mapping layer".
A mapping layer directs the transformation from abstract model classes to
concrete implementation classes, and registers operations with the
workspace's generic functions to perform all necessary loading, saving,
cache management, etc.
Initially, I'll be developing a "trivial" mapping layer that does
everything in-memory, providing no support for loading, saving, locking, etc.
A "schema" is effectively an instance of a mapping layer, that includes
metadata to drive the mapping to an underlying storage machinery. If a
mapping layer can determine everything it needs to know from the model
classes, it will not need a schema. But e.g. a relational database mapping
layer will need a schema to configure it. (Whereas a mapping layer that
reads and writes XMI files might not need any data but the classes themselves.)
Anyway, the idea is that it should be possible to have more or less one
Workspace class used for all workspaces, but instances will have generic
functions that are configured by the mapping layer to do the right thing
for the specific schema and other issues.
Mapping layers/schemas should also be extensible by the deployer (i.e.
whoever controls the app's .ini files!) to install additional policies for
tuning.
More information about the PEAK
mailing list