[PEAK] The path of model and storage
Phillip J. Eby
pje at telecommunity.com
Wed Jul 28 13:51:47 EDT 2004
At 01:53 AM 7/28/04 -0400, Phillip J. Eby wrote:
>Here's a rough plan for the first stages of how I want to move towards
>implementing "workspace-based" object mappings in peak.model and storage,
>in the form of an list of "stories" or requirements to be
>implemented. I've tried to order them approximately in the order I would
>write the tests to develop them in a test-driven way. As you'll see, this
>list is quit concrete up to a point, and then fizzles out into general
>thoughts and directions that hopefully will be clearer by the time I get
>that far. :)
Wow, I really shouldn't write these things at 2am. That should be "in the
form of *a* list", and "this list is *quite* concrete". :)
>* Classes retrieved from a workspace or any module component thereof, have
>a 'find()' classmethod that iterates over all created instances of that
>class or any of its subclasses (within the corresponding workspace)
This actually needs to be "mutable peak.model classes", not all
classes. Immutables such as structs and enumerations won't have find() or
get(). Indeed, anything other than modules and mutable peak.model classes
should be returned from the workspace (or any child module components)
completely unchanged. So that's another thing that should be on the list
of tests.
>* The workspace offers a 'delete()' method which may be used to dispose of
>an instance of a workspace-specific class. Once deleted, the instance is
>no longer returned by 'find()'.
It should also be noted that, once an instance is created, it will always
be yielded by find() until delete() is called, even if no outside
references to the object are kept. And once delete() is called, any
attempt to use the object should result in an error. (E.g. perhaps its
type should be changed to a subclass with a 'DeletedObject' mixin that
raises an error for any operation, including attribute access.)
>* Keys can have an estimated multiplicity, which may be 1 or higher. (1
>means unique, values higher than 1 are an estimate which may be used to
>drive caching rules)
Clarification: this is to support caching of certain types of
queries. But, given that the vast majority of the system's functionality
depends on unique keys, this should probably be simplified to deal only in
unique keys for now. We can always add support for non-unique keys and
indexes later.
>Workspace SPI
>-------------
>
>* Cache objects or lists thereof by key
>
>* Clear all cache entries touched during a transaction
>
>* Track instances modified during a transaction, and whether they've been
>saved
It's not clear to me yet whether workspaces' lifecycle will be directly
linked to their transaction service, or whether they will be more like
subtransactions. Notably, document-style persistence doesn't require a
workspace to look like a transaction participant, and it may be more useful
to have capabilities like a command history with undo/redo. For fact-base
persistence, a transaction could then simply be modeled as one large
command, that is undone if the transaction is rolled back. Having a
command mechanism is also attractive if one is implementing a "prevalent"
workspace, or creating a GUI that works on in-memory objects.
In fact, using commands and mementos is helpful even for implementing
fact-base persistence, in that an object's state can be compared with its
state at the beginning of the transaction, in order to decide what updates
should be done. Deleted and created objects could be tracked by the
command as well. For document persistence, an entire editing session could
be represented by a top-level command, retaining its undo history as
sub-commands. Policy for things like how much undo history to keep can be
encapsulated in the command object, so the workspace logic doesn't need to
get involved with any of that. The workspace just has a "current command"
(which may be a "null command") that objects talk to when they are
inserted/deleted/changed.
If command objects provide a notification interface for undo/redo, then
arbitrary system actions (like updating indexes or caches) can be tied to
the command using closures. (Although a special event source would be
required for 'undo', since it needs to fire callbacks in *reverse* from the
order they were added to the event.) Also, if the workspace provides an
event interface for objects added/changed/deleted, then command objects
that care can track these events to construct their history. A "null
command" implementation would simply ignore undo/redo notification
requests, and also ignore add/change/delete events.
What should this command interface look like? Presumably the workspace
will have a command stack, and offer a method to execute a
command. Perhaps a top-level command is supplied when the workspace is
created, but then we are not "executing" that command. So maybe the
difference is that we have ICommand and ICommandManager, where ICommand is
an ICommandManager that also can be executed. That is, an ICommandManager
doesn't have an 'execute()' method, because it doesn't know "what to
do". It represents a session or transaction of arbitary behavior. An
ICommand has an 'execute()' method to carry out some specific operation,
and is used only by applications that have commands of that sort, or is
used to represent system operations like indexing or cache management.
Example: you create a workspace and modify an object. Modifying the object
means an in-memory index needs updating, so the 'around' method that
updates the index creates an IndexUpdateCommand and calls 'ws.run(cmd)' to
run the command. The command is added to the current command's undo
history, if any, and the index gets updated.
Hm. I don't like that. It seems to make more sense to have the
index-updating logic register a memento with the current command, than to
do the whole nested commands thing for stuff like this. The command
hierarchy logic should be strictly an application-specific thing;
workspaces should just keep track of what the "current command"
is. Commands then supply a memento registration interface and undo/redo
events. That should be all. Indeed, the memento registration interface
could just be a dictionary at a known attribute name, e.g.
'command.mementos'. Generally speaking, any operation that affects state
would check for a memento for that state in the current command. If there
is a memento, just proceed with the operation. If not, create the memento
and register an undo/redo closure that swaps the memento with the current
state. (Note that this sort of assumes that commands know whether they've
been undone or redone, at least if there is an undo/redo history, as
opposed to a single-command undo/redo.)
Okay, cool. We'll use mementos for everything. We don't actually need
object add/delete/change events on the workspace, either. The generic
functions that wrap those operations to perform other state updates (such
as indexing, caching, and GUI notifications) will just interact with the
workspace's current command to mementoize and register for
undo/redo. Operations like flushing output to a database will read the
appropriate mementos and delete them.
Hm, that's not quite accurate. Database flushing should be done using a
nested command object, that represents "changes since last DB flush". When
a flush occurs, the command either resets itself or replaces itself with a
fresh command object. The mementos from the flushed command are then
migrated to the parent command. By migrated, I mean, "added to the parent
memento dictionary, unless the corresponding memento already exists in the
parent."
So, we can undo-to-last-flush by executing the flush command's undo()
method. And we can undo the whole transaction by reverting to the parent's
mementos as well.
It may be that some operations want to use diffs rather than mementos,
though. (E.g. due to data size issues.) Can we accomodate that, even in
the nested command scenario? I guess that as long as "migration" transfers
the undo/redo callbacks to the parent command, that it'll work.
Whew. It's all rather complex, but not necessarily complicated. :) That
is, it seems to be pretty decomposable, since each "concern" (such as
saving to a database or updating an in-memory cache) just uses the command
object to hold things or call things back, and the workspace just keeps
track of the current command.
Indeed, the workspace doesn't really even need a command stack. Commands
can just know their "parent command", and when they finish an operation
they just set 'ws.currentCommand = self.parentCommand', or some
such. (After doing memento/event migration, in the case of a db-flush
command.) Actually, self.parentCommand.finish(self) might actually be more
like it, since the parent command may wish to add the child command to its
undo history. (Assuming the command is undoable, which needs a flag, and
perhaps a 'cantUndo()' method that operations can call on the current
command if they are not undoable.)
This might even help with designing a coherent locking strategy, in the
sense that locking is somewhat like a flushable database operation. The
difference is that the memento or callback representing "things to unlock"
has to always get added to the command object representing the transaction
as a whole, rather than to the current "things to write to the DB" command
object. (Because all database locks must be held until the transaction as
a whole commits or aborts.)
>Directions for Enhancement
>--------------------------
>
>* 'find()' and 'get()' should allow other criteria besides value equality
>
>* query language
>
>* time-to-live caching support, including clear after each txn. (Note that
>some kind of cleanup is needed, because w/out Persistent, circular
>references in the cache will be retained indefinitely.)
I think these still have to be left as "future directions" for now. We'll
see how the base system develops first, and then have a better idea of the
requirements.
As for the last three items...
>* save mementos?
>
>* locking
>
>* event hooks (fire when object(s) change)
...I think I've fleshed out the issues pretty well above, except that I
should mention that as of this moment, I'm thinking that the "event hooks",
if any, will simply be by access to the generic functions that are called
by the workspace when objects are added, deleted, or changed. Since in
general only GUI apps will need this, they should be the only ones who pay
for the overhead.
One final issue that I haven't covered is constraints and
validation. We'll need to flesh that side of peak.model out a bit, using
generic functions, and the workspace (or at any rate, its current command),
will need to be able to manage a set of "active constraint
violations". Either that or commands may need to be able to request
validation of objects that were changed during the command's
execution. Maybe both.
Anyway, if validation can be integrated with the command framework, we'll
be able to do smart error reporting at the UI level. For example, imagine
an "apply for a loan" GUI that displays a list of current issues that would
(or might) keep the loan from being approved, or an IDE that lists style
issues and possible errors in your code. In other words, validation and
business rules in the real world are necessarily *much* more complex than
just "raise an error when you change the field", and I want PEAK to reflect
that at a very deep level. It's part of what will make PEAK a truly
"enterprise-class" framework.
More information about the PEAK
mailing list