[PEAK] The path of model and storage

Wed Jul 28 13:51:47 EDT 2004

At 01:53 AM 7/28/04 -0400, Phillip J. Eby wrote:
>Here's a rough plan for the first stages of how I want to move towards 
>implementing "workspace-based" object mappings in peak.model and storage, 
>in the form of an list of "stories" or requirements to be 
>implemented.  I've tried to order them approximately in the order I would 
>write the tests to develop them in a test-driven way.  As you'll see, this 
>list is quit concrete up to a point, and then fizzles out into general 
>thoughts and directions that hopefully will be clearer by the time I get 
>that far.  :)

Wow, I really shouldn't write these things at 2am.  That should be "in the 
form of *a* list", and "this list is *quite* concrete".  :)

>* Classes retrieved from a workspace or any module component thereof, have 
>a 'find()' classmethod that iterates over all created instances of that 
>class or any of its subclasses (within the corresponding workspace)

This actually needs to be "mutable peak.model classes", not all 
classes.  Immutables such as structs and enumerations won't have find() or 
get().  Indeed, anything other than modules and mutable peak.model classes 
should be returned from the workspace (or any child module components) 
completely unchanged.  So that's another thing that should be on the list 
of tests.

>* The workspace offers a 'delete()' method which may be used to dispose of 
>an instance of a workspace-specific class.  Once deleted, the instance is 
>no longer returned by 'find()'.

It should also be noted that, once an instance is created, it will always 
be yielded by find() until delete() is called, even if no outside 
references to the object are kept.  And once delete() is called, any 
attempt to use the object should result in an error.  (E.g. perhaps its 
type should be changed to a subclass with a 'DeletedObject' mixin that 
raises an error for any operation, including attribute access.)

>* Keys can have an estimated multiplicity, which may be 1 or higher.  (1 
>means unique, values higher than 1 are an estimate which may be used to 
>drive caching rules)

Clarification: this is to support caching of certain types of 
queries.  But, given that the vast majority of the system's functionality 
depends on unique keys, this should probably be simplified to deal only in 
unique keys for now.  We can always add support for non-unique keys and 
indexes later.

>Workspace SPI
>-------------
>
>* Cache objects or lists thereof by key
>
>* Clear all cache entries touched during a transaction
>
>* Track instances modified during a transaction, and whether they've been 
>saved

It's not clear to me yet whether workspaces' lifecycle will be directly 
linked to their transaction service, or whether they will be more like 
subtransactions.  Notably, document-style persistence doesn't require a 
workspace to look like a transaction participant, and it may be more useful 
to have capabilities like a command history with undo/redo.  For fact-base 
persistence, a transaction could then simply be modeled as one large 
command, that is undone if the transaction is rolled back.  Having a 
command mechanism is also attractive if one is implementing a "prevalent" 
workspace, or creating a GUI that works on in-memory objects.

In fact, using commands and mementos is helpful even for implementing 
fact-base persistence, in that an object's state can be compared with its 
state at the beginning of the transaction, in order to decide what updates 
should be done.  Deleted and created objects could be tracked by the 
command as well.  For document persistence, an entire editing session could 
be represented by a top-level command, retaining its undo history as 
sub-commands.  Policy for things like how much undo history to keep can be 
encapsulated in the command object, so the workspace logic doesn't need to 
get involved with any of that.  The workspace just has a "current command" 
(which may be a "null command") that objects talk to when they are 
inserted/deleted/changed.

If command objects provide a notification interface for undo/redo, then 
arbitrary system actions (like updating indexes or caches) can be tied to 
the command using closures.  (Although a special event source would be 
required for 'undo', since it needs to fire callbacks in *reverse* from the 
order they were added to the event.)  Also, if the workspace provides an 
event interface for objects added/changed/deleted, then command objects 
that care can track these events to construct their history.  A "null 
command" implementation would simply ignore undo/redo notification 
requests, and also ignore add/change/delete events.

What should this command interface look like?  Presumably the workspace 
will have a command stack, and offer a method to execute a 
command.  Perhaps a top-level command is supplied when the workspace is 
created, but then we are not "executing" that command.  So maybe the 
difference is that we have ICommand and ICommandManager, where ICommand is 
an ICommandManager that also can be executed.  That is, an ICommandManager 
doesn't have an 'execute()' method, because it doesn't know "what to 
do".  It represents a session or transaction of arbitary behavior.  An 
ICommand has an 'execute()' method to carry out some specific operation, 
and is used only by applications that have commands of that sort, or is 
used to represent system operations like indexing or cache management.

Example: you create a workspace and modify an object.  Modifying the object 
means an in-memory index needs updating, so the 'around' method that 
updates the index creates an IndexUpdateCommand and calls 'ws.run(cmd)' to 
run the command.  The command is added to the current command's undo 
history, if any, and the index gets updated.

Hm.  I don't like that.  It seems to make more sense to have the 
index-updating logic register a memento with the current command, than to 
do the whole nested commands thing for stuff like this.  The command 
hierarchy logic should be strictly an application-specific thing; 
workspaces should just keep track of what the "current command" 
is.  Commands then supply a memento registration interface and undo/redo 
events.  That should be all.  Indeed, the memento registration interface 
could just be a dictionary at a known attribute name, e.g. 
'command.mementos'.  Generally speaking, any operation that affects state 
would check for a memento for that state in the current command.  If there 
is a memento, just proceed with the operation.  If not, create the memento 
and register an undo/redo closure that swaps the memento with the current 
state.  (Note that this sort of assumes that commands know whether they've 
been undone or redone, at least if there is an undo/redo history, as 
opposed to a single-command undo/redo.)

Okay, cool.  We'll use mementos for everything.  We don't actually need 
object add/delete/change events on the workspace, either.  The generic 
functions that wrap those operations to perform other state updates (such 
as indexing, caching, and GUI notifications) will just interact with the 
workspace's current command to mementoize and register for 
undo/redo.  Operations like flushing output to a database will read the 
appropriate mementos and delete them.

Hm, that's not quite accurate.  Database flushing should be done using a 
nested command object, that represents "changes since last DB flush".  When 
a flush occurs, the command either resets itself or replaces itself with a 
fresh command object.  The mementos from the flushed command are then 
migrated to the parent command.  By migrated, I mean, "added to the parent 
memento dictionary, unless the corresponding memento already exists in the 
parent."

So, we can undo-to-last-flush by executing the flush command's undo() 
method.  And we can undo the whole transaction by reverting to the parent's 
mementos as well.

It may be that some operations want to use diffs rather than mementos, 
though.  (E.g. due to data size issues.)  Can we accomodate that, even in 
the nested command scenario?  I guess that as long as "migration" transfers 
the undo/redo callbacks to the parent command, that it'll work.

Whew.  It's all rather complex, but not necessarily complicated.  :)  That 
is, it seems to be pretty decomposable, since each "concern" (such as 
saving to a database or updating an in-memory cache) just uses the command 
object to hold things or call things back, and the workspace just keeps 
track of the current command.

Indeed, the workspace doesn't really even need a command stack.  Commands 
can just know their "parent command", and when they finish an operation 
they just set 'ws.currentCommand = self.parentCommand', or some 
such.  (After doing memento/event migration, in the case of a db-flush 
command.)  Actually, self.parentCommand.finish(self) might actually be more 
like it, since the parent command may wish to add the child command to its 
undo history.  (Assuming the command is undoable, which needs a flag, and 
perhaps a 'cantUndo()' method that operations can call on the current 
command if they are not undoable.)

This might even help with designing a coherent locking strategy, in the 
sense that locking is somewhat like a flushable database operation.  The 
difference is that the memento or callback representing "things to unlock" 
has to always get added to the command object representing the transaction 
as a whole, rather than to the current "things to write to the DB" command 
object.  (Because all database locks must be held until the transaction as 
a whole commits or aborts.)

>Directions for Enhancement
>--------------------------
>
>* 'find()' and 'get()' should allow other criteria besides value equality
>
>* query language
>
>* time-to-live caching support, including clear after each txn. (Note that 
>some kind of cleanup is needed, because w/out Persistent, circular 
>references in the cache will be retained indefinitely.)

I think these still have to be left as "future directions" for now.  We'll 
see how the base system develops first, and then have a better idea of the 
requirements.

As for the last three items...

>* save mementos?
>
>* locking
>
>* event hooks (fire when object(s) change)

...I think I've fleshed out the issues pretty well above, except that I 
should mention that as of this moment, I'm thinking that the "event hooks", 
if any, will simply be by access to the generic functions that are called 
by the workspace when objects are added, deleted, or changed.  Since in 
general only GUI apps will need this, they should be the only ones who pay 
for the overhead.

One final issue that I haven't covered is constraints and 
validation.  We'll need to flesh that side of peak.model out a bit, using 
generic functions, and the workspace (or at any rate, its current command), 
will need to be able to manage a set of "active constraint 
violations".  Either that or commands may need to be able to request 
validation of objects that were changed during the command's 
execution.  Maybe both.

Anyway, if validation can be integrated with the command framework, we'll 
be able to do smart error reporting at the UI level.  For example, imagine 
an "apply for a loan" GUI that displays a list of current issues that would 
(or might) keep the loan from being approved, or an IDE that lists style 
issues and possible errors in your code.  In other words, validation and 
business rules in the real world are necessarily *much* more complex than 
just "raise an error when you change the field", and I want PEAK to reflect 
that at a very deep level.  It's part of what will make PEAK a truly 
"enterprise-class" framework.