The Urakawa SDK (Software Development Kit)

Reference Guide for the Data Model and API (Application Programmer Interface)

Author/Editor : Daniel WECK
Last revision : 2008 May 15

Table of Contents

  1. Introduction
    1. Project Organization
    2. Intended Purpose
    3. Java and UML
  2. Key Chapters
    1. Data Model Genericity
    2. How to read the Data Model
    3. Persistence with XUK
    4. Object Creation using Factories
    5. Specialized Object Managers
    6. Event Notification
    7. Undo-Redo, Commands
    8. Data Model structure
    9. Transparent Media asset management
    10. Pre-conditions and Exceptions
    11. Tree Navigator
  3. Glossary of Terms


Project Organization

The Urakawa project is an incubator for a number of open-source software deliverables, published under the business-friendly LGPL license (Lesser General Public License).

The primary outcome of the Urakawa project is the SDK (Software Development Toolkit), which can basically be seen as the combination of:

The SDK is specified using the Java programming language, essentially because:

The Urakawa project provides a concrete reference implementation of the SDK, written in the cross-platform C# language (although is being tested under Microsoft Windows only, it has been compiled successfully under Mono).

The Urakawa project also incubates 2 major applications based on the SDK:

Developers are strongly encouraged to checkout the project source code using Sourceforge's SVN project repository (SubVersioN). For a quick look, an up-to-date online repository browser is also available here.

To reflect the aforementioned deliverables, the Urakawa project is conveniently divided into 3 main activities, each of which lives in a distinct SVN directory:

In this present document, we will focus on the Architecture part of the project. The reader is invited to check out the corresponding SVN directory, which should contain the documentation of the abstract API:

Intended Purpose

The expected audience for this present document is:

This reference guide is designed to complement (i.e. not to replace) the abstract API documentation, in order to help the reader understanding the "big picture":

Although the formal SDK specifications are written in the Java language, please bare in mind that the emphasis is on describing a generic (language-agnostic) object-oriented model. This guide therefore documents commonly-accepted design patterns, not language-specific features or optimizations.

As a result, this guide does not provide information related to language-specific SDK implementations. If you are developing an application in a existing supported programming language, please also refer to the appropriate documentation (e.g. the C#-specific API-doc).

Java and UML

The SDK's Data Model and API are specified in an Object-Oriented fashion. A set of UML class diagrams is provided to conveniently represent the most relevant modules of the software architecture.

Java was chosen as the declarative form for the abstract Object-Oriented architecture, not as a usable implementation of the SDK. Any programming language that supports OOP (Object-Oriented Programming) should be suitable for implementing a concrete version of the SDK, and some language-specific optimizations are likely to happen. The Urakawa project's own C# SDK implementation illustrates such possible optimizations (i.e. native events, generics, etc.).

The UML diagrams are distributed in the PNG image format, and are designed primarily to be viewed on a computer display, and optionally to be printed-out on paper. The UML diagrams are not drawn manually: they are generated automatically from a plain-text declarative format, which is the Java source code itself. The software architecture is therefore accessible to blind and visually-impaired users, using screen-readers.

The source material for the UML diagrams is Java code augmented with special annotations that are placed in the JavaDoc comments (primarily for configuring the UML generation process). Programmers who do not like reading UML diagrams will feel more at ease reading the Java source code directly, using their favorite IDE or editor.

The process that generates the UML diagrams involves a special parser for the Java sources, called UMLGraph. This tool produces "dot" files (plain-text declarative graph format) which are then interpreted by Graphviz to output a variety of formats (PNG, PDF, HTML image maps, etc.). In order to supply detailed UML entity information, special UMLGraph annotations are used in the source Java code (amongst the standard JavaDoc comments).

The full generation process is implemented using the ANT makefile format, see the "build.xml" file.

Key Chapters

Data Model Genericity


Currently, the Urakawa SDK aims at facilitating the authoring of DTBs (Digital Talking Book), like DAISY 2.02 or DAISY 3 (XML DTBook).

However, the SDK provides an abstraction layer that does not depend on any existing standard. In other words, the core authoring Data Model should be future-proof when new versions of the standards are released. Furthermore, DAISY is in fact one specific distribution format. There are other multimedia standards.

The W3C SMIL (Synchronized Multimedia Integration Language), for example, offers the most generic (and by far the most complete) timing model. Although the Urakawa SDK currently supports only a fraction of SMIL, it will grow and evolve into a richer generic multimedia toolkit, with a DAISY-compatible layer to accommodate the specific DTB requirements. The goal is to ultimately support other use-cases than DTBs, such as accessible motion pictures (multilingual text subtitles, audio captioning, table of content, etc.).

To make this possible, the Urakawa Data Model is modular and the API is extensible. Third-party vendors can target a different application domain by implementing Data Model extensions to the core Urakawa SDK, and an API that conveniently exposes the new features.

The document is a tree

A multimedia presentation represented in the Urakawa Data Model corresponds to a hierarchical document, architected around a single core tree. This logical structure supports enhanced semantic navigation, timing, events, metadata, media objects, etc. via a set of well-defined types that can be attached to its nodes. In other words, the tree itself is totally abstract (somewhat like DOM): behavioral characteristics are applied to the structure by associating "properties" to the nodes of the tree.

The SDK provides built-in properties (e.g. to attach an audio and a text media object to a node), but this can be extended very easily to meet specific application requirements (e.g. a "todo" flag to mark nodes that require editing). Custom constructs can be added by third-party developers by subclassing core base classes, and by configuring object factories (more on that later).

So, a presentation is made of a tree of TreeNode objects, and each TreeNode can have one or more Property objects of different types. Here is a simple, concrete example:

Remember, this is an authoring toolkit, so the fact that one TreeNode has 2 AudioMedia instances attached to it does not mean that the audio plays together in parallel. It is simply a the representation of a hierarchical ownership, which can be displayed in an editing environment as 2 separate AudioChannels (and English and a Spanish one). Such authoring tool may then expose a "DAISY export" feature, which would then only publish the user-selected audio language.

Note: the audio clips in this simple example might well be 10mn MP3 files, from which the H1 text occupies only a small subclip (e.g. from 5.6s to 8.6s, total 3s). The Urakawa SDK provides an abstraction layer to physical storage, so that programmer do not have to deal with that directly. More on that later.

How to read the Data Model


The SDK object-oriented architecture is written in Java and makes extensive use of interfaces. It is very similar to how DOM (Document Object Model) is specified, often using the IDL (Interface Definition Language) format.

An interface is essentially an abstract set of methods that can describe either a particular service (re-usable in many concrete classes) or an actual object type definition (like a class, but without code, attributes and constructors). Baring in mind that the interfaces in the Urakawa SDK are not instance containers, they still provide all the information required to establish the facade API to a concrete object type.

Interfaces have many advantages, especially when the aim is to describe a contract rather than to implement the actual behavior. They enable more flexibility in respect to multiple inheritance (some object-oriented programming languages do not support mix-ins, others do).

Concrete classes

In the Java code, there is a naming convention for concrete classes: their name ends with "Impl". For example, the TreeNode object definition is actually an interface. There is one (and only one) concrete implementation, called TreeNodeImpl. Because the Java code is not meant to be used as an actual implementation of the toolkit, such "*Impl" class is usually empty, and exists as a place-holder. It simply means that a real implementation (like the C# one) needs to actually implement the interface.

Abstract classes are used specifically to limit the application developer's capacity to extend the design (in the object-oriented single-inheritance sense), and to force the developer to extend the class in order to implement application-level business logic. Obviously, abstract classes usually implement all the boiler-plate code and only require the implementation of key custom behaviour. The naming convention is that the class name ends with "AbstractImpl".

This separation between interface and class has real benefits, notably when using factories to create object instances of a given type. For example, a TreeNodeImpl instance can be create with a TreeNodeFactory, using its createTreeNode() method. What is interesting here is that the create method is declared to return the TreeNode type (which is the abstract interface), not the real TreeNodeImpl object type. This means that the code behind the create method could be transparently changed to return a custom AnotherTreeNodeImpl type, without any visible impact from the user side, because both concrete TreeNodeImpl and AnotherTreeNodeImpl types are guaranteed to realize the same interface: TreeNode. In other words, this functionality is one mechanism to allow pluggable implementations via the use of factories (more on this later). One could imagine that AnotherTreeNodeImpl is an highly-optimized implementation of a TreeNode for low-powered mobile devices (small memory requirements). Such optimized implementation could be provided by a third-party vendor as a drop-in replacement of the built-in TreeNodeImpl SDK implementation, without an change of code on the client side (the SDK API remains exactly the same).

Implications for actual implementations

Now, the "Impl" naming convention is likely to be removed in actual implementations for the sake of clarity and simplicity, because such implementations usually provide only 1 single built-in concrete type. This is the case with the C# implementation, where TreeNode is actually a concrete class. Removing the "middle man" interface obviously removes the benefits of configurable factories, but the justification lies in the fact that this is likely not a paramount design goal for most SDK implementations. Developers can easily identify this type of interfaces, as they are marked with the "@leafInterface" annotation (inside the JavaDoc comment).

On the other hand, there are interfaces which are used in the more traditional way, without a single atomic corresponding implementation but with a use in many host classes. Such interface specifies only a small set of methods that several classes need to implement. They are used to split to design in smaller sub-contracts, or individual component services. They not only provide a reading convenience (useful in IDEs with auto-completion on method signatures), but they are also usable by application programmers for casting object types to a lower common denominator: this increase testability and therefore reduces the risk of bugs. Such interfaces are marked with the "@designConvenienceInterface" annotation, and SDK implementors may choose not to include them in their implementation.

One obvious example of optional interfaces is the "With*" interfaces (e.g. WithProperty). Such naming convention is used for interfaces that contain getters and setters (e.g. Property:getProperty() and void:setProperty(Property obj)). The reason why they are externalized in separate interfaces is that is greatly improves readability of the API specification. Getters and setters often come with native syntactic sugar in objet-oriented languages, such as C#. Java does not have native support for this, and furthermore it makes sense to individually specify the contract for the getter and setter. The extra verbosity is offset to an external interface, but SDK implementations are allowed to ignore this construct.

Limitations of interfaces

Because interfaces cannot specify "composition" or "aggregation" relationships (in the UML "association" sense), the SDK design uses a special UML notation to describe these relationships at the interface level (instead of at the level of the concrete classes). The motivation for this non-UML-standard addition is simply that "Impl" concrete classes are usually not easy to read, because of the great number of methods they potentially contain. Given that the effectiveness of the SDK specification resides in its clarity, it makes sense to use the interfaces as first-class citizens of the object-model, as they are more fine-grained than the larger "Impl" classes. By placing the UML association links at the level of the base interfaces, the Urakawa architecture is a lot easier to read and navigate. The resulting UML diagrams obviously benefits from the improved readability, at the expense of a non-UML standard notation.

The contract is everything

The Urakawa SDK (Data Model and API) promotes "programming by contract". This facilitates unit-testing, removes ambiguities and clarifies expectations amongst developer teams. To achieve this:

The Visitor pattern

A tree of TreeNodes is a pretty simple structure in essence, but an actual instance can be quite large in size, making its navigation difficult. The Urakawa SDK provides native support for browsing the document tree using the well-established Visitor design pattern. This pattern decouples the actual action of parsing the tree and the expected operations when reaching certain nodes (which are filtered as per the developer specification).

This model basically makes the application programmer's life easier, by removing the maintenance cost of boiler-plate code, and by allowing the developer to focus on the business logic associated to the parsing of the tree.

Persistence with XUK

The Data Model of the Urakawa SDK provides native support for serialization into an XML format called XUK. This offers authoring applications with an "out-of-the-box" mechanism for document persistence, and also allows lossless round-trip open/save operations. The XUK format is pretty much an image of the object-model for anything that is not binary (i.e. audio files are stored as real files in an associated media directory).

XUK uses XML namespaces to distinguish from built-in elements provided by the SDK, and custom extensions defined at the application level. The XML grammar therefore depends on the application domain.

The "Xukable" interface is realized by any object type that is serializable into the XUK format, which is basically most of the classes of the SDK. There are XukIn and XukOut methods, for parsing and serializing, respectively.

Support for importing and exporting other mainstream multimedia formats (e.g. Daisy, SMIL) is possible by converting to and from the XUK format. This is however out of scope for the SDK itself, so this functionality must be provided by third-party modules. A concrete example is Obi which provides its own XSLT-based converter to open and save DAISY content directly. The Daisy Pipeline provides transformers for many multimedia document types, and there are plans to develop support for the XUK format.

Object Creation using Factories

The Urakawa API makes extensive use of factories for generating object instances (it's a widely-adopted Design Pattern).

This design choice stems from the fact that round-trip serialization with the XUK format requires the ability to create object instances based on a XML fully-qualified name. The create methods of a factory therefore make use of the namespace/local-name information to determine the real type of object to generate. As such, there is a direct mapping between element names in a XUK instance and object types in the data model. This mapping is explicitly hard-coded into the factories (native to the SDK, or custom to the application), and is not configurable otherwise.

Sub-classing the default SDK factories allows third-party applications to implement the additional routines required to support custom object types. There is one factory per type of object to create, however there is one special global factory called the DataModelFactory, which role is to produce the individual factories (it is kind of a factory of factories...). This is basically the mechanism that allows applications to override the default SDK factories with their own, at startup when initializing the project.

As a result of this, not much emphasis is put on constructors in the SDK design. Instead, a fully constructed object is the result of its creation (using the factory and a no-arg constructor) followed by its initialization (by setting some required attributes). However, actual implementations may choose to expose constructors that take as many parameters as necessary to realize both creation and initialization at the same time (factory methods may also include more initialization parameters than specified in the design). The setter methods that play a role in initializing object instances are marked with the "@stereotype Initialize" annotation. This stereotype appears in the resulting UML class diagrams as well, so that a reader knows when to expect object initialization.

Specialized Object Managers

The Urakawa SDK utilizes a few "Manager" classes to maintain a certain state via the ownership of specific object instances. In other words, Managers are containers for objects that would otherwise not belong to the main Data Model structure, and could potentially be lost (not referenced anywhere). Managed objects can sometimes be queried via a unique ID, but that is mostly an implementation detail. What is important here is that the role of Managers is to guarantee object ownership by providing a registry of objects of a certain type and specialized services. Objects that are owned by a Manager can be referenced safely by one or more external entity, without having to worry about ownership (this task being delegated to the Manager).

Event Notification

Events are used in many different contexts, but the common denominator is to let various layers of an application emit event notifications to one or more listeners in the application. For example, the MVC (Model View Controller) design pattern relies on such mechanism to update the GUI (Graphical User Interface) views of the application, as soon as changes happen in the underlying data model.

The Urakawa SDK provides a built-in architecture for event notifications, including a default hierarchy of event types and their registration mechanism. In other words, any change that happens in the data model instance of an SDK implementation is announced in the form of an event object, that is dispatched to all registered listeners at this point (of course, the application developer takes responsibility to register and de-register listeners as required).

Because the data model is essentially a hierarchical structure, events are "bubbled-up" in the structure as they happen. This allows an application developer to listen to all events that are raised within a certain depth of the data model. For example, one might want to be notified when some text content is modified: instead of registering a listener onto the text object itself (TextMedia, attached to a TreeNode), one can simply listen at the level of the Presentation. The event will be propagated upwards along the chain of containers that are involved, until it reaches the top-level. It can be consumed by many listeners along the way.

In the Urakawa SDK, event types are structured hierarchically, following the traditional object-oriented inheritance model. This means that an event listener can specialize into a specific type (which can itself be a super-type, and therefore be a proxy to represent several sub-types), in order to filter the events to receive. This way, a listener does not get flooded with irrelevant events, instead it only gets the event type for which it was registered.

The lifetime of an event corresponds to the time during which it is being consumed by registered listeners. In a garbage-collected environment, the cost of memory ref-counting is offset to the runtime, so the developer just has to make sure not to hold a reference to event instance.

The lifetime of listeners is equally important: one risk it to register a listener and to forget to unregister it once it is not needed anymore. This creates bug-prone applications because of lapsed listeners that receive events when they should not anymore. The developer has therefore a responsibility in keeping track of event listener registration. Behind the scenes, the SDK registers and unregisters listeners as needed to perform the bubbling mechanism, but that is totally transparent to the user.

The extensibility mechanism of the event framework relies on creating a custom event subclass that contains the required data, and in implementing a Listener interface (for writing the callback method on the receiver end), as well as in implementing a Notifier interface which realizes the actual (un)registration of listeners and dispatching of events. Boiler-plate code is provided within the SDK so that application developers don't have to re-invent the wheel. In fact, the C# reference implementation of the SDK provides a variation of this mechanism, because the C# language has native EventArgs and EventHandlers.

Undo-Redo, Commands

General Principles

A Command is simply a container for an action, that can be executed once it has been defined (see the execute() method). For the purpose of Undo-Redo, a Command can not only be executed, but also undone (see the unExecute() method).

Although Commands are the fundamental underlying mechanism to implement undoable operations, they can be used to encapsulate user actions that are not at all reversible. The canUnExecute() method is used to check wether a Command is reversible.

The getShortDescription() and getLongDescription() methods provide human-friendly descriptions of what the command does. This information is typically used in the user interface presentation of the Data Model. Please note that the provided text must describe the business logic of the execute() method, Such description should be easily interpretable into its negative form for the reversible counterpart method: unExecute().

Commands provide the low-level construct to encapsulate a reversible action. In order to maintain the integrity of the data model when changing the state of its instance, the UndoRedoManager registers done and undone Commands in the undo stack and the redo stack, respectively.

When a non-reversible Command is executed via the UndoRedoManager, the undo and redo stacks get invalidated (flushed), so further calls to undo() and redo() will be unsuccessful. However when a reversible Command is registered via the execute() method, it is pushed onto the undo stack and the redo stack gets flushed.

The undo() and redo() methods can be called to navigate through the history of changes, and their canUndo() and canRedo() peer methods should be used to check wether there is any Command to undo or redo in the current history stacks.

The flushCommands() method can be used to clear all done and undone Commands from the stacks. This effectively resets the history and makes it impossible to undo previous changes !

The getRedoShortDescription() and getUndoShortDescription() methods provide access to the description for the next undoable or redoable Command available, if any. This typically delegates to the Command's own getShortDescription() method, so textual descriptions should be judiciously chosen so that a simple prepend of "Undo:" and "Redo:" is self-explicit and non-ambiguous (e.g. "Add a node to the tree.", "Redo: Add a node to the tree.", "Undo: Add a node to the tree.").

This can typically be used to render the undo-redo history in the user interface, and let the user select the level to undo or redo. To achieve this, the getListOfRedoStackCommands() and getListOfUndoStackCommands() methods can be used.

Final note: the undo/redo stacks maintained by the UndoRedoManager can be serialized into the XUK XML format. The Commands contained in both the Undo and the Redo stack become persistent, which allows for closing a project while saving the edition session.

Composite Commands, Transactions

Sometimes, a complex operation on the data model needs to be materialized using several Commands that execute in sequence. However, from the user perspective, such complex operation should be undoable in one go (not once for each individual Command).

This is the role of a CompositeCommand. It is actually a sub-type of Command that is used to register a sequence of individual Commands (which can be CompositeCommands, recursively. Therefore, nested CompositeCommands are allowed by this mechanism).

The getShortDescription() and getLongDescription() methods can be set to return a custom human-friendly label, or by default a description will be generated by concatenating each individual Command's description (recursively). It is therefore recommended to use setShortDescription() and setLongDescription() in order to keep the text human-readable.

Now, instead of manually creating a CompositeCommand object, there is a "Transaction" mechanism built in the UndoRedoManager.

By using startTransaction() to notify the beginning of a lengthy operation, the UndoRedoManager executes any following Commands normally, but waits for a matching call to endTransaction() in order to encapsulate all the registered Commands in a CompositeCommand.

Because CompositeCommands can be nested, Transactions can also be nested, as long as the startTransation() and endTransaction() methods are called in matching pairs.

If the need arise, a Transaction can be terminated prematurely using the cancelTransaction() method. The UndoRedoManager then rolls-back all the affected Commands since the call to startTransaction().

The isTransactionActive() method indicates whether a Transaction is currently active. There are a number of operations that cannot be done while a Transaction is active, such as trying to undo() or redo() (such attempt generates an exception). Once Transactions (potentially nested) are completed normally (the last call to endTransaction() matches the first call to startTransaction()), then the system is back to normal and a call to undo() will reverse the finished Transaction. Obviously, cancelTransaction() can also be used to terminate the current operation, rollback the committed Commands, and return to normal.

The getListOfCommandsInCurrentTransactions() method lists Commands that are NOT in the undo stack, because a Transaction places Commands in a temporary state until they are finally committed to the actual undo stack by using endTransaction().

Data Model structure

The Data Model of the Urakawa SDK has a relatively simple core, but there are quite a few services surrounding it that are necessary for round-trip XML serialization and to provide a flexible extension mechanism. The Data Model is by nature an entire data structure that can be saved into the persistent XML format (XUK). Therefore, all of the entities described below implement the XukAble interface to realize the serialization and parsing to/from XUK. Here is an overview of the composition:


The top-level object type is the Project. This essentially a container for one or more Presentations. It is also the end point for event bubbling (events are propagated upwards from deep in the Data Model structure, and stop here). Finally, it also owns the DataModelFactory, which is the mechanism that allows application developers to configure any custom factory for when their extend the Data Model (remember, factories are crucial as they are used behind the scene to create concrete object instances when opening a XUK file, or when cloning/copying data fragments).


The Presentation object type corresponds to a single document, and therefore contains a single tree of TreeNodes (it owns the root of the tree). Of course, it is also a hub for the event bubbling mechanism, like its parent Project. The Presentation is where the document Metadata is stored. Now, the Presentation owns a number of Managers and Factories. The Managers have specific roles, as shown by their names: ChannelsManager, MediaDataManager, DataProviderManager, UndoRedoManager. One could argue that the UndoRedoManager should be called CommandManager because it essentially maintain an undo and a redo stack of Commands. However we thought that the particular undo-redo function of this component required a change of name to distinguish this Manager from the other types. Regarding Factories: remember, the Project's DataModelFactory is the mechanism that allows an application developer to configure specific custom factories in order to accommodate the custom Data Model extension provided in the application domain. Now, the Presentation actually queries the DataModelFactory in order to obtain the concrete implementations of the Factories needed to create instances of the Data Model's components. The Presentation owns the following factories, to create object instances (the names are self-descriptive): PropertyFactory, MediaFactory, MediaDataFactory CommandFactory, TreeNodeFactory, ChannelFactory, DataProviderFactory, MetadataFactory.


The single tree that an application uses to represent the document is made of TreeNode objects. A TreeNode instance obviously contains a reference to its parent node (if any), and an ordered list of children. A TreeNode is natively Visitable (this is a built-in feature of the Date Model), which is just a simple application of the Visitor design pattern. Obviously, any TreeNode is a hub for event propagation and listener registration. When an event bubbles upwards ancestors in the tree and reaches the root node of the Presentation, it is forwarded over to the Presentation, and so on. Now, a vital function of TreeNode is to own one or more Properties of specific types. Remember, TreeNode is totally abstract (like DOM), it is the Property attachments that provide semantic or behavior to nodes of the document tree.



A Channel can be anything that meets an application requirement, such as "EnglishChannel" vs "SpanishChannel", or "AudioChannel" vs "TextChannel". It can be seen as a equivalency class used to categorize Media objects. The concrete use of this mechanism is for extracting semantically-related content normally scattered across the Tree. For example, the tree is a simple book document translated (text+narration) in 2 different languages. The overall structure is identical for both languages (that's the unique Presentation tree), but each TreeNode refers to both languages at the same time. To obtain the equivalent document for one language only, a Channel can be used as a filter on the tree of TreeNodes, for example using the built-in Visitor navigation functionality. A Channel can be anything that meets an application requirement, such as "EnglishChannel" vs "SpanishChannel", or "AudioChannel" vs "TextChannel". It can be seen as a equivalency class used to categorize Media objects. The concrete use of this mechanism is for extracting semantically-related content normally scattered across the Tree. For example, the tree is a simple book document translated (text+narration) in 2 different languages. The overall structure is identical for both languages (that's the unique Presentation tree), but each TreeNode refers to both languages at the same time. To obtain the equivalent document for one language only, a Channel can be used as a filter on the tree of TreeNodes, for example using the built-in Visitor navigation functionality.


The Media object type is abstract, and is refined into several categories materialized by AudioMedia, VideoMedia, TextMedia, etc. Note: SequenceMedia is a wrapper for a list of Media objects. Obviously, any Media type is a hub for event propagation and listener registration. In the simple case, a Media object either store the data natively into the instance, or it refers to an external file via a URL. Now, a special case for media assets is ManagedMedia (and ManagedAudioMedia in particular, as it is the only concrete implementation in the current toolkit): this allows the developer to benefit from the built-in audio asset management system, which is tremendously useful for implementing transparent non-destructive authoring with full undo/redo support.


ManagedMedia is the interface between the "abstract" world of Media and the "concrete" world of MediaData (in terms of the actual binary data that constitutes a Media object). So, a ManagedMedia basically points to a MediaData object.


Transparent Media asset management

WavAudioMediaData (see above) really is the core of the mechanism for managing audio assets transparently during non-destructive authoring.

It maintains an ordered list of WavClips, each of which represent a chunk of an audio file. In other words, a WavClip owns a single FileDataProvider and constrains the audio data between a logical clip-begin and clip-end (like SMIL does). For example: a single FileDataProvider can be a 10mn MP3 file, from which 10 distinct (non-overlapping) WavClips are created, lasting 1mn each. However, a more typical scenario is when a single WavAudioMediaData obtains its PCM source from several different actual files (this happens when importing, deleting, recording, etc.). It would then be composed of several WavClips, using distinct instances of FileDataProvider.

The consequence is that FileDataProviders can potentially be scattered all over the file-system (well, more likely inside a single application directory), and more importantly there can be unused audio data inside the files (unused in the Urakawa Data Model, that is). It is also important to realize that unused content in the Presentation tree may actually be used somewhere else: the undo/redo stack contains references to MediaData (and therefore DataProviders). One can also imagine that the application clipboard may contain references to important data, which should therefore not be deleted when cleaning-up the main Presentation tree.

Well, thanks to the WavAudioMediaData inside knowledge of what parts of a FileDataProvider are used, and thanks to the various Managers involved (i.e. FileDataProviderManager), the system has a clear picture of what is used and what is unused. As a result, a "cleanup" operation on the document that is currently being authored consists in a single call from the user perspective: SDK API hides all the nitty-gritty details, thanks to the ManagedMedia abstraction.

Media asset management is clearly the most complex / heavy part of the SDK (design-wise), and involves low-level constructs such as binary memory streams, file IO, etc. Things like safe undo/redo would not be possible without the ability to control ManagedMedia's MediaData: it would either lead to memory leaks (waste of storage), or worse, to data corruption (missing audio files / fragments).

Convenience methods

The API of the Urakawa SDK contains a whole lot of methods marked as "convenient", because they are not strictly necessary but greatly improves the API usability nonetheless.

They usually just implement repetitive boiler-plate code, such as delegate a method call to another level in the object hierarchy. A typical example is the getXXFactory() methods, which always delegate to the Presentation as this is the entity that owns the actual Factory instances. Another typical example is the getUID() methods that obtain a unique ID reference to a Managed object: this basically makes a query at the level of the Manager, because the Managed instance itself does not maintain its own ID (it's the responsibility of the Manager).

Sometimes, convenience methods are more than just that: they provide a "facade" API to a complex part of the Data Model. Things sometimes happen behind the scenes, such as with the default AudioChannel and TextChannel that are created automatically when the application programmer uses the setAudioMedia() and setTextMedia() convenience methods on TreeNode. This example use-case involves invoking the ChannelsManager, setting the ChannelsProperty on the TreeNode, etc., which is a lot of code that does not have to be written at the application level.

Pre-conditions and Exceptions

Generally-speaking, exceptions should never be used for controlling the execution flow. The exceptions used in the Urakawa SDK design describe forbidden values for input parameter of methods (e.g. null pointers, empty string values, boundaries for numerals, etc.). The rationale is that fail-first is the most efficient way to prevent bugs due to wrong data input.

This is a convenient mechanism for defining the implementation contract, but it does not mean that implementations have to use actual checked exceptions (they tend to have a negative impact on performance and require explicit try/catch/finally control structures). Instead, implementations may decide to use unchecked exceptions (still useful, as they generate full stack-trace reports), HRESULT output parameter values (very portable), or may choose not to return an error at all. The later is however not recommended, as asserting method parameter values can eliminate a great number of bugs, which are often difficult to trace if no adequate safeguard is implemented.

Tree Navigator

In large multimedia presentations (like typical Daisy books), browsing the tree using the conventional tree parsing methods quickly becomes overkill, even when using the built-in Visitor pattern.

The tree Navigator mechanism builds an abstraction of the full document tree, based on specific selection (filtering) criteria. The resulting virtual tree is much smaller and easier to navigate, and can even be modified without corruption of the Navigator state (as everything is evaluated dynamically, nothing is cached). This is similar to the "TreeWalker" of the Document Object Model (DOM).

This feature is still untested. We are leaving it in the toolkit with the special mention: "use at your own risk" ! [grin]

Glossary of Terms

For your reference, here is a list of useful acronyms used in this document, in alphabetical order:

Valid XHTML 1.1Valid CSS