Oracle 9i ‘PROJECT XDB’ – The XML Database

Sandeepan Banerjee, Oracle Corporation

Introduction

 

XML has arrived as a key technology for the next stage of evolution of the Internet.  In the beginning, its core characteristics of self-description and ad-hoc extensibility offered the flexibility needed for transport of messages between various applications. Lately, the next generation of XML standards -- such as XML Schema -- have enabled unification of both document modeling and data modeling.

Today, most application data and web content is stored either in a relational database or the file system or a combination of both.  XML is used mostly as an artifact for transport, generated from a database or a filesystem.  However, as the volume of XML being transported grows, and developers consider the costs of constant regeneration of XML documents there arises the question whether these storage methods effectively accommodate XML content.

To provide a more native support for XML, the first release of Oracle9i introduced the XMLType datatype and associated XML specific behavior. In addition, built in XML generation and aggregation operators greatly increase the

 

                                                                     Fig1:  Common XML Architectures

throughput of XML processing. With the second forthcoming release of Oracle9i, we will significantly add to XML support in the server. This will fully absorb the W3C XML data model into the Oracle server, and provide new standard access methods for navigating and querying XML – creating a native integrated XML database within the Oracle RDBMS.

 

Key XDB Technologies

 

The key XDB technologies can be grouped into two major classes – XMLType that provides a native XML storage and retrieval capability strongly integrated with SQL, and a XML Repository that provides foldering, access control, versioning etc. for XML resources. Let us look at each of these classes in detail.

 

XMLType

 

The XMLType datatype stores XML content, and can be used as the datatype of a column. XMLType includes a number of useful methods to operate on XML content.  XMLTypes can be stored with 2 storage options – LOB and Object-Relational storage. The former storage model maintains accuracy to the original XML (whitespaces and all), while the latter maintains DOM (Document Object Model) fidelity.  XMLType achieves DOM fidelity by maintaining information that SQL or Java objects normally don't provide for, such as:

·         Ordering of child elements and attributes

·         Distinguishing between elements and attributes

·         Unstructured content declared in the schema (e.g. content="mixed" or <any> declarations)

·         Undeclared data in instance documents, such as processing instructions, comments, and namespace declarations

·         Support for basic XML datatypes not available in SQL (Boolean, QName, etc.)

·         Support for XML constraints (facets) not supported directly by SQL, such as enumerated lists

Native XMLTYPE instances contain hidden columns that store this extra information that doesn't quite "fit" within the SQL object model.  This information can be accessed via APIs in SQL or Java, such as ExtractNode.

Changing XMLTYPE storage from object relational to LOB (or vice versa) is also possible (via database import & export), and your application code will not have to change in response.  This allows you to change XML storage when tuning your application, since each storage option has its own benefits.

 

                                                      Pros and cons of XML storage options

LOB (with Text index)

Object relational (with BTree index)

Very flexible when schemas change

Limited flexibility for schema changes (similar to the ALTER TABLE restrictions)

Maintains the original XML byte for byte, which may be important to some applications

Trailing newlines, whitespace within tags and data format for non-string datatypes is lost

Mediocre performance for DML

Excellent DML performance

 

Accesibility of existing SQL features (constraints, indexes, etc.)

 

 

Some of the other benefits of XMLType are:

 

XML Schema support:  Create tables and types automatically given a W3C standard XML Schema extending the normal SQL DDL. This means you have a standard data model for all your data (structured and unstructured), and can use the database to enforce this data model.

                     

XML Piecewise Update:  Use XPath to specify individual element(s) and attributes of your document to update, without rewriting the entire document. This is more efficient, especially for large XML documents.

 

XPath Search:  Specify elements to query against via XPath, and then use SQL operators (conformant to the emerging ANSI SQLX) on these elements. This helps you combine the best of SQL and XML.

                        

XML Indexes:  Use XPath to specify parts of your document to create indexes for XPath searches.

 

XML Operators:  New operators like XMLTABLE (to cast a list of nodes returned by XPath into a table), XMLELEMENT (to create XML elements on the fly), etc. to make XML queries and on-the-fly XML generation easy. XDB makes the SQL and XML metaphors interoperable.

 

XSL Transformations for XMLType:  Use an XSLT to transform XML documents via a SQL operator. You can get fast response-time, database-resident XSL transformations.

 

Lazy XML Load:  XMLType provides a virtual DOM; it only loads rows of data as they are requested, throwing away previously referenced sections of the document if memory usage grows too large. This helps you get high scalability when many concurrent users are dealing with large XML documents.

 

XML Views:  Create XML views to create permanent aggregations of various XML document fragments or relational tables. You can create your own efficient representations of XML.

 

Java Beans Interface: In addition to the DOM API, XDB provides a Java Beans Interface for fast access to structured XML data, and extensions that save only those parts that have been modified in memory. You can get static access to XML as well as dynamic access.

 

Schema Caching:  XDB keeps structural information (like element tags, datatypes, and storage location) in a schema cache, to minimize access time and storage costs. This helps you get high performance and scalability with large documents, as well as a large number of documents.

   

XML Repository

 

The second key aspect about XDB is that it provides an Internet repository for managing XML data and documents. Important items of Repository functionality include:

 

Access Control Lists (ACLs):  Create high-performance access control lists for any XMLType object, and define your own privileges in addition to the system-defined ones.

 

Foldering:  Enable folders to map resources (XML files) into database structures and enable directory traversal; also, use XMLTypes or views to map rows into URLs (via ALTER TABLE ENABLE FOLDERING), providing access control, modification date tracking, and other metadata management for those rows.

 

WebDAV and FTP Access:  Access any foldered XMLType row via WebDAV and FTP (Note that XMLType can manage arbitrary binary data as well, including any file format).

 

SQL Repository Search:  Operators like UNDER_PATH and DEPTH, allow applications to search folders, file metadata like owner and creation date, as well as file contents via SQL, and enable the SQL optimizer to choose the best execution plan.

 

Hierarchical Index:  XDB provides a special hierarchical index designed to speed pathname resolution and folder search. Additionally, you can automatically map hierarchical data in relational tables into folders, where the hierarchy is defined by existing relational information.

 

Navigational API:  Use JNDI (in addition or instead of JDBC) to access repository objects via a pathname, rename, delete, and copy files.

 

Servlet Access:  Users manipulating XML data in the Oracle server can use the servlet API to process XML via Java.

                          

 

Key XDB Benefits

 

The integration of a native XML capability within the database brings a number of benefits.

 

·         Users today manage structured data as tables and unstructured data as files or BLOBs, and have to subject their applications to different paradigms for managing different kinds of data. Systems channel application development down either the unstructured path (making document access transparent but table access complex) or the structured one (vice versa). XDB provides a unique ability to store and manage both structured and unstructured data, under a standard W3C XML data model. XDB provides complete transparency and interchangeability between the XML and SQL metaphors. You can perform XML operations over table data and SQL operations over XML documents. This opens up the database for a new class of ‘XML-shaped’ content.

·         XDB provides valuable Repository functionality – foldering, access control, FTP and WebDAV protocol support with versioning – enabling applications to retain the file abstraction when manipulating XML data brought into Oracle.

·         Users today face a performance barrier in storing and retrieving complex XML. XDB provides superior performance and scalability for XML operations

·         XDB provides better management of unstructured XML data through piecewise updates, indexing, search, multiple views on the data, managing intra-document and inter-document relationships and so on.

·         XDB enables data and documents from disparate systems to be accessed (e.g. through gateways, external tables) and combined into a standard data model. This integrative aspect reduces the complexity of developing applications that must deal with data from different stores.

 

XDB’s unique features will be attractive to B2B applications, developers of Web Services, Internet applications, content-management applications, as well as data- and application integrators. In the absence of strong database support for XML, many of these developers have leaned towards file-storage or unstructured storage of XML. If you store XML data in files or CLOBs, you are not exploiting several key capabilities of databases.

 

·         Indexing and Search:  Real applications need to do queries like "find me all of the product definitions created between March & April 2000", a query that is typically supported by a BTREE index on a date column.  This type of query is why most content management vendors need to use an RDBMS, since even document metadata requires BTREE indexes.  Content management vendors have had to build proprietary query APIs to handle this problem. XDB enables efficient structured search over XML data.

 

·         Updates & Transaction Processing:  Today’s commercial relational databases enable fast updates of subparts of a record, with minimal contention between users trying to update.  As traditionally document-centric data becomes more structured (via XML), this requirement gains in importance. File- or CLOB- storage cannot provide the granular concurrency control that XDB does.

 

·         Managing Relationships: Data with any structure will typically have some type of foreign key constraint.  Currently, XML data stores lack this feature, so you must implement these in application code. XDB enables you to constrain XML data to XML schemas, as well as relational constraints, and thus achieve the control over relationships that structured data has always enjoyed.

 

·         Multiple Views of Data:  Most enterprise applications need to group data together in different ways for different modules.  This is why relational views are necessary—to allow for these multiple ways to combine data. By allowing views on XML, XDB allows you to create different logical abstractions on XML.

 

·         Performance and Scalability:  People expect data storage, retrieval, and query to be fast.  Loading a file or CLOB and parsing is much slower than relational data access. XDB dramatically speeds up XML storage and retrieval.

 

·         Ease of Development:  Databases are foremost an application-development platform, that provide standard, easy ways to manipulate, transform and modify individual data elements.  While XML parsers give read access to XML data in a standard way, they don't provide an easy way to modify individual XML elements and store them. XDB supports a number of standard ways to store and retrieve data – using XML Schema, XPath, DOM, Java Beans, and JNDI.

 

On the other hand, if the drawbacks of XML file storage are forcing you to break down XML into database tables and columns, there are several advantages of XML you are leaving on the table.

·         Structure Independence:  The open content model of XML cannot be captured easily in the pure tables-and-columns world. XML Schema allows global element declarations (not just scoped to a container), so that you can find a particular data item regardless of where in the XML document it moves to as your application evolves.

 

·         Storage Independence:  When you use relational design, you client programs need to know where your data is stored, and in what format, what table, and what the relationships are between those tables.  XML Schema allows you to write applications without that knowledge, and allow the DBA to map structured data to physical table and column storage.

 

·         Ease of Presentation:  XML is understood as a native format by browsers, desktop applications like Microsoft Office XP, as well as various Internet applications.  Relational data isn't generally accessible directly from applications, but requires programming. XDB allows you to store data as XML and pump it out as XML, requiring zero programming for the contents of your database to be displayed.

 

·         Ease of Interchange:  XML is the language business is using to talk to business.  If you are forced to store XML into an arbitrary table structure, you are living with some sort of proprietary translation. Whenever you translate a language, information is lost, so interchange suffers. By natively understanding XML and providing DOM fidelity in the storage/retrieval process, XDB enables clean interchange.

 

USING XDB

 

Let us take a purchase order example. We want to store Purchase Orders marked up in XML as a column of a table. The data structure for that row or column can optionally be constrained by specifying an XML Schema during CREATE TABLE.  For example:

 

      CREATE TABLE xml_order ( info XMLTYPE )
            XMLSCHEMA 'http://www.oracle.com/xdb/orderSchema.xsd'

            ELEMENT 'PurchaseOrder';

 

OrderSchema.xsd may be created and edited using a popular XMLSchema editing environment such as XML Spy from Altova Corporation. Once the schema is created, mappings to XDB storage can be specified by annotating the schema from the XML Spy environment. We want to store instances of the XML documents in the Object-Relational storage to preserve DOM fidelity.

 

 

                                                               Fig2:  Creating a XML Schema using XMLSpy and Oracle

 

Fragments of the annotated schema are shown below. Note the use of the Oracle xdb namespace to qualify the annotation names.

 

<xs:schema xmlns:xdb="http://xmlns.oracle.com/xdb" xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0" xs:targetNamespace="xdbPurchaseOrder">

      <xs:complexType name="ActionsType" xdb:SQLType="XDBPO_ACTIONS_TYPE">

            <xs:sequence>

                  <xs:element name="Action" maxOccurs="4" xdb:SQLName="ACTION">

                        <xs:complexType xdb:SQLType="XDBPO_ACTION_TYPE">

                              <xs:sequence>

                                    <xs:element ref="User"/>

                                    <xs:element ref="Date"/>

                              </xs:sequence>

                        </xs:complexType>

                  </xs:element>

            </xs:sequence>

      </xs:complexType>

      <xs:complexType name="RejectType" xdb:SQLType="XDBPO_REJECTION_TYPE">

            <xs:all>

                  <xs:element ref="User"/>

                  <xs:element ref="Date"/>

                  <xs:element ref="Comments"/>

            </xs:all>

      </xs:complexType>

      <xs:complexType name="ShippingInstructionsType" xdb:SQLType="XDBPO_INST_TYPE">

            <xs:sequence>

                  <xs:element ref="name"/>

                  <xs:element ref="address"/>

                  <xs:element ref="telephone"/>

            </xs:sequence>

      </xs:complexType>

      <xs:complexType name="LineItemsType" xdb:SQLType="XDBPO_LINEITEMS_TYPE">

            <xs:sequence>

                  <xs:element name="LineItem" type="LineItemType" maxOccurs="unbounded" xdb:SQLName="LINEITEM">

            </xs:sequence>

      </xs:complexType>

      <xs:complexType name="LineItemType" xdb:SQLType="XDBPO_LINEITEM_TYPE">

            <xs:sequence>

                  <xs:element ref="Description"/>

                  <xs:element ref="Part"/>

            </xs:sequence>

            <xs:attribute name="ItemNumber" type="xs:integer" xdb:SQLName="ITEMNUMBER" xdb:SQLType="NUMBER"/>

      </xs:complexType>

      <!--

      xdb:tableProps="varray(sys_xmldata.lineitems.LineItem) store as table PO_LINEITEMREFS"

      -->

      <xs:element name="PurchaseOrder" xdb:SQLType="XDBPO_TYPE" xdb:defaultTable="PURCHASEORDER">

            <xs:complexType>

                  <xs:sequence>

                        <xs:element ref="Reference"/>

                        <xs:element name="Actions" type="ActionsType" xdb:SQLName="ACTIONS"/>

                        <xs:element name="Reject" type="RejectType" minOccurs="0" xdb:SQLName="REJECTION"/>

                        <xs:element ref="Requestor"/>

                        <xs:element ref="User"/>

                        <xs:element ref="CostCenter"/>

                        <xs:element name="ShippingInstructions" type="ShippingInstructionsType" xdb:SQLName="SHIPPINGINSTRUCTIONS"/>

                        <xs:element ref="SpecialInstructions"/>

                        <xs:element name="LineItems" type="LineItemsType" xdb:SQLName="LINEITEMS"/>

                  </xs:sequence>

            </xs:complexType>

            <!--              xdb:columnProps="foreign key (sys_xmldata.userid) references scott.emp(ename)"-->

      </xs:element>

      <xs:simpleType name="money">

            <xs:restriction base="xs:float">

                  <xs:fractionDigits value="2"/>

                  <xs:totalDigits value="12"/>

            </xs:restriction>

      </xs:simpleType>

      <xs:simpleType name="quantity">

            <xs:restriction base="xs:float">

                  <xs:fractionDigits value="4"/>

                  <xs:totalDigits value="8"/>

            </xs:restriction>

      </xs:simpleType>

      <xs:element name="User" xdb:SQLName="USERID" xdb:SQLType="VARCHAR2" xdb:defaultTable="">

            <xs:simpleType>

                  <xs:restriction base="xs:string">

                        <xs:length value="10"/>

                  </xs:restriction>

            </xs:simpleType>

      </xs:element>

     

      <xs:element name="PONumber" xdb:SQLName="PONUMBER" xdb:SQLType="VARCHAR2" xdb:defaultTable="">

            <xs:simpleType>

                  <xs:restriction base="xs:string">

                        <xs:length value="10"/>

                  </xs:restriction>

            </xs:simpleType>

      </xs:element>

      <xs:element name="SpecialInstructions" xdb:SQLName="SPECIALINSTRUCTIONS" xdb:SQLType="VARCHAR2" xdb:defaultTable="">

            <xs:simpleType>

                  <xs:restriction base="xs:string">

                        <xs:length value="2048"/>

                  </xs:restriction>

            </xs:simpleType>

      </xs:element>

      <xs:element name="name" xdb:SQLName="SHIPTONAME" xdb:SQLType="VARCHAR2" xdb:defaultTable="">

            <xs:simpleType>

                  <xs:restriction base="xs:string">

                        <xs:length value="20"/>

                  </xs:restriction>

            </xs:simpleType>

      </xs:element>

      <xs:element name="address" xdb:SQLName="ADDRESS" xdb:SQLType="VARCHAR2">

            <xs:simpleType>

                  <xs:restriction base="xs:string">

                        <xs:length value="256"/>

                  </xs:restriction>

            </xs:simpleType>

      </xs:element>

      <xs:element name="telephone" xdb:SQLName="PHONE" xdb:SQLType="VARCHAR2">

            <xs:simpleType>

                  <xs:restriction base="xs:string">

                        <xs:length value="24"/>

                  </xs:restriction>

            </xs:simpleType>

      </xs:element>

     

 

 

If OBJECT_RELATIONAL storage is not specified, we default to LOB storage.

 

Data Definition with XML Schema

To use an XML Schema to define data structure, you must first register the schema via a PL/SQL package.  To register the schema, all you need is the URL of the schema document.  XDB will load the schema file, and compile the information for efficient use at runtime.  XDB may also create SQL types and tables, indexes, and Java classes to make the schema ready for use.

 

The compiled schema stores all of the information needed to map XML elements and attributes onto SQL types and Java beans methods, the default table in which elements of a particular type will be stored, and the datatypes and data structures used to manage the XMLTYPE object in memory.  The application programmer can explicitly control this mapping information, or allow XDB to generate default mappings.  XDB subclasses the XML Schema definitions defined by the standard by adding extra attributes and elements to specify the mapping information.  For example, where the standard schema definition might be:

<xsd:attribute name="customerID" type="decimal"/>

 

the XDB version would be:

<xdb:attribute name="customerID" type="decimal" SQLname="customer_id"
      SQLtype="number" JavaType="int"/>

 

You can specify an existing SQL object type to map to rather than having XDB generate a mapping.  However, all of the attributes in the SQL object type must be mapped to some XML element.  It is possible to create many XML mappings for the same SQL object type, by registering different schema elements for each mapping.

 

Once the schema is registered with the database, XML documents to be constrained to the schema. The process of registration parses and validates the schema, and stores it into the Oracle data dictionary, and also creates the SQL object object definitions like:

 

CREATE TYPE purchaseOrderItem AS OBJECT (

      productName       varchar2(2000),

      quantity          integer,

      price             number,

      partNum           varchar2(2000)

);

 

CREATE TYPE purchaseOrderitems as VARRAY(2147483647)

of purchaseOrderItem;

 

CREATE TYPE purchaseOrder as OBJECT (

      customerName      varchar2(2000),

      orderDate         date,

      shipDate          date,

      items             purchaseOrderItems

);

 

 

Data Access With XPath and SQL 

The XPath standard provides the access abstraction on top of the storage model.  XPath traverses nested XML elements by specifying the elements to navigate through with a slash-separated list of element and attribute names.  SQL usage of XPath is via the ExtractNode and ExistsNode operators for XMLTYPE rows and columns.   ExistsNode tests the XMLTYPE object to see if a node matching the specified XPath expression exists:

 

SELECT xml_order FROM orders WHERE
EXISTSNODE(xml_order, '//ship_to/state') > 0

 

ExtractNode returns another XMLTYPE instance consisting of an XML document fragment with those elements and attributes matching the XPath expression.  Note that XMLTYPE can be converted to a VARCHAR or CLOB for purposes of output:

 

      SELECT xml_order FROM orders WHERE
      EXTRACTNODE(xml_order, '//ship_to/state').getClobVal() = 'CA'

 

The power of the XPath abstraction of XMLTYPE is that the storage used by XMLTYPE is totally isolated from the application.  The XMLTYPE column could be stored as a SQL object (with each element and attribute mapped onto rows and columns) or as a LOB in its original textual form.

When Oracle executes a query againstan XMLTYPE column, the underlying storage is examined to see if the query can be rewritten into object-relational SQL.  For example the query predicate above:

      EXTRACTNODE(xml_order, '//ship_to/state').getClobVal() = 'CA'

can be rewritten by the optimizer as:

      xml_order.ship_to.state = 'CA'

 

In general, ExtractNode expressions that can be written into SQL attribute / column references can be used in other SQL statements, allowing you to create BTree or bitmap indexes, specify relational integrity constraints, or use any other Oracle feature.

 

XMLTYPE columns that use LOB storage will make use of an Oracle Text index, when available.  Oracle Text has provided new XML querying capability as a part of XDB to support faster XPath queries on LOB content, as well as to transparently provide CONTAINS keyword search access across XML documents stored object-relationally.

 

Generating XML from SQL queries

XDB provides the SYS_XMLGEN() operator to generate XMLTYPE objects from arbitrary SQL queries. For example, a query such as:

 

SELECT SYS_XMLGEN(null, empno, empname, salary*100, hiredate)

AS xml_doc FROM scott.emp e;

 

results in

XML_DOC

----------------------------------------------

<?xml version='1.0'?>

<ROW>

<EMPNO>20</EMPNO>

<EMPNAME>John</EMPNAME>

<HIREDATE>12-20-1999 20:23:23.0</HIREDATE>

</ROW>

 

<?xml version="1.0"?>

<ROW>

<EMPNO>30</EMPNO>

</ROW>

 

returns two XML documents as above.

The benefits of using SYS_XMLGEN() instead of a utility running outside the database are as follows:

·         You can create XMLTYPE views from any SQL tables, allowing you to use most of the other XMLTYPE features

·         You can use XPath against existing SQL tables, while maintaining the same performance, using query rewrites

·         You can use an XML Schema to describe the mapping, allowing for more flexibility of representation without the need for an XSL stylesheet to transform the result

·         Performance— the generation is done close to the data, at the server and all the data is retrieved in one roundtrip

 

Repository View

We can look at purchase orders stored into XDB as a hierarchy of folders. This repository aspect of XDB involves access through standard protocols such as WebDAV and FTP.

WebDAV

XDB models pathnames to natively support the WebDAV (Distributed Authoring and Versioning) standards, developed by the IETF as extensions to HTTP.  WebDAV is not only a protocol, but also defines a model for finding and managing information. 

 

WebDAV calls any piece of data individually accessible via an HTTP request a resource, with an associated URL (Uniform Resource Locator).  WebDAV requires that a set of properties be maintained along with every resource, information like modification dates, owner, access control, and so forth.  WebDAV also defines an object called a collection, which basically corresponds to a directory or folder in an operating system.  Collections contain references to a set of children, each with a unique name.  A list of names separated by slashes (e.g. /docs/mydata/foo.xml) forms a pathname used in the URL.

 

XDB maintains all of the information required for a WebDAV resource in XDB.RESOURCE objects, which can be automatically associated with a REF to an XMLTYPE object (which therefore can map to any row in the database, since an XMLTYPE view can be created on any set of tables).  These are exposed in SQL via two fully updateable views, XDB.PATH_VIEW and XDB.RESOURCE_VIEW, which allow applications to query and update resource information.  The PATH_VIEW contains one row for each unique pathname is the XDB repository, whereas the RESOURCE_VIEW contains one row for each resource, regardless of how many folders it is linked into.  The RESOURCE_VIEW arbitrarily selects one of the pathnames to a resource when it is queried, and may be more efficient for queries, since all pathnames do not need to be traversed.

 

The XDB repository views provides operators to help manage resources, UNDER_PATH and EQUALS_PATH.  UNDER_PATH returns TRUE for each resource under a designated root folder (up to some optional depth limit), and provides two ancillary operators, PATH() and DEPTH(), that return the relative pathname and folder depth for each row returned.  EQUALS_PATH is a returns TRUE for a particular pathname.  For example:

SELECT extractValue(res, ‘//Title’), PATH(1)

FROM PATH_VIEW r

WHERE UNDER_PATH(r, 5 /*depth*/, ‘/companies/oracle’) > 0 AND

extractValue(res, ‘//Name’) = ‘King’;

 

would return all of the XMLType objects in the "/companies/oracle" folder that have an element called "Name" with a value of "King".

 

With WebDAV support, XDB resources can be viewed from standard WebDAV enabled operating systems such as Windows NT/2000/XP, Mac OS X, and Linux.  Here is a collection of XML Purchase Orders from the Purchase Order table viewed in Windows Explorer:

 

                                                                                              Fig 3:  Database foldering

Similarly, XML content in the database can be edited through desktop WedDAV-compliant tools, such as the Microsoft Office suite, content editing tools from Adobe and Macromedia, or development tools like Oracle JDeveloper.

 

 

                                                    Fig4: Saving to XDB directly from a WebDAV client

 

Other Protocol Access

XDB provides direct access to XDB resources via FTP and HTTP as well.  XDB direct access uses the Oracle shared server architecture to serve content directly out of the database server process, rather than first formatting the data for Net8 and processing through the normal program interfaces.  This provides a high performance way to move XML data into and out of the server.

 

Internet Application Programming

All XDB functionality is accessible via standard Java APIs, which makes Internet application programming a breeze.  Today, the most popular methods for building web applications are servlets/JSPs (Java Server Pages) and XSL/XSPs (XML Stylesheets / XML Server Pages).  JSPs are designed to access data via Java Beans, and XSL/XSPs are designed to access data via a DOM (Document Object Model) API implementation.  XDB supports both styles of APIs, each of which have their own pros and cons.

 

Java Beans are compiled objects, and require the application programmer to know the structure of the data in advance.  XDB Beans are created when the schema is registered, and compile in knowledge of the data structure.  Beans provide for the fastest, easiest access to XMLTYPE data.

 

DOM requires no knowledge of the data structure on the part of the application.  The application uses string names to identify pieces of content, and must dynamically walk through the data in each XMLTYPE object.  The DOM API is the most flexible API, but has a performance cost vs. Java Beans.

 

XDB uses JNDI (Java Naming and Directory Interface) to locate resources, and to manage collections.  JNDI returns XMLTYPE objects that support both a Java Bean and a DOM interface. Most typical filesystem-style operations (rename, link, delete, etc.) are provided via the JNDI interface.  JNDI also provides a simple interface for accessing attribute data (as do DOM and Java Beans), which is not as powerful as the other APIs.

 

Once an XMLTYPE object is retrieved (using either JNDI or JDBC/SQL), it can be cast to the matching Java Bean class, or the DOM Node class.  XDB keeps track of any changes to the XMLTYPE object that you make in memory via the Bean methods or DOM APIs, and you can save those incremental changes back to the database with the save() method on XMLTYPE.  Alternately, you can bind XMLTYPE objects into SQL statements to modify an entire XMLTYPE object at once, or alter individual pieces.

JNDI

Here is a way to find out properties of an XDB purchase order object using JNDI alone.  JNDI allows properties to be looked up via a String property name.  This API is useful when the properties to be retrieved are not known at compile-time, as well as to provide the same interface as JNDI service providers other than XDB, like an LDAP directory.  In this example, we are retrieving a String property “title” and a Boolean property “isCurrent” from an XMLTYPE resource:

 

          import javax.naming.*;

          

          Hashtable env = new Hashtable();

          env.put(Context.INITIAL_CONTEXT_FACTORY,

         "oracle.xdb.jndi.JNDIContextFactory");

 

          DirContext    ctx = new InitialDirContext(env);

          Attributes    po = ctx.getAttributes("/orders/123.xml");

          String        title = (String)po.get("title");

          boolean       isCurrent = ((Boolean)po.get("isCurrent")).booleanValue();

 

DOM

Here is the JNDI example using DOM for attribute access.  The environment setup is the same as above (and omitted):

 

          Context  ctx = new InitialContext(env);

          Document po = (Document)ctx.lookup("/orders/123.xml");

          Node     n;

 

          n = po.getElementsByTagName("Title").item(1);

          String  title = n.getNodeValue();

          n = po.getElementsByTagName("IsCurrent").item(1);

          boolean isCurrent = new Boolean(n.getNodeValue()).booleanValue();

 

Note that the DOM example requires a type conversion from a string to a Boolean to get the desired data (another minor performance hit)

Java Beans 

Here's the Java Beans version of the example above, using JNDI to locate the object with the same setup:

 

          Context       ctx = new InitialContext(env);

          PurchaseOrder po = ctx.lookup("/orders/123.xml");
          Boolean       isCurrent = po.isCurrent();

          String        title = po.getTitle();

 

Note how simple the Beans API is.  In this case, the return of lookup() is simply cast to the corresponding Bean class.  The application must already have knowledge of the datatype at a particular filename, or else use methods defined on XMLTYPE to find out what class to cast to.

JDBC 

Finally, here is an example of using SQL instead of JNDI to find the object.  We use Java Beans for object acces (although DOM would work with JDBC as well).  JDBC isn't as fast as JNDI, but provides all of the powerful of SQL select (potentially using criteria other than pathname) to find XMLTYPE objects.

 

    PreparedStatement pst = con.prepareStatement(

"SELECT EQUALS_PATH(res, '/companies/oracle') FROM resource_view";

 

    pst.executeQuery();

    PurchaseOrder po = (PurchaseOrder)pst.getObject(1);

          boolean       isCurrent = po.isCurrent();

          String        title = po.getTitle();

 

Hierarchical Access

As we have discusses before, XDB introduces a hierarchical index to speed pathname resolution and collection searching.  The structure of this index is similar to directory structures in a filesystem, with the difference that ROWIDs are used to identify resources.  The hierarchical index resolves a pathname with a similar cost to a single BTREE index search, with typical applications fetching one block per level of the directory tree.  Whether the hierarchical index is selected during a query involving EQUALS_PATH and UNDER_PATH is left to the optimizer, which makes its choice based on its understanding of whether a hierarchical index lookup is more selective than other clauses in the query. The hierarchical index can also be used directly via JNDI.

Performance Features In XDB

 

No discussion of XDB would be complete without discussing some of the measures that we have undertaken to improve XML performance. 

 

XML Query Performance

XML Query performance depends on good indexes.  If the functional implementation is used by a query, your XML performance won't scale very well (the functional implementation involves loading each XMLTYPE row into memory, and executing XPath evaluation on that). While the functional implementation is important for completeness, its performance is the equivalent of a full table scan.  XPath is powerful query syntax, and not all XPath expressions can be evaluated using the XDB indexes (for now), but here is what can be:

·         ExistsNode() expressions on XMLTYPEs stored as LOBs (using the text index)

·         ExistsNode() and ExtractNode() expressions on XMLTYPEs stored object-relationally that can be rewritten to one or more column references

·         XPath predicate expressions that can be mapped to equivalent SQL expressions

·         XPath expressions that match a functional index on the XMLTYPE column

 

It's clearly the case that any query for which a relational index can be used can also be used for XMLTYPE.  However, there is still more work to do in this area to improve XPath query performance.

 

Managing XML instances

One of the big complaints people have about using XML is that it takes up much more space than other representations.  Some surveys show that the same data in XML text is three times the size of the data in a Java object or in relational tables.  The basic reasons for this are twofold:

·         The tags (metadata describing the data) take up space

·         The data is all converted to string form.  In the case of numbers, for example, the string needs about twice as many bytes as the native representation

 

The first problem is addressed via the schema registry mechanism.  Each XMLTYPE object doesn't actually keep all of its tags—it just points to an entry in the Oracle library cache, containing all of the tag (and other schema) information.

 

XDB addresses the second problem via a technology called XOBs (or XML OBjects).  A XOB is laid out in memory like a C/C++ object, with computed offsets, rather than having each node stored in a dynamic structure, like a hash table.  Auxiliary information describing ordering, non-schema data, and so forth is stored separately in their own dynamic structures.  (Conveniently this same information is needed on disk to supplement the SQL objects so that they can maintain DOM fidelity.)  To give you an idea of its impact, the freely downloadable XML parsers available from Oracle, IBM, Microsoft and other vendors use around 80 bytes per DOM node.  XDB only stores instance data for elements with child elements or attributes, and uses only 20 bytes for each of those.  If you made a conservative assumption of an average of 9 children per non-leaf element, XDB would use an average of only 2 bytes per node, a 40x space improvement.  By using computed offsets for storage, if you have a description of the element you are looking for, you can find that element in a document with just a few machine instructions, as opposed to the hundreds or thousands of instructions needed to search a DOM tree for a child node, or even for simple hash lookups.

 

Hierarchical Index

The best way to understand the performance advantages of the hierarchical index is to compare it with the BTREE index.  Let's examine the case where we want to find an employee in the EMP table by traversing a hierarchical pathname (let us assume for the purposes of the argument that employee names are not globally unique, but only unique for a given manager, which is the case in most filesystems), so we’re given a path like "/king/clark/meyer".  Since the filename (employee name) is stored in a single table, the BTREE index will have one entry for each employee in the table, and we will have to search through all of the employees to find 'king', so we can see who his/her direct reports are.  Next, we will have to search through all of the employees again to find the people who work for 'clark'.  If there are n employees in the table, I will have to search through 3*log(n) records, since each index search requires log(n) records to be searched.  In general, I will search d * log(n) records, where d is the depth of the path (one for each slash in the pathname).  This is the performance characteristic seen from CONNECT BY queries running against indexed columns.

 

The hierarchical index groups all of the direct reports for each manager into one record.  With a "normal" fanout (average number of children associated with each parent) less than 100, most folders (directories) will fit into a single disk block.  This allows us to search only one disk block per directory, or an average of d records.  While d goes up as the number of employees (files) increases, typically on the order of log(n), there is still a clear advantage to the hierarchical index pathname resolution.  Empirical data shows improvements on the order of a factor of 100.

 

If a single parent contains a very large number of children, the hierarchical index uses a BTREE index to manage the list of children.  Thus, if the hierarchy were only one level deep (like putting all of your files into the root directory), performance is equivalent to that of a BTREE.

 

Commit-Time Collection update

One issue that comes up with maintaining directory-style hierarchical structures in the database is that collection access is transacted.  In a filesystem, each user modifying a directory quickly locks the directory, makes the change (adding, renaming, or deleting a file) and then immediately releases the lock, and the changes are visible to other users.  In a database, however, the modifications to the collection do not become visible to other users until the current transaction is committed.  Sometimes applications have other long-running SQL statements that would mean a significant wait before committing.  Therefore, locking the entire directory record (as filesystems do) can be a concurrency problem.

 

This problem is seen most commonly in situations where a number of applications are generating files quickly in the same directory, such as when maintaining a spool directory for printing or email delivery, or when generating trace or log files.  (These problems were experienced by IFS in a number of instances.)

 

XDB solves this problem by providing for name-level locking rather than collection-level locking.  When a name is added to a collection, an exclusive write lock is not grabbed on the collection, only that name within the collection is locked.  The name modification is put on a queue, and the collection is locked and modified only at commit time.  Queuing collection modifications until commit time also has the side benefit of minimizing I/O when a number of changes are made to a single collection in the same transaction.

Direct XMLTYPE Load

Hierarchical pathname access is very simple (while still providing a universal naming system).  In contrast, SQL is a very complex access mechanism that provides incredible flexibility.  The cost of SQL's flexibility is performance.  Empirical measurements show that an operating system can locate data by pathname (e.g. a stat() system call) one the order of 100 times as fast as popular SQL implementations (including Oracle) can locate data via primary key via program interfaces such as JDBC or ODBC .  To provide the same performance in pathname resolution that operating systems do, XDB provides for direct load.  This means that XMLTYPE read access operates directly against the buffer cache, without going through the SQL engine.  With this, XDB performance is competitive with filesystems.

 

Java Native API Implementation

One of the major performance costs in implementing XML applications is moving data back and forth between native code and Java code.  The more complex the data structure, the higher the cost, and XML DOM trees are very complex.  Moving data back and forth between the native server implementation and Java application needs to be fast.

 

Another bottleneck for Java XML applications comes in the parser.  Even natively compiled or JIT compiled Java still runs at least twice as slow at XML parsing vs. C applications.  Java's performance has the most disadvantage when doing lots of string manipulation, which XML data is full of.

 

XDB addresses these issues by implementing all of the Java APIs natively.  XDB's APIs were designed for Java by Java programmers, but implemented in C.  This provides for language-neutral XML support (since PL/SQL and SQL support is also required), as well as for performance.

 

Minimizing Type Conversions

One of the biggest bottlenecks in JDBC implementations is the cost of type conversions—moving data from database datatypes to Java datatypes.  Most databases do not store character data in UCS-2, the character set used internally by Java.  Java doesn't have native support for Oracle's decimal number support, which requires a conversion to a Java datatype.  Also, creation of Java objects can be expensive, so just initializing an array of 1000 Java objects can be a bottleneck.

 

XDB addresses these problems with lazy type conversions.  Often, large trees of parsed XML elements are never accessed, or if they are accessed, their values aren't used by the application.  XDB only converts the data for Java when the Java application first asks for it.  Consider the case of a JSP that wants to load a name from the database and print it out in the generated HTML output.  Typical JSP implementations would read the name from the database (which probably contains data in the ASCII or ISO8859 character sets) convert the data to UCS-2, and return it to Java as a String.  The JSP wouldn't look at the contents of the string, but merely print it out after printing the enclosing HTML, probably converting back to the same ASCII or ISO8859 for the client browser.  XDB provides a write interface on XMLTYPE so that any element can write itself directly to a stream (such as a ServletOutputStream) without conversion through Java character sets.

 

Conclusion

XDB is a high-performance XML storage and retrieval technology available with the Oracle9i Rel 2 server. It fully absorbs the W3C XML data model into the Oracle server, and provides new standard access methods for navigating and querying XML. With XDB, you get all the advantages of relational database technology and XML technology at the same time.