EP1192561A1 - Database management system with capability of fine-grained indexing and querying - Google Patents

Database management system with capability of fine-grained indexing and querying

Info

Publication number
EP1192561A1
EP1192561A1 EP00904610A EP00904610A EP1192561A1 EP 1192561 A1 EP1192561 A1 EP 1192561A1 EP 00904610 A EP00904610 A EP 00904610A EP 00904610 A EP00904610 A EP 00904610A EP 1192561 A1 EP1192561 A1 EP 1192561A1
Authority
EP
European Patent Office
Prior art keywords
document
xml
documents
update
objects
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00904610A
Other languages
German (de)
French (fr)
Other versions
EP1192561A4 (en
Inventor
Stephanos Object Design Inc. BACON
Peter Object Design Inc. FEIN
Paul Object Design Inc. RABIN
Michael L. Object Design Inc. RESNICK
Daniel Object Design Inc. WEINREB
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Excelon Corp
Original Assignee
Excelon Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Excelon Corp filed Critical Excelon Corp
Publication of EP1192561A1 publication Critical patent/EP1192561A1/en
Publication of EP1192561A4 publication Critical patent/EP1192561A4/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/83Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures

Definitions

  • This invention relates generally to databases and more particularly to database management systems using XML as a primary data model .
  • XML is a pared-down version of Standard Generalized Markup Language (SGML) , designed especially for Web documents, however XML is also useful for object-oriented databases. XML enables Web designers and other programmers to create customized tags to provide functionality not available with Hypertext Markup Language (HTML) . For example, XML supports links from a point in a document to multiple other documents, as opposed to HTML links, which car. reference jusr one destination each.
  • SGML Standard Generalized Markup Language
  • HTML Hypertext Markup Language
  • XML is based on the concept of documents composed of a series of entities or objects. Each entity can contain one or more logical elements . Each of these elements can have certain attributes (properties) that describe the way in which the element is to be processed.
  • XML provides a formal syntax for describing the relationships between the entities, elements and attributes that make up an XML document, which can be used to tell the computer how it can recognize the component parts of each document.
  • XML describes a class of data objects called XML documents and also partially describes the behavior of computer programs which process the documents.
  • XML documents are conforming SGML documents .
  • XML differs from other markup languages in that it does not simply indicate where a change of appearance occurs, or where a new element starts. XML sets out to clearly identify the boundaries of every part of a document, whether it be a new chapter, a piece of boilerplate text, or a reference to another publication.
  • the entities, or objects, of XML contain either parsed or unparsed data. Parsed data is made up of characters, some of which form the character data in the document, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure. The markup portions of an XML document are also called "tags.”
  • XML provides a mechanism to impose constraints on the storage layout and logical structure. A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application.
  • XML is designed to make it easy to interchange structured documents over the Internet.
  • XML files always clearly mark where the start and end of each of the logical parts (called elements) of an interchanged document occurs.
  • XML restricts the use of SGML constructs to ensure that fall back options are available when access to certain components of the document is not currently possible over the Internet. It also defines how Internet Uniform Resource Locators can be used to identify component parts of XML data streams.
  • a Document Type Definition DTD
  • users of XML can check that each component of the document occurs in a valid place within the interchanged data stream.
  • the DTD declares each of the permitted entities, elements and attributes and the relationships between them.
  • An XML DTD allows computers to check, for example, that users do not accidentally enter a third-level heading without first having entered a second-level heading, something that cannot be checked using the HyperText Markup Language (HTML) previously used to code documents that form part of the World Wide Web (WWW) of documents accessible through the Internet.
  • HTML HyperText Markup Language
  • XML does not require the presence of a DTD. If no DTD is available, either because all or part of it is not accessible over the Internet or because the user failed to create it, an XML system can assign a default definition for undeclared components of the markup.
  • XML enables users to bring multiple files together to form compound documents where illustrations may be incorporated into text files, and the format used to encode each illustration provides processing control information to supporting programs, such as document validators and browsers.
  • XML is formal language that can be used to pass information about the component parts of a stored document.
  • XML is flexible enough to be able to describe any logical text structure, whether it be a form, memo, letter, report, book, encyclopedia, dictionary or database.
  • XML is a common syntax for expressing structure in data. Structured data refers to data that is tagged for its content, meaning, or use.
  • ⁇ H1> tags might identify the author of a document
  • ⁇ PRICE> tags could contain an item's cost in an inventory list, all the way down to ⁇ DOGFOODBRAND> if that is the level of detail required.
  • XML separates structure and content from presentation, allowing the same XML source document to be written once, then displayed in a variety of ways, e.g. on a computer monitor, within a cellular-phone display, or a personal digital assistant. Therefore, an XML document can outlive and move beyond the particular authoring and display technologies available when it was written including the Internet.
  • XML XML
  • relational databases A relational database management system (RDBMS) stores data in the form of related tables.
  • Relational databases are powerful because they require few assumptions about how data is related or how it will be extracted from the database. As a result, the same database can be viewed in many different ways.
  • An important feature of relational systems is that a single database can be spread across several tables. This differs from flat-file databases, in which each database is self-contained in a single table.
  • a join operation matches records in two tables.
  • the two tables must be joined by at least one common field. That is, the join field is a member of both tables .
  • Object oriented data bases directly map computational objects to persistent data structures.
  • the fundamental argument in favor of OODBs, as compared to relational data bases (RDBs) relies on a measure of data complexity.
  • the need for OODB representations quickly increases with data complexity, which depends on a number of factors including: 1. the absence of a permanent and unique identification of the represented objects; 2. the existence of many-to-many relationships; 3. the evolution of data structures; 4. complex relationships between object instances and object features; 5. the existence of a large variety of data types; and 6. complex hierarchical structures between data types .
  • RDBs were originally designed to represent "simple" data types, such as customer names (strings) or product prices (numbers) and simple relationships that can be represented in tables (matrices) .
  • OODBs were designed to represent complex and "natural” data structures that often exist in the "real" world. Such objects tend to have a complex hierarchical structure with distributed and sparse relationships between objects and object features. As the complexity of object structures increases, OODBs have tended to scale up "linearly” with object relationships, whereas RDBs tend to require an exponentially increasing amount of join tables .
  • RDBs may be used to store complex objects, however there are problems .
  • a database representation does not match the computational representation, there is an impedance mismatch.
  • Such a mismatch may generate substantial problems in RDB representations of complex objects including: excessively large amounts of memory necessary to represent complex data relationships into join tables; significant performance problems during queries; and exponentially increasing difficulty in maintaining and extending RDB tables as data complexity increases .
  • Relational and hybrid object-relational database management systems have been developed, however they are not well suited for component-based computing applications .
  • RDBMSs and ORDBMSs deconstruct objects into tables for storage, and in the process, lose the explicit relationships between objects.
  • RDBMSs and ORDBMs have to reconstruct the relationships each time data is retrieved by executing CPU- intensive joins. Database performance degrades exponentially as the number of tables involved in a join increases.
  • RDBMSs were designed long before distributed architectures emerged. Database processing is server centric so queries, joins, stored procedures, and triggers occur on the server and as a result, database requests from distributed components across the network must all queue up at a central database server, which results in server bottlenecks and excessive network traffic.
  • RDBMSs do not integrate well with object-oriented languages . A mismatch exists between RDBMSs which manage simple data as rows and columns, and components written, for example in Java or C++, which manage data as objects. It is the developer's responsibility to address this mismatch by writing mapping code to translate between objects and rows. Mapping code adds no value to solving problems and often composes 25 - 40% of an application, increasing time-to- market, decreasing flexibility, and creating an on-going maintenance burden.
  • Relational joins limit performance and scalability.
  • components written in Java and C++ are rich entities of interrelated object data.
  • it In order for an RDBMS to manage component data models, it must perform time-consuming joins to relate the data at run time. This severe performance penalty prevents large-scale deployment or requires exorbitant hardware investments .
  • RDMBSs do not support rich, user-defined, data types.
  • RDBMSs support alphanumeric data types only, and they cannot be extended to support new user-defined data types.
  • Object- relational hybrids allow you to define new data types, which means you have to wait for joins, and extensibility is still limited by the need to write code to convert component object modes to tables . It remains desirable to have a database management system with a data model that is richer than the relational model, i.e. without having to do join computations, while allowing flexibility in the model so that the schema can be changed easily. It is an object of the present invention to provide a method and apparatus to manage data stored in existing databases .
  • a transactional store of XML documents maintains an index that can be queried at a fine grain level.
  • the documents may be any type of XML documents, which encompasses an broad range of data types.
  • the invention makes available a document object interface that enables an index to be built by breaking down a received document into its component parts .
  • the index is a decomposed form of each document with pointers into the documents.
  • the XML grammar enables the invention to maintain relationships inside documents and across documents so that querying can be done more quickly than if documents had to be parsed with each query.
  • the index also allows documents to be queried in a consistent manner.
  • An update language integrates a new document into the existing index, as well as removes documents to be deleted and updates the database when a change is made.
  • Figure 1 is a block diagram of a computer system using a data management system according to principles of the invention
  • Figure 2 is block diagram of the computer system of
  • FIG. 1 showing more components of the data management system according to principles of the invention
  • Figure 3 is a block diagram of a document in the file system
  • Figure 4 is a first structure index according to principles of the present invention
  • Figure 5 is an example document in the filesystem
  • Figure 6 is a flow chart of constructing a structure index from a document
  • Figure 7 is a structure index of the document of Figure 5 ; and Figure 8 is a flow chart of parsing and executing an update request according to principles of the invention.
  • FIG. 1 is a computer system 10 having a data management system 15 according to the principles of the present invention.
  • the computer system 10 has a client server architecture.
  • a client 20 and the database management system 15 are connected by a network of some type, e.g. a LAN, a WAN, or the Internet.
  • the client 20 is a thin client, i.e. it is a small embeddable component that runs in the same process as a client application.
  • the database management system 15 also referred to as the server, has the functional modules that process data.
  • the database management system 15 has an admin (administrative) server 30, XML stores 35, 40, a configuration database 45, and caches 50, 55.
  • the administrative server 30 and the caches each have a client interface 60.
  • the client interface 60 is a COM interface (Component Object Model, a model developed by Microsoft Corporation) . In alternative embodiments, other interface standards may be used.
  • the admin server 30 can perform all the functions of the database management system 15 including operating a default cache.
  • the admin server 30 can launch caches as required by configuration data and client requests.
  • the admin server 30 will be described more fully below.
  • the XML stores 35, 40 have any data of interest to the users of the system.
  • the XML stores 35, 40 may store any type of XML documents. Those items that are not XML documents are stored as blobs.
  • the caches 50, 55 are processes having associated caches of data from the XML stores 35, 40.
  • the configuration database 45 generally contains the configuration information. That is, it maintains information about the XML stores 35, 40 and the caches the admin server 30 is supposed to launch for particular processes.
  • the configuration database 45 also contains security information.
  • a user program on the client 20 sends a request to the admin server 30 for data.
  • the admin server 30 examines the configuration database 45 and sends the configuration information which includes the caches which are involved in the operations, e.g. storing and querying, and whether the caches have already been created. Using the configuration information, the client 20 then connects to existing caches. The admin server 30 creates any new caches which are needed and then the client 20 connects to those. The client 20 then routes requests for data to the appropriate cache for the particular operation.
  • the server 15 provides persistence of the data objects stored in the XML stores. Persistence is defined as the preservation of a data object and its relationships through transactions involving the data object.
  • the database management system manages data as XML objects.
  • FIG. 2 shows the functional components of a server process.
  • the admin server 30 and the caches 50, 55 of Figure 1 are all server processes and the server process of Figure 2 illustrates any one of them.
  • the server process has an engine 100, an engine API 110, and a JAVA API 120, as well as the COM API 60 described above with regard to Figure 1.
  • the engine 100 is the core of the database management system 15 and has a number of processes including an XML parser, a query engine, an update engine, a DOM implementation, a file system, a configuration component, and a security component.
  • the XML parser parses XML.
  • the query engine implements XQL.
  • the update engine implements an update language which has an XML grammar and creates a document having attributes , etc .
  • the update engine interprets the update language and uses the information contained therein to update other documents .
  • the DOM is an implementation of the Document Object Model, the interface that defines the mechanisms for accessing data in the XML database.
  • the file system is a logical view of the XML store and acts as the API for manipulating files.
  • the configuration component is used for creating and deleting XML stores and caches in the configuration database.
  • the security component is related to the file system and the configuration component. The security component controls whether a particular user can perform a particular operation on the system.
  • the engine is a dynamic link library wrapped in APIs through which the outside applications link to the applications in the engine.
  • the engine API 110 exposes the applications inside the engine to the applications outside the engine.
  • the engine API 110 is implemented in C++.
  • the Java API 120 is a translation of the engine API 110 into Java.
  • the client interface, or COM API allows the engine to be accessed remotely, e.g. by a remote client.
  • the COM API gives access to the server via a remote procedure call (as implemented in COM/DCOM (DCOM is Distributed Component Object Model, an extension of COM and also developed by Microsoft Corporation) ) .
  • the client interface permits no DOM level access to the engine.
  • FIG. 3 is a diagram of a typical document in a filesystem of the present invention.
  • the document 200 is hooked to the filesystem directory 205 via a filesystem object 210 that has the document name and permissions.
  • the document 200 is arranged in a hierarchical "tree."
  • the component parts are for example tags, elements and text substrings according to the XML standard.
  • the XML standard encourages rigorous detailing of document elements. This in turn enables the decomposing of an XML document into its component parts in a consistent manner.
  • Figure 4 shows first example of a structure index that is built from a document of the type shown in Figure 3 according to principles of the present invention.
  • the structure index is the result of parsing and decomposing the document. This example shows how the element "foo" is indexed.
  • the document having the element is parsed and "foo” is decomposed into its component parts which are then indexed into a structure index that is organized as a hierarchical tree, different from the hierarchical tree in the filesystem as shown in Figure 3.
  • Figures 5-7 illustrate the decomposition of a document in greater detail .
  • Figure 5 shows a directory having a filesystem object having a document in XML format.
  • XML enables structured data to be put into the format of a text file.
  • the document in Figure 5 is laid out in rigorous XML format where each element of the document is described by a tag (and the appropriate end tag) .
  • Figure 6 is a flow chart of the process of parsing a document to create a structure index.
  • a document in XML is taken as input, block 300.
  • a filesystem object is created inside the directory of stored files, block 305.
  • a first element in the file is parsed, block 310.
  • An object is created, block 315, and then added to the structure index with pointers from the structure index tree to the document, block 320.
  • the process checks if the end of the file (EOF) has been reached, block 325, and stops if it find the EOF. If not, the process parses a next element in the file and continues .
  • Figure 7 shows the index for the directory, filesystem object and document shown in Figure 5.
  • the document is a child of the filesystem object which in turn is a child of the directory.
  • the document is indexed according to its component parts.
  • the tag a is at the top of document.
  • B is a child of a.
  • C and d are children of b.
  • "q" is a child of c and takes the value "5.”
  • the document is broken down in the index to each of its component parts. One can arrive at the value of q efficiently using the index path. Additionally, because the document is broken down into its component parts, and each part is indexed, it is simple to do a search on all c's, or all a's in the directory rather than having to search through every document to find a "c" or an "a.”
  • a data path In order to access the data, a data path is used.
  • the data path is associated with content indices.
  • Update Language creates new elements in the database, removes elements from the database and updates the database.
  • the update language follows the following principles .
  • the update language is a declarative, or "rules-based" language using an XML grammar.
  • the update language provides all functions available via procedural document object model (DOM) that defines what attributes are associated with each object and how the objects and attributes can be manipulated.
  • the functions of the update language include creating new nodes in the document tree, new trees, new attributes and entire documents.
  • the update language operates on single nodes and bulk nodes . The content and attributes of existing nodes can be modified. Also, nodes and attributes can be removed from the document tree.
  • the update language operates across documents in the database.
  • the update language can store the grammar as a parse tree and optimize the grammar based on queries performed.
  • the implementation of the update language represents an update message internally as a parse tree, analyzes the internal representation, re-formulates it into an equivalent form for more efficient processing thereby optimizing it.
  • Updates to the database can be sent as web- based requests.
  • the update language makes updates re-entrant, that is the update language allows concurrent updates on unrelated data in the database .
  • the update language can target pieces of elements (e.g. words of text) for update and removal .
  • the basic update element is "xlnupdate" which identifies a request as being a database management system update.
  • This element has a namespace reference as well as a required version number, indicating the version of the update grammar supplied.
  • the following is an example of a database update:
  • the content of the update element "xlnupdate" is one or more of these elements (also called “rules”): ⁇ foreach>
  • xlnupdate is allowed only the above-listed top level elements as direct children.
  • An update request can be processed in the context of an existing document. If an update request has document context, this is known at the time that processing begins. If an update request does not have such context, the specific document, or documents to be acted upon are provided in the update request itself. Any document context provided inside the update overrides the current context. The top-level elements in an update request are processed in order. If no further information is provided, the scope of the update is the current document context.
  • the ⁇ foreach> element is a basic looping construct with a mandatory select pattern.
  • the rules contained in the element apply to all matching patterns. There may be more than one element inside, and elements are applied sequentially. For example,
  • the query is performed once, and sub-queries are scoped to a lesser set. Also, nesting semantics are well defined for dealing with newly- created, or modified sub-elements.
  • the ⁇ create> element creates a new entity in the system.
  • ⁇ Create> supports elements, attributes, and comments and other XML objects such as processing instructions (Pis), as well as files and XMLStores .
  • the ⁇ select> attribute is optional, but only if working in the context of a ⁇ foreach>; otherwise it is mandatory. If ⁇ select> is used while inside a ⁇ foreach>, it is evaluated relative to the current target element.
  • the target of the ⁇ select> attribute must be an element (can be root) . It can select into existing text, only in the case where a ⁇ text> element is being created. For example,
  • Element content goes here, and is created, un interpreted. No content if ⁇ movefrom> or ⁇ copyfrom> attribute exists . ⁇ /element>
  • false ⁇ " ] [exclusive ⁇ "true I false” ⁇ ] />
  • a single ⁇ create> element contains only one of the following child elements: ⁇ element>, ⁇ attribute>, ⁇ document>, or ⁇ comment> .
  • Attributes of the ⁇ create> element further include "where", “movefrom,” and “copyfrom. " If either "*from” attribute is set, it specifies a node, or possibly a set of nodes, that are to be used as the source of the creation. If the attribute is "movefrom” a MOVE operation is performed on the target nodes. If the attribute is "copyfrom,” the source is simply copied. If either of these attributes is set, then the content of the "element” element must be empty.
  • the "where” attribute tells the statement where, relative to the create target, to put the new element.
  • the values of "where” are: “insert” which creates a new child of target element, that child inserted anywhere; “firstchild” which creates a new child of target element, the new child to be inserted first; “lastchild” which creates a new child of target element, that child inserted last; “before” which creates a new sibling, and inserts it before a target;
  • the ⁇ document> attribute in the ⁇ create> element creates a new document, of type directory of file.
  • the current XMLStore and file context is used for resolving the path for the new document, unless a storename attribute is used.
  • the type defaults to "file.”
  • the ⁇ remove> element removes an entity from the system.
  • the ⁇ remove> element acts on elements, attributes, comments, and XML objects (e.g. Pis), as well as files and XMLStores.
  • the ⁇ update> element modifies an entity, in place. This element may replace attribute values, or content, including subtrees of elements.
  • the select attribute under ⁇ document> is optional and has same semantics as in create, above, except that it can target an attribute.
  • An optional "newname" attribute allows an update to change the name of the targeted object, effectively modeling a ⁇ replace> attribute including name . or example,
  • Contextual elements help give context to the update, and identify nodes for update.
  • Creation elements tell the update what type of XML structure to create or modify. Contextual elements are "document” and “modify. " Creation elements are ⁇ element>, ⁇ attribute>, ⁇ text>, ⁇ comment>, and ⁇ pi>. The use of ⁇ element>, ⁇ attribute>, ⁇ text>, ⁇ comment>, and ⁇ pi> are consistent between ⁇ update> and ⁇ create> .
  • ⁇ element>, ⁇ attribute>, ⁇ text>, ⁇ comment>, and ⁇ pi> are simple elements used only by the ⁇ create> and ⁇ update> elements, telling ⁇ create> and ⁇ update> what type of data to create.
  • the "type,” "overwrite,” and “exclusive” attributes are only used for the ⁇ create> element.
  • the "type” attribute 18 tells the update language processor to create a directory or a file.
  • the "overwrite” attribute tells the update language processor what to do if it sees a file or directory or other object with the same name.
  • the "exclusive” attribute tells the update language processor whether or not the access to the file is read or write.
  • the "from” element is used to control selection of target elements and documents for update. Its syntax is:
  • More than one ⁇ from> element may exist inside a single, top-level element.
  • ⁇ from> must be a direct child of a top- level element, such as ⁇ update> .
  • the parent element operation is performed over the set of all ⁇ from> elements.
  • ⁇ from> is a way to scope a single operation (e.g. ⁇ create> or ⁇ update>) across multiple documents and even XMLStores, without making the top-level element semantics overly complex.
  • All of the top-level elements have an optional "select" attribute. If this attribute is used, the element must not contain any ⁇ from> elements. The use of the "select" attribute at this level, and ⁇ from> elements is mutually exclusive.
  • FIG. 8 is a flow chart of the parsing and execution of an update request according to principles of the invention.
  • the update in XML format, block 400 is parsed into a document object model (DOM) tree, block 405.
  • the DOM tree is traversed, creating an execution list of objects for each major operation (e.g. foreach, update, etc), block 410.
  • Each executable operation is modeled as an object of the same type.
  • the end result is an ordered list of linked execution objects.
  • Each execution object has its own specific state, based on the arguments passed in.
  • a first major element is executed, block 415. If the major element has nexted elements, block 420, the nest objects are executed, block 425.
  • Elements which allow nesting (foreach, update) have their own execution lists, which are traversed as part of their own execution.
  • the major elements are executed sequentially until the end of the execution list is reached, block 430.
  • the following are some examples where
  • V represents the head of an execution list.
  • the ⁇ foreach> and ⁇ update> objects each has its own execution list, which is traversed once for each target element. In the case of nested updates, there is only one object; however, if one of the nested element is a foreach, then it will loop.

Abstract

A transactional store of XML documents maintains an index that can be queried at a fine grain level. The documents may be any type of XML documents (300), which encompasses a broad range of data types. A document object interface that enables an index to be built by breaking down a received document into its component parts. The index is a decomposed form of each document with pointers into the documents. The XML grammar enables the invention to maintain relationships inside documents and across documents so that the querying can be done more quickly than if documents had to be parsed with each query (310). The index also allows documents to be queried in a consistent maner. An update language integrates a new document into the existing index (320), as well as removes documents to be deleted and updates the database when a change is made.

Description

DATABASE MANAGEMENT SYSTEM WITH CAPABILITY OF FINEGRAINED INDEXING AND QUERYING
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims priority of U.S. provisional applications Serial No. 60/117,909 entitled, "XML Database Management System" filed January 29, 1999.
FIELD OF THE INVENTION
This invention relates generally to databases and more particularly to database management systems using XML as a primary data model .
BACKGROUND OF THE INVENTION
With the improvements in computer capabilities, there has been an exponential increase in data stored in databases. Once stored, data needs to be accessible, preferably quickly and to a fine degree of granularity. The vast amounts of data stored in databases drives a need for data retrieval to be efficient and specific. The finer the degree of granularity, the more specific a data retrieval can be, and the more specific the data retrieval, the less time and other resources that are wasted by the data requestor. extensible Markup Language (XML) was developed by the SGML Editorial Board formed under the auspices of the World Wide Web Consortium (W3C) in 1996. XML is a pared-down version of Standard Generalized Markup Language (SGML) , designed especially for Web documents, however XML is also useful for object-oriented databases. XML enables Web designers and other programmers to create customized tags to provide functionality not available with Hypertext Markup Language (HTML) . For example, XML supports links from a point in a document to multiple other documents, as opposed to HTML links, which car. reference jusr one destination each.
XML is based on the concept of documents composed of a series of entities or objects. Each entity can contain one or more logical elements . Each of these elements can have certain attributes (properties) that describe the way in which the element is to be processed. XML provides a formal syntax for describing the relationships between the entities, elements and attributes that make up an XML document, which can be used to tell the computer how it can recognize the component parts of each document. In other words, XML describes a class of data objects called XML documents and also partially describes the behavior of computer programs which process the documents. By construction, XML documents are conforming SGML documents .
XML differs from other markup languages in that it does not simply indicate where a change of appearance occurs, or where a new element starts. XML sets out to clearly identify the boundaries of every part of a document, whether it be a new chapter, a piece of boilerplate text, or a reference to another publication.
The entities, or objects, of XML contain either parsed or unparsed data. Parsed data is made up of characters, some of which form the character data in the document, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure. The markup portions of an XML document are also called "tags." XML provides a mechanism to impose constraints on the storage layout and logical structure. A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application.
XML is designed to make it easy to interchange structured documents over the Internet. XML files always clearly mark where the start and end of each of the logical parts (called elements) of an interchanged document occurs. XML restricts the use of SGML constructs to ensure that fall back options are available when access to certain components of the document is not currently possible over the Internet. It also defines how Internet Uniform Resource Locators can be used to identify component parts of XML data streams. By defining the role of each element of text in a formal model, known as a Document Type Definition (DTD) , users of XML can check that each component of the document occurs in a valid place within the interchanged data stream. The DTD declares each of the permitted entities, elements and attributes and the relationships between them. An XML DTD allows computers to check, for example, that users do not accidentally enter a third-level heading without first having entered a second-level heading, something that cannot be checked using the HyperText Markup Language (HTML) previously used to code documents that form part of the World Wide Web (WWW) of documents accessible through the Internet.
Unlike SGML, however, XML does not require the presence of a DTD. If no DTD is available, either because all or part of it is not accessible over the Internet or because the user failed to create it, an XML system can assign a default definition for undeclared components of the markup. XML enables users to bring multiple files together to form compound documents where illustrations may be incorporated into text files, and the format used to encode each illustration provides processing control information to supporting programs, such as document validators and browsers.
XML is formal language that can be used to pass information about the component parts of a stored document. XML is flexible enough to be able to describe any logical text structure, whether it be a form, memo, letter, report, book, encyclopedia, dictionary or database. XML is a common syntax for expressing structure in data. Structured data refers to data that is tagged for its content, meaning, or use. For example, whereas the <H1> tag in HTML specifies text to be presented in a certain typeface and weight, an XML tag would explicitly identify the kind of information: <BYLINE> tags might identify the author of a document, <PRICE> tags could contain an item's cost in an inventory list, all the way down to <DOGFOODBRAND> if that is the level of detail required. XML separates structure and content from presentation, allowing the same XML source document to be written once, then displayed in a variety of ways, e.g. on a computer monitor, within a cellular-phone display, or a personal digital assistant. Therefore, an XML document can outlive and move beyond the particular authoring and display technologies available when it was written including the Internet. The highly detailed structure and flexibility of XML makes it ideal for database applications. XML requires detailed marking of a stored document in a way that provides a rich, yet simple, data model for storing data in a way that makes the data easy to access even at a fine-grained level. There are a number of types of databases but currently large amounts of data are commonly stored in relational databases (RDBs) . A relational database management system (RDBMS) stores data in the form of related tables. Relational databases are powerful because they require few assumptions about how data is related or how it will be extracted from the database. As a result, the same database can be viewed in many different ways. An important feature of relational systems is that a single database can be spread across several tables. This differs from flat-file databases, in which each database is self-contained in a single table.
In relational databases, a join operation matches records in two tables. The two tables must be joined by at least one common field. That is, the join field is a member of both tables . Object oriented data bases (OODBs) directly map computational objects to persistent data structures. The fundamental argument in favor of OODBs, as compared to relational data bases (RDBs) , relies on a measure of data complexity. The need for OODB representations quickly increases with data complexity, which depends on a number of factors including: 1. the absence of a permanent and unique identification of the represented objects; 2. the existence of many-to-many relationships; 3. the evolution of data structures; 4. complex relationships between object instances and object features; 5. the existence of a large variety of data types; and 6. complex hierarchical structures between data types . RDBs were originally designed to represent "simple" data types, such as customer names (strings) or product prices (numbers) and simple relationships that can be represented in tables (matrices) . On the other hand, OODBs were designed to represent complex and "natural" data structures that often exist in the "real" world. Such objects tend to have a complex hierarchical structure with distributed and sparse relationships between objects and object features. As the complexity of object structures increases, OODBs have tended to scale up "linearly" with object relationships, whereas RDBs tend to require an exponentially increasing amount of join tables .
RDBs may be used to store complex objects, however there are problems . When a database representation does not match the computational representation, there is an impedance mismatch. Such a mismatch may generate substantial problems in RDB representations of complex objects including: excessively large amounts of memory necessary to represent complex data relationships into join tables; significant performance problems during queries; and exponentially increasing difficulty in maintaining and extending RDB tables as data complexity increases .
Relational and hybrid object-relational database management systems have been developed, however they are not well suited for component-based computing applications . RDBMSs and ORDBMSs deconstruct objects into tables for storage, and in the process, lose the explicit relationships between objects. RDBMSs and ORDBMs have to reconstruct the relationships each time data is retrieved by executing CPU- intensive joins. Database performance degrades exponentially as the number of tables involved in a join increases.
RDBMSs were designed long before distributed architectures emerged. Database processing is server centric so queries, joins, stored procedures, and triggers occur on the server and as a result, database requests from distributed components across the network must all queue up at a central database server, which results in server bottlenecks and excessive network traffic. RDBMSs do not integrate well with object-oriented languages . A mismatch exists between RDBMSs which manage simple data as rows and columns, and components written, for example in Java or C++, which manage data as objects. It is the developer's responsibility to address this mismatch by writing mapping code to translate between objects and rows. Mapping code adds no value to solving problems and often composes 25 - 40% of an application, increasing time-to- market, decreasing flexibility, and creating an on-going maintenance burden.
Relational joins limit performance and scalability. For example, components written in Java and C++ are rich entities of interrelated object data. In order for an RDBMS to manage component data models, it must perform time-consuming joins to relate the data at run time. This severe performance penalty prevents large-scale deployment or requires exorbitant hardware investments .
RDMBSs do not support rich, user-defined, data types. RDBMSs support alphanumeric data types only, and they cannot be extended to support new user-defined data types. Object- relational hybrids allow you to define new data types, which means you have to wait for joins, and extensibility is still limited by the need to write code to convert component object modes to tables . It remains desirable to have a database management system with a data model that is richer than the relational model, i.e. without having to do join computations, while allowing flexibility in the model so that the schema can be changed easily. It is an object of the present invention to provide a method and apparatus to manage data stored in existing databases .
SUMMARY OF THE INVENTION
The problems of storing and accessing data to a fine degree of granularity in a database having a flexible data model are solved by the present invention of an XML database system.
A transactional store of XML documents maintains an index that can be queried at a fine grain level. The documents may be any type of XML documents, which encompasses an broad range of data types. The invention makes available a document object interface that enables an index to be built by breaking down a received document into its component parts . The index is a decomposed form of each document with pointers into the documents. The XML grammar enables the invention to maintain relationships inside documents and across documents so that querying can be done more quickly than if documents had to be parsed with each query. The index also allows documents to be queried in a consistent manner. An update language integrates a new document into the existing index, as well as removes documents to be deleted and updates the database when a change is made. The index with its pointers into the documents at a fine-grained level enables the documents to be manipulated as well as queried at the sub-document level . The present invention together with the above and other advantages may best be understood from the following detailed description of the embodiments of the invention illustrated in the drawings , wherein :
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a block diagram of a computer system using a data management system according to principles of the invention; Figure 2 is block diagram of the computer system of
Figure 1 showing more components of the data management system according to principles of the invention;
Figure 3 is a block diagram of a document in the file system; Figure 4 is a first structure index according to principles of the present invention;
Figure 5 is an example document in the filesystem; 8
Figure 6 is a flow chart of constructing a structure index from a document;
Figure 7 is a structure index of the document of Figure 5 ; and Figure 8 is a flow chart of parsing and executing an update request according to principles of the invention.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS Figure 1 is a computer system 10 having a data management system 15 according to the principles of the present invention. The computer system 10 has a client server architecture. A client 20 and the database management system 15 are connected by a network of some type, e.g. a LAN, a WAN, or the Internet. In the present embodiment of the invention the client 20 is a thin client, i.e. it is a small embeddable component that runs in the same process as a client application.
The database management system 15, also referred to as the server, has the functional modules that process data. The database management system 15 has an admin (administrative) server 30, XML stores 35, 40, a configuration database 45, and caches 50, 55. The administrative server 30 and the caches each have a client interface 60. In the present embodiment of the invention, the client interface 60 is a COM interface (Component Object Model, a model developed by Microsoft Corporation) . In alternative embodiments, other interface standards may be used.
The admin server 30 can perform all the functions of the database management system 15 including operating a default cache. The admin server 30 can launch caches as required by configuration data and client requests. The admin server 30 will be described more fully below.
The XML stores 35, 40 have any data of interest to the users of the system. The XML stores 35, 40 may store any type of XML documents. Those items that are not XML documents are stored as blobs. The caches 50, 55 are processes having associated caches of data from the XML stores 35, 40.
The configuration database 45 generally contains the configuration information. That is, it maintains information about the XML stores 35, 40 and the caches the admin server 30 is supposed to launch for particular processes. The configuration database 45 also contains security information.
In operation, a user program on the client 20 sends a request to the admin server 30 for data. The admin server 30 examines the configuration database 45 and sends the configuration information which includes the caches which are involved in the operations, e.g. storing and querying, and whether the caches have already been created. Using the configuration information, the client 20 then connects to existing caches. The admin server 30 creates any new caches which are needed and then the client 20 connects to those. The client 20 then routes requests for data to the appropriate cache for the particular operation.
The server 15 provides persistence of the data objects stored in the XML stores. Persistence is defined as the preservation of a data object and its relationships through transactions involving the data object. The database management system manages data as XML objects.
Figure 2 shows the functional components of a server process. The admin server 30 and the caches 50, 55 of Figure 1 are all server processes and the server process of Figure 2 illustrates any one of them. The server process has an engine 100, an engine API 110, and a JAVA API 120, as well as the COM API 60 described above with regard to Figure 1. The engine 100 is the core of the database management system 15 and has a number of processes including an XML parser, a query engine, an update engine, a DOM implementation, a file system, a configuration component, and a security component. The XML parser parses XML. The query engine implements XQL. The update engine implements an update language which has an XML grammar and creates a document having attributes , etc . The update engine interprets the update language and uses the information contained therein to update other documents . The DOM is an implementation of the Document Object Model, the interface that defines the mechanisms for accessing data in the XML database. The file system is a logical view of the XML store and acts as the API for manipulating files. The configuration component is used for creating and deleting XML stores and caches in the configuration database. The security component is related to the file system and the configuration component. The security component controls whether a particular user can perform a particular operation on the system.
The engine is a dynamic link library wrapped in APIs through which the outside applications link to the applications in the engine. The engine API 110 exposes the applications inside the engine to the applications outside the engine. In the present embodiment of the invention, the engine API 110 is implemented in C++. The Java API 120 is a translation of the engine API 110 into Java. The client interface, or COM API, allows the engine to be accessed remotely, e.g. by a remote client. The COM API gives access to the server via a remote procedure call (as implemented in COM/DCOM (DCOM is Distributed Component Object Model, an extension of COM and also developed by Microsoft Corporation) ) . The client interface permits no DOM level access to the engine. The Java API is used by user-written components called server extensions that run in the context of some server process, i.e. in either the admin server 30, or the caches 50, 55. Server extensions are named as elements in an XML store file system and can be invoked via the COM client API. Figure 3 is a diagram of a typical document in a filesystem of the present invention. The document 200 is hooked to the filesystem directory 205 via a filesystem object 210 that has the document name and permissions. The document 200 is arranged in a hierarchical "tree." In the present invention, the component parts are for example tags, elements and text substrings according to the XML standard. The XML standard encourages rigorous detailing of document elements. This in turn enables the decomposing of an XML document into its component parts in a consistent manner.
Figure 4 shows first example of a structure index that is built from a document of the type shown in Figure 3 according to principles of the present invention. The structure index is the result of parsing and decomposing the document. This example shows how the element "foo" is indexed. The document having the element is parsed and "foo" is decomposed into its component parts which are then indexed into a structure index that is organized as a hierarchical tree, different from the hierarchical tree in the filesystem as shown in Figure 3.
Figures 5-7 illustrate the decomposition of a document in greater detail . Figure 5 shows a directory having a filesystem object having a document in XML format. XML enables structured data to be put into the format of a text file. The document in Figure 5 is laid out in rigorous XML format where each element of the document is described by a tag (and the appropriate end tag) .
Figure 6 is a flow chart of the process of parsing a document to create a structure index. A document in XML is taken as input, block 300. A filesystem object is created inside the directory of stored files, block 305. A first element in the file is parsed, block 310. An object is created, block 315, and then added to the structure index with pointers from the structure index tree to the document, block 320. The process then checks if the end of the file (EOF) has been reached, block 325, and stops if it find the EOF. If not, the process parses a next element in the file and continues . Figure 7 shows the index for the directory, filesystem object and document shown in Figure 5. The document is a child of the filesystem object which in turn is a child of the directory. The document is indexed according to its component parts. The tag a is at the top of document. B is a child of a. C and d are children of b. "q" is a child of c and takes the value "5." The document is broken down in the index to each of its component parts. One can arrive at the value of q efficiently using the index path. Additionally, because the document is broken down into its component parts, and each part is indexed, it is simple to do a search on all c's, or all a's in the directory rather than having to search through every document to find a "c" or an "a."
In order to access the data, a data path is used. The data path is associated with content indices. The content indices are the value texts and word/text indices. That is, the data is accessible using a path, and an index can be added to the path. For example, one can use the following to find a customer in Boston: /customers/customer [address/city = "Boston"] where /customers/customer is the path and address/city is the key. Update Language The update language creates new elements in the database, removes elements from the database and updates the database. The update language follows the following principles . The update language is a declarative, or "rules-based" language using an XML grammar. The update language provides all functions available via procedural document object model (DOM) that defines what attributes are associated with each object and how the objects and attributes can be manipulated. The functions of the update language include creating new nodes in the document tree, new trees, new attributes and entire documents. The update language operates on single nodes and bulk nodes . The content and attributes of existing nodes can be modified. Also, nodes and attributes can be removed from the document tree. The update language operates across documents in the database. The update language can store the grammar as a parse tree and optimize the grammar based on queries performed. The implementation of the update language represents an update message internally as a parse tree, analyzes the internal representation, re-formulates it into an equivalent form for more efficient processing thereby optimizing it. Updates to the database can be sent as web- based requests. The update language makes updates re-entrant, that is the update language allows concurrent updates on unrelated data in the database . The update language can target pieces of elements (e.g. words of text) for update and removal .
The basic update element is "xlnupdate" which identifies a request as being a database management system update. This element has a namespace reference as well as a required version number, indicating the version of the update grammar supplied. The following is an example of a database update:
<xml version^ " 1.0 " > <xlnupdate version=" 1.0 " xmlns :xlnup="http : //www.odi . com/excelon/update">
content of update request
</xlnupdate>
The content of the update element "xlnupdate" is one or more of these elements (also called "rules"): <foreach>
<create>
<remove>
<update> .
"xlnupdate" is allowed only the above-listed top level elements as direct children. An update request can be processed in the context of an existing document. If an update request has document context, this is known at the time that processing begins. If an update request does not have such context, the specific document, or documents to be acted upon are provided in the update request itself. Any document context provided inside the update overrides the current context. The top-level elements in an update request are processed in order. If no further information is provided, the scope of the update is the current document context.
The <foreach> element is a basic looping construct with a mandatory select pattern. The rules contained in the element apply to all matching patterns. There may be more than one element inside, and elements are applied sequentially. For example,
<foreach select= "pattern" > <from/>
<update/> or
<create/> or <remove/> <foreach/> </foreach>
For the nested <foreach> elements, the query is performed once, and sub-queries are scoped to a lesser set. Also, nesting semantics are well defined for dealing with newly- created, or modified sub-elements.
The <create> element creates a new entity in the system. <Create> supports elements, attributes, and comments and other XML objects such as processing instructions (Pis), as well as files and XMLStores . The <select> attribute is optional, but only if working in the context of a <foreach>; otherwise it is mandatory. If <select> is used while inside a <foreach>, it is evaluated relative to the current target element. The target of the <select> attribute must be an element (can be root) . It can select into existing text, only in the case where a <text> element is being created. For example,
<create [select = "pattern" ]> <from/> <element location={ " firstchild" | "lastchild" | "before" | "after"
I "insert" } [ movefrom= "pattern" | copyfrom="pattern" ] >
Element content goes here, and is created, un interpreted. No content if <movefrom> or <copyfrom> attribute exists . </element>
<attribute name="attrname">attribute value</attribute>
<comment location={ " firstchild" | "lastchild" | "before" "after" | "insert" } > text body of comment;
</comment>
<text location={ "before" I "after" }> text to be inserted.
If the target is an existing element, the text is simply inserted as content as a first child node. If there is existing text, the existing text is concatenated onto the new text. If the target is inside existing text, the location attribute is used to determine if the new text goes before or after the target. For example, </text> <pi name="name of PI">value of PI</pi> <document filepath="path_to_file" [ type=" {directory I file} " ] [storename="XMLStore_name" ] [overwrite=" {true | false} " ] [exclusive={ "true I false" } ] />
<!-- other types of nodes can be added, as desired --> </create>
A single <create> element contains only one of the following child elements: <element>, <attribute>, <document>, or <comment> .
Attributes of the <create> element further include "where", "movefrom," and "copyfrom. " If either "*from" attribute is set, it specifies a node, or possibly a set of nodes, that are to be used as the source of the creation. If the attribute is "movefrom" a MOVE operation is performed on the target nodes. If the attribute is "copyfrom," the source is simply copied. If either of these attributes is set, then the content of the "element" element must be empty.
The "where" attribute tells the statement where, relative to the create target, to put the new element. The values of "where" are: "insert" which creates a new child of target element, that child inserted anywhere; "firstchild" which creates a new child of target element, the new child to be inserted first; "lastchild" which creates a new child of target element, that child inserted last; "before" which creates a new sibling, and inserts it before a target;
"after" which creates a new sibling, and inserts it after a target .
The <document> attribute in the <create> element creates a new document, of type directory of file. The current XMLStore and file context is used for resolving the path for the new document, unless a storename attribute is used. The type defaults to "file."
The <remove> element removes an entity from the system. The <remove> element acts on elements, attributes, comments, and XML objects (e.g. Pis), as well as files and XMLStores.
The "select" attribute under the <remove> element is optional and has the same semantics as in <create>, above, except that it can target an attribute. For example , <remove [select = "pattern"] > <from/> </remove>
The code in the ex4mple above simply removes the target document, element or attribute specified.
The <update> element modifies an entity, in place. This element may replace attribute values, or content, including subtrees of elements. The select attribute under <document> is optional and has same semantics as in create, above, except that it can target an attribute. An optional "newname" attribute allows an update to change the name of the targeted object, effectively modeling a <replace> attribute including name . or example,
<update [select = "pattern" ]> <from/> <element [newname = "newna e for element" ]> New content for element goes here . </element>
<attribute [newname="newname" ] >newvalue</attribute> <text>new text to replace target</text> <comment>new text for comment</comment>
<pi [newname="newname"] >new pi content</pi> <!-- add other objects as required --> </update>
The following are elements used inside the top-level elements. Contextual elements help give context to the update, and identify nodes for update. Creation elements tell the update what type of XML structure to create or modify. Contextual elements are "document" and "modify. " Creation elements are <element>, <attribute>, <text>, <comment>, and <pi>. The use of <element>, <attribute>, <text>, <comment>, and <pi> are consistent between <update> and <create> .
The <document> element is always optional, and helps scope a given update operation with respect to XMLStore and filename. It provides support required to deal with multiple documents and stores in a single request. For example, <document filepath="path_to_file"
[type=" {directory | file} " ] [storename="XMLStore_name" ] [overwrite=" {true I false} " ]
[exclusive={ "true I false" } ] />
<element>, <attribute>, <text>, <comment>, and <pi> are simple elements used only by the <create> and <update> elements, telling <create> and <update> what type of data to create.
The "type," "overwrite," and "exclusive" attributes are only used for the <create> element. The "type" attribute 18 tells the update language processor to create a directory or a file. The "overwrite" attribute tells the update language processor what to do if it sees a file or directory or other object with the same name. The "exclusive" attribute tells the update language processor whether or not the access to the file is read or write.
The "from" element is used to control selection of target elements and documents for update. Its syntax is:
<from select^ "pattern"> [<document/>] </from>
More than one <from> element may exist inside a single, top-level element. <from> must be a direct child of a top- level element, such as <update> . When more than one <from> exists, the parent element operation is performed over the set of all <from> elements. Essentially, <from> is a way to scope a single operation (e.g. <create> or <update>) across multiple documents and even XMLStores, without making the top-level element semantics overly complex. All of the top-level elements have an optional "select" attribute. If this attribute is used, the element must not contain any <from> elements. The use of the "select" attribute at this level, and <from> elements is mutually exclusive. Implementation
Figure 8 is a flow chart of the parsing and execution of an update request according to principles of the invention. The update in XML format, block 400, is parsed into a document object model (DOM) tree, block 405. The DOM tree is traversed, creating an execution list of objects for each major operation (e.g. foreach, update, etc), block 410. Each executable operation is modeled as an object of the same type. The end result is an ordered list of linked execution objects. Each execution object has its own specific state, based on the arguments passed in. A first major element is executed, block 415. If the major element has nexted elements, block 420, the nest objects are executed, block 425. Elements which allow nesting (foreach, update) have their own execution lists, which are traversed as part of their own execution. The major elements are executed sequentially until the end of the execution list is reached, block 430. The following are some examples where
-> represents an execution list link, and
V represents the head of an execution list.
A simple example is: list
V create_ob ect -> update_object -> remove_object
The following example presents a more complex list, including a <foreach> : list
I V create_object -> foreach_object -> remove_object
V update_object -> create_object
The following example presents an even more complex list, with nesting inside the update object: list
I V create_object -> foreach_ob ect -> remove_ob ect
I v update_object -> create_object
V f oreach_obj ect -> create_obj ect V (foreach object's list)
The <foreach> and <update> objects each has its own execution list, which is traversed once for each target element. In the case of nested updates, there is only one object; however, if one of the nested element is a foreach, then it will loop.
For parameters, it would be useful to cache pre-parsed templates, perhaps by creating a new file system type, like an update stored procedure. General abstraction principles should be followed. Specifically, the implementation of the execution tree should be as independent of the update grammar as possible. This provides for maximal code reuse if/when other updates are supported, as well as good engineering. It is to be understood that the above-described embodiments are simply illustrative of the principles of the invention. Various and other modifications and changes may be made by those skilled in the art which will embody the principles of the invention and fall within the spirit and scope thereof.

Claims

What is claimed is:
1. A method for updating a computer database, comprising the steps of : providing an update request, said update request having a plurality of elements, each element in said update request delimited by a tag; parsing said update request; building a document object model in response to said parsing step; traversing said document object model to form an execution list of objects; and executing said execution list sequentially thereby updating the database.
2. The method of claim 1 wherein said providing step further comprises providing said update request over a web-based computer network.
3. The method of claim 1 wherein said update request is in XML format .
4. The method of claim 3 wherein said parsing step further comprises interpreting XML tags.
5. The method of claim 1 wherein building a document object model further comprises defining attributes and associating said attributes with objects.
6. The method of claim 5 wherein said building step further comprises defining rules by which said attributes and said objects are handled.
7. The method of claim 1 wherein said traversing step further comprises forming an execution list of a plurality of major execution objects wherein at least one of said major execution objects has at least one nested execution object.
8. The method of claim 7 wherein said executing step further comprises executing said at least one nested execution object in sequence with said at least one major execution object to which it belongs .
9. An apparatus for updating a computer database, comprising: an update request, said update request having a plurality of elements, each element in said update request delimited by a tag; a parser for parsing said update request; a document object model builder for building a document object model in response to said parsed update request; means for traversing said document object model, said means for traversing to form an execution list of objects from said document object model; and an execution list processor for executing said execution list of objects, whereby the database is updated as said execution list of objects is executed.
EP00904610A 1999-01-29 2000-01-28 Database management system with capability of fine-grained indexing and querying Withdrawn EP1192561A4 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US11790999P 1999-01-29 1999-01-29
US117909P 1999-01-29
PCT/US2000/002139 WO2000045304A1 (en) 1999-01-29 2000-01-28 Database management system with capability of fine-grained indexing and querying

Publications (2)

Publication Number Publication Date
EP1192561A1 true EP1192561A1 (en) 2002-04-03
EP1192561A4 EP1192561A4 (en) 2005-04-20

Family

ID=22375461

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00904610A Withdrawn EP1192561A4 (en) 1999-01-29 2000-01-28 Database management system with capability of fine-grained indexing and querying

Country Status (4)

Country Link
EP (1) EP1192561A4 (en)
JP (1) JP2002536723A (en)
AU (1) AU2633800A (en)
WO (1) WO2000045304A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020305A (en) * 2012-12-29 2013-04-03 天津南大通用数据技术有限公司 Effective index for two-dimensional data table, and method for creating and querying effective index

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6981212B1 (en) * 1999-09-30 2005-12-27 International Business Machines Corporation Extensible markup language (XML) server pages having custom document object model (DOM) tags
FI20000637A0 (en) * 2000-03-17 2000-03-17 Codeonline Oy Procedure and system for presenting questions and receiving answers
JP2002123418A (en) 2000-10-13 2002-04-26 Nec Corp Method and device for updating data and mechanically readable recording medium with program recorded thereon
EP1204030A1 (en) * 2000-11-02 2002-05-08 Caplin Systems Limited Extending hypermedia documents by adding tagged attributes
WO2002042934A1 (en) * 2000-11-27 2002-05-30 Randal Walker Method and system for managing data modelled in the form of objects, and method and system for constructing and managing pages using said method
JP2002251350A (en) * 2001-02-22 2002-09-06 Sony Corp Transmitter, receiver, transmitter-receiver, transmitting method and receiving method
EP1402435A4 (en) * 2001-03-08 2007-04-25 Richard M Adler System and method for modeling and analyzing strategic business decisions
US6901585B2 (en) * 2001-04-12 2005-05-31 International Business Machines Corporation Active ALT tag in HTML documents to increase the accessibility to users with visual, audio impairment
US7028028B1 (en) 2001-05-17 2006-04-11 Enosys Markets,Inc. System for querying markup language data stored in a relational database according to markup language schema
AU2003204729B2 (en) * 2002-06-17 2006-02-02 Canon Kabushiki Kaisha Indexing and Querying Structured Documents
AUPS300402A0 (en) 2002-06-17 2002-07-11 Canon Kabushiki Kaisha Indexing and querying structured documents
EP1462949A1 (en) * 2003-03-22 2004-09-29 Cegumark AB A system and method relating to access of information
WO2005114494A1 (en) * 2004-05-21 2005-12-01 Computer Associates Think, Inc. Storing multipart xml documents

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5717925A (en) * 1993-10-08 1998-02-10 International Business Machines Corporation Information catalog system with object-dependent functionality
US5794030A (en) * 1995-12-04 1998-08-11 Objectivity, Inc. System and method for maintenance and deferred propagation of schema changes to the affected objects in an object oriented database
US5809497A (en) * 1995-05-26 1998-09-15 Starfish Software, Inc. Databank system with methods for efficiently storing non uniforms data records

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5970490A (en) * 1996-11-05 1999-10-19 Xerox Corporation Integration platform for heterogeneous databases

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5717925A (en) * 1993-10-08 1998-02-10 International Business Machines Corporation Information catalog system with object-dependent functionality
US5809497A (en) * 1995-05-26 1998-09-15 Starfish Software, Inc. Databank system with methods for efficiently storing non uniforms data records
US5794030A (en) * 1995-12-04 1998-08-11 Objectivity, Inc. System and method for maintenance and deferred propagation of schema changes to the affected objects in an object oriented database

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ABITEBOUL S ET AL: "The Lorel query language for semistructured data" INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, HEIDELBERG, DE, vol. 1, no. 1, April 1997 (1997-04), pages 68-88, XP002244491 ISSN: 1432-5012 *
FEATURES AND REQUIREMENTS FOR AN XML VIEW DEFINITION LANGUAGE, [Online] 5 December 1998 (1998-12-05), XP002318804 QUERY LANGUAGES 1998 CONFERENCE Retrieved from the Internet: URL:http://www.w3.org/TandS/QL/QL98/pp/xma s.html> [retrieved on 2005-02-23] *
MCHUGH J ET AL: "LORE: A DATABASE MANAGEMENT SYSTEM FOR SEMISTRUCTURED DATA" SIGMOD RECORD, ASSOCIATION FOR COMPUTING MACHINERY, NEW YORK, US, vol. 26, no. 3, September 1997 (1997-09), pages 54-66, XP000701384 *
See also references of WO0045304A1 *
SHAH A ET AL: "CREATING HIGH PERFORMANCE WEB APPLICATIONS USING TCL, DISPLAY TEMPLATES, XML, AND DATABASE CONTENT" PROCEEDINGS OF THE ANNUAL TCL/TK CONFERENCE. PROCEEDINGS OF USENIX ANNUAL TCL/TK, XX, XX, 18 September 1998 (1998-09-18), pages 121-130, XP001014253 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020305A (en) * 2012-12-29 2013-04-03 天津南大通用数据技术有限公司 Effective index for two-dimensional data table, and method for creating and querying effective index

Also Published As

Publication number Publication date
JP2002536723A (en) 2002-10-29
EP1192561A4 (en) 2005-04-20
WO2000045304A1 (en) 2000-08-03
AU2633800A (en) 2000-08-18

Similar Documents

Publication Publication Date Title
US6643633B2 (en) Storing fragmented XML data into a relational database by decomposing XML documents with application specific mappings
US8886686B2 (en) Making and using abstract XML representations of data dictionary metadata
JP3754253B2 (en) Structured document search method, structured document search apparatus, and structured document search system
US7072896B2 (en) System and method for automatic loading of an XML document defined by a document-type definition into a relational database including the generation of a relational schema therefor
US7386567B2 (en) Techniques for changing XML content in a relational database
Pal et al. XQuery implementation in a relational database system
US20030135825A1 (en) Dynamically generated mark-up based graphical user interfaced with an extensible application framework with links to enterprise resources
US8176030B2 (en) System and method for providing full-text search integration in XQuery
JP2004515836A (en) Method and apparatus for efficient management of XML documents
WO2000045304A1 (en) Database management system with capability of fine-grained indexing and querying
Fiebig et al. Natix: A technology overview
Jigyasu et al. SQL to XQuery translation in the aqualogic data services platform
JP4289022B2 (en) Structured document processing method and apparatus, structured document processing program, and storage medium storing structured document processing program
DeRose XQuery: A unified syntax for linking and querying general XML documents.
Bulajic et al. Implementation of the Tree Structure in the XML and Relational Database
Chung et al. Schemaless XML document management in object-oriented databases
Behrends Evaluation of queries on linked distributed XML data
Gardarin et al. Using conceptual modeling and intelligent agents to integrate semi-structured documents in federated databases
Mihajlovic et al. On Region Algebras, XML Databases, and Information Retrieval
Wang et al. COCALEREX: An engine for converting catalog-based and legacy relational databases into XML
Wu Data indexing and update in XML database
Guo Implementing attribute metadata operators to support semistructured data
Geng et al. XML Benchmark
Shen A good performance XML querying architecture
Naser A flexible approach for mapping between object-oriented databases and xml. A two way method based on an object graph.

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20010829

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

A4 Supplementary search report drawn up and despatched

Effective date: 20050309

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20060328