WO2004068270A2 - Method and system for xpath implementation - Google Patents

Method and system for xpath implementation Download PDF

Info

Publication number
WO2004068270A2
WO2004068270A2 PCT/IL2004/000035 IL2004000035W WO2004068270A2 WO 2004068270 A2 WO2004068270 A2 WO 2004068270A2 IL 2004000035 W IL2004000035 W IL 2004000035W WO 2004068270 A2 WO2004068270 A2 WO 2004068270A2
Authority
WO
WIPO (PCT)
Prior art keywords
command
xml
dpl
node
xpath
Prior art date
Application number
PCT/IL2004/000035
Other languages
French (fr)
Other versions
WO2004068270A3 (en
Inventor
Yuri Steinschreiber
Original Assignee
Multiconn Technologies Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Multiconn Technologies Ltd filed Critical Multiconn Technologies Ltd
Publication of WO2004068270A2 publication Critical patent/WO2004068270A2/en
Publication of WO2004068270A3 publication Critical patent/WO2004068270A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/154Tree transformation for tree-structured or markup documents, e.g. XSLT, XSL-FO or stylesheets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms

Definitions

  • the present invention relates to XML document programming, more specifically to a method and system for activating XSLT conversion module.
  • HTML hypertext markup language
  • HTML provides no mechanism for interacting with Web pages.
  • a Web browser uses the Hypertext Transport Protocol (HTTP) to request an HTML file from a Web server for the rapid and efficient delivery of HTML documents.
  • HTTP Hypertext Transport Protocol
  • CGI Common Gateway Interface
  • Templates may be created with any Standard Generalized Markup Language (SGML) based markup language, such as Handheld Device Markup language (HDML).
  • SGML Standard Generalized Markup Language
  • HDML Handheld Device Markup language
  • templates can be created with any markup language or text and it is not limited to SGML based languages.
  • Other language types such as MIME or HDML, are markup language designed and developed to allow handheld devices, such as phones, access to the resources of the Internet.
  • XML extensible Markup Language
  • XSL extensible Stylesheet Language
  • XSL Transformations known as XSLT makes it possible for one XML document to be transformed into another according to an XSL Style sheet. More generally however, XSLT can turn XML into anything textual, regardless of how well-formed it is, or HTML.
  • XSLT uses XPath (XML Path language) to address parts of an XML document that an author wishes to transform.
  • XPath is also used by another XML technology, XPointer, to specify locations in an XML document.
  • XSL extensible Stylesheet Language
  • Java and C/C++ e-business applications.
  • standards-compliance, stability and performance vary widely across implementations.
  • even the fastest current implementations are much slower than necessary to meet the throughput requirements for either B2C or B2B applications.
  • the great flexibility provided by XML encoding generally means that such conversions are complex and time-consuming.
  • the XSL World Wide Web Consortium Recommendation which addresses the need to transform data from one XML format into another or from an XML format into an HTML or other "output" format, as currently specified includes three major components in an XSL processor:, an XSL transformation engine (XSLT), a node selection and query module (XPath), and a formatting and end- user presentation layer specification (Formatting Objects).
  • XML-to-XML data translation is primarily concerned with the first two modules, while the Formatting Objects are most important for XML-to-HTML or XML-to-PDF document rendering.
  • a typical XSL implementation comprises a parser for the transform, a parser for the source data, and an output stream generator-three distinct processes.
  • Known XSL transformation engines (XSLT) typically rely on recursive processing of trees of nodes, where every XML element, attribute or text segment is represented as a node. In prior art were suggested implementations for simplifying and
  • the processor To transform one XML vocabulary to another, the processor must parse the transform, parse the source data, walk the two parse trees to apply the transform and finally output the data into a stream. Some of the better implementations allow the transform parsing as a separate step, thereby avoiding the need to repeat that step for every document or data record to be processed by the same transform.
  • XSLT relies on recursive processing of trees of nodes, where every XML element, attribute or text segment is represented as a node, merely optimizing the implementation of the algorithms cannot attain the necessary results.
  • current state-of-the-art XSLT implementations have to sacrifice performance in order to maintain the flexibility that is the very essence of XSLT and XML itself. So while XML and XSLT offer greater flexibility than older data interchange systems through the use of direct translation, self-describing data and dynamic transformation stylesheets, this flexibility comes with a great performance penalty.
  • Then main limitation of the XSLT is the inefficient performance of the XPath command language.
  • the operation of the XPath requires indefinite memory space for traversing and querying through the hierarchical structure of the XML documents.
  • the querying method of the XPath operation utilizes expending allocation memory policy for enabling recursive navigation of the XML hierarchical structure. As result of such allocation policy the translation operation of the XSLT is time consuming.
  • the present invention provides a new designated programming language (DPL) for performing XPath expression commands as part of an XML processing module, said DPL utilizing the command language themselves for maintaining internal states which represent the updated location and result values of the querying and traversing process through the XML document wherein the memory allocated for the program commands is the only memory required for executing the XPath expression.
  • DPL languages command performs an atomic step which is part of the
  • XPath expression and its results represent a pointer to respective node of the XML structure or its values, said results are stored within command memory and used for performing the next DPL command or output value for the XML processing module.
  • the DPL program flow process is presented by a hierarchical binary tree, each command represented by one tree node and wherein each command can be initiated by its parent node or by its Children: Left child or Right child.
  • the DPL command includes command code, activation state, and context parameters including pointers and values of XML structure current navigating status.
  • FIG. 1 is a general diagram of environment in which the present invention is practiced;
  • Fig. 2 is a flow chart illustrating Programming methodology for XSLT module according to the present invention
  • Figs 3 and 4 are flow charts illustrating the compiling and runtime processes according to the present invention.
  • Fig. 5 illustrates the process of operating XCP program commands according to the present invention
  • Figs. 6-7 are examples the specific programming commands operation;
  • Fig. 8 illustrates Program operation flow representation by hierarchical tree;
  • Fig. 9 is an example of XML hierarchical data structure
  • the embodiments of the invention described herein are implemented as logical operations in a computing system.
  • the logical operations of the present invention are preferably presented (1 ) as a sequence of computer implemented steps running on the computing system and (2) as interconnected machine modules within the computing system.
  • the implementation is a matter of choice dependent on the performance requirements of the computing system implementing the invention. Accordingly, the logical operations making up the embodiments of the invention described herein are referred to variously as operations, steps, or modules.
  • XPath expression relate to queries of traversing an XML document structure and locating specific objects.
  • XML allows hierarchical tree structures of almost any dimension to be easily described, it is known in the art that using such structure is the most efficient way to handle XML documents.
  • the diagram of Fig. 9 illustrates a set of data in hierarchical tree structure.
  • a value inside XML can be stored in 3 ways:
  • the lookup memory table is of variable size, hence it is not possible to allocate predetermined memory space for all possible queries. Such allocation memory policy is inefficient and consumes valuable processing time.
  • XPath command processing language - XCP a new programming language
  • XPath command processing language - XCP a new programming language
  • Fig. 1 The programming environment according to the new concept of the present invention is illustrated in Fig. 1.
  • the XCP command language structure and its operations are based on the of similarity between the hierarchical path tree structure of the XPath expression and the traversing path navigation structure required for executing the XPath expression queries.
  • Fig 1. illustrates the general working flow of programming with the XCP language.
  • the XCP language as mentioned above, is designated for processing XPath expressions.
  • XPath expressions can be used by any XML processing application, such as XSLT application A that is used for converting
  • XML documents B The programmer is provided with programming interface C for working with XPath expression command language D and XCP compiler E for creating an XCP program F.
  • the created XCP program is used by the
  • an XLST (or any other XML processing application) application is programmed (step 210) according to general design target for transforming the structure of XML documents (for example converting the source XML structure into a result XML structure).
  • XLST command execution requires performing queries and traversing the XML structures.
  • XPath expressions are defined (step 220) which represent the required queries to be executed on the XML documents (as described above).
  • Those XPath expressions are compiled (step 230) for creating the XCP language commands.
  • the compiling processes automatically assigns the memory space which is needed for executing the XPath expressions.
  • the XCP program commands are stored, to be used in real time for execution of the respective XSLT application.
  • Fig. 3 illustrates the flow chart of the XCP compiling process: first the XPath expression are parsed (step 310) according to standard grammar rules of the XPath language specification, and each part of the expression is checked according to the XPath syntax (step 320). At the next step( 330), each expression is normalized according to the unabbreviated form, for example A/B is converted into "Child:: A/Child:: B". The normalized XPath expressions are translated (step 340) into XCP language commands as will be further explained down bellow. Once the XCP program is ready, it is possible to allocate(step 350) the exact memory space, which is needed for activating the respective XPath expression.
  • Fig. 4 illustrates the runtime process of an XSLT program according to the present invention methodology.
  • Each of the XSLT commands that activate an XPath expression is actually calling the respective XCP program.
  • the XCP commands (of the same XCP program) are programmed to execute an XPath expression and return the respective result values to the XPath client (the XSLT application in this case).
  • the result may be a pointer to an XML node or an XML object value.
  • Each XCP command performs an atomic step, which is part of the XCP
  • Each command has different characteristics that are designed to perform a specific role.
  • basic access iterators for traversing the XML structure starting from a given node (context node).
  • a second type of command is a filtering operator for checking the value of a specific node ("filtering command").
  • filtering command is a filtering operator for checking the value of a specific node ("filtering command").
  • the third type of commands is JOIN commands designated for coordinating the access iterators' operation.
  • An XCP program includes further types of commands but for the purpose of this discloser they d not need to be explained.
  • Each command includes an operation code, init flag, and context parameters such as current node of the XML tree structure.
  • the command operation can be alternatively explained as state modes which represent the XCP program operation in real-time as seen in Fig. 5.
  • the first command is initiated by an XSLT statement, referring to an XPath expression.
  • the command receives current context node as a parameter. All other commands are initiated by former commands, receiving updated information of current context and/or output results (of the former command), see state I in Fig. 5.
  • each command tests its init flag or the output results of the former command, and according to these indications the command's action is determined, such as checking the next child, checking the current node value(Boolean operation) etc..
  • the respective action takes place (e.g. locating the next child, testing the value of the current node).
  • the command updates its inner parameters (state IV) (init flag, current node etc.) and the output results (state V) accordingly.
  • a hierarchical binary tree as seen in Fig. 8, can represent the flow of command actions.
  • the root node is initiated and receives its input from a given XSLT statement.
  • Each command is represented by one node, the arrows illustrate the possible paths through the tree structure.
  • a command (other than the root node) may be initiated by its parent node, by its left child (“Lstep”) or by it's right child (“Rstep”)
  • the Join command receives the current context node from the former commands.
  • the command was initiated by the Parent command and its init flag is true (as seen in row 1)
  • the desired action is to start the traversing process through the current context node.
  • the next active command is the Lstep node command.
  • the current context node updates the Lstep context node and the program pointer is directed to the respective Lstep command.
  • the Init flag changes its value to false; this flag will further indicate the future action of this command.
  • the command In case the command is initiated by the Lstep, it extracts the current node parameter from the Lstep command inner memory.
  • the Join command checks the current node value: in case it's null (row 4) (no further "siblings" satisfy the required condition, or simply exist, in the XML tree), then null is stored as a current node within the Join command and control is passed to the parent command (or returned to the XPath client if the Join command is the root node). Otherwise (current node is not null, so we arrived at the next sibling at the current XML level), the desired action is to check the descendants of the current node at the lower levels of the XML structure (see Row 3): that is, activating Rstep node command.
  • the Join command In case the Join command is initiated by the Rstep command, it receives the current node from the inner memory of the Rstep command (it is actually the output result of the Rstep command). The Join command checks the current node. If it's null (row 6) (no more descendants satisfy the condition at the lower levels of the XML tree), the Lstep command-node is activated for trying to select the next suitable sibling at the upper level of the XML tree structure. Otherwise (current node is not null, so we arrived at the next node satisfying the XPath subexpression under the Join node), the current node pointer is stored within the inner memory of the Join command, and control passes to the parent command (or XPath client).
  • the Filtering command is designated for checking child nodes at the XML tree structure and test their values according to a given Boolean condition.
  • the desired action is to check children of the current context node of XML structure, thus the Lstep node-command is activated.
  • the command is activated by the Lstep node, it extracts the current node parameter value, in case of null value (no more Children) the result is further returned to the parent node-command. Otherwise the Rstep node command is activated to check the current node value.

Abstract

The present invention provides a new programming language (DPL) for performing XPath expression commands as part of an XML processing module. The DPL command languages themselves are used for maintaining internal states which represent the updated location and result values of the querying and traversing process through the XML document. Thus, the memory allocated for the program commands is the only memory required for executing the XPath expression. The DPL languages command performs an atomic step, which is part of the XPath expression, and its results represent a pointer to respective node of the XML structure or its values. The results are stored within the command memory and are used for performing the next DPL command or providing the output value for the XML processing module.

Description

METHOD AND SYSTEM FOR XPATH IMPLEMENTATION
BACKGROUND OF THE INVENTION
The present invention relates to XML document programming, more specifically to a method and system for activating XSLT conversion module.
The rapid growth of the Internet and more specifically the World Wide Web (WWW or Web) as a network for the delivery of applications and content, has resulted in software developers quickly beginning to shift their focus towards making the web browser a key tool to access information.
Most information is available as static content, composed of a variety of media, such as text, images, audio, and video, using hypertext markup language (HTML). HTML provides no mechanism for interacting with Web pages. At present, a Web browser uses the Hypertext Transport Protocol (HTTP) to request an HTML file from a Web server for the rapid and efficient delivery of HTML documents.
Dynamic information retrieval such as selected information retrieved from databases commonly implemented through the use of the Common Gateway Interface (CGI). While CGI allows specifically requested information to be accessed from databases across the Internet, CGI has very limited capabilities.
An alternative to using separate CGI scripts to define content is a template- based HTML that actually embeds a request for the dynamic data within the HTML file itself. When a specific page is requested, a pre-processor scans the file for proprietary tags that are then translated into final HTML based on the request. The final HTML is then passed back to the server and on to the browser for the user to view on their computer terminal. While the examples given have been explained in the context of HTML. Templates may be created with any Standard Generalized Markup Language (SGML) based markup language, such as Handheld Device Markup language (HDML). In fact, templates can be created with any markup language or text and it is not limited to SGML based languages. Other language types such as MIME or HDML, are markup language designed and developed to allow handheld devices, such as phones, access to the resources of the Internet.
As the HTTP based technology has matured, these have tended to move towards the clear separation of the HTML template from the underlying data. Recently there have been several other key advancements.
A subset and simplification of SGML, the extensible Markup Language (XML) has evolved as a standard meta-data format in order to simplify the exchange of data. The extensible Stylesheet Language (XSL) has evolved as the standard way to define stylesheets that accept XML as input. Non-HTML browsers accessing data over HTTP are becoming common and in the next few years will become more common than browsers on desktop computers.
As more content publishers and commercial interests deliver rich data in XML, the need for presentation technology increases in both scale and functionality. XSL meets the more complex, structural formatting demands that XML document authors have. On the other hand XSL Transformations known as XSLT makes it possible for one XML document to be transformed into another according to an XSL Style sheet. More generally however, XSLT can turn XML into anything textual, regardless of how well-formed it is, or HTML. As part of the document transformation, XSLT uses XPath (XML Path language) to address parts of an XML document that an author wishes to transform. XPath is also used by another XML technology, XPointer, to specify locations in an XML document.
The World Wide Web Consortium has defined extensible Stylesheet Language (XSL) as a standard method for addressing both XML-HTML and XML-XML conversions. There are several freely available and commercial XSL processor implementations for Java and C/C++ e-business applications. However, standards-compliance, stability and performance vary widely across implementations. Additionally, even the fastest current implementations are much slower than necessary to meet the throughput requirements for either B2C or B2B applications. The great flexibility provided by XML encoding generally means that such conversions are complex and time-consuming. The XSL World Wide Web Consortium Recommendation which addresses the need to transform data from one XML format into another or from an XML format into an HTML or other "output" format, as currently specified includes three major components in an XSL processor:, an XSL transformation engine (XSLT), a node selection and query module (XPath), and a formatting and end- user presentation layer specification (Formatting Objects). XML-to-XML data translation is primarily concerned with the first two modules, while the Formatting Objects are most important for XML-to-HTML or XML-to-PDF document rendering. A typical XSL implementation comprises a parser for the transform, a parser for the source data, and an output stream generator-three distinct processes. Known XSL transformation engines (XSLT) typically rely on recursive processing of trees of nodes, where every XML element, attribute or text segment is represented as a node. In prior art were suggested implementations for simplifying and optimizing the transformation algorithms.
Known XSLT implementations suffer from terrible performance limitations. While suitable for Java applets or small-scale projects, they are not yet fit to become part of the infrastructure. Benchmarks of the most popular XSLT processors show that throughput of 10-150 kilobytes/second is typical. This is 10 times slower than an average diskette drive and roughly equivalent to a 128 Kbit/s ISDN line. Many websites today have sustained bandwidths at or above T1 speeds (1500 Kbit s) and the largest ones require 100 Mbit/s or faster connections to the Internet backbone. Clearly, unless XSLT processing is to become the chief performance barrier in B2C and B2B operations, its performance has to improve by orders of magnitude.
There are a number of reasons for such poor performance. To transform one XML vocabulary to another, the processor must parse the transform, parse the source data, walk the two parse trees to apply the transform and finally output the data into a stream. Some of the better implementations allow the transform parsing as a separate step, thereby avoiding the need to repeat that step for every document or data record to be processed by the same transform.
However, the transformation step is extremely expensive and consumes an overwhelming portion of processing time. Because XSLT relies on recursive processing of trees of nodes, where every XML element, attribute or text segment is represented as a node, merely optimizing the implementation of the algorithms cannot attain the necessary results. Thus current state-of-the-art XSLT implementations have to sacrifice performance in order to maintain the flexibility that is the very essence of XSLT and XML itself. So while XML and XSLT offer greater flexibility than older data interchange systems through the use of direct translation, self-describing data and dynamic transformation stylesheets, this flexibility comes with a great performance penalty.
Then main limitation of the XSLT is the inefficient performance of the XPath command language. The operation of the XPath requires indefinite memory space for traversing and querying through the hierarchical structure of the XML documents. The querying method of the XPath operation utilizes expending allocation memory policy for enabling recursive navigation of the XML hierarchical structure. As result of such allocation policy the translation operation of the XSLT is time consuming.
It is the main object of the present invention, to provide an efficient command language to perform XPath querying tasks using minimum memory space, thus enabling efficient translation operation of the XSLT programs.
SUMMARY
The present invention provides a new designated programming language (DPL) for performing XPath expression commands as part of an XML processing module, said DPL utilizing the command language themselves for maintaining internal states which represent the updated location and result values of the querying and traversing process through the XML document wherein the memory allocated for the program commands is the only memory required for executing the XPath expression. The DPL languages command performs an atomic step which is part of the
XPath expression and its results represent a pointer to respective node of the XML structure or its values, said results are stored within command memory and used for performing the next DPL command or output value for the XML processing module. The DPL program flow process is presented by a hierarchical binary tree, each command represented by one tree node and wherein each command can be initiated by its parent node or by its Children: Left child or Right child. The DPL command includes command code, activation state, and context parameters including pointers and values of XML structure current navigating status.
BRIEF DESCRIPTION OF THE DRAWINGS
These and further features and advantages of the invention will become better understood in the light of the ensuing description of a preferred
embodiment thereof, given by way of example only, with reference to the accompanying drawings, wherein-
Fig. 1 is a general diagram of environment in which the present invention is practiced; Fig. 2 is a flow chart illustrating Programming methodology for XSLT module according to the present invention;
Figs 3 and 4 are flow charts illustrating the compiling and runtime processes according to the present invention;
Fig. 5 illustrates the process of operating XCP program commands according to the present invention;
Figs. 6-7 are examples the specific programming commands operation; Fig. 8 illustrates Program operation flow representation by hierarchical tree;
Fig. 9 is an example of XML hierarchical data structure;
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The embodiments of the invention described herein, are implemented as logical operations in a computing system. The logical operations of the present invention are preferably presented (1 ) as a sequence of computer implemented steps running on the computing system and (2) as interconnected machine modules within the computing system. The implementation is a matter of choice dependent on the performance requirements of the computing system implementing the invention. Accordingly, the logical operations making up the embodiments of the invention described herein are referred to variously as operations, steps, or modules.
For better understanding of the innovation of the present invention it is required to briefly explain the operation of an XPath expression, which relate to queries of traversing an XML document structure and locating specific objects. XML allows hierarchical tree structures of almost any dimension to be easily described, it is known in the art that using such structure is the most efficient way to handle XML documents. By grouping the data so that a new set of child nodes exist at each decision point, you can eliminate the parser having to scan every single node in the structure to perform its search. The diagram of Fig. 9 illustrates a set of data in hierarchical tree structure.
Let us assume that an XML processing program required to execute an XPath query such as ,,T/A2/B1/*[narr_eO=,■e3,, or namet CΗ/D"
At each decision point we are eliminating large parts of the structure so by the time we get to the level of the nodes of name 'D' we only have a small subset of nodes to scan through. In this case, the nodes of name 'D' don't actually have to be evaluated at all.
A value inside XML can be stored in 3 ways:
. as an attribute value (e.g., <A id="#VALUE#*7>)
• as text (e.g., <A>#VALUE#</A>)
• or as the node name itself (e.g., <#VALUE#/>)
When a value is stored as an attribute or as text, it is assumed that XML program has to allocate some memory and store the value into that location. If you are performing an XPath query such as *[@id="#VA UE"], then the parser has to first iterate through the potential nodes, find the memory location of where the attribute is stored, and perform a string comparison to determine if it has found a valid match.
For executing such XPath parsing and evaluating of an XML document, it is required to generates a lookup table in memory of all the node names (which may be repeated several times throughout the structure) and gives them unique
IDs. When performing a query such as A/B/#VALUE#, it accesses the lookup tables and then uses them to quickly locate the correct matches.
The lookup memory table is of variable size, hence it is not possible to allocate predetermined memory space for all possible queries. Such allocation memory policy is inefficient and consumes valuable processing time.
According to the present invention it is suggested to use a new programming language (herein after referred as "XPath command processing language - XCP") for executing XPath expression commands. The programming environment according to the new concept of the present invention is illustrated in Fig. 1.
According to prior art the XPath commands programmed were activated by the XSLT program in real time for executing queries on XML documents. According to the present invention methodology it is suggested to compile the
XPath commands into the new XCP programming language, as seen in the illustration flow chart of Fig. 2.
(The XCP command language structure and its operations are based on the of similarity between the hierarchical path tree structure of the XPath expression and the traversing path navigation structure required for executing the XPath expression queries.)
Fig 1. illustrates the general working flow of programming with the XCP language. The XCP language as mentioned above, is designated for processing XPath expressions. XPath expressions can be used by any XML processing application, such as XSLT application A that is used for converting
XML documents B. The programmer is provided with programming interface C for working with XPath expression command language D and XCP compiler E for creating an XCP program F. The created XCP program is used by the
XSLT application for activating the XPath expression.
The flow chart of the programming methodology according to the present invention can be seen in Fig. 2. First, an XLST (or any other XML processing application) application is programmed (step 210) according to general design target for transforming the structure of XML documents (for example converting the source XML structure into a result XML structure). XLST command execution requires performing queries and traversing the XML structures. To perform these tasks, XPath expressions are defined (step 220) which represent the required queries to be executed on the XML documents (as described above). Those XPath expressions are compiled (step 230) for creating the XCP language commands. As the XCP commands include the data parameters which are required for performing the XML queries, the compiling processes automatically assigns the memory space which is needed for executing the XPath expressions. The XCP program commands are stored, to be used in real time for execution of the respective XSLT application.
Fig. 3 illustrates the flow chart of the XCP compiling process: first the XPath expression are parsed (step 310) according to standard grammar rules of the XPath language specification, and each part of the expression is checked according to the XPath syntax (step 320). At the next step( 330), each expression is normalized according to the unabbreviated form, for example A/B is converted into "Child:: A/Child:: B". The normalized XPath expressions are translated (step 340) into XCP language commands as will be further explained down bellow. Once the XCP program is ready, it is possible to allocate(step 350) the exact memory space, which is needed for activating the respective XPath expression.
Fig. 4 illustrates the runtime process of an XSLT program according to the present invention methodology. Each of the XSLT commands that activate an XPath expression is actually calling the respective XCP program. The XCP commands (of the same XCP program) are programmed to execute an XPath expression and return the respective result values to the XPath client (the XSLT application in this case). The result may be a pointer to an XML node or an XML object value. Each XCP command performs an atomic step, which is part of the XCP
program process designated to execute an XPath expression. Each command has different characteristics that are designed to perform a specific role. There are several basic commands types. First, basic access iterators, for traversing the XML structure starting from a given node (context node). A second type of command is a filtering operator for checking the value of a specific node ("filtering command"). The third type of commands is JOIN commands designated for coordinating the access iterators' operation. An XCP program includes further types of commands but for the purpose of this discloser they d not need to be explained.
Each command includes an operation code, init flag, and context parameters such as current node of the XML tree structure.
The command operation can be alternatively explained as state modes which represent the XCP program operation in real-time as seen in Fig. 5. The first command is initiated by an XSLT statement, referring to an XPath expression. The command receives current context node as a parameter. All other commands are initiated by former commands, receiving updated information of current context and/or output results (of the former command), see state I in Fig. 5. In state II each command tests its init flag or the output results of the former command, and according to these indications the command's action is determined, such as checking the next child, checking the current node value(Boolean operation) etc.. At the next State(lll) the respective action takes place (e.g. locating the next child, testing the value of the current node). In accordance to the condition test and action results (e.g. next child pointer, or true/false results of value testing) the command updates its inner parameters (state IV) (init flag, current node etc.) and the output results (state V) accordingly.
A hierarchical binary tree, as seen in Fig. 8, can represent the flow of command actions. The root node is initiated and receives its input from a given XSLT statement. Each command is represented by one node, the arrows illustrate the possible paths through the tree structure. A command (other than the root node) may be initiated by its parent node, by its left child ("Lstep") or by it's right child ("Rstep")
For further understanding of command operations Fig. 6 and 7 describe examples of two types of commands.
The Join command receives the current context node from the former commands. In case the command was initiated by the Parent command and its init flag is true (as seen in row 1), the desired action is to start the traversing process through the current context node. Accordingly, the next active command is the Lstep node command. Hence, the current context node updates the Lstep context node and the program pointer is directed to the respective Lstep command. The Init flag changes its value to false; this flag will further indicate the future action of this command.
In case the command is initiated by the Lstep, it extracts the current node parameter from the Lstep command inner memory. The Join command checks the current node value: in case it's null (row 4) (no further "siblings" satisfy the required condition, or simply exist, in the XML tree), then null is stored as a current node within the Join command and control is passed to the parent command (or returned to the XPath client if the Join command is the root node). Otherwise (current node is not null, so we arrived at the next sibling at the current XML level), the desired action is to check the descendants of the current node at the lower levels of the XML structure (see Row 3): that is, activating Rstep node command.
In case the Join command is initiated by the Rstep command, it receives the current node from the inner memory of the Rstep command (it is actually the output result of the Rstep command). The Join command checks the current node. If it's null (row 6) (no more descendants satisfy the condition at the lower levels of the XML tree), the Lstep command-node is activated for trying to select the next suitable sibling at the upper level of the XML tree structure. Otherwise (current node is not null, so we arrived at the next node satisfying the XPath subexpression under the Join node), the current node pointer is stored within the inner memory of the Join command, and control passes to the parent command (or XPath client).
The Filtering command is designated for checking child nodes at the XML tree structure and test their values according to a given Boolean condition. In case the command is initiated by the parent node-command and the initial flag is true, the desired action is to check children of the current context node of XML structure, thus the Lstep node-command is activated. In case the command is activated by the Lstep node, it extracts the current node parameter value, in case of null value (no more Children) the result is further returned to the parent node-command. Otherwise the Rstep node command is activated to check the current node value.

Claims

What is claim is
1. A designated programming language (DPL) for performing XPath expression commands as part of an XML processing module, said DPL utilizing the command language themselves for maintaining internal states , which represent the updated location and result values of the querying and traversing process through the XML document wherein the memory allocated for the program commands is the only memory required for executing the XPath expression.
2. The DPL of claim 1 wherein each DPL command performs an atomic step which is part of the XPath expression and its results represent a pointer to respective node of the XML structure or its values, said results are stored within command memory and used for performing the next DPL command or output value for the XML processing module.
3. The DPL of claim 2 wherein the flow process of the DPL program is presented by a hierarchical binary tree, each command represented by one tree node and wherein each command can be initiated by its parent node or by its Children: Left child or Right child.
4. The DPL of claim 3 wherein each command includes command code, activation state, and context parameters including pointers and values of
XML structure current navigating status.
5. The DPL of claim 3 wherein the activation of each command is determined according to the initiating command hierarchical position (Parent, Left Child, Right child), activating state, and output results of the initiating command.
6. The DPL of claim 3 wherein the DPL command are constructed from access Iterators for traversing the XML structure starting from a given context node, Filtering operators for checking values of specific nodes of the XML structure, Join operators for coordinating single access Iterators.
7. A method for executing XPath expression commands as part of an XML processing module, said method comprising the steps of: compiling the XPath expression into new programming language (DPL program), wherein said new language commands are utilized for storing internal states representing updated data of the querying and traversing process through the XML documents, wherein the memory space allocated for the program commands is the only memory required for executing the XPath expression.
Storing the complied program of each XPath expression. Executing the stored program according to the XML processing module requirements, in real time.
8. The method of claim 7 wherein the compiling process is further comprised of: parsing XPath expression; checking XPath syntax; normalizing XPath expression to the unabbreviated form; parsing normalized XPath expression; - translating each normalized expression into the new programming language commands;
9. The method of claim 8 wherein each DPL command performs an atomic step which is part of the XPath expression and its results represent a pointer to a respective node of the XML structure or its values, said results are stored within command memory and used for performing the next DPL command or output value for the XML processing module.
10. The method of claim 9 wherein the flow process of the DPL program is presented by a hierarchical binary tree, each command represented by one tree node and wherein each command can initiated by its parent node or by its Children: Left child or Right child.
11. The method of claim 10 wherein each DPL command includes command code, activation state and context parameters including pointers and values of XML structure current navigating status.
12. The method of claim 11 wherein the activation of each command is determined according to the initiating command hierarchical position
(Parent, Left Child, Right child), activation state, and output results of the initiating command.
13. The method of claim 12 wherein the DPL language includes at least three command types: Join command which is designated for traversing through the XML structure starting from a given context node of the XML structure, filtering command which is designated for checking Children nodes at specific level of the XML structure, and Boolean command which is designated for checking values of specific nodes of the XML structure.
14. A computerized process for performing XPath expressions as part of an XML processing module wherein the process flow is represented by a hierarchical binary tree whose traversing path represents XPath execution sequence, wherein each atomic step of the process is represented by one tree node and each atomic step can be initiated by its parent node or by it's Children: Left child or Right child.
15. The process of claim 14 wherein the process results represents a pointer to any node of the XML structure or its values, said results are stored within internal memory of each atomic step and used for performing the next atomic step or output value for the XML processing module.
16. The process of claim 15 wherein each atomic step commands includes command code, activation state, and context parameters including pointers and values of XML structure current navigating status.
17. The process of claim 16 wherein the activation of each command is determined according to the initiating command hierarchical position (Parent, Left Child, Right child), activation state, and output results of the initiating command.
18. The process 17 wherein the atomic steps commands are constructed from access Iterators for traversing the XML structure starting from a given context node, Filtering operators for checking values of specific nodes of the XML structure, Join operators for coordinating single access Iterators.
PCT/IL2004/000035 2003-01-27 2004-01-14 Method and system for xpath implementation WO2004068270A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IL15415203A IL154152A0 (en) 2003-01-27 2003-01-27 METHOD AND SYSTEM FOR XPath IMPLEMENTATION
IL154152 2003-01-27

Publications (2)

Publication Number Publication Date
WO2004068270A2 true WO2004068270A2 (en) 2004-08-12
WO2004068270A3 WO2004068270A3 (en) 2004-11-18

Family

ID=29798478

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2004/000035 WO2004068270A2 (en) 2003-01-27 2004-01-14 Method and system for xpath implementation

Country Status (2)

Country Link
IL (1) IL154152A0 (en)
WO (1) WO2004068270A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7472130B2 (en) 2005-10-05 2008-12-30 Microsoft Corporation Select indexing in merged inverse query evaluations
US7779396B2 (en) 2005-08-10 2010-08-17 Microsoft Corporation Syntactic program language translation

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040060007A1 (en) * 2002-06-19 2004-03-25 Georg Gottlob Efficient processing of XPath queries

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040060007A1 (en) * 2002-06-19 2004-03-25 Georg Gottlob Efficient processing of XPath queries

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CLARK, JAMES ET AL XML PATH LANGUAGE (XPATH), [Online] 16 November 1999, *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7779396B2 (en) 2005-08-10 2010-08-17 Microsoft Corporation Syntactic program language translation
US7472130B2 (en) 2005-10-05 2008-12-30 Microsoft Corporation Select indexing in merged inverse query evaluations

Also Published As

Publication number Publication date
IL154152A0 (en) 2003-07-31
WO2004068270A3 (en) 2004-11-18

Similar Documents

Publication Publication Date Title
US7340728B2 (en) Methods and systems for direct execution of XML documents
JP4991040B2 (en) Interpreting command scripts using local and extended storage for command indexing
US9524275B2 (en) Selectively translating specified document portions
US6480865B1 (en) Facility for adding dynamism to an extensible markup language
US7590644B2 (en) Method and apparatus of streaming data transformation using code generator and translator
JP5010551B2 (en) Server-side code generation from dynamic web page content files
US6781609B1 (en) Technique for flexible inclusion of information items and various media types in a user interface
US7383255B2 (en) Common query runtime system and application programming interface
US7120869B2 (en) Enhanced mechanism for automatically generating a transformation document
US20020078105A1 (en) Method and apparatus for editing web document from plurality of web site information
US20020073119A1 (en) Converting data having any of a plurality of markup formats and a tree structure
EP1492034A2 (en) Query optimizer system and method
US7366973B2 (en) Item, relation, attribute: the IRA object model
US20020143816A1 (en) Method and system for using a generalized execution engine to transform a document written in a markup-based declarative template language into specified output formats
US20040090458A1 (en) Method and apparatus for previewing GUI design and providing screen-to-source association
US8301615B1 (en) Systems and methods for customizing behavior of multiple search engines
US20050021502A1 (en) Data federation methods and system
US20060167869A1 (en) Multi-path simultaneous Xpath evaluation over data streams
US7409636B2 (en) Lightweight application program interface (API) for extensible markup language (XML)
GB2359157A (en) Extensible Markup Language (XML) server pages having custom Document Object Model (DOM) tags
JP2005507523A (en) Improvements related to document generation
Hogue Tree pattern inference and matching for wrapper induction on the World Wide Web
US20050262042A1 (en) Generating a dynamic content creation program
US8452753B2 (en) Method, a web document description language, a web server, a web document transfer protocol and a computer software product for retrieving a web document
US20020052895A1 (en) Generalizer system and method

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase