US8898144B2 - Query-oriented message characterization - Google Patents

Query-oriented message characterization Download PDF

Info

Publication number
US8898144B2
US8898144B2 US13/922,725 US201313922725A US8898144B2 US 8898144 B2 US8898144 B2 US 8898144B2 US 201313922725 A US201313922725 A US 201313922725A US 8898144 B2 US8898144 B2 US 8898144B2
Authority
US
United States
Prior art keywords
message
information
queries
values
normalized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US13/922,725
Other versions
US20130282655A1 (en
Inventor
Nicholas A. Allen
Stefan R. Batres
Justin D. Brown
Jayteerth N. Katti
Stephen Maine
Edmund S. V. Pinto
Kenneth D. Wolf
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Priority to US13/922,725 priority Critical patent/US8898144B2/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BROWN, JUSTIN D., ALLEN, NICHOLAS A., BATRES, STEFAN R., KATTI, JAYTEERTH N., MAINE, STEPHEN, PINTO, EDMUND S.V., WOLF, KENNETH D.
Publication of US20130282655A1 publication Critical patent/US20130282655A1/en
Application granted granted Critical
Publication of US8898144B2 publication Critical patent/US8898144B2/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • G06F17/30442
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/107Computer-aided management of electronic mailing [e-mailing]

Definitions

  • Computers and computing systems have affected nearly every aspect of modern living. Computers are generally involved in work, recreation, healthcare, transportation, entertainment, household management, etc.
  • computing system functionality can be enhanced by a computing systems ability to be interconnected to other computing systems via network connections.
  • Network connections may include, but are not limited to, connections via wired or wireless Ethernet, cellular connections, or even computer to computer connections through serial, parallel, USB, or other connections. The connections allow a computing system to access services at other computing systems and to quickly and efficiently receive application data messages from other computing system.
  • a stage of processing In a message-processing application it is not uncommon for a stage of processing to require only a portion of the message as input when making a decision.
  • An example of such a decision is message routing, which may be based on a single field or portion of the message as opposed to the entire message payload. Furthermore, the decision may only need a characterization of the message rather than the literal content. In the context of the message routing example, the routing decision may only require an equitable but repeatable distribution of messages. In other words, two messages with the same characteristic have the same routing applied but messages with different characteristics should be apportioned equally among the routing targets. Using a characteristic for message routing may be used in a fashion analogous to using a hash code for lookup in a table.
  • a characteristic There are a variety of information sources that may be drawn upon to compute a characteristic. It is typical for a message to be segmented into several component parts, such as a message envelope, message body, and message headers.
  • the message transmission protocol contributes information, such as HTTP headers, SMTP headers, TCP socket properties, and so on depending on the protocols in use.
  • a variety of other protocols that contribute information may be used in connection with the message as well, such as SSL session keys and WS-ReliableMessaging sequence identifiers. Both the quantities and kinds of information that might be used to form the characteristic are virtually limitless so an attempt to produce a complete enumeration may be a somewhat futile task.
  • the time at which the information needed to compute the characteristic becomes available varies. These variations take place not only from information source to information source but from fact to fact within an information source. As an extreme example, some information may not be known until a message is sent. Due to the highly concurrent nature of a distributed system, it is conceivable that the message might be received by another party, processed, and a response sent back before the calculation of the characteristic for sending the original message completes. The application needs to be able to handle these difficult race conditions.
  • Some embodiments described herein are directed to processing messages using a characteristic. Messages are processed based on a characteristic derived from at least one of: information in messages, metadata about messages, or other information external to messages.
  • One or more values for one or more pieces of information are received. At least one of the values for the one or more pieces of information is associated with a first message.
  • One or more queries are received. The queries specify one or more of the one or more pieces of information. At least a portion of the plurality of values for the one or more pieces of information are processed in conjunction with the one or more queries to create one or more normalized characteristics for the first message.
  • the one or more normalized characteristics for the first message are in a same format irrespective of the format of the pieces of information.
  • the first message, and/or other messages are processed based on at least one of the one or more normalized characteristics.
  • FIG. 1 illustrates a topology for calculating characteristics used to process messages using queries
  • FIG. 2 illustrates components for optimization of queries
  • FIG. 3 illustrates transformation of data when calculating characteristics
  • FIG. 4 illustrates a messaging infrastructure supplying information to a query engine after the information becomes available to facilitate calculating a characteristic
  • FIG. 5 illustrate a method of processing messages based on a characteristic.
  • One embodiment described herein formulates the construction of a message characteristic using a query over message content, message metadata or other data. Access to the storage locations for information may be normalized and integrated into the query. Additionally, in some embodiments, the calculation of a characteristic coordinates with the messaging infrastructure so that the system can reason effectively about when information is available and when computation of the characteristic will be complete.
  • FIG. 1 illustrates a query engine 102 .
  • the query engine 102 includes functionality for processing queries 104 , where the queries 104 are queries against various sources of data, such as messages 108 which include message data or non-message data 110 obtainable from other sources.
  • messages 108 which include message data or non-message data 110 obtainable from other sources.
  • some embodiments may be practiced such that a message characteristic may be specified using a query over message content, metadata or other information.
  • the query engine may include support for various languages 106 .
  • a query may be formulated using XPath expressions as a query language.
  • query languages 106 have native functionality for accessing information in a limited variety of formats and from a limited variety of sources, while not natively including functionality for access to other information.
  • XPath includes native functionality for accessing information in an XML structured data structure, such as a message formatted using XML, but may not include functionality for determining other information from other services. Nonetheless, the query language may be extended by including extensions 112 to include functionality for accessing other services. In the XPath query language, the extensions are referred to as selectors.
  • some embodiments may include functionality for normalization of access to different storage locations using extensions to the query language. In some embodiments, normalization of access to different storage locations may use a mutually-agreed upon data structure. Optimization of computing multiple characteristics for the same message may be performed, as will be described in more detail below, by merging queries and executing them simultaneously or in parallel.
  • information sources may include a message 108 which includes message data.
  • the message may include information such as envelope data, data in the message body, data in the headers of the message, etc.
  • the query engine 102 may include functionality for extracting the message data.
  • the query engine may include functionality supporting the XPath query language for extracting data from XML formatted messages.
  • Other query languages 106 may also or alternatively be used.
  • the query engine 102 may also include functionality for invoking various application programming interfaces (APIs) 114 .
  • APIs 114 include programmed functionality for interacting with information sources to obtain data from the sources.
  • the languages 106 may be considered in some respects to be APIs.
  • FIG. 1 further illustrates non-message data 110 .
  • the non-message data may be data from any one of a number of different sources and may include metadata about message data, or other data not directly present in the message data.
  • Metadata associated with data in the message 108 may include information such as protocol data indicating protocols used to send the message 108 ; environmental data, local properties, time of day, etc.
  • FIG. 1 illustrates that the query engine 102 performs queries 104 against the data sources. Based on the queries 104 , the query engine 102 generates intermediate results 116 , which may be instance values of data.
  • the intermediate results 116 may include tables of data or other forms of data.
  • the intermediate results 116 may include information such as a specific time of day (which may or may not be associated with the message 108 ), a specific protocol used to transmit a message 108 , or other information.
  • the intermediate results are typically not unit-less results, but rather represent some specific unit.
  • the intermediate results 116 may represent a time of day unit, a protocol unit, a transport unit, or some other specific unit. Additionally, the intermediate results may be of one or more different data types.
  • the intermediate results may be integers, floating points, strings, or other data types.
  • a set of intermediate results may have a mix of different data types.
  • a time may be expressed as one or more integers while a protocol may be expressed as one or more strings. Both the time integers and the protocol strings may be included in the same set of intermediate results 116 .
  • the intermediate results 116 can be used to create a characteristic 118 by a characteristic computation module 120 .
  • the characteristic 118 may be, for example, a number calculated using a hash algorithm or other numerical method to calculate a number based on the intermediate results 116 .
  • the characteristic 118 may be a unit-less 128-bit hash number that represents a globally-unique identifier.
  • the characteristic computation module 120 may be embodied using computer hardware and software configured to calculate a hash or other representation, such as for example, a numerical representation.
  • a messaging infrastructure may catalog the information that it can potentially supply for a query 104 .
  • the messaging infrastructure may be able to provide information regarding transports, information regarding protocols, etc.
  • the messaging infrastructure can promise availability of information at a particular time. In some embodiments, the promise is related to some functionality or performance of some action at the messaging infrastructure.
  • Analysis of a query 104 may be performed before characteristic computation at the characteristic computation module 120 to determine what information will be needed. Optimizations of characteristic computation may be performed so as to perform calculations for a characteristic computation at more convenient times subject to constraints based on information availability
  • a message 108 for which it is desirable to compute a characteristic.
  • the existence of this message 108 can be presupposed without regard to how the message 108 is or was generated. Thus, this could be a message that is being sent, being received, or possibly even created out of thin air without any connection to a messaging operation.
  • the message may be represented in a variety of formats. As an example, consider a message represented using the Simple Object Access Protocol (SOAP) 1.2 format. Such a message will have storage locations for a message envelope, a message body, and any number of message headers.
  • SOAP Simple Object Access Protocol
  • the message may also have associated with it metadata that is not contained within the message envelope, such as local message properties, delivery properties, or information in the ambient environment.
  • This metadata may be represented by the non-message data illustrated at 110 .
  • sources of data may be referred to as either sources of information from within the message or sources of information from outside the message.
  • a characteristic for the message 108 the available sources of information can be drawn upon.
  • the computation of the characteristic will frequently require only a subset of the available information. This subset is described by a query specification 105 including one or more queries 104 .
  • Each query includes an identifier and a query procedure.
  • the query procedure defines how a value is extracted from the available information.
  • a query procedure is specified using an XPath expression.
  • the message may be a purchase order in the SOAP format, a fragment of which is as follows:
  • the XPath expression “/s:Envelope/s:Body/po:PurchaseOrder/@purchaseOrderNumber” specifies a portion of the message.
  • the XPath expression specifies the value of the attribute named purchaseOrderNumber on the element named PurchaseOrder inside the element named Body inside the element named Envelope.
  • the XPath expression is named “PONumber” to create an association between the identifier PONumber and the fact resulting from evaluating the XPath expression, i.e. the number 123 which represents a purchase order unit 123 .
  • the query engine 102 Upon supplying the query engine 102 with a query specification 105 including queries 104 and necessary information sources, such as the message 108 and/or access to sources that generate the non-message date 110 , the query engine 102 computes a table of named query results illustrated in the intermediate results 116 .
  • the calculation of the characteristic 118 is defined in terms of the named query results 116 to abstract the calculation process from how information was accessed or organized.
  • New information sources can be added to the system either by unifying them with existing information sources or by extending the query engine with a new access method.
  • the standard XPath language only provides access to message data.
  • the XPath language could be extended with a new function, as illustrated by the extensions 112 , to access non-message data.
  • the HTTP Referer header is not part of the message data but could be accessed in a similar fashion using the XPath expression “z:GetProtocolData( )/Referer” to specify a portion of the non-message data 110 .
  • the value of the Referer property in the protocol data is not contained within the message.
  • the SMTP From header comes from a different information source, it too could be accessed using the GetProtocolData function.
  • the grouping of information to the same or different access methods can be done at the convenience of the developer.
  • the query engine 102 includes a query optimization module 122 which first transforms the original query specification 105 including into an optimized query specification 124 before executing the optimized query specification 124 using the language 106 (API's 114 as illustrated in FIG. 1 ).
  • the optimized query specification 124 when processed, produces the same table of query results 116 .
  • the query optimizer 122 of the query engine 102 joins together queries that have common subexpressions so that a single common subexpression is only evaluated once.
  • the query engine 102 operating on a query specification 105 containing the two queries “/s:Envelope/s:Body/PurchaseOrder1” and “/s:Envelope/s:Body/PurchaseOrder2” may only have to scan through the Envelope and Body elements of the message 108 once to satisfy both queries.
  • the query engine 102 composes with other components 128 and 130 as part of a processing pipeline 126 . These components 128 and 130 act on the inputs and outputs respectively to the engine.
  • One or more transformations may be applied at component 128 to the information sources before being read by the engine and one or more transformation may be applied at component 130 to the query results before the characteristic 118 (see FIG. 1 ) is calculated.
  • Each fact in an information source and each named query result may have an individually crafted transformation applied to them; or, transformations may be applied to groups of facts or query results.
  • a messaging infrastructure 134 Prior to the message 108 being sent by an application 132 , a messaging infrastructure 134 is introspected to identify the variety of information that this particular configuration will generate. The messaging infrastructure 134 may also make one or more statements about when each fact will be available. Statements may be promises that the facts will be available at a particular time or stage of processing. Also prior to the message 108 being sent, the query specification 105 (see FIG. 1 ) may be introspected to determine the variety of information that this particular query specification 105 will request.
  • FIG. 4 illustrates the messaging infrastructure 134 against a timeline axis T.
  • the timeline axis T illustrates increasing time in the down direction.
  • the message 108 is sent from the application 132 to the messaging infrastructure 134 .
  • the list of information identifiers that will be required by the queries 104 in the query specification 105 is associated with the message 108 .
  • embodiments may be implemented to associate the list of information identifiers prior to the message 108 being sent, at the time the message 108 is sent, or in some embodiments after the message 108 is sent.
  • the message 108 is associated with a callback that invokes the query engine 102 and characteristic computation module 120 (see FIG. 1 ).
  • Components 136 - 1 - 136 -N may act on the message 108 .
  • components referred to here generally as 136 and specifically by 136 -X where X is a number identifying a particular component, act on the message 108 , they conceptually add checkmarks to the list of information identifiers that will be required by the queries 104 as each identified fact becomes available.
  • a fact becomes available when it is possible to execute a specific process that results in the value of the fact. This process might simply return a pre-calculated value for the fact, or alternatively might require performing additional calculations.
  • FIG. 4 shows information about the message 108 being made available to the query engine 102 .
  • information provided by component 136 - 1 is made available to the query engine 102 .
  • information provided by component 136 - 2 is made available to the query engine 102 .
  • information provided by component 136 -N is made available to the query engine 102 .
  • the timeline axis T includes a time T Transmit representing the message 108 being transmitted out of the messaging infrastructure 134 , such as by transmitting the message onto the communication wire.
  • the communication wire may be any one of a different number of media including network cables or wireless transport media. The completion of the computation may take place earlier or later than the message being transmitted depending on the promises made by the components 136 .
  • the completion of the callback is used to solve races between sending and receiving messages.
  • the application 132 refrains from processing any received messages that might depend on the characteristic 118 of a previously sent message 108 until all of those characteristics have been computed.
  • FIG. 5 illustrates a method 500 of processing messages based on a characteristic derived from at least one of information in messages, metadata about messages, or other information external to messages.
  • the method includes receiving one or more values for one or more pieces of information (act 502 ). At least one of the values for one or more pieces of information may be associated with a first message.
  • the method 500 further includes receiving one or more queries (act 504 ). The queries may specify one or more of the one or more pieces of information.
  • FIG. 1 illustrates a query specification 105 including queries 104 that may be used to obtain values from the information in the message 108 and the non-message data 110 .
  • the method 500 includes processing the one or more values for the one or more pieces of information in conjunction with the one or more queries to create one or more normalized characteristics for the first message (act 506 ).
  • the one or more normalized characteristics for the first message are in a same format irrespective of the format of the pieces of information.
  • the one or more normalized characteristics may always be in the format of a 128-bit unit-less number that represents a globally unique identifier. Other representations may be used alternatively.
  • the method 500 further includes processing messages based on at least one of the one or more normalized characteristics (act 508 ).
  • the messages processed may be messages for which the characteristics were calculated.
  • the messages may be other messages.
  • the method 500 may include storing the one or more normalized characteristics for the first message and using the one or more normalized characteristics for processing one or more different messages.
  • the characteristic(s) may be calculated for a message that is sent while the characteristic(s) are used to process messages sent in reply to the first message.
  • Embodiments of the method 500 may further include creating an optimized query from one or more of the received queries.
  • processing the one or more values for the one or more pieces of information in conjunction with the one or more queries to create one or more normalized characteristics for the first message includes processing the optimized query.
  • Creating an optimized query may include determining a portion of a query that is common to two or more queries and when processing a query, merging common elements of the query to reduce overall processing.
  • creating an optimized query may include performing parallel execution on two or more queries.
  • the method 500 may be performed where processing the one or more values for the one or more pieces of information in conjunction with the one or more queries to create one or more normalized characteristics for the first message (act 506 ) includes computing one or more intermediate results and calculating the one or more normalized characteristics from the intermediate results.
  • the intermediate results may have units associated with them even when the one or more normalized characteristics do not.
  • the intermediate results may include results in time units, network address units, etc.
  • various transformations to data may occur, as illustrated at 128 and 130 of FIG. 3 .
  • the method 500 may be practiced to include transforming one or more of the values for the one or more pieces of information prior to computing the one or more intermediate results. Additionally or alternatively, the method 500 may further include transforming one or more of the values for the one or more pieces of information after computing the one or more intermediate results and prior to calculating the one or more normalized characteristics from the intermediate results.
  • the method 500 may be practiced to include analyzing the one or more queries to determine what information is needed. A promise is received for the information at a particular time for the information needed.
  • the act of processing the one or more values for the one or more pieces of information in conjunction with the one or more queries to create one or more normalized characteristics for the first message is performed after the information is received.
  • the particular time specified in the promise may be a present time, after a message is sent from a messaging infrastructure, or any other appropriate time.
  • Embodiments of the present invention may comprise or utilize a special purpose or general-purpose computer including computer hardware, as discussed in greater detail below.
  • Embodiments within the scope of the present invention also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures.
  • Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system.
  • Computer-readable media that store computer-executable instructions are physical storage media.
  • Computer-readable media that carry computer-executable instructions are transmission media.
  • embodiments can comprise at least two distinctly different kinds of computer-readable media: physical storage media and transmission media.
  • Physical storage media includes RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
  • a “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices.
  • a network or another communications connection can include a network and/or data links which can be used to carry or desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
  • program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to physical storage media (or vice versa).
  • program code means in the form of computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a “NIC”), and then eventually transferred to computer system RAM and/or to less volatile physical storage media at a computer system.
  • a network interface module e.g., a “NIC”
  • NIC network interface module
  • physical storage media can be included in computer system components that also (or even primarily) utilize transmission media.
  • Computer-executable instructions comprise, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions.
  • the computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code.
  • the invention may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, pagers, routers, switches, and the like.
  • the invention may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks.
  • program modules may be located in both local and remote memory storage devices.

Abstract

Processing messages. Messages are processed based on a characteristic derived from information in messages, metadata about messages, or other information external to messages. Values for one or more pieces of information are received. At least one of the values for one or more pieces of information is associated with a first message. Queries are received. The queries specify one or more of the pieces of information. At least a portion of the plurality of values for the one or more pieces of information is processed in conjunction with the one or more queries to create one or more normalized characteristics for the first message. The one or more normalized characteristics for the first message are in a same format irrespective of the format of the pieces of information. The first message, and/or other messages, is processed based on at least one of the one or more normalized characteristics.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a continuation of and claims priority to and the benefit of U.S. patent application Ser. No. 12/203,790, entitled “QUERY-ORIENTED MESSAGE CHARACTERIZATION,” which was filed on Sep. 3, 2008, and which is expressly incorporated herein by this reference. (U.S. patent application Ser. No. 12/203,790 is to issue as U.S. Pat. No. 8,473,455 on Jun. 25, 2013.)
BACKGROUND Background and Relevant Art
Computers and computing systems have affected nearly every aspect of modern living. Computers are generally involved in work, recreation, healthcare, transportation, entertainment, household management, etc.
Further, computing system functionality can be enhanced by a computing systems ability to be interconnected to other computing systems via network connections. Network connections may include, but are not limited to, connections via wired or wireless Ethernet, cellular connections, or even computer to computer connections through serial, parallel, USB, or other connections. The connections allow a computing system to access services at other computing systems and to quickly and efficiently receive application data messages from other computing system.
In a message-processing application it is not uncommon for a stage of processing to require only a portion of the message as input when making a decision. An example of such a decision is message routing, which may be based on a single field or portion of the message as opposed to the entire message payload. Furthermore, the decision may only need a characterization of the message rather than the literal content. In the context of the message routing example, the routing decision may only require an equitable but repeatable distribution of messages. In other words, two messages with the same characteristic have the same routing applied but messages with different characteristics should be apportioned equally among the routing targets. Using a characteristic for message routing may be used in a fashion analogous to using a hash code for lookup in a table.
There are a variety of information sources that may be drawn upon to compute a characteristic. It is typical for a message to be segmented into several component parts, such as a message envelope, message body, and message headers. The message transmission protocol contributes information, such as HTTP headers, SMTP headers, TCP socket properties, and so on depending on the protocols in use. Moreover, a variety of other protocols that contribute information may be used in connection with the message as well, such as SSL session keys and WS-ReliableMessaging sequence identifiers. Both the quantities and kinds of information that might be used to form the characteristic are virtually limitless so an attempt to produce a complete enumeration may be a somewhat futile task.
Additionally, the time at which the information needed to compute the characteristic becomes available varies. These variations take place not only from information source to information source but from fact to fact within an information source. As an extreme example, some information may not be known until a message is sent. Due to the highly concurrent nature of a distributed system, it is conceivable that the message might be received by another party, processed, and a response sent back before the calculation of the characteristic for sending the original message completes. The application needs to be able to handle these difficult race conditions.
Previous solutions have had multiple storage locations for the different classes of information that might be used to construct a message characteristic. Due to the variety of access methods for these storage locations and timings for the availability of information, the construction of a characteristic has largely been a manual and ad hoc process.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.
BRIEF SUMMARY
Some embodiments described herein are directed to processing messages using a characteristic. Messages are processed based on a characteristic derived from at least one of: information in messages, metadata about messages, or other information external to messages. One or more values for one or more pieces of information are received. At least one of the values for the one or more pieces of information is associated with a first message. One or more queries are received. The queries specify one or more of the one or more pieces of information. At least a portion of the plurality of values for the one or more pieces of information are processed in conjunction with the one or more queries to create one or more normalized characteristics for the first message. The one or more normalized characteristics for the first message are in a same format irrespective of the format of the pieces of information. The first message, and/or other messages, are processed based on at least one of the one or more normalized characteristics.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Additional features and advantages will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the teachings herein. Features and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. Features of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.
BRIEF DESCRIPTION OF THE DRAWINGS
To describe the manner in which the above-recited and other advantages and features can be obtained, a more particular description of the subject matter briefly described above will be rendered by reference to specific embodiments which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments and are not therefore to be considered to be limiting in scope, embodiments will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
FIG. 1 illustrates a topology for calculating characteristics used to process messages using queries;
FIG. 2 illustrates components for optimization of queries;
FIG. 3 illustrates transformation of data when calculating characteristics;
FIG. 4 illustrates a messaging infrastructure supplying information to a query engine after the information becomes available to facilitate calculating a characteristic; and
FIG. 5 illustrate a method of processing messages based on a characteristic.
DETAILED DESCRIPTION
One embodiment described herein formulates the construction of a message characteristic using a query over message content, message metadata or other data. Access to the storage locations for information may be normalized and integrated into the query. Additionally, in some embodiments, the calculation of a characteristic coordinates with the messaging infrastructure so that the system can reason effectively about when information is available and when computation of the characteristic will be complete.
Referring now to FIG. 1, an example is illustrated. FIG. 1 illustrates a query engine 102. The query engine 102 includes functionality for processing queries 104, where the queries 104 are queries against various sources of data, such as messages 108 which include message data or non-message data 110 obtainable from other sources. In particular, some embodiments may be practiced such that a message characteristic may be specified using a query over message content, metadata or other information. The query engine may include support for various languages 106. In one specific example, a query may be formulated using XPath expressions as a query language.
Often query languages 106, such as XPath, have native functionality for accessing information in a limited variety of formats and from a limited variety of sources, while not natively including functionality for access to other information. For example, XPath includes native functionality for accessing information in an XML structured data structure, such as a message formatted using XML, but may not include functionality for determining other information from other services. Nonetheless, the query language may be extended by including extensions 112 to include functionality for accessing other services. In the XPath query language, the extensions are referred to as selectors. Additionally, some embodiments may include functionality for normalization of access to different storage locations using extensions to the query language. In some embodiments, normalization of access to different storage locations may use a mutually-agreed upon data structure. Optimization of computing multiple characteristics for the same message may be performed, as will be described in more detail below, by merging queries and executing them simultaneously or in parallel.
As illustrated in FIG. 1, information sources may include a message 108 which includes message data. The message may include information such as envelope data, data in the message body, data in the headers of the message, etc. As noted above, the query engine 102 may include functionality for extracting the message data. For example, in one embodiment, the query engine may include functionality supporting the XPath query language for extracting data from XML formatted messages. Other query languages 106 may also or alternatively be used. Notably, the query engine 102 may also include functionality for invoking various application programming interfaces (APIs) 114. APIs 114 include programmed functionality for interacting with information sources to obtain data from the sources. Notably, the languages 106 may be considered in some respects to be APIs.
FIG. 1 further illustrates non-message data 110. The non-message data may be data from any one of a number of different sources and may include metadata about message data, or other data not directly present in the message data. Metadata associated with data in the message 108 may include information such as protocol data indicating protocols used to send the message 108; environmental data, local properties, time of day, etc.
As noted previously, FIG. 1 illustrates that the query engine 102 performs queries 104 against the data sources. Based on the queries 104, the query engine 102 generates intermediate results 116, which may be instance values of data. The intermediate results 116 may include tables of data or other forms of data. For example, the intermediate results 116 may include information such as a specific time of day (which may or may not be associated with the message 108), a specific protocol used to transmit a message 108, or other information. The intermediate results are typically not unit-less results, but rather represent some specific unit. For example, the intermediate results 116 may represent a time of day unit, a protocol unit, a transport unit, or some other specific unit. Additionally, the intermediate results may be of one or more different data types. For example, the intermediate results may be integers, floating points, strings, or other data types. Additionally, a set of intermediate results may have a mix of different data types. For example, a time may be expressed as one or more integers while a protocol may be expressed as one or more strings. Both the time integers and the protocol strings may be included in the same set of intermediate results 116.
The intermediate results 116 can be used to create a characteristic 118 by a characteristic computation module 120. The characteristic 118 may be, for example, a number calculated using a hash algorithm or other numerical method to calculate a number based on the intermediate results 116. For example, in one embodiment, the characteristic 118 may be a unit-less 128-bit hash number that represents a globally-unique identifier. The characteristic computation module 120 may be embodied using computer hardware and software configured to calculate a hash or other representation, such as for example, a numerical representation.
As will be discussed in more detail below, some embodiments may be practiced where coordination occurs between the computation of a message characteristic 118 and the messaging infrastructure. In particular, a messaging infrastructure may catalog the information that it can potentially supply for a query 104. For example, the messaging infrastructure may be able to provide information regarding transports, information regarding protocols, etc. The messaging infrastructure can promise availability of information at a particular time. In some embodiments, the promise is related to some functionality or performance of some action at the messaging infrastructure. Analysis of a query 104 may be performed before characteristic computation at the characteristic computation module 120 to determine what information will be needed. Optimizations of characteristic computation may be performed so as to perform calculations for a characteristic computation at more convenient times subject to constraints based on information availability
As will be discussed in more detail below, some embodiments may be practiced where transformation of information is performed before and/or after a query.
With reference once again to FIG. 1, a more detailed example with specifics is illustrated. Consider a message 108 for which it is desirable to compute a characteristic. The existence of this message 108 can be presupposed without regard to how the message 108 is or was generated. Thus, this could be a message that is being sent, being received, or possibly even created out of thin air without any connection to a messaging operation. The message may be represented in a variety of formats. As an example, consider a message represented using the Simple Object Access Protocol (SOAP) 1.2 format. Such a message will have storage locations for a message envelope, a message body, and any number of message headers. The message may also have associated with it metadata that is not contained within the message envelope, such as local message properties, delivery properties, or information in the ambient environment. This metadata may be represented by the non-message data illustrated at 110. Thus, sources of data may be referred to as either sources of information from within the message or sources of information from outside the message.
To compute a characteristic for the message 108, the available sources of information can be drawn upon. The computation of the characteristic will frequently require only a subset of the available information. This subset is described by a query specification 105 including one or more queries 104. Each query includes an identifier and a query procedure. The query procedure defines how a value is extracted from the available information.
As an example of a query specification 105, in one embodiment, a query procedure is specified using an XPath expression. For example, the message may be a purchase order in the SOAP format, a fragment of which is as follows:
<s:Envelope>
  <s:Header>
    . . . header data included in the message . . .
  </s:Header>
  <s:Body>
    <po:PurchaseOrder purchaseOrderNumber=“123”>
    . . . purchase order data defined by the application . . .
    </po:PurchaseOrder>
  </s:Body>
</s:Envelope>
The XPath expression “/s:Envelope/s:Body/po:PurchaseOrder/@purchaseOrderNumber” specifies a portion of the message. In the present example, the XPath expression specifies the value of the attribute named purchaseOrderNumber on the element named PurchaseOrder inside the element named Body inside the element named Envelope. In the present example, the XPath expression is named “PONumber” to create an association between the identifier PONumber and the fact resulting from evaluating the XPath expression, i.e. the number 123 which represents a purchase order unit 123.
Upon supplying the query engine 102 with a query specification 105 including queries 104 and necessary information sources, such as the message 108 and/or access to sources that generate the non-message date 110, the query engine 102 computes a table of named query results illustrated in the intermediate results 116.
In the illustrated example, the calculation of the characteristic 118 is defined in terms of the named query results 116 to abstract the calculation process from how information was accessed or organized. New information sources can be added to the system either by unifying them with existing information sources or by extending the query engine with a new access method. For example, the standard XPath language only provides access to message data. The XPath language could be extended with a new function, as illustrated by the extensions 112, to access non-message data.
In one embodiment, the HTTP Referer header is not part of the message data but could be accessed in a similar fashion using the XPath expression “z:GetProtocolData( )/Referer” to specify a portion of the non-message data 110. In this case, the value of the Referer property in the protocol data is not contained within the message. Although the SMTP From header comes from a different information source, it too could be accessed using the GetProtocolData function. Thus, the grouping of information to the same or different access methods can be done at the convenience of the developer.
With reference now to FIG. 2, additional details of the query engine 102 will now be discussed, and in particular with respect to optimizing query processing. If multiple queries 104 are performed against the same sources of information such as the message data and non-message data 108/110, it is often possible to perform the collection of queries more efficiently when taken together than one at a time. To do this in one embodiment, the query engine 102 includes a query optimization module 122 which first transforms the original query specification 105 including into an optimized query specification 124 before executing the optimized query specification 124 using the language 106 (API's 114 as illustrated in FIG. 1). The optimized query specification 124, when processed, produces the same table of query results 116.
In one embodiment, the query optimizer 122 of the query engine 102 joins together queries that have common subexpressions so that a single common subexpression is only evaluated once. Thus, the query engine 102 operating on a query specification 105 containing the two queries “/s:Envelope/s:Body/PurchaseOrder1” and “/s:Envelope/s:Body/PurchaseOrder2” may only have to scan through the Envelope and Body elements of the message 108 once to satisfy both queries.
Referring now to FIG. 3, additional features of the query engine 102 with respect to transforming data prior to and post processing is illustrated. In the illustrated embodiment, the query engine 102 composes with other components 128 and 130 as part of a processing pipeline 126. These components 128 and 130 act on the inputs and outputs respectively to the engine. One or more transformations may be applied at component 128 to the information sources before being read by the engine and one or more transformation may be applied at component 130 to the query results before the characteristic 118 (see FIG. 1) is calculated. Each fact in an information source and each named query result may have an individually crafted transformation applied to them; or, transformations may be applied to groups of facts or query results.
Applications often have a preferred time for computing a characteristic 118. It is typical for the application to want to compute the characteristic 118 as late as possible or as early as possible depending on the type of decision being made. However, an application may be unable to compute the characteristic 118 until all of the necessary information is available. An example of this conflict occurs when sending a message. It may desirable to compute the characteristic as early as possible so that the characteristic is known before any responses to sending the message are observed. However, the information necessary to compute the characteristic may not be available until the message is partially or fully sent. An example of information that is not available until very late is a message identifier that is assigned by the delivery system when the message is written onto the wire.
Referring now to FIG. 4, an example of one embodiment that addresses these concerns is illustrated. To reason about a conflict, what information will be used by the characteristic computation and when that information will be available should be known. Prior to the message 108 being sent by an application 132, a messaging infrastructure 134 is introspected to identify the variety of information that this particular configuration will generate. The messaging infrastructure 134 may also make one or more statements about when each fact will be available. Statements may be promises that the facts will be available at a particular time or stage of processing. Also prior to the message 108 being sent, the query specification 105 (see FIG. 1) may be introspected to determine the variety of information that this particular query specification 105 will request.
FIG. 4 illustrates the messaging infrastructure 134 against a timeline axis T. The timeline axis T illustrates increasing time in the down direction. At TStart, the message 108 is sent from the application 132 to the messaging infrastructure 134. In some embodiments, about the time the message 108 is sent, the list of information identifiers that will be required by the queries 104 in the query specification 105 is associated with the message 108. Notably, embodiments may be implemented to associate the list of information identifiers prior to the message 108 being sent, at the time the message 108 is sent, or in some embodiments after the message 108 is sent. Additionally, the message 108 is associated with a callback that invokes the query engine 102 and characteristic computation module 120 (see FIG. 1). Components 136-1-136-N may act on the message 108. As components, referred to here generally as 136 and specifically by 136-X where X is a number identifying a particular component, act on the message 108, they conceptually add checkmarks to the list of information identifiers that will be required by the queries 104 as each identified fact becomes available. In one embodiment, a fact becomes available when it is possible to execute a specific process that results in the value of the fact. This process might simply return a pre-calculated value for the fact, or alternatively might require performing additional calculations. Thus, although a fact may be made available to query engine 102 at a particular time, the value of the fact may not be known in a definite sense until query engine 102 requests the value of the fact at a later time, if query engine 102 ever chooses to do so. Once all of the identified information is available the callback can be invoked to complete the characteristic computation. In the example illustrated, FIG. 4 shows information about the message 108 being made available to the query engine 102. At time T1, information provided by component 136-1 is made available to the query engine 102. At time T2, information provided by component 136-2 is made available to the query engine 102. At time TN, information provided by component 136-N (which signifies that any number of components 136 may be implemented in the messaging infrastructure 134) is made available to the query engine 102.
The timeline axis T includes a time TTransmit representing the message 108 being transmitted out of the messaging infrastructure 134, such as by transmitting the message onto the communication wire. The communication wire may be any one of a different number of media including network cables or wireless transport media. The completion of the computation may take place earlier or later than the message being transmitted depending on the promises made by the components 136.
In one embodiment, the completion of the callback is used to solve races between sending and receiving messages. The application 132 refrains from processing any received messages that might depend on the characteristic 118 of a previously sent message 108 until all of those characteristics have been computed.
The following discussion now refers to a number of methods and method acts that may be performed. It should be noted, that although the method acts may be discussed in a certain order or illustrated in a flow chart as occurring in a particular order, no particular ordering is necessarily required unless specifically stated, or required because an act is dependent on another act being completed prior to the act being performed. FIG. 5 illustrates a method 500 of processing messages based on a characteristic derived from at least one of information in messages, metadata about messages, or other information external to messages. The method includes receiving one or more values for one or more pieces of information (act 502). At least one of the values for one or more pieces of information may be associated with a first message. The method 500 further includes receiving one or more queries (act 504). The queries may specify one or more of the one or more pieces of information. For example, FIG. 1 illustrates a query specification 105 including queries 104 that may be used to obtain values from the information in the message 108 and the non-message data 110.
The method 500 includes processing the one or more values for the one or more pieces of information in conjunction with the one or more queries to create one or more normalized characteristics for the first message (act 506). The one or more normalized characteristics for the first message are in a same format irrespective of the format of the pieces of information. For example, the one or more normalized characteristics may always be in the format of a 128-bit unit-less number that represents a globally unique identifier. Other representations may be used alternatively.
The method 500 further includes processing messages based on at least one of the one or more normalized characteristics (act 508). Notably, the messages processed may be messages for which the characteristics were calculated. In alternative embodiments, the messages may be other messages. For example, the method 500 may include storing the one or more normalized characteristics for the first message and using the one or more normalized characteristics for processing one or more different messages. For example, the characteristic(s) may be calculated for a message that is sent while the characteristic(s) are used to process messages sent in reply to the first message.
Embodiments of the method 500 may further include creating an optimized query from one or more of the received queries. In some of these embodiments, processing the one or more values for the one or more pieces of information in conjunction with the one or more queries to create one or more normalized characteristics for the first message (act 506) includes processing the optimized query. Creating an optimized query may include determining a portion of a query that is common to two or more queries and when processing a query, merging common elements of the query to reduce overall processing. In an alternative embodiment, creating an optimized query may include performing parallel execution on two or more queries.
As noted and illustrated at 116 of FIG. 1, the method 500 may be performed where processing the one or more values for the one or more pieces of information in conjunction with the one or more queries to create one or more normalized characteristics for the first message (act 506) includes computing one or more intermediate results and calculating the one or more normalized characteristics from the intermediate results. The intermediate results may have units associated with them even when the one or more normalized characteristics do not. For the example, the intermediate results may include results in time units, network address units, etc. In some embodiments, various transformations to data may occur, as illustrated at 128 and 130 of FIG. 3. For example, the method 500 may be practiced to include transforming one or more of the values for the one or more pieces of information prior to computing the one or more intermediate results. Additionally or alternatively, the method 500 may further include transforming one or more of the values for the one or more pieces of information after computing the one or more intermediate results and prior to calculating the one or more normalized characteristics from the intermediate results.
As noted, some embodiments may be directed to environments where information is not immediately available to calculate a characteristic. Thus, some embodiments include functionality for handling these situations. In particular, the method 500 may be practiced to include analyzing the one or more queries to determine what information is needed. A promise is received for the information at a particular time for the information needed. In this example, the act of processing the one or more values for the one or more pieces of information in conjunction with the one or more queries to create one or more normalized characteristics for the first message (act 506) is performed after the information is received. The particular time specified in the promise may be a present time, after a message is sent from a messaging infrastructure, or any other appropriate time.
Embodiments of the present invention may comprise or utilize a special purpose or general-purpose computer including computer hardware, as discussed in greater detail below. Embodiments within the scope of the present invention also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computer-executable instructions are physical storage media. Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, embodiments can comprise at least two distinctly different kinds of computer-readable media: physical storage media and transmission media.
Physical storage media includes RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
A “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmissions media can include a network and/or data links which can be used to carry or desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
Further, upon reaching various computer system components, program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to physical storage media (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a “NIC”), and then eventually transferred to computer system RAM and/or to less volatile physical storage media at a computer system. Thus, it should be understood that physical storage media can be included in computer system components that also (or even primarily) utilize transmission media.
Computer-executable instructions comprise, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.
Those skilled in the art will appreciate that the invention may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, pagers, routers, switches, and the like. The invention may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims (19)

What is claimed is:
1. A system for deriving one or more unitless characteristics of a message to facilitate processing the message based on the derived one or more characteristics, the system operative in a computing environment which processes messages, and wherein the message processing is based on one or more characteristics of each message derived from different classes of information formatted differently from one another, one class of information being information contained within the message itself, and another class of information being other information external to the message, the system comprising one or more computer processors and computer memory comprising computer executable instructions which, when executed upon the one or more processors, performs a method of deriving one or more unitless characteristics of a message to facilitate processing the message based on the derived one or more characteristics comprising:
receiving a first value, the first value associated with a first message, comprising information contained within the message, and representing a specific first unit;
receiving a second value associated with the first message, comprising information from a source external to the message, and representing a specific second unit which is different from the specific first unit;
receiving one or more queries, the queries specifying at least the first and second received values;
at a characteristic computation module configured to compute one or more characteristics for a message from intermediate results, using the one or more queries to process the first and second values to create one or more normalized characteristics for the first message, the one or more normalized characteristics for the first message being in a unitless format which is the same format irrespective of the format of the first and second values and irrespective of the specific units of the first and second values, at least one of the created one or more normalized characteristics being a unitless number representing a globally unique identifier;
processing the first message based on at least one of the one or more of the derived normalized characteristics;
storing within the computing environment the one or more normalized characteristics derived for the first message; and
processing within the computing environment a second message based on at least one of the one or more of the stored normalized characteristics derived for the first message.
2. The system of claim 1, further comprising creating an optimized query from one or more of the received queries, and wherein using the one or more queries to process the first and second values comprises using the optimized query to perform said processing of the first and second values.
3. The system of claim 2, wherein creating an optimized query comprises:
determining a portion of a query that is common to two or more queries; and
when processing a query, merging common elements of the query to reduce overall processing.
4. The system of claim 2, wherein creating an optimized query comprises performing parallel execution on two or more queries.
5. The system of claim 1, wherein using the one or more queries to process the first and second values comprises:
evaluating the one or more queries to obtain one or more intermediate results and;
calculating the one or more normalized characteristics from the intermediate results.
6. The system of claim 5, further comprising transforming one or more of the received values prior to computing the one or more intermediate results.
7. The system of claim 5, further comprising transforming one or more of the received values after computing the one or more intermediate results and prior to calculating the one or more normalized characteristics from the intermediate results.
8. The system of claim 1 wherein the one or more queries comprise one or more Xpath queries.
9. The system of claim 1, wherein the one or more characteristics are one or more unique identifiers.
10. A system for deriving one or more unitless characteristics of a message to facilitate processing the message based on the derived one or more characteristics, the system operative in a computing environment which processes messages, and wherein the message processing is based on one or more characteristics of each message derived from different classes of information formatted differently from one another, one class of information being information contained within the message itself, and another class of information being other information external to the message, the system comprising one or more computer processors and computer memory comprising computer executable instructions which, when executed upon the one or more processors, performs a method of deriving one or more unitless characteristics of a message to facilitate processing the message based on the derived one or more characteristics, the method comprising:
identifying at a processor at least two values for at least two types of information associated with a first message, at least one value comprising information contained within the message, and representing a specific first unit, and at least one other value comprising information from a source external to the message, and representing a specific second unit which is different from the specific first unit;
receiving at the processor one or more queries which specify one or more of the identified values to be extracted for the at least two types of information;
at a characteristic computation module configured to compute one or more characteristics for a message from intermediate results, the processor using the one or more queries to extract and then process the at least two values to derive from the processed values one or more unitless normalized characteristics for the first message, the one or more normalized characteristics for the first message being in a unitless format which is the same format irrespective of the format of the two types of information and irrespective of the specific units of the values from which the normalized characteristics for the first message were derived, at least one of the created one or more normalized characteristics being a unitless number representing a globally unique identifier;
storing the one or more normalized characteristics derived for the first message; and
the processor then processing a second message based on at least one of the one or more of the stored normalized characteristics derived for the first message.
11. The system of claim 10, further comprising:
analyzing the one or more queries to determine information that is needed for a particular normalized characteristic;
receiving a promise for the needed information that the needed information will be available at a particular time; and
wherein using the one or more queries to extract and then process the at least two values is performed after the needed information is received.
12. The system of claim 11, wherein the promise is that the needed information will be immediately available.
13. The system of claim 11, wherein the promise is that the needed information will be available once the first message is sent from a messaging infrastructure.
14. The system of claim 11, wherein the promise is that the needed information will be available after a particular time or date.
15. The system of claim 10, wherein the one or more normalized characteristics are one or more unique identifiers.
16. The system of claim 10, wherein the one or more queries comprise one or more Xpath queries.
17. The system of claim 10, wherein using the one or more queries to extract and then process the at least two values to create one or more normalized characteristics for the first message comprises:
evaluating the one or more queries to obtain one or more intermediate results and;
calculating the one or more normalized characteristics from the intermediate results.
18. The system of claim 17, wherein the one or more intermediate results comprise results extracted from data associated with information contained within or with the first message itself.
19. The system of claim 17, wherein the one or more intermediate results comprise results extracted from data based on the information external to the first message.
US13/922,725 2008-09-03 2013-06-20 Query-oriented message characterization Expired - Fee Related US8898144B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/922,725 US8898144B2 (en) 2008-09-03 2013-06-20 Query-oriented message characterization

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12/203,790 US8473455B2 (en) 2008-09-03 2008-09-03 Query-oriented message characterization
US13/922,725 US8898144B2 (en) 2008-09-03 2013-06-20 Query-oriented message characterization

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US12/203,790 Continuation US8473455B2 (en) 2008-09-03 2008-09-03 Query-oriented message characterization

Publications (2)

Publication Number Publication Date
US20130282655A1 US20130282655A1 (en) 2013-10-24
US8898144B2 true US8898144B2 (en) 2014-11-25

Family

ID=41726820

Family Applications (2)

Application Number Title Priority Date Filing Date
US12/203,790 Active 2029-12-04 US8473455B2 (en) 2008-09-03 2008-09-03 Query-oriented message characterization
US13/922,725 Expired - Fee Related US8898144B2 (en) 2008-09-03 2013-06-20 Query-oriented message characterization

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US12/203,790 Active 2029-12-04 US8473455B2 (en) 2008-09-03 2008-09-03 Query-oriented message characterization

Country Status (1)

Country Link
US (2) US8473455B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10394658B1 (en) * 2015-12-14 2019-08-27 EMC IP Holding Company LLC High speed snapshots mechanism
US10977289B2 (en) * 2019-02-11 2021-04-13 Verizon Media Inc. Automatic electronic message content extraction method and apparatus

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8099498B2 (en) * 2008-09-03 2012-01-17 Microsoft Corporation Probabilistic mesh routing
US8473455B2 (en) 2008-09-03 2013-06-25 Microsoft Corporation Query-oriented message characterization

Citations (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5892923A (en) 1994-12-28 1999-04-06 Hitachi, Ltd. Parallel computer system using properties of messages to route them through an interconnect network and to select virtual channel circuits therewithin
US6243360B1 (en) 1996-09-18 2001-06-05 International Business Machines Corporation Network server having dynamic load balancing of messages in both inbound and outbound directions
US20020184344A1 (en) 2001-03-14 2002-12-05 Ferhan Elvanoglu Executing dynamically assigned functions while providing services
US6567380B1 (en) 1999-06-30 2003-05-20 Cisco Technology, Inc. Technique for selective routing updates
US20040018839A1 (en) 2002-06-06 2004-01-29 Oleg Andric Protocol and structure for mobile nodes in a self-organizing communication network
US6732273B1 (en) 1998-10-21 2004-05-04 Lucent Technologies Inc. Priority and security coding system for electronic mail messages
US20040098667A1 (en) * 2002-11-19 2004-05-20 Microsoft Corporation Equality of extensible markup language structures
US20040148334A1 (en) 2003-01-28 2004-07-29 Sbc Properties, L.P. Coordination platform and method for dynamic aggregation of services
US6826148B1 (en) 2000-07-25 2004-11-30 Sun Microsystems, Inc. System and method for implementing a routing scheme in a computer network using intention packets when fault conditions are detected
US20050044197A1 (en) 2003-08-18 2005-02-24 Sun Microsystems.Inc. Structured methodology and design patterns for web services
US20050108340A1 (en) 2003-05-15 2005-05-19 Matt Gleeson Method and apparatus for filtering email spam based on similarity measures
US20050198169A1 (en) 2002-06-06 2005-09-08 Arc-E-Mail Ltd. Storage process and system for electronic messages
US20060029198A1 (en) 2004-06-09 2006-02-09 Honeywell International Inc. Communications system based on real-time neurophysiological characterization
US20060041647A1 (en) 2004-08-17 2006-02-23 Michael Perham System and method for profiling messages
US20060059165A1 (en) 2004-09-13 2006-03-16 Solace Systems, Inc. Highly scalable subscription matching for a content routing network
US20060085505A1 (en) 2004-10-14 2006-04-20 Microsoft Corporation Validating inbound messages
US20060089931A1 (en) * 2004-10-22 2006-04-27 Microsoft Corporation Presenting message attachments independent of electronic messages at a user-interface
US7039050B1 (en) 2000-02-16 2006-05-02 Tibco Software Inc. Intra-process message switch
US20060123467A1 (en) 2004-12-06 2006-06-08 Sandeep Kumar Performing message payload processing functions in a network element on behalf of an application
US20060123479A1 (en) 2004-12-07 2006-06-08 Sandeep Kumar Network and application attack protection based on application layer message inspection
US20060129650A1 (en) 2004-12-10 2006-06-15 Ricky Ho Guaranteed delivery of application layer messages by a network element
US20060149767A1 (en) 2004-12-30 2006-07-06 Uwe Kindsvogel Searching for data objects
US20060173899A1 (en) 2005-02-02 2006-08-03 Microsoft Corporation Efficient transformation of interchange format messages
US20060215660A1 (en) 2005-03-11 2006-09-28 Alcatel Device and a method for generating routing messages for a GMPLS control plane communications network
US20060242292A1 (en) 2005-04-20 2006-10-26 Carter Frederick H System, apparatus and method for characterizing messages to discover dependencies of services in service-oriented architectures
US20060265689A1 (en) 2002-12-24 2006-11-23 Eugene Kuznetsov Methods and apparatus for processing markup language messages in a network
US20070011256A1 (en) 2005-05-31 2007-01-11 Yonatan Klein Presentation layer adaptation in multimedia messaging
US20070064703A1 (en) 2005-09-22 2007-03-22 Hernandez Albert A Autonomous routing of network messages
US20070064702A1 (en) 2005-09-20 2007-03-22 Anthony Bates Modifying operation of peer-to-peer networks based on integrating network routing information
US20070168546A1 (en) 2006-01-18 2007-07-19 Microsoft Corporation Efficient Dispatch of Messages Based on Message Headers
US20070276956A1 (en) 2006-05-23 2007-11-29 International Business Machines Corporation Method and system for content similarity-based message routing and subscription matching
US20080168150A1 (en) 2005-04-14 2008-07-10 Ying Chen Method and apparatus for metadata driven web service mediation
US20080178197A1 (en) * 1998-03-09 2008-07-24 Aginfolink Holdings, Inc. A Bvi Corporation Enhanced product tracking
US20090234924A1 (en) 2008-03-14 2009-09-17 Microsoft Corporation Automatic Association of Messages with Conversations
US20100030798A1 (en) 2007-01-23 2010-02-04 Clearwell Systems, Inc. Systems and Methods for Tagging Emails by Discussions
US20100037161A1 (en) * 2008-08-11 2010-02-11 Innography, Inc. System and method of applying globally unique identifiers to relate distributed data sources
US20100057707A1 (en) 2008-09-03 2010-03-04 Microsoft Corporation Query-oriented message characterization
US20100057933A1 (en) 2008-09-03 2010-03-04 Microsoft Corporation Probabilistic mesh routing
US7809698B1 (en) 2002-12-24 2010-10-05 International Business Machines Corporation System and method remapping identifiers to secure files
US7899871B1 (en) 2006-01-23 2011-03-01 Clearwell Systems, Inc. Methods and systems for e-mail topic classification
US7983701B2 (en) * 2007-11-01 2011-07-19 International Business Machines Corporation Alerts based on significance of free format text messages

Patent Citations (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5892923A (en) 1994-12-28 1999-04-06 Hitachi, Ltd. Parallel computer system using properties of messages to route them through an interconnect network and to select virtual channel circuits therewithin
US6243360B1 (en) 1996-09-18 2001-06-05 International Business Machines Corporation Network server having dynamic load balancing of messages in both inbound and outbound directions
US20080291030A1 (en) * 1998-03-09 2008-11-27 Aginfolink Holdings, Inc. A Bvi Corporation Enhanced Claim Validation
US20080178197A1 (en) * 1998-03-09 2008-07-24 Aginfolink Holdings, Inc. A Bvi Corporation Enhanced product tracking
US6732273B1 (en) 1998-10-21 2004-05-04 Lucent Technologies Inc. Priority and security coding system for electronic mail messages
US6567380B1 (en) 1999-06-30 2003-05-20 Cisco Technology, Inc. Technique for selective routing updates
US7039050B1 (en) 2000-02-16 2006-05-02 Tibco Software Inc. Intra-process message switch
US6826148B1 (en) 2000-07-25 2004-11-30 Sun Microsystems, Inc. System and method for implementing a routing scheme in a computer network using intention packets when fault conditions are detected
US20020184344A1 (en) 2001-03-14 2002-12-05 Ferhan Elvanoglu Executing dynamically assigned functions while providing services
US20050198169A1 (en) 2002-06-06 2005-09-08 Arc-E-Mail Ltd. Storage process and system for electronic messages
US20040018839A1 (en) 2002-06-06 2004-01-29 Oleg Andric Protocol and structure for mobile nodes in a self-organizing communication network
US7234109B2 (en) 2002-11-19 2007-06-19 Microsoft Corp. Equality of extensible markup language structures
US20040098667A1 (en) * 2002-11-19 2004-05-20 Microsoft Corporation Equality of extensible markup language structures
US7774831B2 (en) * 2002-12-24 2010-08-10 International Business Machines Corporation Methods and apparatus for processing markup language messages in a network
US20060265689A1 (en) 2002-12-24 2006-11-23 Eugene Kuznetsov Methods and apparatus for processing markup language messages in a network
US7809698B1 (en) 2002-12-24 2010-10-05 International Business Machines Corporation System and method remapping identifiers to secure files
US20040148334A1 (en) 2003-01-28 2004-07-29 Sbc Properties, L.P. Coordination platform and method for dynamic aggregation of services
US20050108340A1 (en) 2003-05-15 2005-05-19 Matt Gleeson Method and apparatus for filtering email spam based on similarity measures
US20050044197A1 (en) 2003-08-18 2005-02-24 Sun Microsystems.Inc. Structured methodology and design patterns for web services
US20060029198A1 (en) 2004-06-09 2006-02-09 Honeywell International Inc. Communications system based on real-time neurophysiological characterization
US20060041647A1 (en) 2004-08-17 2006-02-23 Michael Perham System and method for profiling messages
US20060059165A1 (en) 2004-09-13 2006-03-16 Solace Systems, Inc. Highly scalable subscription matching for a content routing network
US20060085505A1 (en) 2004-10-14 2006-04-20 Microsoft Corporation Validating inbound messages
US20060089931A1 (en) * 2004-10-22 2006-04-27 Microsoft Corporation Presenting message attachments independent of electronic messages at a user-interface
US20060123467A1 (en) 2004-12-06 2006-06-08 Sandeep Kumar Performing message payload processing functions in a network element on behalf of an application
US20060123479A1 (en) 2004-12-07 2006-06-08 Sandeep Kumar Network and application attack protection based on application layer message inspection
US20060129650A1 (en) 2004-12-10 2006-06-15 Ricky Ho Guaranteed delivery of application layer messages by a network element
US20060149767A1 (en) 2004-12-30 2006-07-06 Uwe Kindsvogel Searching for data objects
US20060173899A1 (en) 2005-02-02 2006-08-03 Microsoft Corporation Efficient transformation of interchange format messages
US20060215660A1 (en) 2005-03-11 2006-09-28 Alcatel Device and a method for generating routing messages for a GMPLS control plane communications network
US20080168150A1 (en) 2005-04-14 2008-07-10 Ying Chen Method and apparatus for metadata driven web service mediation
US20060242292A1 (en) 2005-04-20 2006-10-26 Carter Frederick H System, apparatus and method for characterizing messages to discover dependencies of services in service-oriented architectures
US20070011256A1 (en) 2005-05-31 2007-01-11 Yonatan Klein Presentation layer adaptation in multimedia messaging
US20070064702A1 (en) 2005-09-20 2007-03-22 Anthony Bates Modifying operation of peer-to-peer networks based on integrating network routing information
US20070064703A1 (en) 2005-09-22 2007-03-22 Hernandez Albert A Autonomous routing of network messages
US20070168546A1 (en) 2006-01-18 2007-07-19 Microsoft Corporation Efficient Dispatch of Messages Based on Message Headers
US7899871B1 (en) 2006-01-23 2011-03-01 Clearwell Systems, Inc. Methods and systems for e-mail topic classification
US20070276956A1 (en) 2006-05-23 2007-11-29 International Business Machines Corporation Method and system for content similarity-based message routing and subscription matching
US20100030798A1 (en) 2007-01-23 2010-02-04 Clearwell Systems, Inc. Systems and Methods for Tagging Emails by Discussions
US7983701B2 (en) * 2007-11-01 2011-07-19 International Business Machines Corporation Alerts based on significance of free format text messages
US20090234924A1 (en) 2008-03-14 2009-09-17 Microsoft Corporation Automatic Association of Messages with Conversations
US20100037161A1 (en) * 2008-08-11 2010-02-11 Innography, Inc. System and method of applying globally unique identifiers to relate distributed data sources
US20100057933A1 (en) 2008-09-03 2010-03-04 Microsoft Corporation Probabilistic mesh routing
US20100057707A1 (en) 2008-09-03 2010-03-04 Microsoft Corporation Query-oriented message characterization

Non-Patent Citations (15)

* Cited by examiner, † Cited by third party
Title
"A Routing Scheme for Content-Based Networking" INFOCOM 2004. Twenty-third Annual Joint Conference of the IEEE Computer and Communications Societies Issue Date: Mar. 7-11, 2004 on pp. 918-928 vol. 2.
"Apache Synapse Enterprise Service Bus (ESB)," Author Unknown, Date Unknown, 1 pg. [online] [retrieved on Jul. 28, 2008]. Retrieved from the Internet: http://synapse.apache.org/.
"Cisco AON: A Network-Based Intelligent Message Routing System," Author Unknown, Cisco Application Networking Services Modules, Copyright 1992-2008 Cisco Systems, Inc., 8 pgs. [online] [retrieved on Jul. 25, 2008]. Retrieved from the Internet: http://www.cisco.com/en/US/prod/collateral/modules/ps6438/product-data-sheet0900aecd802c1f9c.html.
"Constraint-Based Personalization Model: Multi-Channel Messaging," by Kal Toth and Sai Rajkiran Nagboth, Department of Computer Science, Oregon State University, Date Unknown, 5 pages. [online] [retrieved on Jul. 28, 2008]. Retrieved from the Internet: http://www.research.att.com/~rjana/TothNagboth.pdf.
"Constraint-Based Personalization Model: Multi-Channel Messaging," by Kal Toth and Sai Rajkiran Nagboth, Department of Computer Science, Oregon State University, Date Unknown, 5 pages. [online] [retrieved on Jul. 28, 2008]. Retrieved from the Internet: http://www.research.att.com/˜rjana/TothNagboth.pdf.
"Demaq: A Foundation for Declarative XML Message Processing," by Alexander Bohm, Carl-Christian Kanne and Guido Moerkotte, Department of Mathematics and Computer Science, University of Mannheim, Germany, Date Unknown, 11 pgs. [online] [retrieved on Jul. 28, 2008]. Retrieved from the Internet: http://www.cidrdb.org/cidr2007/papers/cidr07p04.pdf.
"Sonic ESB B5.5 Developers Guide," Author Unknown, Sonic Software Corporation, Copyright 2004, 422 pages, [online] [retrieved on Jul. 25, 2008]. Retrieved from the Internet: http://www.sonicsoftware.com/developer/documentation/docs/esb-dev.pdf.
Andrew T. Campbell, et al., "Design, Implementation and Evaluation of Cellular IP," IEEE Personal Communications, vol. 7, Issue: 4, pp. 42-49, Publication Year: 2000. See pp. 44-45.
U.S. Appl. No. 12/203,790, filed Dec. 6, 2011, Office Action.
U.S. Appl. No. 12/203,790, filed Feb. 27, 2013, Notice of Allowance.
U.S. Appl. No. 12/203,790, filed Jul. 17, 2012, Office Action.
U.S. Appl. No. 12/203,790, filed Jul. 6, 2011, Office Action.
U.S. Appl. No. 12/203,790, filed Mar. 29, 2011, Office Action.
U.S. Appl. No. 12/203,826, filed Apr. 27, 2011, Office Action.
U.S. Appl. No. 12/203,826, filed Sep. 20, 2011, Notice of Allowance.

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10394658B1 (en) * 2015-12-14 2019-08-27 EMC IP Holding Company LLC High speed snapshots mechanism
US10977289B2 (en) * 2019-02-11 2021-04-13 Verizon Media Inc. Automatic electronic message content extraction method and apparatus
US11663259B2 (en) 2019-02-11 2023-05-30 Yahoo Assets Llc Automatic electronic message content extraction method and apparatus
US20230267138A1 (en) * 2019-02-11 2023-08-24 Yahoo Assets Llc Automatic electronic message content extraction method and apparatus

Also Published As

Publication number Publication date
US20130282655A1 (en) 2013-10-24
US8473455B2 (en) 2013-06-25
US20100057707A1 (en) 2010-03-04

Similar Documents

Publication Publication Date Title
US20070198457A1 (en) Accessing and manipulating data in a data flow graph
US8527865B2 (en) Spreadsheet formula translation of server calculation rules
US20190129734A1 (en) Data collection workflow extension
US8375043B2 (en) Techniques for providing XQuery access using web services
US9075833B2 (en) Generating XML schema from JSON data
RU2419837C2 (en) Work streams aimed at data
US20070277154A1 (en) Testing distributed components
US20100115126A1 (en) Automated conversion of versioned data collections
US8898144B2 (en) Query-oriented message characterization
JP6101269B2 (en) Efficient provision of multiple metadata representations of the same type
US7945893B2 (en) Mapping web services description language documents to XQuery functions
CN111638923B (en) Method and device for data routing based on Java annotation
KR20080104303A (en) Declarations for transformations within service sequences
US8959501B2 (en) Type and length abstraction for data types
CN109871354B (en) File processing method and device
US10078652B2 (en) Reducing latency in performing a task among distributed systems
US8978041B2 (en) Service virtualization container
US8099498B2 (en) Probabilistic mesh routing
CN110109983B (en) Method and device for operating Redis database
US8201185B2 (en) COM object and managed code object interoperability
US20120136881A1 (en) Exchanging data using data transformation
CN111221841A (en) Real-time processing method and device based on big data
US20220303346A1 (en) Efficient response communication
CN115563183B (en) Query method, query device and program product
US20090285224A1 (en) Service adaptation machine

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ALLEN, NICHOLAS A.;BATRES, STEFAN R.;BROWN, JUSTIN D.;AND OTHERS;SIGNING DATES FROM 20080902 TO 20080903;REEL/FRAME:030653/0705

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034544/0541

Effective date: 20141014

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551)

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20221125