CN100430896C - Hardware parser accelerator - Google Patents

Hardware parser accelerator Download PDF

Info

Publication number
CN100430896C
CN100430896C CNB2003801061657A CN200380106165A CN100430896C CN 100430896 C CN100430896 C CN 100430896C CN B2003801061657 A CNB2003801061657 A CN B2003801061657A CN 200380106165 A CN200380106165 A CN 200380106165A CN 100430896 C CN100430896 C CN 100430896C
Authority
CN
China
Prior art keywords
character
palette
state table
state
string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2003801061657A
Other languages
Chinese (zh)
Other versions
CN1726464A (en
Inventor
迈克尔·C·达普
埃里克·C·莱特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lockheed Martin Corp
Original Assignee
Lockheed Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lockheed Corp filed Critical Lockheed Corp
Publication of CN1726464A publication Critical patent/CN1726464A/en
Application granted granted Critical
Publication of CN100430896C publication Critical patent/CN100430896C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

The present invention relates to a device that can implement the analysis of documents (such as XML<TM>) with dedicated hardware in greatly reduced time, and simultaneously can eliminate a large amount of processing burden from the main CPU. The traditional application of state tables is divided into a character palette, an abbreviated state table and a next-state palette. The palette can be implemented in a dedicated high-speed storage; and a cache device can be used for accelerating the access of the abbreviated state table. The treatment is done in a partly parallel assembly line. A dedicated register can be updated simultaneously; and the cluster of special characters of any length, which can adapt to the character palette, can further accelerate the document analysis, after skipping over the characteristics under the control of the mark bit.

Description

Hardware parser accelerator
Explanation
Background of invention
Invention field
The present invention relates generally to the parsing (parsing) to the application program that is used to control common computer operations, the file and/or other logic sequence that relate in particular to application programs, network packet are carried out parse operation.
The description of prior art
In recent years, the digital communicating field that is connected at computing machine with computing machine between the link in the network had obtained development rapidly, and this all is similar with increasing rapidly of personal computer several years ago in many aspects.This effective performance and the function that has increased the single computing machine in above-mentioned network system at the interconnectivity of teleprocessing and the growth pole the earth aspect the possibility.Yet, the purposes of single computing machine and system and be placed into their user's of when service hobby and the diversity of these aspects of state of the art has caused the performance of individual machines and their operating system and configuration to produce basic degree change when computing machine, individual machines and their operating system jointly are called as " platform ", these platforms are mutually incompatible on some degree usually, especially on the aspect of operating system and program language.
Platform identity and to communication capacity and teleprocessing ability and for the compatibility of enough degree of supporting it time this incompatible development that has caused Object-oriented Programming Design of demand (it has admitted such conception of species, that is, the frame of reference by entity, attribute and relation is one group of more or less general module with application program and data acquisition) and a large amount of programming language that embodies this conception of species.Extend markup language TM(XML TM) language that comes to this, it its used widely, and can be as file and in the transmission over networks of any configuration and architecture.
In such language, some character string is corresponding with some instruction or identifier, comprise special character and other important data (they gather and are known as control word), it allows data or efficient in operation ground to carry out oneself's identification so that they can be taken as " object " subsequently, like this, relevant data and order can be translated into the appropriate format and the order of different application with different language, thereby produce the compatibility to a certain degree of processing of expectation that is enough to be supported in given machine place of each platform that links to each other.The detection of these character strings realizes by the operation that is called parsing (parsing), and this is decomposed into its ingredient and to its more conventional application class that carries out syntactic description seemingly with grammer with expression formula (for example sentence).
Work as analyzing XML TMDuring file, most of and may be used to travel through this document Search Control word, special character and other main central processing unit (CPU) execution time and be defined as just at processed concrete XML TMThe significant data of standard.This measure is typically finished by software, this software is inquired about each character and is determined whether it belongs to the set of predefined character string interested, for example, one group of character string that comprises "<command〉", "<data type=dataword〉", "</command〉" etc.If any one of target string is detected, then token is saved, and this token has hereof the pointer of the position of pointing to the starting point that is used for token and length.These tokens are accumulated up to all files resolved.
Traditional implementation is to carry out to search those character strings that will be concerned about based on the finite state machine (FSM) of form (table) in software.State table resides in the storer and is designed to search hereof special pattern.Current state is used as the plot in the state table, and the ascii table of input character shows that (representation) is the index of this table.For example, suppose that the ASCII value that above-mentioned state machine is in the state 0 (zero) and first input character is 02, then the specific address of state entry will be plot (state 0) and index/ascii character (02) and/cascade (concatenation).FSM begins from CPU takes out input file from storer first character.Then structure then obtains status data corresponding to the specific address of initialized/current state and input character in the state table of CPU in storer from this state table.Based on the status data that is returned, different if (represent this character corresponding to first character of interested character string), CPU upgrades current state to new value, and other operation that execution is indicated in status data (for example, if based on aforesaid further repetition, if single character is the last character that special character or current character are found to be the string that will be concerned about, then send token or interruption).
Along with the continuation character of the string that will be concerned about is found, above-mentioned processing is repeated, and state is changed.Just, if initial character is the original character of the string be concerned about, then the state of above-mentioned FSM can be promoted to new state (for example, from original state 0 to state 1).If this character is not to be concerned about, then by the state table entries of returning from the state table address, specifying same state (for example, state 0) or not coomand mode renewal) and (usually) state machine is remained unchanged.Possible operation includes, but are not limited to be provided with interruption, storage token and upgrades pointer.To repeat above-mentioned processing subsequently to afterwards character.Should be noted that, just followed and above-mentioned FSM when being in the state (or other string that will be concerned about of expression also is not found to be other state of current character of following) of non-state 0 at the string that will be concerned about, but can be found with the character of the original character of inconsistent other string that will be concerned about of current string.Under such a case, string fragment or part that state table entries will indicate suitable operation before to be followed with indication and identification, and follow the possible new string that will be concerned about and thoroughly discerned or be found to be till the string that is not to be concerned about up to this new string.In other words, the string of being concerned about can be nested, and state machine must detect the string that will be concerned about in another string that will be concerned about, or the like.This just requires CPU traversal XML TMFile is many times with this XML of thorough parsing TMFile.
Whole XML TMFile or other Languages file are connect a character ground by a character by the way and resolve.Along with potential target strings is recognized, character of above-mentioned FSM connects a character ground progressively by various states, up to the string of being concerned about identified fully or run into and the inconsistent character of possible string be concerned about (for example, above-mentioned string by fully/when intactly mating, or character is when departing from objectives string) till.Under a kind of situation in back, except turn back to initial state or with the corresponding state of detection of the original character of other target strings, can not operate usually.In the previous case, token in input file start address and the length of token be stored in the storer.When parsing was finished, all objects will be identified, and based on local platform or can begin for the processing of fixed platform.
Because above-mentioned search is normally carried out at a plurality of strings of being concerned about, so state table can provide the multiple conversion from any given state.This implementation allows current character resolved to be used for a plurality of target strings simultaneously when adapting to nested string expediently.
From above stated specification as can be seen, to for example XML TMThe document analysis of file needs many repetitions and is used for many storage access of each time repetition.Therefore, the processing time in universal cpu is the most basic essential.And another main complicacy of handling a plurality of strings is that it need produce big state table, and is to handle under the situation that breaks away from the processing of real time data bag.Yet, this just need a large amount of cpu cycles obtain input character data, obtain status data, upgrade the various pointers and the state address that are used for each character of file.Like this, resolve for example XML TMThe file of file handles with in try to be the first fully (pre-empt) CPU or the platform other, and to postpone required processing basically be common relatively (common).
Should be realized that in the art, can make general hardware imitate the function of (emulate) specialized hardware by program composition, special-purpose data processing hardware is usually than faster by the operation of the common hardware of program composition, even if their structure and program are accurately consistent with each other, this is because need expense still less to manage and control specialized hardware.Yet the needs of specialized hardware are used for surprising (prohibitively) that the hardware resource of particular procedure can be big, especially the growth of processing speed may be less situation under.And specialized hardware must have the function restriction, and for specific application provides enough dirigibilities, for example, it also can be conditional that the ability of the combination in any of the arbitrary number of searching character is provided.Like this, for practical, specialized hardware must provide big growth in processing speed when hardware saving very fully is provided; In required processing capacity, because functional mobility that increases or program composition and the demand that is difficult to synchronous adaptation (accommodate) further is needs.
In this, be connected to each other ability and need be used to resolve for example XML TMThe file processing time quantum of file has also been drawn the security of system problem.On the one hand, any processing of carrying out with high relatively priority, need very many processing times is attacked similar in appearance to the denial of service (DOS) in system or its node in some aspects, maybe can be the instrument that uses in a kind of like this attack.
In order to consume inimically and finally to make the available resources overload, frequent dos attack shows as trifling or bad system service request.The suitable configuration of hardware accelerator can greatly reduce or eliminate the potentiality of available resources overload.In addition, when overload, system often breaks down or exposes fragile security.Like this, eliminating overload is the thing that important security will be considered.
And, because state table must (basic levels) comprise cpu command on fundamental aspect, therefore, finish before the parsing, it is possible beginning some processing and carrying out number order, and this is difficulty or impossible under the situation of performance of system not being carried out strict compromise (compromise).In brief, the safe potentiality that is used to compromise will be used to handle for example XML by minimizing TMProcessing time of resolving and reducing necessarily.
Summary of the invention
The invention provides a kind of being used in the quantity of required hardware of restriction and storer, acceleration is to for example XML TMThe file of file carries out the application specific processor and the related hardware of dissection process.
In order to finish these purposes of the present invention and other purpose, a kind of hardware parser accelerator is provided, comprising: file memory; The character palette, comprise with described file in the character corresponding address; State table comprises a plurality of clauses and subclauses corresponding with described character; The NextState palette comprises state address or side-play amount; And token buffer, the described clauses and subclauses in the wherein said state table comprise at least one in the address that enters described NextState palette and token.
Brief description of drawings
Aforementioned and other target, aspect and beneficial effect will be better understood after the reference accompanying drawing is described in detail preferred implementation of the present invention, wherein:
Fig. 1 has shown the part of the state table that uses in document analysis;
Fig. 2 is the high-level diagram according to parser accelerator of the present invention;
Fig. 3 has described the form of preferred character palette (palette) as described in Figure 2;
Fig. 4 A and 4B have described described state table form of describing as Fig. 2 of preferred form of the present invention and the state table control register that uses with state table;
Fig. 5 has described preferred NextState palette form as shown in Figure 2; And
Fig. 6 is preferred as described in Figure 5 token form.
The detailed description of the preferred embodiment of the invention
With reference now to accompanying drawing,,, wherein shown the part of understanding the useful state table of the present invention has been represented more specifically with reference to Fig. 1.Should be appreciated that the state table that shows is to can be used for analyzing XML in Fig. 1 TMThe very little part of file, and be to be used for exemplary purpose in itself.Should be appreciated that XML TMFile is used as the embodiment that can use one type the logical data sequence that accelerator according to the present invention handles here.Other logical data sequence also can be built from the content of network data packet Chinese holly, for example is used for the user terminal command string that share service device computing machine is carried out.Although all state tables physically do not exist (at least in the form that the present invention shows), Fig. 1 can be used to be convenient to the understanding to the operation of known software resolver, and the applicant states and do not belong to any prior art related to the present invention among Fig. 1.
Many clauses and subclauses of observing the state table part of in Fig. 1, describing be replicability also be helpful, for the understanding of the present invention, the whole state table that no longer needs to make hardware be adapted to express among Fig. 1 is important.On the contrary, although the present invention can realize (may use special-purpose processor) in software, but hsrdware requirements according to the present invention are limited fully, and the performance loss (penalty) that is used for increase processing time of resolving by software can not passed judgment on (justify) by possible hardware saving (hardwareeconomy) arbitrarily.
In Fig. 1, state table is divided into the row of arbitrary number, and each row has and the corresponding plot of a kind of state.The row of plot is divided into the row consistent with the number that is used to represent the coding for the treatment of the character in the resolved file; In this embodiment, have 256 (256) be listed as corresponding with the basic octet that is used for as the character of the index of state table.
Several aspects of noticing the state table entries of demonstration are helpful, and the sub-fraction of the exemplary state table of especially describing in being delivered in Fig. 1 is how to support aspect the understanding of detection of many words:
1. in the state table that shows, have only two clauses and subclauses to comprise the clauses and subclauses that are not " being positioned at state 0 " at the row that is used for state 0, when the original character of tested byte and the arbitrary string of being concerned about did not match, these clauses and subclauses had kept initial state.The single clauses and subclauses that the state of advancing to 1 is provided are corresponding in particular cases a kind of like this, and wherein whole strings of being concerned about all begin with identical character.Can provide any other character that advances to other state will be usually but not necessarily advance to state except that state 1, still, can be used to for example detect nested string the further reference of the equal state that can be reached by other character.In that { state 0, the order shown in the FD} place (for example, " special interruption ") can be used to detect and operate special single character with combining of " being positioned at state 0 ".
2. in the state that is higher than state 0, the clauses and subclauses that " remain in state n " are for treating that the state that will be maintained by the possible long-play of one or more characters stipulates, as situation about running into usually, this state runs in may increasing at the numerical value of for example order.The special processing that the invention provides such character string is so that the acceleration of enhancing to be provided, as will describing in detail below.
3. at the state that is higher than state 0, " arrival state 0 " expression has detected the character of distinguishing this string from any string of being concerned about, and no matter there are how many characters matched formerly to be detected, and dissection process is turned back to initially/default setting to be to begin to search other string of being concerned about.(therefore, up to the present, " arrival state 0 " clauses and subclauses normally frequent and the most maximum clauses and subclauses occur in the state table).Turn back to state 0 must find the solution analyse operation turn back in the file when distinguishing character and be detected, the character of the back of the beginning character of being followed.
4. comprise that " clauses and subclauses of the order of arrival state 0 represent to have finished the detection to the complete string of being concerned about.Usually, this order will be a storage token (address and the length that have this token), and afterwards, this token makes string be used as an object and handles.Yet, the command specifies with " arrive in state n " continue to follow can be potentially with the string of the string coupling of being concerned about in, initiate operation at intermediate point (intermediate point).
5. for fear of (for example searching thereon at two strings being concerned about, have n-1 identical original character but n the string that character is different, or have a string of different original characters) between branch appears and the place, arbitrfary point uncertain (ambiguity) appears, usually need to different (for example, noncontinuity) state advances, as at { state 1,01} and the { state shown in the state 1, FD}.Except the special circumstances of the string of the special character that comprised and the string be concerned about, be that the complete identification of the string of n needs n-1 state to random length with common original character.For this reason, the number that state and state table are capable usually must be very big, even if also like this for the string of the appropriate relatively number of being concerned about.
7. opposite with the leading portion content, most state can be characterized fully by one or two unique clauses and subclauses and default " arrival state 0 ".The hardware that these characteristics of the state table of Fig. 1 are used to produce high level is in the present invention saved property and is quickened the normal conditions of the string that dissection process is concerned about to be used for significantly.
What as above hinted is the same, and dissection process (, carry out as tradition the same) starts from the system under given default/original state, is described to state 0 in Fig. 1, then, after re-treatment along with mating that character is found and the state that advances to higher sequence number.When the string of being concerned about was discerned fully, when perhaps specifying a specific operation at place, the centre position of the string with potential matching, the operation of for example storing token or sending interruption was performed.Yet, repeating of each character that is used for file at every turn, character must obtain from the CPU storer, state table must be obtained by (once more from the CPU storer), and various pointers (for example, point to the character of file and the plot in the state table) and register (for example, pointing to the initial matching character address of string and the length that adds up) must in operation in proper order, be updated.Therefore, be easy to recognize that above-mentioned dissection process expends a large amount of processing times.
According to the high-level exemplary block scheme of parser accelerator 100 of the present invention as shown in Figure 2.As those of ordinary skill in the art was cognoscible, Fig. 2 it is also understood that the process flow diagram for the execution in step of carrying out parsing according to the present invention.As below will discussing in detail together with Fig. 3,4A, 4B, 5 and 6, the present invention has used some hardware economies when the expression state table, so that a plurality of hardware pipelines (pipelines) are developed, although these hardware pipelines have slight asymmetric (skewed) to operate in parallel mode basically in time.Like this, the renewal of pointer and register can be carried out basically concurrently and with other operation concomitantly, by visiting the hardware of operating with parallel mode quickly and looking ahead (prefetching) from the CPU storer about state table and file, the required time of access memory is reduced in a large number simultaneously.
As general overall viewpoint, for example XML TMThe file of file externally is stored among the DRAM 120 that is indexed by register 112,114, and preferably is transferred to the input block 130 of playing the effect of multiplexer for above-mentioned streamline by 32 words.Each streamline comprises the copy of character palette (palette) 140, state table 160 and NextState palette 170; Its each all be adapted to the part of the state table of compressed format.The output of NextState palette 170 comprise the clauses and subclauses in the table 160 that gets the hang of the address the NextState address portion and treat stored token value (if any).Operation in character palette 140 and the NextState palette 170 is to the simple memory access of high-speed internal SRAM, internal SRAM can move mutually concurrently, and parallel with the simple memory visit to the outside high-speed DRAM (it also can be embodied as high-speed buffer) that forms state table 160.Therefore, only need CPU initially to control these hardware elements (still, it is in case start, and only can enough accidental CPU operation calls spontaneously works with the updating file data and stores token) relative few clock period with the assessment of each character of being used for file.The acceleration gain on basis is the minimizing that all operations duration of each character in CPU adds the summation of the duration of the single storage operation of spontaneously carrying out in high-speed SRAM or DRAM.
Should be appreciated that being called " outside " memory construction here is the configuration that is used for hinting storer 120,140, consider the amount of required storage and from the access of above-mentioned hardware parser accelerator and/or host CPU, it is that inventor institute is preferred at present.In other words, token is handled and some other operations provide architecture according to parser accelerator of the present invention with convenient memory sharing, or conveniently at least has beneficial effect by host CPU and hardware accelerator access.Discuss according to these, those of ordinary skill in the art should be realized that the hardware substitute that the expection that does not have other includes with for example synchronous dram (SDRAM) of wider scope is fit to.
Is that example to the form of character palette 140, state table 160, NextState table 170 and NextState and token discuss to Fig. 6 with the hardware economy of supporting preferred realization among Fig. 2 referring now to Fig. 3.Can use other technology/form, same, the form of foregoing description can be understood as exemplary but is preferred at present.
The character palette preferred form that Fig. 3 has described, this palette is with to be included in the character that maybe can be included in the string of being concerned about corresponding.Correspond to the number that is listed as in the state table among Fig. 1, these forms preferably provide the clauses and subclauses that are numbered 0-255.(term " palette (palette) " uses with the identical meaning with comprising the term " palette (color palette) " that is used to support the data of each color and be called full gamut (gamut) by collectivity ground, and the use of palette has reduced the clauses and subclauses/row in the state table).For example, be called as statement in single-row that the character that can not cause any state variation of " null character (NUL) " can be in state table, rather than in many row, explain.Be that null character (NUL) output tests at 144 places be desired, this can quicken the processing that is used to resolve basically, and this is because its allows next character is handled at once, and need not the further storage operation to the state table access.This form can be adapted to by single register, is perhaps adapted to by the storage unit that is configured by (for example) data in the base register 142 of the palette (schematically describing with overlapping memory board in by Fig. 2) that points to special character.From file (XML for example TMFile) clauses and subclauses current 8 characters (in four one provides from input buffer 130, and is the same with the nybble word that receives from outside DRAM 120) addressing palette, this palette is followed index or the local pointers of OPADD as status register.Like this, by providing palette with above-mentioned form, the part of the function of Fig. 1 can provide with the form of the single register that is limited capacity relatively; Like this, allow a plurality of in them to be formed and operate, keep other function in sufficient hardware saving and the status of support table 160 simultaneously in parallel mode.
Fig. 4 A has shown preferred state table form, and it is to constitute with the similar mode of character palette (for example, the same with register basically) or to dispose.Be with the main difference of character palette among Fig. 3, the length of register depend on to the response times of desired character and be concerned about the number and the length of character string.Therefore, if the quantity of the internal storage that can be provided economically is insufficient in concrete situation, then is provided at the possibility that realizes sort memory among CPU or other the outside DRAM (may have inside or outside speed buffering) and considers that expectation obtains.However, because the clauses and subclauses with higher replicability in the state table of Fig. 1 can be reduced to single clauses and subclauses, be very clearly so sufficient hardware saving is provided; The address of single clauses and subclauses is admitted by the data that provide as mentioned above as the character palette according to Fig. 3.Preferably one, two or four of the outputs of state table 160, but regulation nearly 32 figure place the dirigibility of increase can be provided, as discussing with reference to Fig. 4 B below.Under any circumstance, the output of state table provides the address or the pointer of visit NextState palette 170.
Referring now to Fig. 4 B, as the of the present invention perfect feature of paying close attention in the back, the present invention realizes that preferably feature comprises state table control register 162, and it allows further fully hardware saving, if especially 32 outputs of state table 160 with situation about being provided under.In fact, the state table control register by allow variable-length word and be stored in state table and with it from wherein reading the compression that state table information is provided.
More particularly, state table control register 162 storage and the length of each clauses and subclauses in the state table 160 of Fig. 4 A is provided.Since some state table entries among Fig. 1 be the height replicability (for example, " arrival state 0 ", " remaining in state n "), so these clauses and subclauses not only can by the single clauses and subclauses in the state table 160 or at least the clauses and subclauses in Fig. 1 represent, but also can be by less bit representation, even if these less positions can still can produce sufficient hardware saving less when most of or all replicability clauses and subclauses are included in state table, just as being found to be easily in some state tables.Those of ordinary skill in the art should be realized that the principle of this minimizing encodes similar in appearance to so-called entropy (entropy).
Referring now to Fig. 5 the preferred format of NextState palette 170 is discussed.NextState palette 170 is preferably realized in the mode very identical with the character palette 140 of above-mentioned discussion.Yet owing to have status register 160, the number of the clauses and subclauses that may need can not known in advance, and the length of clauses and subclauses is preferably grown (for example, two 32 byte) very much separately.On the other hand, owing to have only relatively little and foreseeable address realm need be included in the given time arbitrarily, NextState palette 170 can be operated (for example, using NextState palette base register 172) as speed buffering.And, if 32 outputs of state table 160 are provided, some above-mentioned data can be used to replenish the data in the clauses and subclauses of NextState palette 170, may allow the clauses and subclauses of lacking or may fully skip the NextState palette in the latter, shown in dash line 175.
As shown in Figure 5,32 words of low address from 170 outputs of NextState palette are tokens to be saved.These tokens preferably be formed have 16 token value, 8 token flag and 8 control marks, wherein, token value and token flag be stored in token buffer 190 by the pointer 192 of the beginning of pointing to string with by calculating the place, address that length that successful character comparison accumulates provides jointly.Control mark setting is interrupted the processing in host CPU or the control parser accelerator.One in above-mentioned latter's control mark preferably is used for being provided with the function that can skip character, this can not cause the state variation at the state place except state 0, for example the string of the identical or relevant character of the random length that occurs in the string of being concerned about as above can be derived.Under such a case, the NextState table clause need not just to obtain and can be reused from SRAM/SDRAM.Input buffering address 112 need not extra processing and just can be incremented; Thereby allow to be used for the fully additional acceleration of parsing of definite string of character.Second 32 word is the address offsets that feed back to register 180 and totalizer 150, and it is treated with output is connected (concatenate) to form the state table pointer that sensing is used for next character from the index of character palette.Initial address corresponding to state 0 provides by register 182.
So as can be seen, the status register of the use of character palette, reduced form and NextState storer clearly are expressed as the independent stage with the function of the operation of traditional status register; Each can carry out by enough high-speed slightly storeies relatively in stage as quick as thought, like this, each stage can be replicated with form successively with other operation of token and memory parallel to the file parallel pipeline of character manipulation separately.Therefore, dissection process can greatly be quickened, even if with respect to handle the application specific processor that can must be carried out all above-mentioned functions before beginning in turn at other character.
In a word, character data of this accelerator access host CPU packet of Network Transmission (be known as sometimes imply) and the state table program storage of locating.Accelerator 100 is under the control of host CPU via memory map registers.This accelerator can interrupt host CPU to point out exception, report to the police and to stop.When resolving beginning, pointer (112,114) is set to the beginning and the ending of the data for the treatment of resolved input buffer 130, and state table to be used (is based upon in the accelerator with other control information (for example, 142) shown in plot 182.
In order to start the operation of accelerator, CPU issues commands to accelerator, as response to this order, accelerator from the CPU program storage (for example, 120 or speed buffering) obtain first 32 words, and it is presented to input buffer 130, the first bytes/ascii character from input buffering 130, selects.Accelerator obtains status information and the current state corresponding to input character (just, the single character of the good working condition table in Fig. 4 A corresponding diagram 1 or single row).This status information comprises NextState address and pending for example interrupts of CPU or arbitrary specific operation of termination.Like this, status information not only support detection in advance to the single string of being concerned about, and support nested string (as above hint) and sequence of going here and there or the detection of token (for example word of file Chinese version or phrase) accordingly.Sending of the interruption of sending in response to above-mentioned detection and/or internal control that is not restricted to resolver unusually and token, report to the police or additionally start the processing that other provides other function but may produce, for example intercept undesirable Email or stop (objectionable) theme that oppose or content-based Route Selection (may by the realization that sends of special token).
Next accelerator is selected to treat next analyzed byte from input buffer 130, and is utilized available new state information to repeat above-mentioned processing with totalizer 150.The storage of aforesaid operations or token information can be carried out concomitantly.This execution continued before all four characters of input word are analyzed.Then (or with resolve the 4th character obtaining in advance concomitantly), impact damper 112,114 is compared to determine whether to have arrived the end of archive buffer 120, if reached the end of archive buffer, then interrupts being sent out back CPU.Otherwise, obtaining new word, impact damper 112 is updated, and above-mentioned processing is repeated.
Because the pointer sum counter realizes in the hardware of special use, thus they can be upgraded concurrently, if rather than picture required serial renewal when realizing with software.This just with the time decreased of the byte of resolution data to carrying out the following required time of action, that is, from local input buffer, obtain character, from the local character palette of high speed storer, produce the state table address, from storer, obtain corresponding state table entries and from local high-speed buffer, take out NextState information once more.The certain operations of aforesaid operations can be carried out in independent parallel pipeline concurrently, and other operation of appointment can character be performed when resolving continuing further in state table information (partly or wholly providing by the NextState palette).
Therefore can be clear that very that the present invention is by little and specialized hardware economic quantities provides the abundant acceleration of dissection process.When parser accelerator can interrupts of CPU, above-mentioned processing operate in initial command after moved on to parser accelerator from CPU fully.
Although the present invention describes according to independent preferred implementation, those of ordinary skill in the art will recognize that in the scope of spirit of the present invention and appended claim, also can make modification to the present invention.

Claims (14)

1. parser accelerator comprises:
File memory;
The character palette comprises a plurality of addresses, the character in the respective file of described address;
State table comprises a plurality of clauses and subclauses corresponding with described character;
The NextState palette comprises state address or side-play amount; And
According to the token buffer of state table entries storage token, wherein
Described clauses and subclauses in the described state table comprise at least one in the address that enters described NextState palette and token, and the token address in the state table is stored in the token buffer.
2. parser accelerator as claimed in claim 1, wherein, described character palette, described state table and described NextState palette form streamline.
3. parser accelerator as claimed in claim 2, wherein, described character palette, described state table and send in the NextState palette each to comprise the part separately of the state table information of compressed format respectively.
4. parser accelerator as claimed in claim 1, wherein, described NextState palette comprises the NextState address portion of the address that enters the clauses and subclauses in the described state table and treats stored token value.
5. parser accelerator as claimed in claim 1 further comprises the device that is used for detecting at string the character that can not cause the state change.
6. parser accelerator as claimed in claim 5 further comprises the device that the further storage operation that need not to be used for the state table access can next character of fast processing.
7. parser accelerator as claimed in claim 2, wherein, institute reaches streamline and implements with hardware mode.
8. parser accelerator as claimed in claim 2, wherein, described streamline constitutes and comprises the ring that is used for device that the NextState address is combined with state table index from described character palette.
9. resolve electronic document is concerned about string to discern method for one kind, said method comprising the steps of:
Storage a plurality of addresses corresponding in the character palette with character, a plurality of clauses and subclauses of storage in state table, and in the NextState palette store status address with the formation ring-shaped assembly line, with the part of the string that detects described care;
From described state table, obtain token information, and store token into token buffer;
Carry out concurrently to the storage of described token information and to the detection of the part of the string of described care.
10. method as claimed in claim 9 comprises following further step:
Detect the sequence of the string of being concerned about; And
The step that responds described detection sequence is sent special token with control and treatment.
11. method as claimed in claim 10, wherein, the sequence of the string of described care comprises nested string.
12. method as claimed in claim 10, wherein, the sequence of the string of described care is corresponding with text word or text phrases in the file.
13. method as claimed in claim 10, wherein, the prevention of described further processing execution message.
14. method as claimed in claim 10, wherein, the content-based Route Selection of described further processing execution.
CNB2003801061657A 2002-10-29 2003-10-03 Hardware parser accelerator Expired - Fee Related CN100430896C (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US42177302P 2002-10-29 2002-10-29
US60/421,774 2002-10-29
US60/421,775 2002-10-29
US60/421,773 2002-10-29
US10/331,315 2002-12-31

Publications (2)

Publication Number Publication Date
CN1726464A CN1726464A (en) 2006-01-25
CN100430896C true CN100430896C (en) 2008-11-05

Family

ID=35925173

Family Applications (3)

Application Number Title Priority Date Filing Date
CNB2003801061642A Expired - Fee Related CN100357846C (en) 2002-10-29 2003-10-03 Intrusion detection accelerator
CNB2003801061661A Expired - Fee Related CN100380322C (en) 2002-10-29 2003-10-03 Hardware accelerated validating parser
CNB2003801061657A Expired - Fee Related CN100430896C (en) 2002-10-29 2003-10-03 Hardware parser accelerator

Family Applications Before (2)

Application Number Title Priority Date Filing Date
CNB2003801061642A Expired - Fee Related CN100357846C (en) 2002-10-29 2003-10-03 Intrusion detection accelerator
CNB2003801061661A Expired - Fee Related CN100380322C (en) 2002-10-29 2003-10-03 Hardware accelerated validating parser

Country Status (1)

Country Link
CN (3) CN100357846C (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4945410B2 (en) * 2006-12-06 2012-06-06 株式会社東芝 Information processing apparatus and information processing method
US8117347B2 (en) * 2008-02-14 2012-02-14 International Business Machines Corporation Providing indirect data addressing for a control block at a channel subsystem of an I/O processing system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5414833A (en) * 1993-10-27 1995-05-09 International Business Machines Corporation Network security system and method using a parallel finite state machine adaptive active monitor and responder
CN1118899A (en) * 1994-09-13 1996-03-20 松下电器产业株式会社 Editing and translating programmer
CN1173676A (en) * 1996-07-11 1998-02-18 株式会社日立制作所 Documents retrieval method and system
US5995963A (en) * 1996-06-27 1999-11-30 Fujitsu Limited Apparatus and method of multi-string matching based on sparse state transition list

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5319776A (en) * 1990-04-19 1994-06-07 Hilgraeve Corporation In transit detection of computer virus with safeguard
US5799307A (en) * 1995-10-06 1998-08-25 Callware Technologies, Inc. Rapid storage and recall of computer storable messages by utilizing the file structure of a computer's native operating system for message database organization
JP3958902B2 (en) * 1999-03-03 2007-08-15 富士通株式会社 Character string input device and method
US6427202B1 (en) * 1999-05-04 2002-07-30 Microchip Technology Incorporated Microcontroller with configurable instruction set
CA2307529A1 (en) * 2000-03-29 2001-09-29 Pmc-Sierra, Inc. Method and apparatus for grammatical packet classifier
AUPQ849500A0 (en) * 2000-06-30 2000-07-27 Canon Kabushiki Kaisha Hash compact xml parser
CN1132390C (en) * 2001-03-16 2003-12-24 北京亿阳巨龙智能网技术有限公司 Telecom service developing method based on independent service module

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5414833A (en) * 1993-10-27 1995-05-09 International Business Machines Corporation Network security system and method using a parallel finite state machine adaptive active monitor and responder
CN1118899A (en) * 1994-09-13 1996-03-20 松下电器产业株式会社 Editing and translating programmer
US5995963A (en) * 1996-06-27 1999-11-30 Fujitsu Limited Apparatus and method of multi-string matching based on sparse state transition list
CN1173676A (en) * 1996-07-11 1998-02-18 株式会社日立制作所 Documents retrieval method and system

Also Published As

Publication number Publication date
CN100380322C (en) 2008-04-09
CN100357846C (en) 2007-12-26
CN1726464A (en) 2006-01-25
CN1735850A (en) 2006-02-15
CN1726465A (en) 2006-01-25

Similar Documents

Publication Publication Date Title
TWI730654B (en) Method and device for deploying and executing smart contract
US20040083466A1 (en) Hardware parser accelerator
US11307990B2 (en) Deploying a smart contract
AU2003277248B2 (en) Intrusion detection accelerator
US7458022B2 (en) Hardware/software partition for high performance structured data transformation
US7328403B2 (en) Device for structured data transformation
CN103365992B (en) Method for realizing dictionary search of Trie tree based on one-dimensional linear space
WO2006029508A1 (en) Highly scalable subscription matching for a content routing network
EP1678643A1 (en) Expression grouping and evaluation
US20060259508A1 (en) Method and apparatus for detecting semantic elements using a push down automaton
US20070061884A1 (en) Intrusion detection accelerator
US10585871B2 (en) Database engine for mobile devices
CN109144514B (en) JSON format data analysis and storage method and device
US20090055728A1 (en) Decompressing electronic documents
US7461370B2 (en) Fast hardware processing of regular expressions containing sub-expressions
CN100430896C (en) Hardware parser accelerator
US20090171651A1 (en) Sdram-based tcam emulator for implementing multiway branch capabilities in an xml processor
EP1579320A2 (en) Hardware parser accelerator
Fischer et al. Engineering a distributed full-text index
JP2006505044A (en) Validation parser accelerated by hardware
CN117349295A (en) Word frequency statistics method and device
CN115563353A (en) Character string processing method, device, equipment and medium
US20060080290A1 (en) Extension for lexer algorithms to handle unicode efficiently
Dorfman et al. Hash Crash and Beyond

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20081105

Termination date: 20101003