WO1997007452A1

WO1997007452A1 - Programmable compiler

Info

Publication number: WO1997007452A1
Application number: PCT/US1996/013169
Authority: WO
Inventors: Gregory C. Thompson
Original assignee: International Software Machines
Priority date: 1995-08-15
Filing date: 1996-08-14
Publication date: 1997-02-27
Also published as: AU6846196A

Abstract

A programmable compiler converts a set of instructions in any language into corresponding instructions in another language, and re-architects the program structure to control the CPU requirements and the I/O requirements of the target language. First, software source modules are examined using a lexical analyzer that produces a stream of grammar and symbol values. Identifiers are compared against a list of keywords to determine their level of grammatical content and level of abstraction. A stream of grammar is partitioned into grammatical segments of arbitrary length and content using an arbitrary and virtual concept of segment separators as determined by a programmable set of segmentation conditions.

Description

PROGRAMMABLE COMPILER

BACKGROUND OF THE INVENTION This invention is a programmable compiler for computer programs. In particular, it is a process for converting a set of instructions in any language into corresponding instructions in another language, and re- architecting the program structure so as to control the CPU requirements and the I/O requirements of the target language. The fact that the compiler is programmable makes it possible to make such a conversion in a relatively short time.

A compiler is a program that reads a program written in one language - the source language - and translates it into an equivalent program in another language - the target language. As an important part of this translation process, the compiler reports to its user the presence of errors in the source program. This description is taken from the text "Compilers: Principles, Techniques, and Tools," by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, Addison-Wesley Publishing Company, Reading MA 1986; reprinted with corrections 1988. The principal use of compilers in the past has been to translate from a source language such as Fortran or Basic into machine language to control a computer, and many compilers performing these functions are now in use.

The present state of the art is represented in Goss et al., U.S. Patent 4,667,290, entitled "Compilers Using a Universal Intermediate Language;" Bourne, U.S. Patent 4,787,035, entitled "Meta-Interpreter, " and Bourne, U.S. Patent 4,905,138, also entitled "Meta- Interpreter," a division of the '035 patent, all of which are incorporated here by reference as if set forth fully.

The '290 patent teaches the translation of a computer program written in a source or high-level language such as BASIC, FORTRAN, C, COBOL, PASCAL, PL/I, ADA, or RPG, into an intermediate language which is then translated into object code for a target machine. The '290 patent lists as an object to develop a group of compilers, one for each of the high-level languages. This differs from the present invention, which comprises a single compiler that is adapted to different high-level languages.

The '035 and '138 patents (the specifications appear to be identical except for the claims) teach an interpreter that includes a parser that examines a message using grammar and lexical tables to produce a parse table. Parsing is a combination of partitioning and language validation, a step that is not a part of the present invention and the absence of which is a particular advantage of the present invention. The parse table of the '035 and '138 patents is compared to data needed in a semantics table to fire one or more conditional rules, causing a function table to be evaluated and perform some action based on the meaning of the message. This is a comparison of data with a look-up table to call for an action, which is in contrast to the operation of the present invention, which does not use data to call for an action, but instead uses the partition of the input instruction to call for an action. The particular application for which the Meta-Interpreter is designed is the control of a machining workcell that includes machines from different vendors.

The three patents noted above all perform much like conventional compilers in that they transform a single language, usually a high-level language, into a lower-level language. This is usually, although not necessarily, machine language. The meta-interpreter appears to have a separate embodiment for each source language, transforming that source language into a single intermediate language. In contrast, the present invention is adaptable to a number of input source languages and a number of target languages. This is of especial utility in converting custom software written for main-frame computers into programs useful in personal computers and workstations .

SUMMARY OF THE INVENTION It is an object of the invention to make a better compiler.

It is a further object of the present invention to provide a single compiler that translates directly from any of a number of source languages into any of a.number of target languages.

It is a further object of the present invention to provide a programmer with the ability to modify existing source code and to produce large quantities of new source code by program or algorithmic means . It is a further object of the present invention to remove the 1:1 translation restrictions of previous language transformation systems, enabling n:m transformations .

It is a further object of the present invention to describe a set of languages that operate on language and operate on grammar.

It is a further object of the present invention to provide the programmer of a system with the ability to segment and arbitrarily transform variable length segments.

It is a further object of the present invention to detect redundant code segments with single-subroutine implementations .

It is a further object of the present invention to permit the programmer to alter the architecture of an application.

It is a further object of the present invention to permit the programmer to reorganize source code along new paradigm lines . It is a further object of the present invention to permit the programmer to convert lines of code into functions . It iε a further object of the present invention to permit the programmer to describe algorith ically the steps necessary to convert organizations into calling sequences . It is a further object of the present invention to permit the programmer to describe algorithmically the restructuring of code into object/claims.

It is a further object of the present invention to permit a programmer to describe algorithmically how to modify the grammar of partitioned segments.

It is a further object of the present invention to permit the programmer to manipulate large quantities of language instances by modifying the grammatical pattern by means of defined algorithms . It is a further object of the present invention to describe a computer-like organization and description of a language and grammar computational system in which the sequence of instructions determines the language of grammar transformations. It is a further object of the present invention to eliminate the parser as an integral component of a language and grammar computational system, given input and output language processors that perform syntax analysis . It is a further object of the present invention to describe a system in which the level of abstraction and detail is controlled through the dynamically alterable description of words that constitute keywords or reserved words . It is a further object of the present invention to describe a system which, through the elimination of the requirement to perform syntax analysis, permits the system to operate in a language- and grammar- independent fashion. These and other objects are achieved by a single general-purpose programmable language- and grammar computing system that is capable of pipelined operations . The language and grammar computing system examines software source modules using a lexical analyzer producing a stream of grammar and symbol values . Should the output destination language require it the grammar stream can be preprocessed by a context and/or ambiguity resolver such that the grammar stream itself is modified to produce the necessary stream to follow on stages of the system.

Identifiers are compared against a list of keywords to determine their level of grammatical content and level of abstraction. The grammatical representation is such that the semantics of the original program are immediately apparent and no additional semantic attachment is necessary. The stream of grammar is partitioned into grammatical segment of arbitrary length and content using an arbitrary and virtual concept of segment separators as determined by a programmable set of segmentation conditions. Variable length segments permit m:n transformations of language. The segment is immediately and uniquely identified using any combination of sorting, hashing, or B-tree or searching algorithms or addressing to identify the address of a segment handling routine to which control is optionally transferred by direction of the executing instructions . Segment handling is performed under program control that results in the capability to transform language, grammar and semantics. The present invention is particularly well suited, to the re- architecting, reengineering, re-structuring, and re- languaging, and re-housing of software systems and is well suited to the development of language and application programs.

A key differentiation between this language and grammar computational system is the recognition that language functions of lexical analysis and partitioning may be performed in any order to accomplish language transformations. This recognition has led directly to the stored program instruction set description of this invention as a means of conveying the programmable nature and use of language to operate upon language. Differentiating this language programming system are the concepts of operations on a continuous stream of grammar tokens, the division of the stream of tokens into arbitrarily sized grammatical segments, and further operations on the segments to achieve high levels of programmer productivity. Also differentiating this system from others is the general purpose programmable algorithmic instruction stream oriented nature of this invention. Also differentiating this system from others is the complete lack of syntax analysis, table lookups, or comparisons necessary to determine or attach semantics. The partitioned segment is in and of itself sufficient to produce a recognition and identify the semantic content of the segment. This system is also differentiated by not requiring any pre-built or in process derived parsing tables. The system is capable of working on data as well as source languages. May offer electronic equipment manufacturers with modifications to their hardware designs that will better facilitate language processing.

SUMMARY

DIFFERENTIATING PARTITIONING FROM PARSING Prior art performs parsing which is the state analysis of incoming token streams designed to determine the legality of a particular sentence. In a parser the current state of a machine is compared with an incoming token to determine the next state.

This process continues until an error or legal sentence is recognized. The process of parsing is always associated in concert with the simultaneous act of language validation. This combination of language validation and partitioning is what ultimately results in single language implementations of compiler technology. Parsing as described in the scientific literature is a combination of partitioning and language validation.

This patent describes a process of partitioning which involves solely the recognition of starting and ending points within token εtreams . The method by which a starting and ending point is recognized is programmable, variable, and may consist of any combination of real and virtual methods of determining the beginning and ending points . Partitioning may consist in its simplest form of looking for a period or semicolon commonly used as terminators in machine languages. It may include specific tokens within a stream (if, then, do, while, ..etc. ) and it may involve even more complex decisions involving grammatical classification (token), symbolic value, and counted values (3rd if from the paragraph 314000-section, for example). With the combination of virtual and real partitioning the detection of beginning and ending points can be made arbitrarily precise. Partitioning never involves validation and therefore is not limited in the number of languages that it may handle.

BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is an overall functional block diagram of a system for the practice of the present invention.

Fig. 2 is an overall functional block diagram that is an expanded version of Fig. 1, and includes more detail of the operation of the invention.

Figs. 3A - 3D together are a flow chart of the steps of the present invention.

Fig. 4 is an overall functional block diagram of a system for the practice of the present invention that describes the physical organization of the system.

DETAILED DESCRIPTION OF THE INVENTION

Fig. 1 is an overall functional block diagram of a system for the practice of the present invention. In

Fig. 1, after a start 10, a read input instruction 12 selects information that is caused to generate a grammar stream 14. A read control language instruction 16 directs the performance of control instructions 18 which- controls the transformation 20 of the grammar stream into an output 22, followed by an end instruction 24. There are no parsing actions and no abstract syntax tree, in contrast to the art.

Fig. 2 is an overall functional block diagram that is an expanded version of Fig. 1, and includes more detail of the operation of the invention.- In Fig. 2, a block 30 directs the reading of the source, and a block 32 generates a grammar stream comprising both the grammatical tokens and the lexemes that is taken to a decision block 34. The decision block 34 decides whether to partition the grammar stream according to a rule. If there is no partition, a block 36 adds a grammatical atom (a token plus the detected symbol) to the segment resulting from the partition, a block 38 directs the saving of atoms, and the output of the block 38 is taken back to block 30. A token is an encoded symbol representing the grammatical content of the word that has just been detected by the lexical analyzer that generates the grammatical stream, and an atom is the token plus the detected word or symbol. If there is a partition from decision block 34, a block 40 adds the token to the segment, which is a concatenation of atoms to an arbitrary length. A block 42 directs the saving of atoms, a block 44 directs the unique identification of like segments, and a block 46 directs the transformation of like segments according to their identification. The output of the block 46 is the translated code from the source. This is in contrast to the definition of a compiler given in Aho, who writes, "There are two parts to compilation: analysis and synthesis. The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program. The synthesis part constructs the desired target program from the intermediate representation.... During analysis, the operations implied by the source program are determined and recorded in a hierarchical structure called a tree." The present invention proceeds directly to the target program without creating an intermediate representation or requiring a tree. This allows an increase in speed of compiling and enables the user to compile different programs with the same compiler. Figs. 3A - 3D together are a flow chart of the steps of the present invention. In Figs. 3A-3D, the stored instructions of the program to be translated are made available to the programmable compiler. The next instruction is fetched from storage, it is decoded, and it is tested for "read token." This means call the lexical analyzer. If the instruction is "read token," the next token is read and another instruction is fetched. If it is not, the instruction is tested for "load token." If that is the instruction, the token is loaded from memory; if it is not, the tests continue in sequence for "write token to media," "add token to segment register," "compare token," "conditional branching" and "finer lex," which calls for lexical analysis at a higher order than that applied to read the token.

The tests for instructionε continue through the possible instructions as follows. If the next instruction is not "finer lex," the program tests for "discard token," "exchange token," "push context," "pop context," "call," "change source stream," "divide segment," "concatenate segments," "another standard language function," "action," "goto," "shift," "reduce," "accept," "error," "symbolic store, 11 "exchange identifier," "make identifier a keyword," "remove keyword," "input token to text," "output token to text," and "is next instruction illegal." If control passes all the way to "is next instruction illegal," or if any of the tests is met, the appropriate action is taken and the program fetches the next instruction.

Fig. 4 is an overall functional block diagram of a system for the practice of the present invention that describes the physical organization of the system. In Fig. 4, blocks I through 4 describe modifications to the source language preparing the source for language processing. Block 4 indicates that any number of source file edits may be made in the source file pipeline. Blocks 9-11 indicate the active modification/editing of the input stream to a count for contextual blocks: BEGIN-END, WHILE, DO, IF-THEN-ELSE, etc. Block 11 indicates that any number of input stream modifications may be mad in the grammar/atom input pipeline. Block 14 is the partitioning function; block 15 outputs unique segments, and block 16 dispatches the appropriate segment or rule specific transformation to rule blockε 19, 20, or 21. Block 23 indicates that multiple segments may be stored internally in registers for n:m operations and the construction of larger grammatical patterns. The result is the ability to describe algorithms and procedures for operating on language and also on grammar to enable the translation of a source language into the target language and to control the modularity of the program - how it is broken into pieces and how the pieceε communicate with each other - and how to change the semantics of the input language into new and different semantics in the target language. The invention has been described in terms of a specific embodiment, but it should be understood that the invention extends to the breadth of the claims .

Claims

1. A process for transforming a first statement written according to a first set of grammatical rules into a second statement written according to a second set of grammatical ruleε using a computer, the process comprising the steps of: analyzing said firεt statement to obtain an information stream, said information stream including a plurality of grammar elements and a plurality of value elements associated with said grammar elements; partitioning at least a portion of said information stream into at least one segment; and transforming said segment into said second statement in accordance with at least one transfor- mation rule asεociated with the grammar elements in said segment.

2. The process of claim 1 comprising in addition the steps of: removing said value elements from said segments; storing said value elements; and inserting said εtored value elementε into said second statement during said transformation step.

3. The process of claim 1 wherein said analyzing step comprises the step of performing a lexical analysis on said first statement to yield a plurality of tokens corresponding to said grammar elements, and to yield a plurality of lexemes corresponding to said value elements.

4. The process of claim 1 wherein said trans¬ forming step comprises the steps of: identifying said segment as being of a particular type; calling a transformation routine associated with the type of said identified segment; and running said transformation routine to transform said identified statement into said second statemen .

5. The process of claim 4 wherein said step of running comprises the steps of: accessing a rule table containing a plurality of transformation rules; and determining which of said transformation rules is to be used to transform said identified segment.

6. A process for transforming a first statement written in a first language into a second statement written in a second language comprising the steps of: performing a lexical analysis on said first statement to yield an information stream including a plurality of tokens and a plurality of lexemes associated with said tokens; partitioning at least a portion of said information stream into a plurality of segments in accordance with at least one partitioning rule; removing said lexemes from said segments and εtoring said lexemes; transforming said segments into a transformed grammar stream in accordance with at least one transformation rule; and inserting said stored lexemes into said transformed grammar stream to yield said second statement written in said second language.

7. The process of claim 6 wherein the step of partitioning comprises the steps of : scanning said information stream to locate punctuation in said information stream; and dividing said information stream based upon said located punctuation.

8. The process of claim 6 wherein said step of tranεforming comprises the steps of: identifying each of said segments as being of a particular type; calling a transformation routine associated with each type of said identified segments; and running said transformation routine to transform the identified segment associated with said transformation routine based upon said at least one transformation rule.

9. The process of claim 8 wherein said step of running comprises the steps of: accessing a rule table containing a plurality of transformation rules; and determining which of said transformation rules is to be used to transform said identified segments .

10. A procesε for transforming a plurality of first statements written according to a first set of grammatical rules into a plurality of second statements written according to a second set of grammatical rules comprising the steps of: analyzing said plurality of first statements to obtain an information stream, said information stream including a plurality of grammar elements and a plurality of value elements associated with said grammar elements; partitioning at least some portions of said information stream into a plurality of segments in accordance with at least one partitioning rule; identifying at least some of said segments as being of a particular type; and transforming each of said identified segments, based upon at least one transformation rule associated with said identified segment, to yield a second statement written according to said second set of grammatical rules.

11. The process of claim 10 further comprising: removing said value elements from said segments; storing said removed value elements; and inserting said stored value elements into said second statement during said transforming step.

12. The process of claim 10 wherein said tranεforming εtep includeε: acceεsing a list of transformation ruleε; selecting at least one of the transformation rules in said list to transform each of the identified segments; and applying the selected transformation rule to an identified segment to obtain a second statement.

13. The process of claim 10 wherein said transforming step includes: appending a transformed identified segment onto a stream of previously transformed identified statements .

14. A procesε for transforming a plurality of first statements written according to a first set of grammatical rules into a plurality of second statements written according to a second set of grammatical rules comprising the stepε of: performing a lexical analyεis on said plurality of first statements to yield an information stream, said stream including a plurality of tokens and a plurality of lexemes; partitioning said information stream into a plurality of segments in accordance with at least one partitioning rule; removing said plurality of lexemes from said partitioned information stream; storing said plurality of lexemes; determining the meaning of at least εome of said segments ; grouping the segments that have similar meanings into the same group to yield a plurality of groups; identifying each segment in the same group; calling at least one transformation routine associated with each of said identified segments; running said at least one transformation routine to transform each of said identified segments based upon at least one transformation rule to yield a plurality of transformed grammar streams; and inserting said stored lexemes into said transformed grammar streamε to yield εaid plurality of second statements .

15. The process of claim 14 wherein the step of running compriseε the step of transforming each of the identified segments in the same group into the same transformed grammar stream.

16. The procesε of claim 14 wherein said step of partitioning comprises the steps of: scanning said information stream to locate punctuation in said information stream; and dividing said information stream into segments based upon said located punctuation.