WO2000031626A1 - Method of identifying recurring code constructs - Google Patents

Method of identifying recurring code constructs Download PDF

Info

Publication number
WO2000031626A1
WO2000031626A1 PCT/CA1999/000993 CA9900993W WO0031626A1 WO 2000031626 A1 WO2000031626 A1 WO 2000031626A1 CA 9900993 W CA9900993 W CA 9900993W WO 0031626 A1 WO0031626 A1 WO 0031626A1
Authority
WO
WIPO (PCT)
Prior art keywords
code
block
fingerprint
statement
statements
Prior art date
Application number
PCT/CA1999/000993
Other languages
French (fr)
Inventor
Autumn Umanetz
Lingyan Shen
Original Assignee
Netron Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netron Inc. filed Critical Netron Inc.
Priority to AU64557/99A priority Critical patent/AU6455799A/en
Publication of WO2000031626A1 publication Critical patent/WO2000031626A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/36Software reuse

Definitions

  • the present invention is a software application designed to facilitate the re-use or re-engineering of existing software applications.
  • the invention identifies areas of code commonality, facilitating the extraction of the common code elements into reusable objects.
  • code analysis is to provide understanding of the structure of a legacy application.
  • type of information that is generally provided is: a) graphs showing program flow; b) charts showing data flow; c) listings of variable aliases; and d) where used information.
  • Re-engineering tools on the other hand are utilized to convert an existing application into a different type of application.
  • the most common target for conversion is to turn a mainframe application into a client server application. Neither of these solutions addresses the problem of identifying re-usability within the existing legacy application.
  • any legacy application there will be programmed a number of
  • business rules or logic components. For example, the method used to determine the sales taxes on a specific class of item. To effectively renovate legacy applications, an organization needs to first identify the inventory of their current business rules and then be able to reuse them in new applications.
  • the present invention aids in analyzing an existing application, to identify the business rules in the application.
  • a method for identifying recurring code constructs within the source code of a software application comprising the steps of:
  • step d) parsing the source code to create a syntax tree; b) traversing the syntax tree to identify blocks of code; c) creating a fingerprint for each block of code; d) submitting the fingerprints obtained in step c) to a classification engine; and e) providing the output from step d) to a user for analysis.
  • a system for identifying recurring code constructs within the source code of a software application comprising: a) a code parser; b) a create coarse fingerprint module for analyzing the output of said parser to produce a raw fingerprint file; c) a classification engine to classify data contained within the raw fingerprint file; and d) an output module to format the data output from the classification engine to the user for analysis.
  • a database interface said interface providing methods for accessing an object within a syntax tree, said interface comprising methods for: a) retrieving said object from said syntax tree by type or reference; b) retrieving information regarding the attributes of said object; and c) retrieving abstract type or relationship data of said object, given a string representing the name of said object.
  • Figure 1 is a schematic diagram illustrating the components utilized in fingerprinting a legacy application
  • Figure 2 is an illustration of the advanced classifications settings screen
  • Figure 3 is an illustration of the results provided by the classification engine
  • Figure 4 is an illustration of a finer classification screen
  • Figure 5 is a screen capture of the results obtained by running a finer classification
  • Figure 6 is an illustration of the filter settings screen.
  • the term "fingerprint" for this specification including the claims means a set of characteristics that are present within a block of source code. In the preferred embodiment these characteristics are: statement type, frequency of statement type, bytes input to statement type, and bytes output from statement type. A fingerprint however may be based upon other characteristics within a block of code, such other characteristics are described later.
  • the present invention relates to the developing of a "fingerprint" to identify the characteristics of a block of source code.
  • the purpose of the preferred embodiment of the present invention is to determine a method of identifying recurring code constructs without depending on:
  • code fingerprint should be desensitised to arbitrary differences in ordering of statements, i.e. differences in order that do not impact the functionality of the code.
  • Flow control such as branching procedure calls or interactive constructs
  • a fingerprinting system 10 is comprised of a parser 12 which takes as input, source code 14 and produces syntax tree 16.
  • parser 12 is the Revolve COBOL parser provided by MicroFocus Corporation.
  • Syntax tree 16 provides input data for create coarse fingerprint process 18.
  • the create coarse fingerprint process 18 analyzes syntax tree 16 and creates a persistent raw fingerprint database 20.
  • Raw fingerprint database 20 serves as input to aggregate fingerprint process 22.
  • a user request 24 provides input parameters to the aggregate fingerprint process 22 to aid in selecting the source code components within source code 14 that are of greatest interest with regard to re-usability.
  • the output of the aggregate fingerprint process 22 is then passed to a filter 26.
  • the user may interact with the filter 26 to prepare classification data for the source code modules of interest and passes that classification data to a classification engine 28.
  • the classification engine 28 then outputs the result of the classification to a formatting process 30 for presentation to the user for review in determining if there are any source code modules for potential reuse.
  • the invention uses the raw fingerprint database 20 to retain fingerprint information. This allows multiple passes of the classification to be done without re-parsing the input source code 14.
  • Formal boundaries are those provided by the language for the isolation of a set of statements required to accomplish some task.
  • FORTRAN and C functions provide formal boundaries.
  • COBOL paragraphs and sections provide formal boundaries.
  • Informal boundaries are boundaries that a programmer might identify as separating sets of statements. These boundaries might be identified by white space, comments or the beginning and end of block oriented statements in the language (e.g., an if and an end-if).
  • the last information required from the parser is what data is referenced or manipulated by the statement. All parsers provide this information. For each statement we need to know what data: a) is input to the statement (i.e., referenced or read); and b) is output or modified by the statement.
  • Location (j) + 1 ! WrapAround (LOWER, Location (j) + 1, UPPER) ) count++ ;
  • NumReturned upper - (lower - num) + 1; returned NumReturned; ⁇ long Bounded (long lower, long num, long upper) ⁇ long NumBounded; ⁇ From the C Example 1 above, within the function WrapAround, if (num > upper )
  • the present invention provides an object oriented wrapper interface to the syntax tree 16. This insulates the fingerprinting methodology from changes made to the API (Application Programming Interface) of the parser, and provides a level of abstraction.
  • API Application Programming Interface
  • the object oriented wrapper of the present invention utilizes two types of objects.
  • One type represents database objects stored in the syntax tree 16, and the other provides a database interface allowing access to the database objects within the syntax tree 16.
  • Database objects may be considered as nodes and relationships within the syntax tree 16.
  • Database objects are of three forms: Nodes, Relationships, and Types.
  • Nodes are things which actually exist in the source code. Examples of nodes are:
  • Relationships indicate interaction between nodes. They are directional. Examples include: a) an "OF" relationship points from a variable usage to its definition; b) a "HAS” relationship points from a paragraph or function to a statement; c) another "HAS" relationship points from a statement to
  • Types describe the attributes of an object. For example, a type of STATEMENT_MOVE would be used to describe a COBOL move statement, and a type of USAGE_VARIABLE_MOD indicates usage of a variable such that its value is modified.
  • the fingerprinting process gets the parser data according to the following algorithm:
  • SetCodeBlocks Get the statement types of interest from the user, save in SetStatementTypes
  • the classification engine 28 of the preferred embodiment is a software package known as Auto Class C v3.2.1, developed by the computational sciences division at NASA Ames Research Center. AutoClass makes use of Bayesian statistical methods. More information on AutoClass may be obtained at the web site: http://ic-www.arc.nasa.gov/ic/projects/bayes- group/autoclass/.
  • the classification engine 28 works best when each item to be classified is represented by a set of characteristics that have numeric values.
  • the input to the classification engine 28 comprises:
  • a set of control information dictating characteristics of the classification process (duration, number of attempts, expected number of classes or groups, convergence algorithm to be used); and ii) a list of items to be classified, where each item is represented by a tag for identification and a value for each of the characteristics of interest for the classification.
  • the classification engine 28 does not assign any meaning to any of the characteristics. Nor does it, a priori, weight any characteristics differently than others. It determines a weighting during classification based on the utility of the characteristic for grouping things. For example, if a characteristic has the same value for every single case then it is useless and has a weighting of zero. Nor does it care how many characteristics items have, though each item must have a value specified for each characteristic and the characteristics must be in the same order for each of the items being classified.
  • Non numeric or discretely valued (index) numeric values can be assigned to a characteristic.
  • the classification engine 28 recognizes when such values are the same, but makes no other attempt to compare non numeric input.
  • the characteristics of each item being input for classification represents the fingerprint for a block of code.
  • the output from the classification engine 28 is a set of groupings containing the items that were submitted for classification. Each submitted item will be placed in exactly one group.
  • the set of groupings represents the classification engine's best classification of the input.
  • each group represents a set of blocks of code that have similar fingerprints. If the blocks of code have similar fingerprints, they must contain a similar set of statements. The fact that every member of a group contains similar statements makes them candidates for further review to determine if blocks of code are in fact repetitions of the same code.
  • Create a coarse fingerprint 18 reads the syntax tree 16 and counts the number of occurrences of each distinct type of statement supported by the language. For COBOL this means counting the number of occurrences of:
  • Create coarse fingerprint 18 then counts the total number of bytes of input to all occurrences of each statement type within the block of code being fingerprinted. Finally, it counts the total number of bytes of output from each statement type within the block of code.
  • This information is made persistent by storing it in the Raw fingerprint database 20 to speed up subsequent classifications.
  • variable sizes given as:
  • a block of code can be treated as a gestalt, having inputs and outputs like any single statement within it.
  • the characteristics of these inputs and outputs comprise the macroscopic attributes of the block code.
  • Languages such as COBOL do not explicitly declare their inputs and outputs, and even languages such as C do not necessarily distinguish between inputs and outputs.
  • This section describes a method for automatic determination of those types from the structure of the code block and the code which surrounds it.
  • Loops complicate variables within and surrounding them as they make the order of statements less than clear.
  • the following examples help to illustrate how the preferred embodiment of the present invention determines whether a variable associated with a loop is classified as input or output.
  • Solution Redefine the problem. For input, given the last out-of-target modification prior to the first in-target reference; we simply require no interposing modification. For output the situation is reversed: There must be no interposing modification between the last in-target modification and the first subsequent out-of-target reference.
  • Algorithm Search starting at the outermost in-target use, and searching for the corresponding out-of-target use (forward for output, backward for input). Branch whenever a loop is exited (top or bottom), continuing one's search as though there were no loop, and continuing the other search at the opposite end of the loop.
  • An in-loop search branch fails instantly after completely traversing the loop once or returning to the original in-target use.
  • a search branch fails instantly upon striking an in-target modification (input) or non-target modification (output).
  • a search succeeds instantly upon striking a non-target modification (input) or a non-target reference (output). If any branch succeeds, the test has succeeded. It is potentially possible for a target to enclose non-target code. For example, the most general case of an object selected for re-use may not include several lines which appear somewhere in the input code. Therefore, the user may wish to remove them from the definition of the re-usable object.
  • the logic of the preferred embodiment of the present invention includes a way to not include the unwanted code, for the purpose of fingerprinting.
  • aggregate fingerprint process 22 then reads the raw fingerprint data 20 output by create coarse fingerprint 18.
  • the preferred embodiment has the following aggregations for COBOL: a) PERFORM & GO are aggregated to represent control flow b) IF & EVALUATE are aggregated to represent conditionals c) MOVE & SET are aggregated to represent assignment d) all IO statements are aggregated to represent file access e) all math statements are aggregated to represent arithmetic f) all non COBOL statements are aggregated together
  • Figures 2 and 4 illustrate this use of the aggregation, note for example, that in Figure 2 the "flow" box is checked to enable aggregation of Perform and Go statements, whereas in Figure 4 it is not checked to provide distinct counts for both perform and go statements.
  • the advanced control screen of Figures 2 and 4 has been provided for knowledgeable analysts to allow them to manipulate aggregation for any of the above groups. This allows them to differentiate between occurrences of statements with similar behaviour and in doing so, highlight more subtle differences in blocks of code. This can be important when the analyst is interested in:
  • the filter step 26 filters out characteristics that should be excluded from the current classification. This is done under user control. Excluding some characteristics is useful to draw out different similarities between blocks of code. As illustrated in Figure 6, an end user of the product can filter out the following characteristics:
  • I/O - excluding these statements desensitizes the classification to the stylistic differences between programmers who isolate their IO statements and those who code them inline;
  • data - excluding the consideration of data highlights similarities in the types of statements. This is useful when searching for commonality in code structure that may be repeated for use with very different sets of data. For example the same algorithm may be applied against a single instance of data in some cases and against arrays of varying sizes in other cases. Excluding data desensitizes the classification to variances in data; and c) Logic & flow - excluding these statements filters out the control structure around algorithmic logic. This flattens the code structure and highlights similarities in data movement and transformation. In use, the user would select the source code modules 14 that comprise the project of interest, and input them to the parser 12 so that a syntax tree 16 may be generated.
  • the create coarse fingerprint process 18 analyzes the syntax tree 16 and creates raw fingerprint data database 20.
  • Raw fingerprint data database 20 contains counts of each type of statement and the number of input and output fields within each functional block of the selected source code 14.
  • the user may then manipulate the data contained in raw fingerprint database 20 to provide the desired input to classification engine 28.
  • the typical user will have access only to the classification setting screen of Figure 6 which allows them to simply select aggregate groups of statements or data. Analysts more familiar with the functionality of classification engine 28 will have access to the advanced classification settings screen illustrated in Figures 2 and 4.
  • Figure 2 illustrates a screen capture of the advance classification settings available to analysts more familiar with the functionality of the classification engine 28.
  • the minimum statements option 32 has been set to five statements.
  • the use data option 34 has been turned off so as not to consider the size of data fields when determining the characteristics to be selected for input to the classification engine 28.
  • the statement type menu 36 has all of the high level grouping attributes selected. Thus, for example, both Perform and Go statements will be considered identical for the purpose of determining the characteristic of a block of code.
  • the advanced classification settings screen provides for finer control over the level of granularity than that provided by the default aggregation classification settings provided to the typical user.
  • the autoclass settings section 38 enables a user knowledgable with the functionality of the classification engine 28 to set a plurality of input variables.
  • the classification engine 28 uses a random seed to direct the search for its initial classification. Subsequent classifications are refinements of the first one. Using a known seed, rather than a random one, results in a reproducible, (albeit potentially useless), set of classifications. This facility is particularly useful for demonstration purposes.
  • the search length selection box 41 indicates the duration of the search for new classifications that better satisfy the criteria selected. It can be specified in seconds, or in terms of the number of attempts at refinement. As can be appreciated, any of the input variables accepted by the classification engine 28 may be provided for the user in this advanced classification settings interface.
  • Figure 3 illustrates the default groupings created by the classification engine 28 on a set of source code 14 input to the parser 12.
  • the formal output process 30 has broken the source code files selected into a number of groups 42. Within each group 42 is a paragraph heading (as this is a COBOL example) 44, the file name 46 which contains the paragraph 44, and a probability ranking 48.
  • the probability ranking is a simple comparison of the attributes of a given code item to the mean values for the classification in which it appears.
  • the displaying of groups 42 in the output screen allows the user to focus on the types of logic that they are primarily interest in.
  • a user can look at a couple of representative members of a group 42, and quickly characterize the group as I/O handling. Such a group can then be deleted and discarded, or deferred for later analysis.
  • the user may now re- classify the code blocks using a finer classification pattern.
  • a screen capture of the advanced classification settings screen the use data option 34 has been selected.
  • the high level attributes have been turned off to eliminate grouping of statements and provide much finer statement resolution.
  • the intent of this finer resolution is to create a group of sub- classifications, which in general terms may be similar but differ in minutely different ways.
  • An example might be a coarse classification which finds a classification containing several date handling routines.
  • Fine classification might create two sub groups of the date handling routines, one of which is year 2000 compliant and another which is not.
  • Figure 5 illustrates the output of the finer classification requested by the user in Figure 4.
  • the groups of Figure 5 are much different from the groups of Figure 3.
  • Further group 1 of Figure 5 has been sub-divided into two sub groups. This creation of sub groups is a result of the characteristics of the code items being significantly different, although similar enough to have been grouped together. Re-classifying with different settings changes the nature of the attribute set that is input to the classification engine 28 and thus enables it to find sub groupings.
  • Figure 6 is a capture of the code classification settings screen available to the user when filtering the output of the aggregate fingerprint process 22. As shown the user may choose to exclude three general types of code from consideration for a particular classification:
  • fast classification is a pre-defined setting for the user dialogue indicating a value of ten tries.
  • Non-fast classification equates to fifty tries.
  • the user has also requested that there be a minimum of ten statements in a paragraph (this being a COBOL example) in order for a block of code to be considered for fingerprinting.
  • various characteristics other than the characteristics selected for the preferred embodiment may be utilized in creating a fingerprint for a block of code. The inventors have found promise in the following characteristics:
  • the McCabe metric for the block a measure of cyclomatic complexity of the directed acyclic graph which represents the flow of control within each block, this reflects the number of logic decisions within the block;

Abstract

A method of identifying recurring or common logical code elements within the source code of a computer application. Parsing of the source code of a set of files within the application produces a syntax tree which is then traversed to identify blocks of code. A fingerprint is created for each block of code, each fingerprint containing a characteristic for each type of statement located within the block. The characteristic consisting of a vector containing: the statement type, the number of occurrences of the statement, the number of bytes of data input to the statement and the number of bytes of data output from the statement. The user may select that only certain types of statements are to be considered in creating a fingerprint. The user may also choose to aggregate types of statements into a single characteristic when creating the fingerprint. The fingerprints for each block of code are then submitted to a Bayesian classification engine which places the blocks of code into common groups based upon their fingerprints and displays them to the user. The user may then browse the selected groups to determine if there exist modules within the application that may be reused, redeployed or re-engineered.

Description

Title: METHOD OF IDENTIFYING RECURRING CODE CONSTRUCTS
FIELD OF THE INVENTION
The present invention is a software application designed to facilitate the re-use or re-engineering of existing software applications. The invention identifies areas of code commonality, facilitating the extraction of the common code elements into reusable objects.
BACKGROUND OF THE INVENTION
It is well known that many organizations continue to use and maintain what are known in the computer software industry as "legacy" applications. These legacy applications have recently gained media attention with regard to the year 2000 problem. Regardless of the year 2000 problem, organizations are repeatedly deciding whether or not to renovate or replace their existing legacy applications. For example, transitioning an application to make it available on an internet, or structuring the application so that it may be distributed. The process of renovating is selected in the hope of making legacy applications easier to maintain as well as more easily modified. The problem with renovating or re-engineering legacy applications, is that quite often the original developers are unavailable to help explain the structure of the application, nor is development or design documentation available. In re-engineering legacy applications there are two traditional approaches: a) code analysis; and b) re-engineering.
The purpose of code analysis is to provide understanding of the structure of a legacy application. In analyzing code the type of information that is generally provided is: a) graphs showing program flow; b) charts showing data flow; c) listings of variable aliases; and d) where used information.
Re-engineering tools on the other hand are utilized to convert an existing application into a different type of application. The most common target for conversion is to turn a mainframe application into a client server application. Neither of these solutions addresses the problem of identifying re-usability within the existing legacy application. In any legacy application, there will be programmed a number of
"business rules" or logic components. For example, the method used to determine the sales taxes on a specific class of item. To effectively renovate legacy applications, an organization needs to first identify the inventory of their current business rules and then be able to reuse them in new applications. The present invention aids in analyzing an existing application, to identify the business rules in the application.
The ability to recognize common business rules or logic components has a number of advantages namely;
a) when maintaining an application, having the ability to ensure that all instances of the logic to be altered have been located; b) improving the flexibility of the legacy application by replacing inline code with components; c) creating new applications based upon the business rule or logic components which are embedded in existing applications; and d) obtaining a better understanding of the components in the legacy application, in particular being able to understand what gets done where, and reorganizing code that may be duplicated in many portions of the application. Thus there is a need for a software tool to identify and thus provide for isolation of common business rules within a legacy application.
BRIEF SUMMARY OF THE INVENTION
A method for identifying recurring code constructs within the source code of a software application, the method comprising the steps of:
a) parsing the source code to create a syntax tree; b) traversing the syntax tree to identify blocks of code; c) creating a fingerprint for each block of code; d) submitting the fingerprints obtained in step c) to a classification engine; and e) providing the output from step d) to a user for analysis.
A system for identifying recurring code constructs within the source code of a software application comprising: a) a code parser; b) a create coarse fingerprint module for analyzing the output of said parser to produce a raw fingerprint file; c) a classification engine to classify data contained within the raw fingerprint file; and d) an output module to format the data output from the classification engine to the user for analysis.
A method for determining whether variables in a block of code are to be considered as input to or output from said block of code, independent upon access to or modification of said variables within and without said block of code.
A database interface, said interface providing methods for accessing an object within a syntax tree, said interface comprising methods for: a) retrieving said object from said syntax tree by type or reference; b) retrieving information regarding the attributes of said object; and c) retrieving abstract type or relationship data of said object, given a string representing the name of said object.
A method for excluding statements from within a block of code, said statements being excluded from consideration when calculating a fingerprint for said block of code.
BRIEF DESCRIPTION OF THE DRAWINGS
For a better understanding of the present invention, and how to show more clearly how it may be carried into effect, reference will now be made by way of example, to accompanying drawings which show a preferred embodiment of the present invention and which:
Figure 1 is a schematic diagram illustrating the components utilized in fingerprinting a legacy application; Figure 2 is an illustration of the advanced classifications settings screen;
Figure 3 is an illustration of the results provided by the classification engine;
Figure 4 is an illustration of a finer classification screen; Figure 5 is a screen capture of the results obtained by running a finer classification; and
Figure 6 is an illustration of the filter settings screen.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
The term "fingerprint" for this specification including the claims means a set of characteristics that are present within a block of source code. In the preferred embodiment these characteristics are: statement type, frequency of statement type, bytes input to statement type, and bytes output from statement type. A fingerprint however may be based upon other characteristics within a block of code, such other characteristics are described later. The present invention relates to the developing of a "fingerprint" to identify the characteristics of a block of source code. The purpose of the preferred embodiment of the present invention is to determine a method of identifying recurring code constructs without depending on:
a) naming conventions; b) consistency of coding style; or c) consistency in alignment of code.
Further, the code fingerprint should be desensitised to arbitrary differences in ordering of statements, i.e. differences in order that do not impact the functionality of the code.
Most computer languages support the same basic types of operations:
1. Flow control such as branching procedure calls or interactive constructs;
2. Conditionals such as if, case and evaluate;
3. Assignments such as set, move and equals;
4. Arithmetic; and 5. I/O
Many different statement types may be used to achieve essentially the same result. For example the choice between a FOR or WHILE loop control is arbitrary; it is most often dictated by the type of boundary condition the programmer was most comfortable with.
Referring first to Figure 1, a schematic diagram illustrating the components utilized in fingerprinting a legacy application. A fingerprinting system 10 is comprised of a parser 12 which takes as input, source code 14 and produces syntax tree 16. In the preferred embodiment parser 12 is the Revolve COBOL parser provided by MicroFocus Corporation. As can be appreciated by one skilled in the art, the present invention is not restricted to the Revolve parser or the COBOL language. Syntax tree 16 provides input data for create coarse fingerprint process 18. The create coarse fingerprint process 18 analyzes syntax tree 16 and creates a persistent raw fingerprint database 20. Raw fingerprint database 20 serves as input to aggregate fingerprint process 22. A user request 24 provides input parameters to the aggregate fingerprint process 22 to aid in selecting the source code components within source code 14 that are of greatest interest with regard to re-usability. The output of the aggregate fingerprint process 22 is then passed to a filter 26. The user may interact with the filter 26 to prepare classification data for the source code modules of interest and passes that classification data to a classification engine 28. The classification engine 28 then outputs the result of the classification to a formatting process 30 for presentation to the user for review in determining if there are any source code modules for potential reuse.
In order to establish a fingerprint for a block of code, the following information is required from the parser 12:
a) identification of boundaries for each block of code; b) identification of the set of statements within each block of code; and c) identification of the data acted on by each statement.
All of this information is readily available from any parser. Because the classification is most meaningful when done on large numbers of programs, it is helpful if the fingerprint information is persistent. Thus, the invention uses the raw fingerprint database 20 to retain fingerprint information. This allows multiple passes of the classification to be done without re-parsing the input source code 14. Boundary of a Block of Code
Most programs are written with a combination of formal boundaries and informal boundaries. Formal boundaries are those provided by the language for the isolation of a set of statements required to accomplish some task. In FORTRAN and C, functions provide formal boundaries. In COBOL, paragraphs and sections provide formal boundaries. Informal boundaries are boundaries that a programmer might identify as separating sets of statements. These boundaries might be identified by white space, comments or the beginning and end of block oriented statements in the language (e.g., an if and an end-if).
This means that the identifiers for each block of code delimited by specific boundaries, need to be extracted from the parser.
COBOL Example 1
Process-Transfer Section. Move "Y" to Process-Trans,
Subtract Transfer-Amount from Current-Balance giving New-Balance. If New-Balance < 0
Move Account-Number to Check-Account-Number Perform Check-Account If New-Balance < Safety-Balance AND Process-Trans = "Y" Move Customer-Number to arn-Customer-Number Perform Warn-Customer .
If Process-Trans = "Y"
Perform Calculate-Transaction-Fee
Compute New-Balance = Current-Balance - Transfer-Amount -
Trans-Fee.
Check-Account Section. If Account-Type = "Z"
Move "Y" to Process-Trans Go Politely-Remind-Customer . If Account-Type = "M* Go Friendly-Check-Account. Go Abort-Transfer.
Calculate-Transaction-Fee Section. Compute Trans-Fee= Transfer-Amount * .001. Perform Special-Discount-Check
Varying Discount-Type From 1 by 1 until Discount-Type Greater than Num-Discount-Types . If Trans-Fee > 100 Move 100.00 to Trans-Fee.
If Account-Type = "M"
Divide Trans-Fee By 2 Giving Trans-Fee. If Account-Type = "Z" Move 0 to Trans-Fee.
Statements Within a Block of Code
All parsers produce an ordered set of statements. For fingerprinting purposes, the order is not significant. What is significant is the use of the verbs in the statement.
In the COBOL section titled "Calculate-Transaction-Fee" of COBOL Example 1 above, the statements found in the section and their number of occurrences are:
• 1 Compute
• 1 Perform
• 3 If
• 2 Move
• 1 Divide
Data Acted on by a Statement
The last information required from the parser is what data is referenced or manipulated by the statement. All parsers provide this information. For each statement we need to know what data: a) is input to the statement (i.e., referenced or read); and b) is output or modified by the statement.
From a statement within the section titled "Calculate-Transaction-Fee" of the COBOL Example 1 above;
Perform Special-Discount-Check
Varying Discount-Type From 1 by 1 until Discount-Type Greater than Num-Discount-Types. We see that for this statement, Num-Discount-Types and 1 are input, while Discount-Type is output.
C Example 1
{ for(j=l;j <= upper ;j++)
{ if (Location (j) ==1 | |
Location (j) + 1 != WrapAround (LOWER, Location (j) + 1, UPPER) ) count++ ;
} return count; } long WrapAround (long lower, long num, long upper) { long NumReturned; // a long is 4 bytes NumReturned=num; if (num > upper)
NumReturned=lower + (num - upper) - 1; if (num < lower)
NumReturned=upper - (lower - num) + 1; returned NumReturned; } long Bounded (long lower, long num, long upper) { long NumBounded; } From the C Example 1 above, within the function WrapAround, if (num > upper )
- num and upper are both input
NumRe turned= lower + (num - upper) - 1 - lower, num, and upper are input, NumReturned is output.
Accessing Parser Information
The present invention provides an object oriented wrapper interface to the syntax tree 16. This insulates the fingerprinting methodology from changes made to the API (Application Programming Interface) of the parser, and provides a level of abstraction.
The object oriented wrapper of the present invention utilizes two types of objects. One type represents database objects stored in the syntax tree 16, and the other provides a database interface allowing access to the database objects within the syntax tree 16. Database objects may be considered as nodes and relationships within the syntax tree 16.
Database Objects
Database objects are of three forms: Nodes, Relationships, and Types.
Nodes are things which actually exist in the source code. Examples of nodes are:
a) a statement; b) a paragraph label; c) a variable definition; and d) a usage of a variable by a statement
Relationships indicate interaction between nodes. They are directional. Examples include: a) an "OF" relationship points from a variable usage to its definition; b) a "HAS" relationship points from a paragraph or function to a statement; c) another "HAS" relationship points from a statement to
. its variable usage; and d) a "HAVING" relationship is a recursive "HAS" relationship
Types describe the attributes of an object. For example, a type of STATEMENT_MOVE would be used to describe a COBOL move statement, and a type of USAGE_VARIABLE_MOD indicates usage of a variable such that its value is modified.
Database Interface
The following table lists the methods provided by the object oriented wrapper of the present invention that are utilized to obtain information from the syntax tree 16.
Figure imgf000013_0001
Figure imgf000014_0001
The fingerprinting process gets the parser data according to the following algorithm:
Get the programs of interest from the user, save in SetOfProgramsOfInterest Get the code blocks inside SetOfProgramsOfInterest , save in
SetCodeBlocks Get the statement types of interest from the user, save in SetStatementTypes
For each CodeBlock in SetCodeBlocks:
For each StatementType in SetStatementTypes
Tally number of statements of StatementType in CodeBlock Tally size of data referenced by statements of StatementType in CodeBlock
Tally size of data modified by statements of StatementType in CodeBlock Classification Engine Interface
The classification engine 28 of the preferred embodiment is a software package known as Auto Class C v3.2.1, developed by the computational sciences division at NASA Ames Research Center. AutoClass makes use of Bayesian statistical methods. More information on AutoClass may be obtained at the web site: http://ic-www.arc.nasa.gov/ic/projects/bayes- group/autoclass/. The classification engine 28 works best when each item to be classified is represented by a set of characteristics that have numeric values. The input to the classification engine 28 comprises:
i) a set of control information dictating characteristics of the classification process (duration, number of attempts, expected number of classes or groups, convergence algorithm to be used); and ii) a list of items to be classified, where each item is represented by a tag for identification and a value for each of the characteristics of interest for the classification.
The classification engine 28 does not assign any meaning to any of the characteristics. Nor does it, a priori, weight any characteristics differently than others. It determines a weighting during classification based on the utility of the characteristic for grouping things. For example, if a characteristic has the same value for every single case then it is useless and has a weighting of zero. Nor does it care how many characteristics items have, though each item must have a value specified for each characteristic and the characteristics must be in the same order for each of the items being classified.
Non numeric or discretely valued (index) numeric values can be assigned to a characteristic. The classification engine 28 recognizes when such values are the same, but makes no other attempt to compare non numeric input. For the purpose of the present invention, the characteristics of each item being input for classification represents the fingerprint for a block of code.
Output
The output from the classification engine 28 is a set of groupings containing the items that were submitted for classification. Each submitted item will be placed in exactly one group. The set of groupings represents the classification engine's best classification of the input.
For the purpose of the present invention, each group represents a set of blocks of code that have similar fingerprints. If the blocks of code have similar fingerprints, they must contain a similar set of statements. The fact that every member of a group contains similar statements makes them candidates for further review to determine if blocks of code are in fact repetitions of the same code.
Fingerprinting
Create a coarse fingerprint 18 reads the syntax tree 16 and counts the number of occurrences of each distinct type of statement supported by the language. For COBOL this means counting the number of occurrences of:
PERFORM GO IF
EVALUATE MOVE SET
COMPUTE ADD
SUBTRACT MULTIPLY DIVIDE
READ
WRITE
OPEN
CLOSE non COBOL statements, e.g. EXEC CICS
ACCEPT
CALL
DISPLAY
DELETE
ENTRY
INITIALIZE
INSPECT
MERGE
RECEIVE
REWRITE
SEND
SORT
START
STRING
UNSTRING
Create coarse fingerprint 18 then counts the total number of bytes of input to all occurrences of each statement type within the block of code being fingerprinted. Finally, it counts the total number of bytes of output from each statement type within the block of code.
This means that the raw characteristics of a block of code consist of three numbers for each statement type supported by the language of interest:
1. number of occurrences; 2. total bytes of input; and
3. total bytes of output
This information is made persistent by storing it in the Raw fingerprint database 20 to speed up subsequent classifications.
Referring to the following portion of COBOL Example 1 above;
Calculate-Transaction-Fee Section .
Compute Trans-Fee= Transfer-Amount * .001. Perform Special-Discount-Check
Varying Discount-Type From 1 by 1 until
Discount-Type Greater than Num-Discount-Types. If Trans-Fee > 100
Move 100.00 to Trans-Fee. If Account-Type = "M"
Divide Trans-Fee By 2 Giving Trans-Fee. If Account-Type = "Z"
Move 0 to Trans-Fee.
if we use variable sizes given as:
Figure imgf000018_0001
Then we have the raw characteristics for this block as:
Figure imgf000019_0001
With regard to constant values, a constant is treated as having the same number of bytes as the size of its target. Similarly, for the C function within C Example 1 above, given by:
long WrapAround (long lower, long num, long upper) { long NumReturned; // a long is 4 bytes if (num > upper)
NumReturned=lower + (num - upper) - 1; if (num < lower)
NumReturned=upper - (lower - num) + 1; return NumReturned;
Then we have the raw characteristics for this block as:
Figure imgf000019_0002
These characteristics are then stored in the raw fingerprint database 20. To this point, we have discussed those attributes of code groups which are defined by microscopic characteristics of specific types of logic within those groups. We will now discuss attributes which derive from the macroscopic behaviour of the code group.
Specifically, a block of code can be treated as a gestalt, having inputs and outputs like any single statement within it. The characteristics of these inputs and outputs comprise the macroscopic attributes of the block code. Languages such as COBOL do not explicitly declare their inputs and outputs, and even languages such as C do not necessarily distinguish between inputs and outputs. This section describes a method for automatic determination of those types from the structure of the code block and the code which surrounds it.
Figure imgf000020_0001
Loops complicate variables within and surrounding them as they make the order of statements less than clear. The following examples help to illustrate how the preferred embodiment of the present invention determines whether a variable associated with a loop is classified as input or output.
Figure imgf000021_0001
Problem: Loops complicate everything.
Solution: Redefine the problem. For input, given the last out-of-target modification prior to the first in-target reference; we simply require no interposing modification. For output the situation is reversed: There must be no interposing modification between the last in-target modification and the first subsequent out-of-target reference. Algorithm: Search starting at the outermost in-target use, and searching for the corresponding out-of-target use (forward for output, backward for input). Branch whenever a loop is exited (top or bottom), continuing one's search as though there were no loop, and continuing the other search at the opposite end of the loop.
Conditions: An in-loop search branch fails instantly after completely traversing the loop once or returning to the original in-target use. A search branch fails instantly upon striking an in-target modification (input) or non-target modification (output). A search succeeds instantly upon striking a non-target modification (input) or a non-target reference (output). If any branch succeeds, the test has succeeded. It is potentially possible for a target to enclose non-target code. For example, the most general case of an object selected for re-use may not include several lines which appear somewhere in the input code. Therefore, the user may wish to remove them from the definition of the re-usable object. As some parsers, including the Revolve parser, do not provide the ability to create a new parse tree from arbitrary code, (i.e. the re-usable object not including the unwanted code), the existing parse tree of the original source must be utilized. Therefore, the logic of the preferred embodiment of the present invention includes a way to not include the unwanted code, for the purpose of fingerprinting.
Figure imgf000022_0001
Returning now to Figure 1, aggregate fingerprint process 22 then reads the raw fingerprint data 20 output by create coarse fingerprint 18.
In this step the fact that many languages provide more than one way to accomplish the same thing, is taken into account. For an end user of the product, the preferred embodiment has the following aggregations for COBOL: a) PERFORM & GO are aggregated to represent control flow b) IF & EVALUATE are aggregated to represent conditionals c) MOVE & SET are aggregated to represent assignment d) all IO statements are aggregated to represent file access e) all math statements are aggregated to represent arithmetic f) all non COBOL statements are aggregated together
Referring to the calculate-transaction-fee section of COBOL Example 1 above;
Figure imgf000023_0001
Figures 2 and 4 illustrate this use of the aggregation, note for example, that in Figure 2 the "flow" box is checked to enable aggregation of Perform and Go statements, whereas in Figure 4 it is not checked to provide distinct counts for both perform and go statements. The advanced control screen of Figures 2 and 4 has been provided for knowledgeable analysts to allow them to manipulate aggregation for any of the above groups. This allows them to differentiate between occurrences of statements with similar behaviour and in doing so, highlight more subtle differences in blocks of code. This can be important when the analyst is interested in:
a) stylistic differences in code; or b) identifying usage of a particular construct that they may have determined is problematic. Consider, for example a COBOL calculation routine. If there were two versions of the same routine utilized in various places within the application, one of which used COMPUTE statements and the other of which used the older ADD or MULTIPLY statements, with aggregation on they would be lumped together. With aggregation off, they would be categorized separately. Thus, one of the versions may be problematic in that it may be wrong, and needs replacing with the other. The issue here being that there may be variances in how the two statements are implemented. If they are meant to perform the same function, consistency in behaviour can be assured by using the same coding construct.
The filter step 26 filters out characteristics that should be excluded from the current classification. This is done under user control. Excluding some characteristics is useful to draw out different similarities between blocks of code. As illustrated in Figure 6, an end user of the product can filter out the following characteristics:
a) I/O - excluding these statements desensitizes the classification to the stylistic differences between programmers who isolate their IO statements and those who code them inline; b) data - excluding the consideration of data highlights similarities in the types of statements. This is useful when searching for commonality in code structure that may be repeated for use with very different sets of data. For example the same algorithm may be applied against a single instance of data in some cases and against arrays of varying sizes in other cases. Excluding data desensitizes the classification to variances in data; and c) Logic & flow - excluding these statements filters out the control structure around algorithmic logic. This flattens the code structure and highlights similarities in data movement and transformation. In use, the user would select the source code modules 14 that comprise the project of interest, and input them to the parser 12 so that a syntax tree 16 may be generated.
Once the source code 14 has been parsed the create coarse fingerprint process 18 analyzes the syntax tree 16 and creates raw fingerprint data database 20. Raw fingerprint data database 20 contains counts of each type of statement and the number of input and output fields within each functional block of the selected source code 14. The user may then manipulate the data contained in raw fingerprint database 20 to provide the desired input to classification engine 28. The typical user will have access only to the classification setting screen of Figure 6 which allows them to simply select aggregate groups of statements or data. Analysts more familiar with the functionality of classification engine 28 will have access to the advanced classification settings screen illustrated in Figures 2 and 4.
Figure 2 illustrates a screen capture of the advance classification settings available to analysts more familiar with the functionality of the classification engine 28. As shown in Figure 2 the minimum statements option 32 has been set to five statements. The use data option 34 has been turned off so as not to consider the size of data fields when determining the characteristics to be selected for input to the classification engine 28. The statement type menu 36 has all of the high level grouping attributes selected. Thus, for example, both Perform and Go statements will be considered identical for the purpose of determining the characteristic of a block of code. As can readily be appreciated the advanced classification settings screen provides for finer control over the level of granularity than that provided by the default aggregation classification settings provided to the typical user. The autoclass settings section 38 enables a user knowledgable with the functionality of the classification engine 28 to set a plurality of input variables. The classification engine 28 uses a random seed to direct the search for its initial classification. Subsequent classifications are refinements of the first one. Using a known seed, rather than a random one, results in a reproducible, (albeit potentially useless), set of classifications. This facility is particularly useful for demonstration purposes. The search length selection box 41 indicates the duration of the search for new classifications that better satisfy the criteria selected. It can be specified in seconds, or in terms of the number of attempts at refinement. As can be appreciated, any of the input variables accepted by the classification engine 28 may be provided for the user in this advanced classification settings interface.
Figure 3 illustrates the default groupings created by the classification engine 28 on a set of source code 14 input to the parser 12. The formal output process 30 has broken the source code files selected into a number of groups 42. Within each group 42 is a paragraph heading (as this is a COBOL example) 44, the file name 46 which contains the paragraph 44, and a probability ranking 48. The probability ranking is a simple comparison of the attributes of a given code item to the mean values for the classification in which it appears. The displaying of groups 42 in the output screen allows the user to focus on the types of logic that they are primarily interest in. For example, if a user is primarily interested in analyzing business logic, the user can look at a couple of representative members of a group 42, and quickly characterize the group as I/O handling. Such a group can then be deleted and discarded, or deferred for later analysis.
Having selected the groups of interest, the user may now re- classify the code blocks using a finer classification pattern. Referring now to Figure 4 a screen capture of the advanced classification settings screen, the use data option 34 has been selected. Additionally within the statement type menu 36 the high level attributes have been turned off to eliminate grouping of statements and provide much finer statement resolution. The intent of this finer resolution is to create a group of sub- classifications, which in general terms may be similar but differ in minutely different ways. An example might be a coarse classification which finds a classification containing several date handling routines. Fine classification might create two sub groups of the date handling routines, one of which is year 2000 compliant and another which is not.
Figure 5 illustrates the output of the finer classification requested by the user in Figure 4. As can be seen the groups of Figure 5 are much different from the groups of Figure 3. Further group 1 of Figure 5 has been sub-divided into two sub groups. This creation of sub groups is a result of the characteristics of the code items being significantly different, although similar enough to have been grouped together. Re-classifying with different settings changes the nature of the attribute set that is input to the classification engine 28 and thus enables it to find sub groupings.
Figure 6 is a capture of the code classification settings screen available to the user when filtering the output of the aggregate fingerprint process 22. As shown the user may choose to exclude three general types of code from consideration for a particular classification:
a) logic and flow; b) I/O; and c) data.
In this example the user has also requested fast classification. In the preferred embodiment fast classification is a pre-defined setting for the user dialogue indicating a value of ten tries. Non-fast classification equates to fifty tries. The user has also requested that there be a minimum of ten statements in a paragraph (this being a COBOL example) in order for a block of code to be considered for fingerprinting. As can be appreciated to those skilled in the art, various characteristics other than the characteristics selected for the preferred embodiment may be utilized in creating a fingerprint for a block of code. The inventors have found promise in the following characteristics:
a) number of input parameters to a block; b) number of output parameters to a block; c) number of bytes input/ (number of bytes output) to a function or component; d) parameter types passed or returned to a function or component interface; e) label matching to identify similarities between functional blocks due to common labels; f) number of lines of code in a block; g) total variable references within a block; h) the number of flows into the block in control flow graph, i.e. the number of different places in the code that invoke the block, sometimes referred to as fan-in; i) number of flows out of the block in the control flow graph, i.e. the number of different blocks of code invoked by the block under consideration, sometimes referred to as fan-out; j) the McCabe metric for the block, a measure of cyclomatic complexity of the directed acyclic graph which represents the flow of control within each block, this reflects the number of logic decisions within the block; k) (lines of code) /(lines of non-blank comment), a measure of the level of code documentation;
1) (McCabe metric) /(lines of code in block), a measure of the complexity of code density; m) the number of literal string and numeric constants used in the block; n) various Halstead complexity metrics which are based on the number of operators and operands in the block; and o) the value of Halstead /(lines of code).
As will be apparent to those skilled in the art, various modifications and adaptations of the method and system described above are possible without departing from the present invention, the scope of which is defined in the appended claims.

Claims

WE CLAIM:
1. A method for identifying recurring code constructs within the source code of a software application, the method comprising the steps of:
a) parsing the source code to create a syntax tree; b) traversing the syntax tree to identify blocks of code; c) creating a fingerprint for each block of code; d) submitting the fingerprints obtained in step c) to a classification engine; and e) providing the output from step d) to a user for analysis.
2. The method of claim 1 wherein each fingerprint for said block of code comprises one or more characteristics, each characteristic comprising:
a) a statement type; b) the number of occurrences of said statement type within said block of code; c) the number of bytes of data input to said statement type; and d) the number of bytes output by said statement type.
3. The method of claim 2 which includes aggregating occurrences of statement types counted by step b) into a single group based upon similarity of statement type, when creating said fingerprints.
4. The method of claim 2 or 3 which includes counting the number of bytes of data in selected fingerprints, when creating said fingerprints.
5. The method of claim 3 wherein said groups comprise statements of the type:
a) flow control; b) conditional; c) assignment; ) arithmetic; e) Input /Output; and f) all statements not native to the language being parsed
6. The method of claim 2, which includes selecting the minimum number of statements required within a block of code for fingerprinting said block of code.
7. The method of claim 1 wherein said blocks of code are defined by a Section and/or Paragraph in the COBOL programming language.
8. The method of claim 1 wherein said blocks of code are defined by a function or procedure in the C programming language.
9. The method of claim 2 additionally comprising a filtering step to filter out characteristics or groups of characteristics prior to the submission of said fingerprints to said classification engine.
10. The method of claim 9 wherein said filtering step filters out at least one of the following characteristics:
a) Input /Output; b) Data; and c) Logic and flow.
11. A system for identifying recurring code constructs within the source code of a software application comprising: a) a code parser; b) a create coarse fingerprint module for analyzing the output of said parser to produce a raw fingerprint file; c) a classification engine to classify data contained within the raw fingerprint file; and d) an output module to format the data output from the classification engine to the user for analysis.
12. The system of claim 11 wherein said classification engine is the AutoClass software package provided by NASA.
13. The system of claim 11 or 12, which includes: an aggregate fingerprint module including a user input for selection of code components of interest, the aggregate fingerprint module being connected to the create coarse fingerprint module; and a filter connected between the aggregate fingerprint module and the classification engine, for filtering out selected characteristics.
14. A method for determining whether variables in a block of code are to be considered as input to or output from said block of code, independent upon access to or modification of said variables within and without said block of code.
15. A database interface, said interface providing methods for accessing an object within a syntax tree, said interface comprising methods for: a) retrieving said object from said syntax tree by type or reference; b) retrieving information regarding the attributes of said object; and c) retrieving abstract type or relationship data of said object, given a string representing the name of said object.
16. A method for excluding statements from within a block of code, said statements being excluded from consideration when calculating a fingerprint for said block of code.
17. The method of claim 2 which includes instructing said classification engine to utilize a known seed, rather than a random seed, to create a reproducible set of classifications.
PCT/CA1999/000993 1998-11-19 1999-10-22 Method of identifying recurring code constructs WO2000031626A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU64557/99A AU6455799A (en) 1998-11-19 1999-10-22 Method of identifying recurring code constructs

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CA2,254,494 1998-11-19
CA 2254494 CA2254494A1 (en) 1998-11-19 1998-11-19 Method of identifying recurring code constructs

Publications (1)

Publication Number Publication Date
WO2000031626A1 true WO2000031626A1 (en) 2000-06-02

Family

ID=4163046

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA1999/000993 WO2000031626A1 (en) 1998-11-19 1999-10-22 Method of identifying recurring code constructs

Country Status (3)

Country Link
AU (1) AU6455799A (en)
CA (1) CA2254494A1 (en)
WO (1) WO2000031626A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1577758A2 (en) 2004-03-15 2005-09-21 Ramco Systems Limited User interfaces and software reuse in model based software systems
AU2004318207B2 (en) * 2004-03-08 2009-12-17 Syngenta Participations Ag Self-processing plants and plant parts
CN110347428A (en) * 2018-04-08 2019-10-18 北京京东尚科信息技术有限公司 A kind of detection method and device of code similarity
CN113204571A (en) * 2021-04-23 2021-08-03 新华三大数据技术有限公司 SQL execution method and device related to write-in operation and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8819619B2 (en) * 2004-03-15 2014-08-26 Ramco Systems Limited Method and system for capturing user interface structure in a model based software system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5452449A (en) * 1991-07-03 1995-09-19 Itt Corporation Interactive multi-module source code analyzer that matches and expands call and entry statement parameters
US5625801A (en) * 1993-08-05 1997-04-29 Hitachi, Ltd. Method and apparatus for producing standardized software specifications and software products
US5649201A (en) * 1992-10-14 1997-07-15 Fujitsu Limited Program analyzer to specify a start position of a function in a source program
US5742827A (en) * 1992-11-02 1998-04-21 Fujitsu Limited Method of automatically forming program specifications and apparatus therefor
US5838965A (en) * 1994-11-10 1998-11-17 Cadis, Inc. Object oriented database management system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5452449A (en) * 1991-07-03 1995-09-19 Itt Corporation Interactive multi-module source code analyzer that matches and expands call and entry statement parameters
US5649201A (en) * 1992-10-14 1997-07-15 Fujitsu Limited Program analyzer to specify a start position of a function in a source program
US5742827A (en) * 1992-11-02 1998-04-21 Fujitsu Limited Method of automatically forming program specifications and apparatus therefor
US5625801A (en) * 1993-08-05 1997-04-29 Hitachi, Ltd. Method and apparatus for producing standardized software specifications and software products
US5838965A (en) * 1994-11-10 1998-11-17 Cadis, Inc. Object oriented database management system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
COLLOFELLO J S ET AL: "Syntactic information useful for software maintenance", AFIPS CONFERENCE PROCEEDINGS: 1985 NATIONAL COMPUTER CONFERENCE, CHICAGO, IL, USA, 15-18 JULY 1985, 1985, Reston, VA, USA, AFIPS Press, USA, pages 547 - 555, XP002135813 *
CREAUSILLET B ET AL: "INTERPROCEDURAL ARRAY REGION ANALYSES", INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING,US,PLENUM PRESS, NEW YORK, vol. 24, no. 6, 1 December 1996 (1996-12-01), pages 513 - 546, XP000635514, ISSN: 0885-7458 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2004318207B2 (en) * 2004-03-08 2009-12-17 Syngenta Participations Ag Self-processing plants and plant parts
EP1577758A2 (en) 2004-03-15 2005-09-21 Ramco Systems Limited User interfaces and software reuse in model based software systems
EP1577758A3 (en) * 2004-03-15 2007-04-25 Ramco Systems Limited User interfaces and software reuse in model-based software systems
US8307339B2 (en) 2004-03-15 2012-11-06 Ramco Systems Limited Software reuse in model based software systems
US8572563B2 (en) 2004-03-15 2013-10-29 Ramco Systems Limited User interfaces and software reuse in model based software systems
CN110347428A (en) * 2018-04-08 2019-10-18 北京京东尚科信息技术有限公司 A kind of detection method and device of code similarity
CN113204571A (en) * 2021-04-23 2021-08-03 新华三大数据技术有限公司 SQL execution method and device related to write-in operation and storage medium
CN113204571B (en) * 2021-04-23 2022-08-30 新华三大数据技术有限公司 SQL execution method and device related to write-in operation and storage medium

Also Published As

Publication number Publication date
AU6455799A (en) 2000-06-13
CA2254494A1 (en) 2000-05-19

Similar Documents

Publication Publication Date Title
CN107704265B (en) Configurable rule generation method for service flow
KR101517460B1 (en) Graphic representations of data relationships
CN107644323B (en) Intelligent auditing system for business flow
Salay et al. Managing requirements uncertainty with partial models
Kim et al. Identifying and summarizing systematic code changes via rule inference
JP5166519B2 (en) Consistent method system and computer program for developing software asset based solutions
Zgraggen et al. (s| qu) eries: Visual regular expressions for querying and exploring event sequences
Maggi et al. Parallel algorithms for the automated discovery of declarative process models
WO2014058805A1 (en) System and method for recursively traversing the internet and other sources to identify, gather, curate, adjudicate, and qualify business identity and related data
Lu et al. Semi-supervised log pattern detection and exploration using event concurrence and contextual information
CN112416787A (en) JAVA-based project source code scanning analysis method, system and storage medium
EP1121637B1 (en) Metohd for generating component-based source code
Nasirloo et al. Semantic code clone detection using abstract memory states and program dependency graphs
Saied et al. Towards assisting developers in API usage by automated recovery of complex temporal patterns
Markovtsev et al. STYLE-ANALYZER: fixing code style inconsistencies with interpretable unsupervised algorithms
Acheli et al. Discovering and analyzing contextual behavioral patterns from event logs
WO2000031626A1 (en) Method of identifying recurring code constructs
Souza et al. Provenance of dynamic adaptations in user-steered dataflows
CN113535799A (en) Mining network training method and system based on artificial intelligence
Zerbato et al. Supporting provenance and data awareness in exploratory process mining
Rostami et al. BIGGR: Bringing GRADOOP to applications
Maqbool et al. Metarule-guided association rule mining for program understanding
WO2001079996A1 (en) Method for extracting business rules
Carme et al. The lixto project: Exploring new frontiers of web data extraction
JP6802109B2 (en) Software specification analyzer and software specification analysis method

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref country code: AU

Ref document number: 1999 64557

Kind code of ref document: A

Format of ref document f/p: F

AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase