WO1991001530A2

WO1991001530A2 - Methods and apparatus for checking the integrity of data base data entries

Info

Publication number: WO1991001530A2
Application number: PCT/US1990/003767
Authority: WO
Inventors: Susan Savell Davey
Original assignee: Bell Communications Research, Inc.
Priority date: 1989-07-11
Filing date: 1990-07-03
Publication date: 1991-02-07
Also published as: CA2020711A1; WO1991001530A3

Abstract

A method and apparatus for testing the integrity of the data in a large data base is described. The constraints on the values of the data are represented by data descriptor records having the same format as the data records. These constraints specify intra-record constraints, inter-record constraints and conditional constraints, i.e., constraints which depend upon the satisfaction of some condition such as a particular data value. These descriptor records are compiled directly into executable code which, in turn, can test each record of the data base for conformance with the constraints specified. Such code can be compiled 'on the fly' as needed to test particular records, or compiled off-line and called whenever needed. The same facilities used to process the application data base records can thus be used to create, modify and delete the constraint descriptor records which may therefore form part of the same data base.

Description

METHODS AND APPARATUS FOR CHECKING THE INTEGRITY OF DATA BASE DATA ENTRIES

Technical Field

This invention relates to data storage and retrieval systems an more particularly, to the maintenance of accurate data records in a large data bas forming the basis for such systems.

Background of the Invention

The number of data bases has been increasing at a rapid rate t support all kinds of new business, commercial, social and economic activities Moreover, the size of such data bases has likewise increased to reflect the growin amount of information for which electronic access is desired. As is well-known, th contents of such data bases often contain errors due to erroneous input, machine o software faults, or changes in external events. Maintaining the accuracy of suc data bases has become a large and difficult task, requiring the development of erro detection techniques the complexity of which rival the complexity of the data storage and retrieval system itself. While errors in data base entries can often be tolerated in applications where these errors cause little or no economic penalties, it has become increasingly necessary to rely on such data for supporting decisions having large economic or social consequences if erroneous. In such circumstances, it is necessary to expend very considerable efforts in developing systems for checking the integrity of the data base. Such integrity checking systems must be able to detect all or essentially all errors and must operate sufficiently fast to insure error-free data contents essentially continually, even in the face of rapidly changing data entries. Developing such checking systems has therefore become a long, arduous, expensive and complex procedure.

Summary of the Invention

In accordance with the illustrative embodiment of the present invention, data entries are checked in an information storage and retrieval system by using special data descriptor entries which describe the various constraints on the application data entries. These data descriptor entries are formatted in the same manner as the application data entries so that the same data base management software that is used to access application data records can be used to access constraint descriptor records. The constraint descriptor records, in turn, are used to generate executable code which tests for adherence to the constraints therein described. The code generation can be carried out "on the fly" while the data base is being used, or can be generated off-line, when the data base is not being used or on an entirely different data processing system. In the preferred embodiment, the code generation is completed off-line to avoid any degradation in accessing the application data. The generated code is, of course, invoked to check the data at all appropriate times, e.g., when the application data records are added to the data base or when an existing application data record is modified.

In accordance with one feature of the present invention, the constraints can be intra-record, i.e., within the same record, such as proper syntax, proper domains, and required fields, or the constraints can be inter-record, i.e., in different records, such as matching backward and forward linking pointers, matching field values in different records, and insuring required relationships between records.

In accordance with .another feature of the present invention, the constraint descriptors can specify conditional constraints that are applied only if a specified condition is met. Such conditional constraints can also be intra- or inter- record. The sequence of a plurality of such conditional constraints forms an integrity checking hierarchy, the various paths of which need be visited only when required by the satisfaction or failure to satisfy the conditional constraints. In the preferred embodiment, these constraints comprise if, then and else types of conditions.

In accordance with yet another feature of the present invention the integrity checking technique is applied to a special type of entity-relationship database known as the hypergraph model. Entity "nodes" in the model can be connected with other entity nodes by means of "edges" representing relationships between the entities. Hyperedges are formed when a single relationship involves three or more nodes. The resulting nodes and edges can be represented by a directed graph .and hence the name "hypergraph" model. Such hypergraph model data bases are well-known and are disclosed in R. S. Ferrer et aL patent 4,479,196, granted October 23, 1984.

Brief Description of the Drawings

A complete understanding of the present invention may be gained by considering the following detailed description in conjunction with the accompanying drawing, in which: FIG. 1 is a graphical representation of a directed graph representi a hypergraph model of a data base with which the integrity checking system of t present invention will find use;

FIG.2 is a tabular representation of the contents of one of th records represented by a node in the graph of FTG. 1;

FIG.3 is a block diagram of an integrity checking system i accordance with the present invention;

FTG.4 is a graphical illustration of the contents of the constrai index table of FIG.3; FIGS.5, 6 and 7 are tabular representations of the contents of thre different application records of a hospital data base, useful in explaining th operation of the present invention;

FIGS. 8, 9, 10, and 11 are tabular representations of the contents o configuration descriptor records in accordance with the present invention; and FIG. 12 is a flowchart of the procedure used for generating an executing integrity checking code for checking records for integrity in accordanc with the present invention.

To facilitate reader understanding, identical reference numerals ar used to designate elements common to the figures. Detailed Description

Before proceeding to a detailed description of the drawings, it is firs necessary to discuss the structure of data bases. There are four general types o data bases: hierarchical, network, relational and entity-relationship. Each type i well suited for particular types of data and less effective for other types. Th entity-relationship type of data base is the most general type of data base and i capable of representing complex data interrelationships. An entity-relationshi type of data base will therefore be used in the following detailed embodiment of th present invention. It is to be understood, however, that the principles of th present invention are readily applicable to any of the other forms of data bases b persons of ordinary skill in the data base art.

The entity-relationship type of data base can be represented by directed graph in which the data entities are represented by the vertices or nodes o the graph while the relationships between these entities .are represented by th edges or directed arrows of the graph. Typically, each data record is a nod including both the data item itself (the entity) .and pointers to to other nodes of th graph (the edges). An even higher level of complexity can be introduced into the entity-relationship type of data base by introducing hyperedges into the graph representation. That is, a single edge is permitted to point to more than one other node. This hypergraph model can then be used to represent relationships involving more than two entities, e. g., the parenthood relationship involves both mother and father as well as child. The hypergraph model and a powerful means for representing nodes in this model are shown in the aforementioned patent 4,479,196, granted to R. S. Ferrer et al. on October 23, 1984. This hypergraph model will be more fully described in connection with FIGS. 1 and 2. Referring then to FIG. 1, there is shown a pictorial representation of a directed hypergraph. The circles in FIG. 1 are called "vertices" of the graph and represent the information entities in the data structure. The arrows of FIG. 1 are called "edges" and represent the relationships between the entities of the vertices. The node 10, for example, is connected to node 12 by the relationship 16. The entity 10, for example, may be a piece part in an inventory, entity 12 may be a supplier, and relationship 16 may be the "purchased from" relationship. Entity 10 is also related to entity 13 by the relationship 17. Entity 13 may be the warehouse in which piece part 10 is being stored. Note that edges 16 and 17 are single-valued. That is edges 16 and 17 connect one vertex to one other vertex. Entity 10 is -also related to entities 14 and 15 by the relationship 18.

The "edge" 18 of the graph of FIG. 1 is called a "hyperedge" because the edge 18 establishes a relationship from one vertex to more than one other vertex. A graph including a hyperedge is called a hypergraph. Entity 15 may, in accordance with the previous example, constitute a subassembly of which piece part 10 is a part while entity 14 may comprise a tool by means of which part 10 is integrated into subassembly 15. It can be seen that the hypergraph model is particularly suited for representing data entities having relationships including more than two entities. The parts-assembly-tool relationship described above is one such relationship. The wires connecting a particular piece of equipment at particular terminals is another such relationship. The assignment of a particular bed in a particular room of a hospital is yet another such relationship. A hospital data base of the entity- relationship type will be used as a vehicle for the following description of the preferred embodiment of the present invention.

In FIG.2 there is shown a generalized or canonical data base record which can be used to efficiently represent one node of the hypergraph of FTG. 1, in particular, node 10. It will be noted in FIG. 1 that the node 10 has one "body" comprising the entity of node 10 itself and a plurality of "edges" comprising all o the directed arrows emanating from node 10. In FIG. 2, the "paragraph" (a represents the entity body having the "attributes" represented by subparagraphs 1 through 4). If the entity is a piece part, then the attributes might well comprise th generic name of the part, the specific serial number and other informatio important to the particular application. The simple edges 16 and 17 are represented by paragraphs (b) and (c), each also including a entity type and identity pointed to by the edge, along with whatever data is necessary to support the data base application. The hyperedge 18 is represented in FIG. 2 by paragraph (d), comprising an edge pointing to two different entities (1 and 2). The format of the representation is identical to that used for simple edges 16 and 17, the only difference being that more than one "connected to" node is associated with the hyperedge. This representation greatly simplifies the accessing and processing of the data records in the hypergraph model. Although only two edges are shown in the hyperedge 18 of FIG. 1, a hyperedge can involve any number of nodes simply by including the appropriate node or entity data in the hyperedge data.

With the .above background in mind, the error detection system of the present invention will now be described. In FIG.3 there is shown a block diagram of an error detection .system in accordance with the present invention comprising an application data base 32 and a data base manager 30 designed to access and process the data records from data base 32. In accordance with the present invention, the constraints on the data contents of each type or configuration of a record in the data base 32 is represented by a constraint descriptor in data base 31. A constraint descriptor is itself a data base record which specifies all of the internal constraints on the data items in the data base record, all of the constraints on the contents of related data base records and all of the conditions upon which these constraints depend. In accordance with the present invention, the constraint descriptor records have the same format as the data base records themselves. More particularly, the internal intra-record constraints for the associated data record contents correspond to paragraph (a) in FIG. 2 and can be thought of as the "body" of the descriptor record. The inter-record constraints on all related records correspond to paragraphs (b), (c) and (d) in FIG. 2 and can be thought of as the "edges" of the descriptor record. Since the constraint descriptor records in data base 31 have the same format as the application data base records in data base 32, the same data base manager 30 can be used to access and process the records in both data bases. This alignment of application records and descriptor records greatly simplifies the creation and maintenance of the error detection system of the present invention.

In operation, the data base manager 30, wishing to check the integrity of one or more data records of a particular type from application data base 32, can access the constraint descriptor record for that record type from data base 31. The contents of the records of data base 31 represent the constraints on the data contents and data relationships in such a format as to simplify the conversion of the constraints represented in the constraint descriptor records into actual code which tests whether the application data meets the constraints there described. The actual code generation takes place in constraint code generator 34. The code execution processor 33 executes this code to carry out the actual testing of the application data.

In order to insure orderly testing of the application data, a constraint index table 35 is consulted to insure that the constraint code generated in generator 34 is generated in the appropriate logical order. That is, some types of constraints are conditional in that the constraint requirement depends on some condition such as the value of the application data. Moreover, complete hierarchies of if, when and else types of conditional constraints are possible. Hence the generation of the constraint code must begin at the root of the conditional testing hierarchy. The function of table 35 is to keep track of all of the constraint hierarchy starting places in order to insure the initiation of the constraint code generation at the root of the constraint hierarchy.

The result of executing the constraint code in processor 33 is the detection of errors or faults in the application data retrieved from application data base 32. If such a fault is detected, the faulty data is identified and entered into data fault table 36. The entries in fault table 36 can, in turn, be used to correct the erroneous data in data base 32 by way of data base editor 37. Data base editor 37 may, of course, comprise a portion of the data base manager 30 or may be any available editor, provided the record representation used in the data base 32 of FIG. 1 is amenable to such text editing. The format of FIG.2 does have this advantage.

It should be noted that only a few constraint descriptor records are required in data base 31 for each generic type or configuration of application data base record and hence data base 31 is much smaller than data base 32. It should also be noted that new application data base record types can be introduced at any time, if, at the same time, appropriate constraint descriptor records are introduce into data base 31. In this way, the error detection system of the present invention is able to dynamically track changes in the content or usage of the data records in application data base 32. It should also be noted that the code generator 34 of FIG. 3 can be invoked "on the fy" to generate test code for a single type of data record or can be invoked "off line" to generate all of the test code the entire data base. The constraint code generator 34 may therefore be designed to generate code "on the fly" or to generate the constraint code in advance for all known application data record types. In either case, processor 33 selects the appropriate code to be used for the particular record to be tested at the time actual testing takes place.

In FTG. 4 there is shown a graphical representation of the constraint index table 35 of FTG. 3. The index table of FTG. 4 includes a list of each starting point configuration for the various constraint testing hierarchies in constraint descriptor data base 31 of FIG. 3. For each starting point configuration listed in the index table of FIG.4, there is listed a node identification number for the descriptor node in data base 31 forming the root of the descriptor hierarchy for that configuration. For example, the first entry in FIG. 4 has a configuration called "room" and the associated node identifier of "200." As will be seen in connection with FIGS. 8-11, one of the configuration hierarchies to be described is called "room" and the root descriptor record for the room hierarchy is 200. All other starting point configurations are likewise listed in the constraint index table of FIG. 4 to enable code generator 34 of FIG. 3 to initiate code generation at the logically correct root of the testing hierarchy. Before proceeding to a more detailed description of the illustrative embodiment of the present invention, a simplified data base will be described which can then be used to illustrate the principles of the present invention. In FIG.5 there is shown a typical record from a simplified hospital data base of the entity-relationship type. The hospital entities which must be included in such a data base would include hospital rooms, patients, nurses, doctors, drugs, testing facilities, operating facilities, and so forth. Only data records for rooms and for patients will be shown in detail since the integrity checking code generation of the present invention can be described in terms of this limited portion of the data base.

FIG. 5 illustrates the contents of a typiαd data node for a hospital room in a hospital data base. Each node of the data base has an internal identifier (100 in FIG. 5), a body portion called "entity" .and edge portions called "relationships." Included in the room node of FIG.5, under the paragraph heading "entity," are the entity attributes entityjype (room), room_number (202), room ype (semi-private) and bed_count (2). Other hospital rooms would be represented by nodes with the same format, but with a different node identifier and with different attribute data values.

Also included in room node 100 are two relationships, one for each of the two patients occupying the beds. Each of these relationships includes the attributes of related_entity_type (patient), related_node_id (101 or 102) and a bed_number for the bed occupied by that patient (1 or 2). For the purposes of this simplified illustration, these basic data elements are all that are required for a room node.

In FIGS. 6 and 7 there are shown data nodes for two patients (101 and 102) corresponding to the patients assigned to the room represented by the room node of FIG.5. Each of these patient nodes includes an entity portion including the attributes of entityjype (patient), patient_number (55 or 65), and an indication of the room type desired by the patient (room_desired = semi-private). In addition, each of these patient nodes includes a relationship portion indicating the room assigned to that patient. The relationship portion therefore includes the attributes of related_entity_type (room) and the node identifier for the room assigned to that patient (related_node_id = 100). Other patient nodes would, of course, be represented by nodes with the .same format, but with different node identifiers and different attribute data values.

All of the "edges" in FIGS.5, 6 and 7 are simple single-headed pointers. As shown in FIG.2, an edge can also point to more than one entity and thus become a hyperedge. For example, each patient can be prescribed different drugs by different doctors. The prescription relationship would therefore be represented by an edge pointing to both the doctor node .and to the drug node.

It is apparent that an application data base of the form described in connection with FIGS.5-7 could very advantageously be used for functions such as assigning empty hospital beds to new patients. For this purpose, the room records would be scanned, looking for an empty (unassigned) bed. The room entry would then be modified to indicate the assignment of the empty bed to the new patient, and the new patient record modified to indicate the assignment of the new patient to the empty bed. Such uses of the data base, however, are heavily dependent on all of the entries in the data base being accurate. Previous errors in entering data could, for example, result in more than one patient being assigned to the same bed while other bed remain empty. The present invention comprises a convenie efficient and accurate mechanism for continually insuring the accuracy of all of t data in such data records.

It is also readily apparent when considering the data nodes FIGS. 5, 6 and 7 that many of the data attribute values have inherent relationshi which must be maintained in order to insure accuracy of the data base. Suc inherent relationships are called "constraints" in that they constrain the values the data in some way. The room node of FTG.5, for example, must have a be count of two if the room type is semi-private, and can have only two edges pointin to assigned patients, one for each bed. These constraints are internal to the roo node and hence are called intra-node or intra-record constraints. It should also b noted, however, that the contents of the nodes of FIGS. 6 and 7 must also confor to the contents of the node of FIG. 5. That is, the patent identified in FIG.5 a occupying one of the beds must .also be so identified in FIG. 6 or FIG. 7. These ar called inter-node or inter-record constraints. Finally, some constraints ar conditional in that the data value of one attribute may depend upon the value o another data attribute. For example, a patient can be assigned to a room of a give type (semi-private or private) only if the patent record indicates a desire for tha type of room. Similarly, a room whose room type is semi-private must have a be count of two. The process of checking the integrity of the data base must includ the ability to determine if all of these constraints on the data values have bee observed. The present invention is directed toward an improved technique fo checking the integrity of data by checking for the proper observation of al constraints on that data. In accordance with the present invention, constraints on the values o data fields in a data base are represented in constraint descriptor records or nodes To this end, a descriptor node is provided for each type or configuration of dat node (room, patient, nurse, doctor, drug, etc.), for each related data node, for eac condition to be tested in the configuration node, and for each relevant outcome o the test. Using the same entity-relationship data record format as is used for th data itself, the constraint descriptor nodes contain all of the constraints on the dat values in the corresponding data nodes, including intra-node constraints, inter-nod constraints and conditional constraints. As an example, an "any_room configuration room descriptor node is shown in FIG. 8 for specifying all of th constraints on the data values in any room node in the data base. In the enti portion of the data node of FIG. 8 are the attributes of entityjype (descriptor), entity_type_described (room), the node type or configuration (any oom), the possible values of the variable roomjype (private or semi-private), and representations of conditional constraints called the //-configuration and the then- configuration. The //-configuration attribute has the value "is_semi-private," which is the value tested to determine whether the //-condition is satisfied. If, and only if, the //-condition is satisfied, is the /ten-configuration invoked. The then- configuration calls for further testing by a separate descriptor node called "semi- private."

The relationship portion of the room descriptor node of FIG.8 has the attributes of related_entity Jypejiescribed (patient), the maxjx)inter_count (2, for up to two possible assigned patients) and the related entity type descriptor node identifier (related jiescriptorjd = 201). The maximum pointer count could, in proper circumstances, be a minimum pointer count or a fixed pointer count (e.g., exactly two pointers). Using the entity-relationship format of FIG.2, the room descriptor node of FIG. 8 specifies internal, external and conditional constraints on the data values, and points to other descriptor nodes where further constraints are specified. Moreover, these constraints are described in a format which renders the generation of testing code relatively easy.

Since the relationship portion of the room descriptor of FIG. 8 includes a related_entity_type_described attribute having the value "patient," it is necessary to provide a related descriptor node for this related entity type. Thus, in FIG.9 there is shown a patient descriptor node which describes all of the constraints (internal, external, conditional) on the data values in patient records of the application data base which are related to the room configuration. The body of the patient descriptor record of FIG.9 includes the attributes of entityjype (descriptor), entity jype iescribed (patient), configuration (part of anyjoom), attribute iame (roomjype desired) and attribute rariable (roomjype). Note that the patient descriptor node of FIG.9 is subordinate to and part of a hierarchical constraint tree identified as the anyjoom configuration and beginning with the room descriptor node of FIG.8. A separate patient configuration descriptor hierarchy will .also exist to test the contents of all patient configuration records of the application data base. The purpose of the patient descriptor node of FIG.9 is merely to test the patient data relationships with the hospital room record data. Therefore, under the relationship portion of the patient descriptor node of FTG. 9, there are only the two attributes: related_entity jypejiescribed (room) and the related jiescriptorjd (200). - ¬

lt will be noted that the room descriptor node of FTG. 8 specifie both an //-configuration and a tΛeΛ-configuration. The descriptor hierarchy mus therefore include a descriptor node for each of these configurations. Thes descriptor nodes are shown in FIGS. 10 and 11. In FTG. 10 there is shown a room descriptor node for the "is-semi private" configuration which room descriptor node specifies the //condition (is th room type semi-private?). The room descriptor node of FIG. 10 has only a bod portion and includes the attributes of entityjype (descriptor) entity_type_described (room), configuration (is_semi-private), attributejiam (room type) and attribute ralue (semi-private). The "is-semi-private" // configuration of the descriptor node of FIG. 8 results in the invocation of the roo descriptor node of FIG. 10 to test the condition. Moreover, the satisfaction of the / condition (room type = semi-private) specified in the descriptor node of FIG. 1 results in the invocation of the semi-private then room descriptor node of FIG. 11. The semi-private configuration room descriptor node of FIG. 11 has only a body portion which includes the attributes of entityjype (descriptor), entity Jype_jiescribed (room), configuration (semi-private), attribute_jiame (bedjOunt) and attribute ralue (2). Thus the room descriptor node of FIG. 11 specifies that, if a room is semi-private, then the bed count must be two. Note that neither FIG. 10 nor FIG. 11 have a relationship portion and neither include any further //attributes. This indicates that no further descriptor nodes beyond FIG. 10 or FIG. 11 need be invoked to complete this branch of the data checking function. The any oom hierarchy of data descriptors is fully satisfied when all // conditions have been tested, when all then or else configurations have been satisfied, and when no further relationship edges remain which have not been investigated. The descriptor nodes of FIGS. 8, 9 10 and 11 therefore form a closed hierarchy of data constraint descriptions which completely specify all of the constraints, internal, external and conditional, on the contents of .any room configuration application data base record. Similar hierarchies would exist for other configurations of application data base records, Le., patients, nurses, doctors, drugs, testing equipment, operating rooms, and so forth. Moreover, each such hierarchy has a root descriptor record at which the constraint descriptions logically start. For the aπy-room configuration hierarchy, it is the room descriptor record of FTG. 8. The purpose of the constraint index table of FTG. 4, then, is to keep track of the root descriptor record identifier for each descriptor record configuration hierarchy necessary to describe the constraints on all record configurations or record types in the application data base. This ensures a logical starting place for generating the constraint testing code.

It will be noted that the format of the constraint descriptor records of FIGS. 8-11 is the same as the format of the application data records of FIGS.5- 7. It is therefore possible to use the same data base manager (30 in FIG.3) to access, modify or delete the constraint descriptor records as are used to access, modify or delete the application data base records. Indeed, the constraint descriptor data base 31 of FTG.3 can be a physical part of and intermingled with the application data base 32. A single data base therefore can be used to contain not only the application data, but also an exhaustive description of all of the constraints on the data values in that application data. The constraints therefore are replicated along with the data base, are always available for testing the integrity of the data base, and can be augmented or modified at the same time that the application data is augmented or modified. Indeed, different versions of the same data base can have different, but locally appropriate, versions of the data constraints.

Given a complete set of constraint descriptors of the type described in connection with FIGS.8-11, it is then necessary to convert the logical testing described in the constraint descriptor records into executable code which actually performs the testing. The constraint descriptor records actually comprise this code in a high level language where field name-field value pairs are used to specify all of the data value tests. For example, the aπy-room descriptor node contains an explicit specification of all of the po^ible room types. A relatively simple parser can therefore recover these specifications directly from the descriptor records and compile the necessary object code to actually perform the tests. In particular, the constraint code generator 34 of FIG.3 generates all global declarations by declaring a variable for each descriptor node. The name of the descriptor node variable is simply a concatenation of the configuration name and the type of node described. These variables are then used to store the node identifiers. Similar declarations are made for all variables specified as attribute names in the descriptor nodes. A main program is compiled which calls a function associated with the root or starting point configuration in the constraint descriptor hierarchy. The main program fetches each node in the application data base with a configuration type that matches the configuration type in the starting point descriptor node. The main program includes all of the intra-record tests and also includes a series of calls to functions containing the tests specified in each of the descriptor nodes related to the root node, i.e., by nodes having a matching nod type. These functions are compiled by compiling the data test and condition test from the subservient descriptor node contents. Once the testing code for particular configuration of application data record is completed, the nex configuration type descriptor node is selected from the constraint index table an the process continued for all configuration types. .All of configuration types are exhausted when all of the entries in the constraint index table of FTG. 4 have bee used.

The code generator 34 of FTG.3 generates the function code for each subservient descriptor node configuration. The name of the function is the configuration name. Each function expects to be passed the identifier for an application data node of the same type as the starting point node for the configuration hierarchy. The specification of an //-configuration in a descriptor node cause a call to the function corresponding to the //-configuration value. Similarly, ///en-configurations and e&e-coπfigurations in a descriptor node cause a call to functions having names coπesponding to the //- or ///en-configuration values. This results of this process is shown for the aπy-room configuration of FIGS. 8-11 by the pseudo-code in the APPENDDC to this specification. The code necessary to perform other tests on other types of configurations will be readily apparent to those skilled in the art by considering this description and the attached psuedo- code.

Occasionally a necessary integrity test cannot be re.adily represented by. the entity-relationship type of descriptor record. One such test is the condition that at least two of the rooms assigned to each nurse in the hospital must be semi- private. In such a case, it is possible to manually write the necessary test function and to provide an attribute in the constraint descriptor which specifies this manual code as its value. This attribute could be called "trap jioor unction" and its value, in the .above instance, "check wo emi-private ooms." The code generator 34 of FIG.3 would recognize the attribute "trap_door_function" as requiring such manual function code .and merely insert a call to a function with a name corresponding to the trap jloor unction value. Automatic test code generation is, of course, a significant advantage where the data tests are changing often and in complicated ways.

The procedure for generating the executable code is shown in the flowchart of FIG. 12. In FIG. 12, leaving start box 40, box 41 is entered where the next starting point is retrieved from index table 35 of FIG. 3. Using the node identifier from the index table, in box 42 the descriptor node itself is retrieved. In box 43, the global variable declarations are created using the name formed by concatenating the configuration name and the node type and storing the data node identifiers. Declarations are also required for all of the attribute variables specified in the descriptor node. Box 43 also generates a standard data node access loop which sequentially accesses each of the data nodes having a configuration type corresponding to the descriptor node coπfiguratioα In box 44, the requisite testing functions are generated recursively by retrieving the related descriptor nodes and converting the data value constraints therein described into data value testing code. In box 45, calls to these functions are added to the main program in the sequence in which they are encountered in the descriptor records. Finally, the code thus generated is executed in box 46 to test the data values in the application data base. Alternatively, as indicated by dashed arrow 47, all of the other constraint testing code can be generated and stored, and executed only when requested by the data base manager 30 (FIG.3)

It can be seen from the above description that the present invention provides a compact, easily altered, readily coded description of all constraints on the values of data fields in a data base. Such constraint descriptions can accompany the data base, be altered as required by alterations in the data base, and compiled into testing code whenever desired, either globally or "on the fry" during use of the application data base.

It should also be clear to those skilled in the art that further embodiments of the present invention may be made by those skilled in the art without departing from the teachings of the present invention.

APPENDIX

/• Pseudo-Code for Generated Program

main() { for entire data base if (entityjype = room)

{ retrieve_jecord get record d call any oom(record d)

10 }

/* Function any_room

anyjoom(recordjd)

{ if (roomjype(recordjd)

or semi-private)

15 { then print(error) return(failure) else 20 current oom_type = room_type(recordJd)

} for (related_entityjype = patient)

{ if (room_desired ^ current joom_type) 25 { then print(error) return(failure) else if (number κ>inters(current oom_type) -*^■ 1) then print(error) 30 return(failure) } if number x)inters(recordjd) > 2

{ then print(error) return(failure)

} call is emi-private(recordjd) if (return = success)

{ call semi-private(recordjd) if (return = failure)

{ then print(error) return(failure) }

} return(success)

Function is_semi_private */

is_semiprivate(recordjd) { if (room_type(recordJd) j* semi-private)

{ then return(failure)

} return(success)

}

Function semijprivate

semi-private(recordjd)

{ if (bed_count(record jd) ?^■ _)

{ then print(error) return(failure)

} return(success)

}

Claims

What is claimed is:

1. A system for detecting errors in a data base comprising means for specifying constraints on the values of data items in said data base, means, reponsive to said means for specifying constraints, for generating data checking code, and means for executing said data checking code to detect errors in said data base.

2. The system for detecting errors according to claim 1 wherein said means for specifying constraints further comprises means for specifying constraints on intra-record data values.

3. The system for detecting errors according to claim 1 wherein said means for specifying constraints further comprises means for specifying constraints on inter-record data values.

4. The system for detecting errors according to claim 1 wherein said means for specifying constraints further comprises means for specifying conditional constraints on data values.

5. The system for detecting errors according to claim 1 wherein said means for generating data checking code further comprises means for generating a function to test each data constraint, and means for generating sequential calls to all of said functions.

6. The system for detecting errors according to claim 1 wherein said means for generating data checking code includes means for generating code on the fly during operations with said data base.

7. The system for detecting errors according to claim 1 wherein said means for generating data checking code includes means for generating code off line from operations with said data base.

8. A method for detecting errors in a data base comprising specifying constraints on the values of data items in said data base, generating data checking code in response to constraint specifications resulting from said step of specifying constraints, and executing said data checking code to detect errors in said data base.

9. The method according to claim 8 wherein said step of specifyin constraints further comprises the step of specifying constraints on intra-record data values.

10. The method according to claim 8 wherein said step of specifyin constraints further comprises the step of specifying constraints on inter-record data values.

11. The method according to claim 8 wherein said step of specifyin constraints further comprises the step of specifying conditional constraints on data values.

12. The method according to claim 8 wherein said step of generatin data checking code further comprises the steps of generating a function to test each data constraint, and generating sequential calls to all of said functions.

13. The method according to claim 8 wherein said step of generatin data checking code includes the step of generating code on the fry during operations with said data base.

14. The method according to claim 8 wherein said step of generatin data checking code includes the step of generating code offline from operations with said data base.

15. In a data base system including data records in the entity- relationship format, a data checking .system comprising data constraint descriptor records in the same entity-relationship format for describing intra-record, inter-record .and conditional constraints on the values in said data records, means for compiling said constraint descriptor records into object code for testing said data records, and means for selectively executing said compiled object code.

16. The combination according to claim 15 wherein each of said data records and each of said constraint descriptor records includes a body portion specifying the entity represented by that data record and optional relationship portions specifying relationships between that data record and other data records.

17. The combination according to claim 15 wherein said conditional constraints comprise //, then and else types of conditions.

18. The combination according to claim 15 wherein said means for compiling further comprises means for accessing said data records in a regular sequence.