WO2005124586A1

WO2005124586A1 - Database interactions and applications

Info

Publication number: WO2005124586A1
Application number: PCT/AU2005/000184
Authority: WO
Inventors: Dorothy Luther; Todor Petkantchin
Original assignee: Arearguard Consulting Pty Ltd
Priority date: 2004-06-17
Filing date: 2005-02-14
Publication date: 2005-12-29
Also published as: AU2004202620B2; AU2004202620A1

Abstract

A method of determining relationships between business data and storage locations of a database that is associated with a respective application, which interacts with the database. The method includes determining application inputs, and first relationships between the business data and the application inputs. Sample input data is then used as an input to the application to allow analyses of the database to determine into which of the number of storage locations the sample data is stored. This allows second relationships to be determined between the application inputs and the database storage locations. The first and second relationships are then used to derive third relationships between the business data and the database storage locations.

Description

DATABASE INTERACTIONS AND APPLICATIONS

Background of the Invention

The present invention relates to a method and apparatus for determining the semantics of a database structure of an application and for producing a data dictionary.

Description of the Prior Art

The reference to any prior art in this specification is not, and should not be taken as, an acknowledgement or any form of suggestion that the prior art forms part of the common general knowledge.

It is common in the computer industry to capture definitions of the types of data that an organisation, such as a company, business or government, uses and the relationships between these types of data. This information is referred to as the 'semantics' of the data. The semantics are used to store the data in an organised way in a database. Over time, the organisation of the database is modified to meet new business requirements. The knowledge of the database organisation and its relationship to the semantics is gradually lost because of these modifications and staff turnover. In the case of an external package system, the organisation using it may never have known the organisation of the data.

The organisation needs to recover the semantics and organisation of the database in a number of cases, including, but not limited to:

• Enhancing the application eg creating new reports from the application.

• Merging the data of existing applications into a single corporate wide database, to be used by the same applications.

• Purchasing of a new computer application eg a Customer Relationship Management (CRM) or Enterprise Resource Planning (ERP) package to replace existing application(s).

• Needing to communicate data with other computer applications, e.g. that of a business partner or associate. The task of recovery requires high level expertise about the business and about the application at the data field level. The task is usually very expensive as it is time and resource intensive and requires skilled staff from both IT and business.

Several tools are available in the marketplace that assist the recovery task, particularly with graphical user interfaces and simple reverse engineering functions, but essentially, the task remains a laborious manual process, and expert domain knowledge is still required.

The following are books covering the prior art:

Whitten, J.L. Bentley, L.D. and Dittman, K.C. (2000) Systems Analysis and Design Methods (5^th Ed) - McGraw-Hill Irwin

Shelly, G., Cashman, T. Rosenblatt, H. (1998) Systems Analysis and Design. Course Technology - International Thomson Publishing.

A summary of current research on the prior art can be found in:

Jean Henrard, Jean-Marc Hick, Philippe Thiran, Jean-Luc Hainaut (2002) Strategies for Data Reengineering Notre Dame de la Paix University

Jesύs Bisbal, Deirdre Lawless, Bing Wu, Jane Grimson (1999) Legacy Information System Migration: A Brief Review of Problems, Solutions and Research Issues Computer Science Department, Trinity College, Dublin, Ireland

Summary of the Present Invention

In a first broad form the present invention provides a method of determining relationships between business data and storage locations of a database, the database being associated with a respective application which interacts with the database, the method including: (a) determining application inputs including at least one of: (i) an entity; and (ii) an attribute; (b) determining first relationships between the business data and the application inputs; (c) using sample input data as an input to the application and analysing the database to determine into which of the number of storage locations the sample data is stored, to thereby determine second relationships between the application inputs and the database storage locations; (d) determining third relationships between the business data and the database storage locations using the first and second relationships.

Typically the method further includes: (a) causing the application to generate sample output data, to thereby determine fourth relationships between the business data and application outputs; (b) analysing the database to determine from which of the number of storage locations the sample output data is obtained to thereby determine fifth relationships between the application outputs and the database; and, (c) determining sixth relationship between the business data and the database storage locations using the fourth and fifth relationships.

In a second broad form the present invention provides a method of determining relationships between business data and storage locations of a database, the database being associated with an application which interacts with the database, the method including: (a) causing the application to generate sample output data, to thereby determine fourth relationships between the business data and application outputs; (b) analysing the database to determine from which of the number of storage locations the sample output data is obtained to thereby determine fifth relationships between the application outputs and the database; and, (c) determining sixth relationship between the business data and the database storage locations using the fourth and fifth relationships.

Typically the method includes: (a) determining application inputs including at least one of: (i) an entity; and (ii) an attribute; (b) determining first relationships between the business data and the application inputs; (c) using sample input data as an input to the application and analysing the database to determine into which of the number of storage locations the sample data is stored, to thereby determine second relationships between the application inputs and the database storage locations; (d) determining third relationships between the business data and the database storage locations using the first and second relationships.

The method may further include storing an indication of the relationships in a repository.

The method can further include storing the data in the repository.

The method may further include: (a) determining Input Descriptions from the Application; (b) determining Output Descriptions from the Application; (c) determining additional Input Descriptions and Output Descriptions from Business Knowledge; (d) identifying business entities and attributes described by the application; (e) linking the business entities and attributes to input and output fields of the application using the determined descriptions; (f) extracting information about the physical organisation of the database; and, (g) notifying experts of any database storage locations not yet identified and linked to an input or output, to thereby enable an identity of the storage locations to be resolved.

The method can further include transferring data between a source database and a target database, each database having an associated application, wherein the method includes: (a) identifying seventh relationships between the business data of the source application and the business data of the target application; and, (b) determining eighth relationships between the data storage locations of the source application and data storage locations of the target application, by relating the third and sixth relationships of the source application to the third and sixth relationships of the target application, using the seventh relationship.

The method typically further includes generating data transfer scripts for extraction, transfer and loading data from the source database to the target database.

The method can further include correctly transferring data from one or more source databases to one or more target databases.

The method may include merging at least two source databases into a combined target database, each source database having an associated source application, wherein the method includes: (a) identifying seventh relationships between the business data of all the source applications; and, (b) deducing eighth relationships between the data storage locations of the source applications, by relating the third and sixth relationships of each source application to the third and sixth relationships of the each other source application, using the seventh relationship.

The method can further include: (a) generating scripts for extraction, transfer and loading the data from each of the source databases into the combined database; and, (b) generating view scripts for each source application which presents the combined database to the respective source application as if it is accessing its original database.

In a third broad form the present invention provides apparatus for determining relationships between business data and storage locations of a database, the database being associated with a respective application which interacts with the database, the apparatus including a processor for: (a) determining application inputs including at least one of: (i) an entity; and (ii) an attribute; (b) determining first relationships between the business data and the application inputs; (c) using sample input data as an input to the application and analysing the database to determine into which of the number of storage locations the sample data is stored, to thereby determine second relationships between the application inputs and the database storage locations; (d) determining third relationships between the business data and the database storage locations using the first and second relationships.

In a fourth broad form the present invention provides apparatus for determining relationships between business data and storage locations of a database, the database being associated with a respective application which interacts with the database, the apparatus including a processor for: (a) causing the application to generate sample output data, to thereby determine fourth relationships between the business data and application outputs; (b) analysing the database to determine from which of the number of storage locations the sample output data is obtained to thereby determine fifth relationships between the application outputs and the database; and, (c) determining sixth relationship between the business data and the database storage locations using the fourth and fifth relationships.

The apparatus can be adapted to perform the method of the first or second broad forms of the invention.

In a fifth broad form the present invention provides a database repository for defining relationships between business data and storage locations of a database, the database being associated with a respective application which interacts with the database, the repository including descriptions of: (a) first relationships between the business data and application inputs; (b) second relationships between the application inputs and the data storage locations; and, (c) third relationships between the business data and the data storage locations using the first and second relationships.

In a sixth broad form the present invention provides a database repository for defining relationships between business data and storage locations of a database, the database being associated with a respective application which interacts with the database, the repository including descriptions of: (a) fourth relationships between the business data and application outputs; (b) fifth relationships between the application outputs and the data storage locations; (c) sixth relationships between the business data and the application data storage locations via the fourth and fifth relationships.

The repository can further include descriptions of: (a) seventh relationships between the business data of an application and the business data of another application; and, (b) eighth relationships between the data storage locations of the application and data storage locations of another application.

The repository may further include descriptions of: (a) Multiple applications and their associated databases; (b) Input & output experiment logs; and, (c) Sample data categorised into experiments.

The data can be stored in any one of or a combination of a relational database, object serialisation, object oriented database, logical clauses, lists, files or other technology known to the art.

The repository is typically populated with data using the method of the first or second broad forms of the invention.

In a seventh broad form the present invention provides a method of transferring data between a source database and a target database, each database having an associated application and associated storage locations, wherein the method includes: (a) identifying seventh relationships between the business data of the source application and the business data of the target application; and, (b) determining eighth relationships between the data storage locations of the source application and data storage locations of the target application, by relating the third and sixth relationships of the source application to the third and sixth relationships of the target application, using the seventh relationship, the third and sixth relationships being determined using the first or second broad forms of the invention.

The method can further include generating data transfer scripts for extraction, transfer and loading the data from the source database to the target database.

The method may be used for transferring data from one or more source databases to one or more target databases.

In a eighth broad form the present invention provides a method of merging at least two source databases into a combined target database, each database having an associated application and associated storage locations, wherein the method includes: (a) identifying seventh relationships between the business data of all the source applications; and, (b) determining eighth relationships between the data storage locations of the source applications, by relating third and sixth relationships of each source application to third and sixth relationships of the each other source application, using the seventh relationship, the third and sixth relationships being determined using the method of the first or second broad forms of the invention.

The method can further include: (a) generating scripts for extraction, transfer and loading the data from each source database into the combined database; and, (b) generating view scripts for each source application which presents the combined database to the respective source application as if it is accessing its original database.

Brief Description of the Drawings

An example of the present invention will now be described with reference to the accompanying drawings, in which: -

Figure 1 is an overview of the database matching concept (the Matching Process);

Figure 2 is a schematic diagram of an example of apparatus for performing the processes of Figures 1 through 8; Figure 3 is a data model of the Repository database created by the Matching Process;

Figure 4 is an overview flow chart of the Matching Process;

Figure 4.1 is a flow chart of the first part of the Matching Process, using input data to analyse the database;

Figure 4.2 is a flow chart of the second part of the Matching Process, using output data to analyse the database;

Figure 5 is an overview of the process of converting the database of one application to a database of a second application using the Matching Process of Figure 1 ;

Figure 6 is a flow chart of the conversion process outlined in figure 5;

Figure 7 is an overview of the process of merging multiple databases into a common database using the Matching Process of Figure 1 ;

Figure 7.1 shows the source situation from Figure 7 where there are multiple applications with databases to be merged;

Figure 7.2 shows the target situation from Figure 7 where there are multiple applications with one combined database;

Figure 8 is a flow chart of the merging process outlined in figure 7.

Overview of the Preferred Embodiment

An overview of a preferred embodiment for determining a database data dictionary will now be described with respect to Figure 1 in which:

Description

An example of a methodology which can be used for determining the semantics of a database. This uses, in functional terms, an Application 100 which is an apparatus to perform a useful business process with the aid of a computer. It has: • Inputs 102 which may be screens, files and other input interfaces used to enable data to be provided to the application.

• Outputs 104 which are the results of the application processing. These outputs include reports, enquiry screens in human readable format, output files to be read by other applications and other output interfaces. • A Database 106 which stores the application's files. This may not resemble the Inputs 102 and Outputs 104 in an obvious way.

A Repository 110 is created and filled in by a Matching Process and describes the structure and organisation of the Application's Inputs 102, Outputs 104 and Database

106. It also stores descriptions of: • Business Entities 112 are a class of persons, places, entities, events, or concepts which is of interest to the users of this application. Information about the entity is manipulated by the application. The Matching Process needs to capture and store data about the entity.

• Sample Data 108, which are specific instances of data typically used by the application or produced by the application.

Business Knowledge 114 is information known to business experts about Application 100, which may be stored in documents such as User Manuals or may only be in the business experts' heads.

The methodology broadly consists of two main parts. 1. The Matching Process - a series of steps to progressively reduce the uncertainty about the structure and semantics of the database 106.

2. Applying the Matching Process to one or more application's databases so that the data can be made available for other business purposes, eg to be manipulated by other applications than the application which created the database. The Matching Process causes an apparatus to create and populate the Repository 110 of knowledge about the Database 106 and its relationship to the Application's Inputs 102 and Outputs 104. The steps in the process use Sample Data 108 that make sense to the users and gather existing Business Knowledge 114 from the Business Experts.

Applying the Matching Process causes the apparatus to gather the information about one or more Applications and their Databases and the relationships between the Databases.

These processes may be performed by a Process Expert or performed automatically by the apparatus under the control of the Process Expert. Eg the Process Expert operates the apparatus and communicates with the Business Experts.

It will be appreciated by persons skilled in the art that this is typically performed utilising apparatus which is capable of interacting with data in a database. An example of a suitable processing system will now be described with reference to Figure 2.

In particular, the processing system 10 generally includes at least a processor 20, a memory 21 , an input/output (I/O) device 22, such as a keyboard and display, and an external interface 23 coupled together via a bus 24. The processing system can be coupled to a database 11 via the external interface 25, as shown.

The processing system must be capable of writing data to and from the database, typically by using a suitable query language. Furthermore, the processing system 10 must be able to perform data manipulations as required by the method outlined above.

Accordingly, it will be appreciated that the processing system may be any form of processing system 10 suitably programmed to interact with the database 11. The processing system 10 may therefore be a suitably programmed computer, lap-top, palm computer, or the like. Alternatively, specialised hardware or the like may be used. Detailed Description of Preferred Embodiment

A more detailed description of a preferred embodiment for determining a database data dictionary will now be described with respect to Figure 3 in which:

In Figure 4 the reference numerals are as follows:

In Figure 4.1 the reference numerals are as follows:

In Figure 4.2 the reference numerals are as follows:

Detailed Description

Figure 3 is an expanded description of the Repository 110 from Figure 1. The Repository 110 holds detailed information created and used by the Invention Processes. It contains the data dictionary of each application being studied, plus other information now disclosed.

The Repository 110 contains information about many instances of each entity. Figure 3 and the following description use the field of the art convention of describing each entity in the singular. Only some attributes of entities are listed, and the full list will be obvious to persons skilled in the art.

In Figure 3:

• Application Description 300 describes the Application 100 as shown in Figure 1.

• Input Description 304 describes a particular Input 102 as shown in Figure 1.

• Input Field 314 describes each particular field of the Input Description 304. It records the name used on the screen, and field properties such as length, data type, text descriptions of validation rules.

• Database Description 302 describes the Database 106 as shown in Figure 1.

• DB Table 308 describes a table in the Database 302. It records the table name and any properties such as primary keys and whether the table has indexes. • DB Table Relationship 306 is a link between two Tables 308 where data is related. It records any properties such as the type, cardinality and meaning of the relationship.

• DB Field 318 is a field of data in the Table 308. It records the physical field name and field properties such as size and data type, whether the field is a key field.

• Input Experiment Log 316 records results of the input Matching Process described in Figure 4.1. It shows the relationship between an Input Field 314 and a DB Field 318.

• Output Description 312 describes a particular Output 104 as shown in Figure 1. • Output Field 322 describes each particular field of the Output Description 312. It records the name used on the output, and field properties such as length, data type, text descriptions of calculation rules.

• Output Experiment Log 320 Records results of the output Matching Process described in Figure 4.2. It shows the relationship between an Output Field 322 and a DB Field 318. It records the report field, DB field, experiment number, the probability of each match, the resolution and who resolved it.

• Business entity 328 is a class of persons, places, entities, events, or concepts about which the Matching Process needs to capture and store data. The entity has meaning to business users. It can consist of multiple sub-entities, e.g. an invoice is an entity which has sub-entities customer, product, etc. The entities must adhere to best practice database design standards such as normalisation.

• Business Attribute 326 is an atomic piece of data of interest to the business users, which describes an aspect of a Business Entity 328 e.g. price, weight, age, etc.

• Sample Input Data 330 is all the sample data sets to be used by the Input Matching Process Figure 4.1 , as related to the Business Entities 328. There are enough sample sets to cover all the types of data which have meaning to the business. E.g. all types of accounts or products. The data is internally consistent, is real data not just made up data, and it has meaning to the application users.

• Input Experiment 334 is a view of the Sample Input Data 330 which will be applied to a particular Input Description 304. For example, it may consist of a set of data which can be input on 1 data entry screen of the application.

• Sample Output Data 310 is all the sample data to be used by the output Matching Process Figure 4.2, as related to the Business Entity 328. The sample data is a consistent set of outputs produced by the Application 100, from a particular instance of its Database 106.

• Output Experiment 336 is a view of the Sample Output Data 310 which is described by a particular Output Description 312. it corresponds to one report format or one output file format, for example.

• Attribute Synonym 324 is an alternative name known to the business people for the Business Attribute 326. It records the name and who uses that name.

• Entity Synonym 332 is an alternative name known to the business people for the Business Entity 328. It records the name and who uses that name.

The Matching Process

The Repository 110 is used in the Matching Process as described below. Each stage in the Matching Process records further information in the Repository 110.

The matching process seeks to establish the relationships between 1 ) the business data, 2) the application's inputs and outputs and 3) the application's database.

This can be achieved by using sample business data, which is input to the application according to its interfaces, as used by the business organisation. The matching process then investigates the database to find where the data has been placed by the application, thus finding the input relationships. For outputs, the matching process produces sample outputs & then find where the application obtained the data from the database, thus finding the output relationships. •

Figure 4 gives a high level view of the Matching Process, in which:

• Describe Inputs 400 is a process to gather and record Input Descriptions 304 of Figure 3 from the Application 100 of figure 1. • Describe Outputs 402 is a process to gather and record Output Descriptions 312 of Figure 3 from the Application 100 of Figure 1.

• Other Sources 404 is a process to gather and record additional Input Descriptions 304 and Output Descriptions 312 of Figure 3 from Business Knowledge 114. • Business Entities And Attributes 406 is a process to identify the business entities and attributes described by the application and link them to the input and output fields of the application, gathered and recorded by steps 400, 402, 404 above.

• Extract DB Schema 408 is a process to automatically extract information about the physical organisation of the database, such as the way data is grouped into tables.

• Construct Repository 410 is a process to record the information from steps 400 to 408 in the Repository 110 of Figure 1. Each stage in the Matching Process records further information in the Repository 110.

• Collect and Process Inputs 412 is described in more detail in Figure 4.1. • Select and Process Outputs 414 is described in more detail in Figure 4.2.

• Resolve Unmatched DB Fields 416 is a process to notify the Process experts of any DB Fields 318 not yet identified and linked to an input or output, thus enabling the field identities to be resolved.

The Matching Process is used by a Process Expert to cause information to be recorded in the Repository 110 about the Application 100 and its components (Inputs 102, Outputs 104 and Database 106). The Process Expert gathers the information by using the Application 100 itself and from Business Knowledge 114.

As the following steps describe recording information in the Repository 110, extensive reference is made to Figure 3, which describes the Repository 110 in detail.

In the process Describe Inputs 400, the Process Expert gathers and records Input Descriptions 304 of Figure 3 gathered and recorded from the application Inputs 102 from Figure 1 , where business experts interpret the physical embodiment. The purpose is to identify all input fields of the Application that are used by the business users and to identify business rules that define what processes are performed on the data. This ensures the collection of sample data that falls within each business rule.

In the process Describe Outputs 402 the Process Expert gathers and records Output Descriptions 312 of Figure 3 gathered and recorded from the application Outputs 104 from Figure 1 , where business experts interpret the physical embodiment. The purpose is to identify all output fields of the Application that are used by the business users and to identify business rules that interpret the content of each output field. ln the process Other Sources 404 the Process Expert gathers and records Input Descriptions 304 and Output Descriptions 312 of Figure 3 gathered and recorded from Business Knowledge 114, where business experts interpret the information. The purpose is to identify additional business rules not found by steps 400 and 402. This ensures the collection of sample data that falls within each business rule.

In the process Business Entities And Attributes 406 the business experts identify business entities and business attributes from the Input Descriptions 304 and Output Descriptions 312 discovered by steps 400, 402 and 404. The Process Expert records Business Entities 328 of Figure 3 and Business Attributes 326 of Figure 3. Then the Process Expert links the Business Attributes 326 of Figure 3 to the Input Fields 314 of Figure 3 and Output Fields 322 of Figure 3 of the application.

The process Extract DB Schema 408 automatically extracts the structural organisation of the database and records it in Database Description 302, DB Table 308 of Figure 3, DB Field 318 of Figure 3 and DB table Relationship 306 of Figure 3.

The process Construct Repository 410 automatically uses the information from the processes:

• Describe Inputs 400,

• Describe Outputs 402,

• Other Sources 404, • Business Entities And Attributes 406 and

• Extract DB Schema 408

to create the working data structures: Input Experiment Log 316, Output Experiment Log 320, • Sample Output Data 310, Attribute Synonym 324, Entity Synonym 332, Sample Input Data 330, Input Experiment 334, • Output Experiment 336 and the relationships: Input Field 314 To Input Experiment Log 316, Input Experiment Log 316 To DB Field 318, Output Field 322 To Output Experiment Log 320, Output Experiment Log 320 To DB Field 318, Input Field 314 To Business Attribute 326 And Business Attribute 326 To Output Field 322

of the Repository 110 of Figure 1.

The process Collect and Process Inputs 412 is described in more detail in Figure 4.1 , which discovers th relationships between Input Fields 314 and DB Fields 318. This step requires the use of a test instance of the Application 100 and its Database 106.

The process Select and Process Outputs 414 is described in more detail in Figure 4.2 which discovers th relationships between Output Fields 322 and DB Fields 318.

After completion of processes Collect and Process Inputs 412 and Select and Process Outputs 414, the process Resolve Unmatched DB Fields 416 is used to notify the experts of any DB Fields 318 from Figure 3 that have not yet been matched. The said DB fields have not been matched either because:

• they are internal to the application or because

• they do not appear on any of the inputs or outputs of interest to the users of the application.

The experts must decide whether they need knowledge about these DB fields. If the experts do decide to investigate the unmatched DB fields, the task will be finished by methods known to the prior art. The investigation will be greatly reduced by the processes 412 and 414, as compared to investigation methods known to the prior art.

It will be appreciated from the above that the databases to be analysed by the Matching Process must allow access by independent means outside the applications that created and maintain them. That is, they must support an external DB query Application Programming Interface (API) such as JDBC, ODBC, ADO, etc to enable the requisite tests to be performed. The preferred software tools used to implement the Matching Process are based on SQL with a relational database for the Repository 110. Any other method known to the current art could be used, such as an 00 database.

As will now be described in more detail with respect to Figures 4.1 and 4.2, the first part of the Matching Process uses a test instance of the application and its database, which is exclusively used by the Matching Process. The second part of the Matching Process uses a frozen copy of the production instance of the application database, plus contents of all relevant enquiry screens and reports produced from the same database instance.

Collect and Process Inputs

The Collect and Process Inputs process is the first sub-process of the matching process. It seeks to establish the relationships between 1 ) the business data, 2) the application's inputs and 3) the application's database. This can be achieved by using sample business data, which is input to the application according to its interfaces, as used by the business organisation. The Collect and Process Inputs process then investigates the database to find where the data has been placed by the application, thus finding the input relationships.

Figure 4.1 describes the first part of the Matching Process in more detail. • Collect Sample Input Data 418 is a process which selects a subset of input items from the Sample Data 108 of Figure 1 and stores the subset tests in Sample Input Data 330 of Figure 3.

• Apply An Input Experiment 420 is a process to input an unmatched input sample set of data to the application's input methods. • Application Stores In Database 422 is a process where the application 100 performs its usual processes on the input sample set from step 420.

• Identify Changes To DB 424 is a process to find the changes made to the database by step 422.

• Match Changes To Input Experiment 426 is a process to find the equivalence between the input sample set and the database changes found in step 424. • Log in Repository 428 is a process to record the equivalences found in step 426.

• Finished All Sets? 430 is a process to repeat steps 418 to 428 for each input sample set collected.

• Results Approved? 432 is a process to repeat steps 418 to 430 if a domain expert deems that further tests are required.

The operation of the Collect and Process Inputs process requires a working test instance of the application, preferably with no business data in the database. The operation is as follows:

In the process Collect Sample Input Data 418, the Process Expert selects a subset of input items from the Sample Data 108 from Figure 1 and stores the subset of tests in Sample Input Data 330 from Figure 3. The Sample Data must be common examples of real data which conforms to business rules. The Sample Data should cover all common conditions and boundary conditions, but error condition tests are not required. Such data is commonly available as standard extracts from the production data of the Application 100 of Figure 1.

The process Apply an Input Experiment 420 is used by the Process Expert to cause:

. the current status of the application Database 106 of Figure 1 to be marked;

. an unapplied Input Experiment 334 from Figure 3 to be entered via the application Inputs 102 as for the normal production processes of the Application 100 of Figure 1.

The process Application Stores In Database 422 is used by the Process Expert to cause the Application 100 of Figure 1 to perform its normal function of writing data into its application Database 106 of Figure 1. The data may be stored as it was input, or may be manipulated by the Application 100 of Figure 1 before storage.

The process Identify Changes To DB 424 automatically finds the changes made by the Application 100 of Figure 1 when processing the Input Experiment 334.

The process Match Changes To Input Experiment 426 automatically identifies the equivalence between the Input Fields 314 of Figure 3 used by this test and the changed fields of the application Database 106 of Figure 1. Data converted to a foreign key is also discovered.

The process Log in Repository 428 automatically records the equivalences found by process 424 and 426 into Input Experiment Log 316 of Figure 3.

The process Finished All Sets? 430 is used by the Process Expert to return to the process Apply an Input Experiment 420, if there are still unused Input Experiments 334 of Figure 3.

In the process Results Approved? 432 a Business expert reviews the results stored into Input Experiment Log 316 of Figure 3. If the results are incomplete, further input tests may be required, by returning to Collect Input Sets 418.

Where data is manipulated by the application before storage, the Collect And Process Inputs process may not be able to find a match directly. For example if the data is modified, eg encrypted. The process can identify changed fields so a Business Expert can resolve the matching. This will only occur occasionally in practice and the resolution is known to the prior art.

The process Apply An Input Experiment 420 causes an unapplied input sample set of data to be input to the application screens. This can be done manually as for the normal production processes of the application or by using a test tool known to the prior art to automatically read and process a sample input file.

Select and Process Outputs

The Select and Process Outputs process is the second sub-process of the Matching Process. It seeks to analyse the derived data calculated by the application, such as charges and accumulated totals. This can establish the relationships between the remaining 1 ) business data, 2) application outputs and 3) application's database, not found from the Collect and Process Inputs process in Figure 4.1.

This can be achieved by using sample business data, which is output from the application according to its interfaces, as used by the business organisation. The Select and Process Outputs process then investigates the database to find where the application obtained the data from the database, thus finding the output relationships.

Figure 4.2 describes the second part of the Matching Process in more detail, where::

• Select Output Experiments 434 is a process which selects a subset of output items from the Sample Data 108 of Figure 1.

• Produce An Output Experiment 436 is a process to use the Application 100 to prepare output sample data and store it in the Repository 110 of Figure 1.

• Match Changes to Output Experiment 438 is a process to compare an Output Experiment 336 with any changes made to the Database 106 in process 436. • Match Database to Output Experiment 440 is a process to compare an Output Experiment 336 with the DB Fields 318, to find possible matches of the output experiment data.

• Finished All Sets? 442 is a process to repeat steps 436 to 440 for each Output Experiment 336. • Calculate Match Probabilities 444 is a process to calculate the probability of each Output Field 322 matching each DB Field 318, based on the number of different matching fields found.

• Expert Selects Results To Evaluate 446 is a process to produce a report or display for the process expert, to consider which probable matches from step 444 are actual matches.

• Log Preferred Matches In Repository 448 is a process to record each match selected by the expert during step 446 in the Repository 110.

• Finished All Sets? 450 is a process to return to step 446 if the process expert wishes. The process expert drives this process and can start and stop at will until all fields have been reviewed.

• Results Approved? 452 is a process to return to step 434 if the results of the tests are inconclusive, so that further tests can be created and run.

The Operation of the Select and Process Outputs process requires a copy of a production database which can be used to produce the selected output experiments. The operation is as follows:. An output experiment will usually contain both matched and unmatched output fields. The matched output fields have been input and new output fields have been calculated by the application. The matched output fields are used to select only rows of the database that refer to that instance of the business entity. Unmatched DB fields in these rows are then searched for occurrences of the unmatched output fields.

In the process Select Output Experiments 434 the Process Expert chooses output items to produce from the Sample Data 108 from Figure 1. The Sample Data 108 are actual outputs of the Application 100 of Figure 1. The same instance of the Database 106 used to produce the Sample Data 108 is also used for the following processes.

In the process Produce An Output Experiment 436, the Process Expert uses the Application 100 of Figure 1 to produce its standard Outputs 104. The Outputs 104 are treated as Sample Data 108 and are stored in Sample Output Data 310 of Figure 3. Each separate output 104 produced is treated as an Output Experiment 336. Any changes to the Database 106 are also gathered and recorded automatically.

The process Match Changes to Output Experiment 438 automatically compares the Output Experiment 336 of Figure 3 with the changes gathered and recorded in process 436. Results are recorded in the Output Experiment Log 320 of Figure 3. This step allows for the Application 100 making changes to its Database 106 while producing its standard Outputs 104.

The process Match Database to Output Experiment 440 automatically compares the Output Experiment 336 from Figure 3 with the content of Database 106 from Figure 1. Matches are recorded in the Output Experiment Log 320 from Figure 3. Many matches may be found for each item in the Output Experiment. These are possible sources of the Output Experiment 336 data.

Step 440 only matches DB Fields 318 not yet matched by the Input Matching Process described in Figure 4.1 ie not yet associated with a Business Attribute 326. Where the Output Experiment 336 contains fields which are already associated with a Business Attribute 326, this information is used to narrow the field of search for step 440. The Process Expert uses the process Finished All Sets? 442 to return to the process Produce An Output Experiment 436, if there are still unused Output Experiments 336 from Figure 3.

The process Calculate Match Probabilities 444, automatically finds the probability of each Output Field 322 matching a sub-set of the DB Field 318, based on the number of different matches found. Results are recorded in the Output Experiment Log 320 from Figure 3.

Many matches may be found for each item in the Output Experiment 336. This is why probabilities are used.

The process expert uses the process Expert Selects Results To Evaluate 446 to display, as requested:

• An Output Field 322 and all the matching DB Fields 318 in probability order

• A DB Field 318 and all the matching Output Fields 322 in probability order

The process expert identifies which probable matches are actual matches.

In the process Log Preferred Matches In Repository 448, each match selected by the process expert is automatically recorded in the Repository in Output Experiment Log 320, along with the name of the expert.

The process expert uses the process Finished Review? 450, to return to the process Expert Selects Results To Evaluate 446 if the expert chooses. The expert drives this process and can start and stop at will until all fields have been reviewed.

The Process Expert uses the process Results Approved? 452 to return to the process Select Sample Output Data 434 if the results of the tests are inconclusive. Further tests are then created and run.

Following the matching process described above, the determined repository can be used to perform a number of database interactions as will now be described. Converting a Source Database to a Target Database

An example of the process for converting a source database to a target database will now be described with respect to Figure 5 in which:

In Figure 6 the reference numerals are as follows:

In this example, the repository is used to convert a source database to a target database as shown in overview in Figure 5, in which: • The Source Application 500 is an instance of Application 100 from Figure 1. It maintains the Source Database 514, which data will be transferred to the Target Database 518. • The Target Application 502 is an instance of Application 100 from Figure 1. It maintains the Target Database 518, which will receive the data from the Source Database 514. • The External User View - Source 504 is an instance of Inputs 102 and Outputs 104 from Figure 1. • User Mapping 506 is the process of discovering the relationship between the two applications' external views. It is described in detail in Figure 6. • The External User View - Target 508 is an instance of Inputs 102 and Outputs 104 from Figure 1. • Matching - Source 510 is the process of discovering the relationship between the external and internal views of the Application 500, using the Matching Process of Figure 4.

• Matching - Target 512 is the process of discovering the relationship between the external and internal views of the Application 502, using the Matching Process of Figure 4.

• The Source Database 514 contains data as it is stored by the Application 500. It is an instance of Database 106 from Figure 1.

• Automatic Mapping 516 is the process of deriving the relationship between the databases of the Source Application 500 and the Target Application 502.

• The Target Database 518 contains data as it is stored by the Application 502. It is an instance of Database 106 from Figure 1.

The database conversion process is shown in more detail in Figure 6, in which:

• Capture Equivalence 600 is a process to gather and record the equivalence between the Source External User View 504 and the Target External User View 508 of Figure 5.

• Apply Matching 602 is a process to apply the Matching Process (Figure 4) to the Source Application 500 and Target Application 502.

• Deduce Equivalence 604 is a process to deduce the equivalence between the DB Fields 318 of the Source Database 514 and the Target Database 518.

• Report Unmatched Fields 606 is a process to report to the experts on any DB Fields 318 not matched by step 604.

• Generate Conversion Script 608 is a process to create the Automatic Mapping 516 for transferring the data from the Source Database 514 to the Target Database 518.

• Cleanse Source Database 610 is a process to perform additional processes on the Source Database 514 using methods known to the prior art.

• Run Conversion Script 612 is a process to transfer the required data from the Source Database 514 to the Target Database 518.

The Source Application 500 and the Target Application 502 cover the same business area, perform essentially the same business functions which are of interest to the business users and store essentially the same data, probably in different structures. The applications may be capable of performing additional business functions, but this is not of interest in the current situation. It is a common business requirement to transfer the contents of the Source Database 514 to the Target Database 518.

During operation, in the process Capture Equivalence 600, the Process Expert gathers and records the equivalence between the Source External User View 504 and the Target External User View 508 of Figure 5. This information is provided by the business experts. The Repository 110 of Figure 1 then has the information about the equivalence between Business Attributes 326, Inputs 304 and Outputs 312 of Figure 3 for Source Application 500 and Target Application 502.

The Process Expert uses the process Apply Matching 602 to apply the Matching Process (Figure 4) to the Source Application 500. The Repository 110 of Figure 1 then has the information about the equivalence between Inputs 304, Outputs 312 and DB Fields 318 of Figure 3 for Source Application 500.

The Process Expert also uses the process Apply Matching 602 to apply the Matching Process (Figure 4) to the Target Application 502. The Repository 110 of Figure 1 then has the information about the equivalence between Inputs 304, Outputs 312 and DB Fields 318 of Figure 3 for Target Application 502.

The process Deduce Equivalence 604 uses the information collected in the Repository 110 from steps 600 and 602 to automatically deduce the equivalence between the DB Fields 318 of the Source Database 514 and the Target Database 518.

The Process Expert uses the process Report Unmatched Fields 606 to report on any unmatched DB Fields 318. Cases include:

• DB fields in the Source Database 514 have no equivalence in the Target Database 518. • DB fields in the Target Database 518 have no equivalence in the Source Database 514.

The Business Experts must decide what to do about these cases, using methods from the prior art. The effort required for this is greatly reduced by the equivalences already resolved by the process. The process Generate Conversion Script 608 automatically creates the Automatic Mapping 516 for transferring the data from the Source Database 514 to the Target Database 518. This uses methods known to the prior art, known as a conversion ETL (Extract, Transfer, Load) script.

The process Cleanse Source Database 610 must be done before step 612 can be done. It uses methods known to the prior art to perform additional steps on the Source Database 514:

1. Data integrity checks to identify and correct any invalid data which may have crept into the database over the years. 2. Structural (referential) integrity checks to identify and correct any rows which are not correctly linked to their parent rows.

The process Run Conversion Script 612 automatically transfers the required data from the Source Database 514 to the Target Database 518, using the Auto Mapping 516 from step 608.

Capturing and recording the equivalence between the input and output of both applications can be largely a manual task, using the input and output sample sets as tools to facilitate discussion. The same sample sets are used for both applications. This step may be done by skilled users of each application sitting together and entering the same data into their respective applications. A software mechanism known to the prior art can be used to record their input actions so that the actions can be recorded in the Repository.

Merging Multiple Databases

An example of the process for merging multiple source databases into a target database will now be described with respect to Figure 7 in which:

In Figure 7.1 the reference numerals are as follows:

712 Database3 714 DatabaseN In Figure 7.2 the reference numerals are as follows:

In Figure 8 the reference numerals are as follows:

The source situation is shown in Figure 7.1 , in which: Application 1 700 is the first application to be processed. Application 2 702 is the second application to be processed. Application 3 704 is the third application to be processed. Application N 706 is the 'Nth' application to be processed. Databasel 708 is the Database of Application 1 700 Database2 710 is the Database of Application2 702 Database3 712 is the Database of Applications 704 DatabaseN 714 is the Database of ApplicationN 706

The target situation is shown in Figure 7.2, in which: • Application 720 is the first application after processing. • Application2 722 is the second application after processing. • Applications 724 is the third application after processing. • ApplicationN 726 is the 'Nth' application after processing. • Viewl 728 is the view of the Combined Database 738 that is presented to Applicationl 720. It is indistinguishable from the Databasel 708 that the Applicationl 720 used before.

• View2 730 is the view of the Combined Database 738 that is presented to Application2 722. It is indistinguishable from the Database2 710 that the Application2 722 used before.

• View3 732 is the view of the Combined Database 738 that is presented to Applications 724. It is indistinguishable from the Database3 712 that the Applications 724 used before. • ViewN 734 is the view of the Combined Database 738 that is presented to ApplicationN 726. It is indistinguishable from the DatabaseN 714 that the ApplicationN 726 used before.

• Database View Layer 736 is the process of combining the data from the various applications into one database. • Combined Database 738 is the single database that contains all the data used by all the applications.

In the following description, the applications involved cover different business areas but store some common data, probably in different structures. It is a common business situation to have several applications 700, 702, 704, ...706 with separate databases 708, 710, 712, ...714 as in Figure 7.1 'Current Situation'.

There is a requirement to merge the databases into one combined database as in Figure 7.2 'Required Situation' so the common data is only stored once.

This enables business people to better manage their business by having faster access to accurate information both for analysis and managing day to day business transactions. At the system level, it provides the benefits of referential integrity, data quality, data consistency and transactional integrity.

The required situation is as in Figure 7.2 'Required Situation', in which:

• Applicationl 700 of Figure 7.1 is the same as Applicationl 720 of Figure 7.2.

• Application2 702 of Figure 7.1 is the same as Application2 722 of Figure 7.2. • Applications 704 of Figure 7.1 is the same as Applications 724 of Figure 7.2.

• ApplicationN 706 of Figure 7.1 is the same as ApplicationN 726 of Figure 7.2. • Applicationl 720 accesses the Combined Database 738 through the Database View Layer 736, using Viewl 728.

• Application2 722 accesses the Combined Database 738 through the Database View Layer 736, using View2 730. • Applications 724 accesses the Combined Database 738 through the Database View Layer 736, using View3 732.

• ApplicationN 726 accesses the Combined Database 738 through the Database View Layer 736, using ViewN 734.

The merging process can be applied to any number of applications.

An example of the the merging process is shown in Figure 8, in which:

• Capture Equivalence 800 is a process to gather and record the equivalence between the input and output of the Applications 700, 702, 704, 706.

• Apply Matching 802 is a process to apply the Matching Process from Figure 4 to each of the source Applications 700, 702, 704, 706. • Deduce Equivalence 804 is a process to automatically deduce the equivalence between the Databases 708, 710, 712, 714 of source applications 700, 702, 704, 706.

• Design Database 806 is a process to design a Combined Database which includes all unique fields of the Databases 708, 710, 712, 714 of source Applications 700, 702, 704, 706.

• Generate View Sets 808 is a process to generate the Views 728, 730, 732, 734 of the Combined Database 738 to match each of the Databases 708, 710, 712, 714 and create the conversion scripts to convert Databases 708, 710, 712, 714 to the Combined Database 738. • Cleanse Source Databases 810 is a prior art process to resolve any data errors in the Databases 708, 710, 712, 714.

• Populate Combined Database 812 is a process to automatically populate the Combined Database 738, including all unique fields of the Databases 708, 710, 712, 714.

Operation of the process will now be described. The Process Expert uses the process Capture Equivalence 800 to gather and record the equivalence between the source external user views of the Applications 700, 702, 704, 706 of Figure 7. The Repository 110 of Figure 1 then has the information about the equivalence between Business Attributes 326, Inputs 304 and Outputs 312 of Figure 3 for the Applications 700, 702, 704, 706 of Figure 7.

The Process Expert uses the process Apply Matching 802 to apply the Matching Process (Figure 4) to the Applicationl 700 of Figure 7.1. The Repository 110 of Figure 1 then has the information about the equivalence between Business Attributes 326, Inputs 304, Outputs 312 and DB Fields 318 of Figure 3 for the Applicationl 700 of Figure 7.1.

The Process Expert uses the process Apply Matching 802 to apply the Matching Process (Figure 4) to the Application2 702 of Figure 7.1. The Repository 110 of Figure 1 then has the information about the equivalence between Business Attributes 326, Inputs 304, Outputs 312 and DB Fields 318 of Figure 3 for the Application2 702 of Figure 7.1.

The Process Expert uses the process Apply Matching 802 to apply the Matching Process (Figure 4) to the Applications 704 of Figure 7.1. The Repository 110 of Figure 1 then has the information about the equivalence between Business Attributes 326, Inputs 304, Outputs 312 and DB Fields 318 of Figure 3 for the Applications 704 of Figure 7.1.

The Process Expert uses the process Apply Matching 802 to apply the Matching Process (Figure 4) to the ApplicationN 706 of Figure 7.1. The Repository 110 of Figure 1 then has the information about the equivalence between Business Attributes 326, Inputs 304, Outputs 312 and DB Fields 318 of Figure 3 for the ApplicationN 706 of Figure 7.1.

The process Deduce Equivalence 804 automatically uses the information collected in the Repository 110 from steps 800 and 802 to deduce the equivalence between the DB Fields 318 of the source Databases 708, 710, 712, and 714. The Process Experts use the process Design Database 806 to finalise the design of the Combined Database 738 which includes all unique fields of the Databases 708, 710, 712, 714 of source Applications 700, 702, 704, 706. The database design is recorded in the Repository as the Business Attributes 326 and Business Entities 328 of the combined database. The Process Expert uses the process Design Database 806 to create:

• the Database Description 302;

• the DB Table 308 and the DB Relationships 306 for each Business Entity 328;

• the DB Field 318 for each Business Attribute 326

for the Combined Database 738.

The process Generate View Sets 808 automatically creates the Viewl 728, View2 730, View3 732 and ViewN 734. A view is a set of DB views, DB procedures and/or programs which presents the combined database to the relevant application as if it is accessing its original database.

The process Generate View Sets 808 also automatically creates a conversion ETL script for transferring the data from the Databasel 708, the Database2 710, the Database3 712, the DatabaseN 714 to the Combined Database 738. This uses methods known to the prior art, known as ETL (Extract, Transfer, Load) scripts.

The process Cleanse Source Database 810 must be done before step 812 can be done. It uses methods known to the prior art to perform additional steps on the databasel 708, the database2 710, the database3 712, the databaseN 714:

1. Data integrity checks to identify and correct any invalid data which may have crept into the database over the years.

2. Structural (referential) integrity checks to identify and correct any rows which are not correctly linked to their parent rows.

The process Populate Combined Database 812 automatically runs the ETL scripts to populate the combined database 738, including all unique fields of the databases 708, 710, 712 and 714.

Conclusion Accordingly, whilst prior art solutions attempt to solve the problem addressed by matching business attributes directly to database fields, the process described above uses two smaller steps, first mapping business attributes to application input and output, then mapping the input and output to the database. This is a much simpler process than those used in the prior art.

A number of additional variations are also within the scope of the above. Some examples will now be described

Periodic process to detect and transfer only the changes between databases. This uses the technique explained in Figure 4 to detect the changes. This technique could be used for exchanging updates:

• between applications belonging to the same organisation, where a decision has been made not to fully integrate them.

• Business to Business (B2B),

• Business to Customer (B2C),

The format of the updates may be in EDI, XML or other formats known to the art.

Another use of the Matching Process is to enable the database structure to be understood by application developers employed by the organisation so that they can add functions to the application, e.g. additional reporting.

The Merging Multiple Databases embodiment described in Figures 7 and 8 can be used in a number of ways. The organisation may adopt a gradual approach, merging two databases and then adding further databases one at a time, or any other combination.

The system therefore provides a methodology and an apparatus for discovering the semantics of a database structure, which thereby aids with database interactions such as merging databases, conversion of databases, or accessing the information from one combined database. This therefore increases the level of automation and improves the process of recovery of the organisation and semantics of databases, which in turn significantly reduces the demands on the time of people with domain knowledge in both the IT and business areas. The process described above not only discovers the data dictionary for the database of one application, it can be used to discover the data dictionaries for the databases of several applications and the relationships between them. The data dictionaries and their relationships can then be used in database interactions.

Throughout the above and according to Davis, G. "Encyclopedic Dictionary of Management Information Systems" Blackwell 1999, "The data dictionary is a reference work of data about data, i.e. metadata. It defines each data element contained in an information system, specifies both its logical & physical characteristics, and provides information concerning how it is used. Historically, the data dictionary was created to extend the information about data provided by the database schema. The database schema ...contains sufficient information for computer access & processing. However, the database schema usually contains insufficient information for those who use, manage & maintain the database."

Persons skilled in the art will appreciate that numerous variations and modifications will become apparent. All such variations and modifications which become apparent to persons skilled in the art should be considered to fall within the spirit and scope that the invention broadly appearing before described.

APPENDIX A

An example illustrating the matching process, will now be described with respect to figure 4.

Step 400 - Describe Inputs.

The example has 2 screens - add product & receive goods, with fields as follows: Input Screen - Add Product Fields Stock Number Stock Item Name Buy price Reorder level Order units

Input Screen - Receive Goods Attribute Item code Qty received

Step 402 - describe outputs.

The example has 1 report - inventory report, with fields as follows: Output Report - Inventory Report Fields Product code Product name Buy price Stock on hand Reorder level Order units

Step 404 - Other Sources.

None for this simple example.

Step 406 - Business Entities and Attributes.

There are 2 entities - product and receipt of stock, with attributes and synonyms as follows:

Step 408 - Extract DB Schema.

The database has 1 table with field names in Russian, as follows: DB Fields Nomer Tovara Tovar Tzena Nalitchnost Novo Zakaza Blok Zakaza

Step 410 - Construct Repository.

All the above information is stored in the repository.

Step 412 - Collect and Process Inputs.

Here we go to Figure 4.1 and execute those steps in sequence.

Step 418 - Collect Test Input Data.

Business Entity - Product. A sample set is selected:

Step 420 - Apply An Input Experiment.

Sample 1 is applied to the Input Screen - Add Product.

Step 422 - Application Stores In Database

Step 424 - Identify Changes To DB.

The database now contains the following information:

Step 428 - Log In Repository.

The relationships discovered are recorded in the repository. Step 430 - Finished All Sets?

We now apply the second example set of data.

Step 420 - Apply An Input Experiment.

Sample 1 is applied to the Input Screen - Add Product.

Step 422 - Application Stores In Database

Step 432 - Results Approved?

The Process Expert reviews and approves the results. The Qty Received is wrongly related to Nalitchnost (Stock on Hand). This will be discovered by the next step.

Step 414 - Select and Process Outputs.

Here we go to Figure 4.2 and execute those steps in sequence.

Step 434 - Select Output Experiments

Only 1 experiment is selected for this example. It is a report from the live database.

Step 436 - Produce An Output Experiment Run the Inventory Report from the application, which produces:

Step 438 - Match Changes To Output Experiment

The application does not make any changes to the database while producing this report.

Step 440 - Match Database To Output Experiment

Creates the following relationships which are potential matches. Because we are only using 1 sample, we only get

1 set of relationshi s:

Step 442 - Finished All Sets? Yes

Step 444 - Calculate Match Probabilities

All are 100% matches because we only have 1 sample.

Step 446 - Expert Selects Results To Evaluate

We have a conflict because the DB Field Nalitchnost (Stock on Hand) is related to both Qty Received and Stock on Hand. Additional Receipt of Stock and Inventory Report experiments will resolve this.

Step 448 - Log Preferred Matches In Repository Step 450 - Finished Review? Yes Step 452 - Results Approved? Yes

The result is that the following relationships between business attributes, inputs and outputs and the database have been created.

Claims

THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS:

1 ) A method of determining relationships between business data and storage locations of a database, the database being associated with a respective application which interacts with the database, the method including: (a) determining application inputs including at least one of: (i) an entity; and (ii) an attribute; (b) determining first relationships between the business data and the application inputs; (c) using sample input data as an input to the application and analysing the database to determine into which of the number of storage locations the sample data is stored, to thereby determine second relationships between the application inputs and the database storage locations; (d) determining third relationships between the business data and the database storage locations using the first and second relationships.

2) A method according to claim 1 , wherein the method further includes: (a) causing the application to generate sample output data, to thereby determine fourth relationships between the business data and application outputs; (b) analysing the database to determine from which of the number of storage locations the sample output data is obtained to thereby determine fifth relationships between the application outputs and the database; and, (c) determining sixth relationship between the business data and the database storage locations using the fourth and fifth relationships.

3) A method of determining relationships between business data and storage locations of a database, the database being associated with an application which interacts with the database, the method including: (a) causing the application to generate sample output data, to thereby determine fourth relationships between the business data and application outputs; (b) analysing the database to determine from which of the number of storage locations the sample output data is obtained to thereby determine fifth relationships between the application outputs and the database; and, (c) determining sixth relationship between the business data and the database storage locations using the fourth and fifth relationships. 4) A method according to claim 3, the method including: (a) determining application inputs including at least one of: (i) an entity; and (ii) an attribute; (b) determining first relationships between the business data and the application inputs; (c) using sample input data as an input to the application and analysing the database to determine into which of the number of storage locations the sample data is stored, to thereby determine second relationships between the application inputs and the database storage locations; (d) determining third relationships between the business data and the database storage locations using the first and second relationships.

5) A method according to any one of the claims 1 to 4, the method further including storing an indication of the relationships in a repository. 6) A method according to claim 5, wherein the method further includes storing the data in the repository.

7) A method according to any one of the claims 1 to 6, the method further including: (a) determining Input Descriptions from the Application; (b) determining Output Descriptions from the Application; (c) determining additional Input Descriptions and Output Descriptions from Business Knowledge; (d) identifying business entities and attributes described by the application; (e) linking the business entities and attributes to input and output fields of the application using the determined descriptions; (f) extracting information about the physical organisation of the database; and, (g) notifying experts of any database storage locations not yet identified and linked to an input or output, to thereby enable an identity of the storage locations to be resolved.

8) A method according to claim 2 or claim 4, wherein the method further includes transferring data between a source database and a target database, each database having an associated application, wherein the method includes: (a) identifying seventh relationships between the business data of the source application and the business data of the target application; and, (b) determining eighth relationships between the data storage locations of the source application and data storage locations of the target application, by relating the third and sixth relationships of the source application to the third and sixth relationships of the target application, using the seventh relationship. 9) A method according to claim 8, wherein the method further includes: (a) generating data transfer scripts for extraction, transfer and loading data from the source database to the target database. 10)A method according to claim 8, wherein the method further includes correctly transferring data from one or more source databases to one or more target databases.

11 )A method according to claim 2 or claim 4, wherein the method includes merging at least two source databases into a combined target database, each source database having an associated source application, wherein the method includes: (a) identifying seventh relationships between the business data of all the source applications; and, (b) deducing eighth relationships between the data storage locations of the source applications, by relating the third and sixth relationships of each source application to the third and sixth relationships of the each other source application, using the seventh relationship. 12)A method according to claim 11 , wherein the method further includes: (a) generating scripts for extraction, transfer and loading the data from each of the source databases into the combined database; and, (b) generating view scripts for each source application which presents the combined database to the respective source application as if it is accessing its original database.

13)Apparatus for determining relationships between business data and storage locations of a database, the database being associated with a respective application which interacts with the database, the apparatus including a processor for: (a) determining application inputs including at least one of: (i) an entity; and (ii) an attribute; (b) determining first relationships between the business data and the application inputs; (c) using sample input data as an input to the application and analysing the database to determine into which of the number of storage locations the sample data is stored, to thereby determine second relationships between the application inputs and the database storage locations; (d) determining third relationships between the business data and the database storage locations using the first and second relationships.

14)Apparatus for determining relationships between business data and storage locations of a database, the database being associated with a respective application which interacts with the database, the apparatus including a processor for: (a) causing the application to generate sample output data, to thereby determine fourth relationships between the business data and application outputs; (b) analysing the database to determine from which of the number of storage locations the sample output data is obtained to thereby determine fifth relationships between the application outputs and the database; and, (c) determining sixth relationship between the business data and the database storage locations using the fourth and fifth relationships. 15)Apparatus according to claim 13 or claim 14, the apparatus being adapted to perform the method of any one of the claims 1 to 12. 16)A database repository for defining relationships between business data and storage locations of a database, the database being associated with a respective application which interacts with the database, the repository including descriptions of: (a) first relationships between the business data and application inputs; (b) second relationships between the application inputs and the data storage locations; and, (c) third relationships between the business data and the data storage locations using the first and second relationships.

17)A database repository for defining relationships between business data and storage locations of a database, the database being associated with a respective application which interacts with the database, the repository including descriptions of: (a) fourth relationships between the business data and application outputs; (b) fifth relationships between the application outputs and the data storage locations; (c) sixth relationships between the business data and the application data storage locations via the fourth and fifth relationships.

18)A database repository according to claim 16 or claim 17, the repository further including descriptions of: (a) seventh relationships between the business data of an application and the business data of another application; and, (b) eighth relationships between the data storage locations of the application and data storage locations of another application. 19)A database repository according to any one of the claims 16 to 18, wherein the repository further includes descriptions of: (a) Multiple applications and their associated databases; (b) Input & output experiment logs; and, (c) Sample data categorised into experiments.

20)A database repository according to any one of the claims 16 to 19, where the data can be stored in any one of or a combination of a relational database, object serialisation, object oriented database, logical clauses, lists, files or other technology known to the art.

21 )A database repository according to any one of the claims 16 to 20, wherein the repository is populated with data using the method of any one of the claims 1 to 12. 22)A method of transferring data between a source database and a target database, each database having an associated application and associated storage locations, wherein the method includes: (a) identifying seventh relationships between the business data of the source application and the business data of the target application; and, (b) determining eighth relationships between the data storage locations of the source application and data storage locations of the target application, by relating the third and sixth relationships of the source application to the third and sixth relationships of the target application, using the seventh relationship, the third and sixth relationships being determined using the method of claim 2 or the method of claim 4. 23)A method according to claim 22, wherein the method further includes generating data transfer scripts for extraction, transfer and loading the data from the source database to the target database.

24)A method according to claim 21 or claim 22, wherein the method is used for transferring data from one or more source databases to one or more target databases.

25)A method of merging at least two source databases into a combined target database, each database having an associated application and associated storage locations, wherein the method includes: (a) identifying seventh relationships between the business data of all the source applications; and, (b) determining eighth relationships between the data storage locations of the source applications, by relating third and sixth relationships of each source application to third and sixth relationships of the each other source application, using the seventh relationship, the third and sixth relationships being determined using the method of claim 2 or the method of claim 4. 26)A method according to claim 25, wherein the method further includes: (a) generating scripts for extraction, transfer and loading the data from each source database into the combined database; and, (b) generating view scripts for each source application which presents the combined database to the respective source application as if it is accessing its original database.