WO2003023687A2

WO2003023687A2 - An advanced method for profile analysis of continuous data

Info

Publication number: WO2003023687A2
Application number: PCT/US2002/027805
Authority: WO
Inventors: Justin Neway; Brent Rognlie
Original assignee: Aegis Analytical Corporation
Priority date: 2001-09-12
Filing date: 2002-09-04
Publication date: 2003-03-20
Also published as: WO2003023687A3; AU2002323532A1

Abstract

The present invention provides a method for analysis comprising the steps of: generating plots for each of a plurality of batches of a process manufacturing process based on data for at least one continuous parameter; aligning the plots based on at least one aligning continuous parameter of each of the plurality of batches; selecting a plurality of profiles for the aligned plots; analyzing the profiles using a regression method to provide analysis results indicating the level of success of the process manufacturing process; and displaying the analysis results to a user or storing the analysis results in a machine readable medium. The present invention also provides a machine readable medium that may be used to implement the method of the present invention.

Description

AN ADVANCED METHOD FOR PROFILE ANALYSIS OF CONTINUOUS DATA

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims the priority of U.S. Provisional Application No.

60/318,329 entitled "A Method for Advanced Profile Analysis," filed September 12, 2001, the entire disclosure and contents of which is hereby incorporated by reference.

BACKGROUND OF THE INVENTION

Field of the Invention

The present invention relates to a method for analyzing the data from a production process.

Description of the Prior Art

Some manufacturing processes, e.g., large scale fermentations or synthetic operations, are very time consuming and expensive to operate. Predicting bad outcomes early and aborting a process can save substantial time and money. Improving control of these operations based on real-time or retrospective data analysis can also result in substantial time and monetary savings. Currently there is not a tool on the market that provides a sufficient method of developing statistical models relating the individual values of one or more continuous parameters at many time points to determine if there is an optimum time point where the values of the continuous parameters can be used to predict the final process outcome with a high degree of certainty.

SUMMARY OF THE INVENTION

It is therefore an object of the present invention to provide for a method of semi- automated data analysis which will predict with a high degree of certainty, unfavorable outcomes as early as possible in a production process, and that can be used to more comprehensively identify interacting critical process parameters and to define the appropriate range to maximize the probability for the success of future batches or production runs.

It is another object of the present invention to provide for a method of semi- automated statistical analysis that will identify and predict with a high degree of certainty, useful parameters for improving process control.

It is yet another object of the present invention to provide an advanced method of profile analysis that relates values of multiple continuous parameters at various time points to some discrete outcome parameter.

It is yet another object of the present invention to provide a semi-automated profile analysis capability for better understanding the relationship between the values of multiple process parameters at specific times, i. e. the same time or different times for each parameter, and one or more outcome parameters.

It is yet another object of the present invention to provide a method for allowing a user to align the parameters across multiple batches or production runs based on some criterion such as the time a parameter reaches a specified value, e.g., when dissolved oxygen reaches 50%.

It is yet another object of the present invention to provide a method for allowing a user to align continuous parameters across multiple batches or production runs with offset times representing the times that similar process states are reached and which can also be used to perform data analysis.

It is yet another object of the present invention to provide a method for allowing a user to determine the value of each continuous parameter that is the actual value at the given time point or an average of the parameter values within a specified range of occurrences.

It is yet another object of the present invention to provide a method for allowing a user to specify a statistical technique, e.g., multiple regression, that will be performed using the data values at each time point that relates the values of the continuous parameters, to the outcome parameter(s) of interest.

It is yet another object of the present invention to provide a method for predicting bad outcomes early so that an unsuccessful process can be aborted early saving substantial time and money in several manufacturing processes, e.g., large scale fermentations and other synthetic processes, that are very time consuming and expensive to operate.

It is yet another object of the present invention to provide a tool in the market that can be used in a semi-automated fashion to develop statistical models relating the individual values of one or more continuous parameters at many time points to the outcome parameter(s) of interest.

It is yet another object of the present invention to provide a semi-automated method for developing statistical models to determine if there is an optimum time point where the values of the continuous parameters can be used to predict the final process outcome with a high degree of certainty.

According to a first broad aspect of the present invention, there is provided a method for analysis comprising the steps of: generating plots for each of a plurality of batches of a process manufacturing process based on data for at least one continuous parameter; aligning the plots based on at least one aligning continuous parameter of each of the plurality of batches; selecting a plurality of profiles for the aligned plots; analyzing the profiles using a regression method to provide analysis results indicating the level of success of the process manufacturing process; and displaying the analysis results to a user. According to a second broad aspect of the invention, there is provided a machine readable medium storing instructions that, if executed by a computer system, causes the computer system to perform a set of operations comprising: generating plots for each of a plurality of batches of a process manufacturing process based on data for at least one continuous parameter; aligning the plots based on at least one aligning continuous parameter of each of the plurality of batches; selecting a plurality of profiles for the aligned plots; analyzing the profiles using a regression method to provide analysis results indicating the level of success of the process manufacturing process; and displaying the analysis results to a user.

According to a third broad aspect of the invention, there is provided a method for analysis comprising the steps of: generating plots for each of a plurality of batches of a process manufacturing process based on data for at least one continuous parameter; aligning the plots based on at least one aligning continuous parameter of each of the plurality of batches; selecting a plurality of profiles for the aligned plots; analyzing the profiles using a regression method to provide analysis results indicating the level of success of the process manufacturing process; and storing the analysis results in a machine readable medium.

According to a fourth broad aspect of the invention, there is provided a machine readable medium storing instructions that, if executed by a computer system, causes the computer system to perform a set of operations comprising: generating plots for each of a plurality of batches of a process manufacturing process based on data for at least one continuous parameter; aligning the plots based on at least one aligning continuous parameter of each of the plurality of batches; selecting a plurality of profiles for the aligned plots; analyzing the profiles using a regression method to provide analysis results indicating the level of success of the process manufacturing process; and storing the analysis results in a second machine readable medium.

Other objects and features of the present invention will be apparent from the following detailed description of the preferred embodiment.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be described in conjunction with the accompanying drawings, in which:

FIG. 1 is a flow chart illustrating a preferred embodiment of the method of the present invention;

FIG. 2 is a screenshot of the operation of a program used to implement a preferred embodiment of the method of the present invention showing plots for parameters of a process being analyzed prior to alignment;

FIG. 3 is a screenshot of the operation of the program of FIG. 2 showing how the plots of FIG. 2 may be shifted using a preferred embodiment of the method of the present invention;

FIG. 4 is a screenshot of the operation of the program of FIG. 2 showing one parameter on which a linear regression may be carried out in accordance with a preferred embodiment of the method of the present invention;

FIG. 5 is a screenshot of the operation of the program of FIG. 2 showing five continuous parameters and a series of selected time-based profiles for a process being analyzed by the program of FIG. 2;

FIG. 6 is a screenshot of the operation of the program of FIG. 2 showing one- line summaries of the multiple regression analysis output of the process being analyzed by the program of FIG 2; and

FIG. 7 is a screenshot of the operation of the program of FIG. 2 showing a detailed output of a model of the process being analyzed by the program of FIG. 2.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

Definitions

It is advantageous to define several terms before describing the invention. It should be appreciated that the following definitions are used throughout this application.

Where the definition of a term departs from the commonly used meaning of the term, applicant intends to utilize the definitions provided below, unless specifically indicated.

For the purposes of the present invention, a "raw data" source refers to unadjusted data contained in the original data sources.

For the purposes of the present invention, the term "profile" refers to the collection of values of one or more continuous parameters of one or more "parameter sets" at a given common time point.

For the purposes of the present invention, the term "the level of success of a manufacturing process" refers to the current relationship of the process to the predicted outcome of interest, such as an impurity or yield outcome parameter.

For the purposes of the present invention, the term "process manufacturing process" refers to a process that uses processing steps that exert chemical and physical changes on the raw materials and intermediate materials so that they are physically transformed into products that no longer resemble the starting materials. Examples of process manufacturing processes are the production of antibiotics or genetically engineered proteins from sugars and salts using living microorganisms, or the production of gasoline from crude oil. For the purposes of the present invention, the term "discrete manufacturing process" refers to a process that is essentially an assembly operation. Examples of discrete manufacturing are the manufacturing of an automobile or a shirt that consists of assembling various parts to make a whole. The finished product looks like an assembly of the parts that the process started with. Some of the parts used to assemble a car may be produced by process manufacturing processes. For example, an engine block for an automobile is made from molten alloy that is produced by a process manufacturing process starting with chunks of mineral ores that are melted and processed so that they no longer resemble the mined starting materials.

For the purposes of the present invention, the term "bioprocess" refers to any process manufacturing process that involves the use of cell cultures, including living cell cultures such as bacterial cultures.

For the purposes of the present invention, the term "user" refers not only to end- users of software employing the method of the present invention, but also to individuals, such as software developers or database designers, who carry out one or more steps of the method of the present invention.

For the purposes of the present invention, the term "process" refers to any process. The method of the present invention may be used to access and analyze data from processes for producing one or more products including manufacturing processes, purification processes, chemical synthesis processes, etc. or may be used for other types of processes such as tracking the shipment of goods, tracking inventory in a store, etc. A process of the present invention includes one or more steps.

For the purposes of the present invention, the term "parameter" refers to any property or characteristic used to classify an individual piece or multiple pieces of data. For the purposes of the present invention, there are two characteristics of "parameters":

"identification codes" and "parameter values." Any parameter that is not used as an identification code for an analysis group is a parameter value. Parameters may include characteristics such as the temperature at a particular time, the pH of a solution, the purity of a compound, the source of a raw material, etc.

For the purposes of the present invention, the term "parameter value" refers to the specific piece of data resulting from a measurement associated with a specific parameter. Examples of specific parameter values include the particular batch number for a batch, the measured temperature associated with a batch material at a particular time in its production cycle, the test outcome for a specific parameter, etc.

For the purposes of the present invention, the term "identification code" refers to a code, name etc. that uniquely identifies a particular parameter.

For the purposes of the present invention, the term "parameter set" refers to a group of parameters that relate to the same batch of manufactured product. A parameter set may be obtained from a single data set or multiple data sets. A parameter set may have one or more "parameter values" associated with each parameter in the parameter set.

For the purposes of the present invention, the term "equipment parameter" refers to a parameter relating to one or more pieces of equipment used in a process manufacturing process of the present invention. Examples of such equipment parameters include: RPMs of an agitator, pressure of a vessel, etc.

For the purposes of the present invention, the term "material parameter" refers to a parameter relating to a material that is processed by a process manufacturing process of the present invention. Examples of such material parameters include: pH of a solution, concentration of metabolite, temperature of a liquid, etc.

For the purposes of the present invention, the term "environmental parameter" refers to a parameter relating to the environment to which material processed by the process manufacturing process is exposed. Examples of environmental parameters include: room air temperature, humidity, dew point, etc.

For the purposes of the present invention, the terms "material parameter", "environmental parameter" and "equipment parameter" refer to different species of parameters.

For the purposes of the present invention, the term "raw material" refers to starting materials used in a process for producing a product.

For the purposes of the present invention, the term "intermediate material" refers to a material produced at any point in the process prior to producing the final product of the process. An intermediate material may be produced by manufacturing the intermediate material from raw materials or other intermediate materials, by purifying raw materials or other intermediate materials, by the synthesis from raw materials or other intermediate materials, etc.

For the purposes of the present invention, the term "batch" refers to a given amount of product and the materials and conditions used to make that given amount of product, regardless of the amount of raw materials used, the amount of product produced, or the time taken to produce a given amount of product. Several types of discrete data, continuous data, and replicate data may all be related to a particular batch of product. The term batch as used in the present invention may refer to a production run of several hours, days, weeks, months, etc.

For purposes of the present invention, the term "data source" refers to any source of data such as a database or a data storage file, data directly produced by a measurement device, data electronically sent from a remote location, data entered into a database from paper records, etc. Two data sources are considered to be "different" if the data sources employ different file formats or different data structures or have different physical locations. For the purposes of the present invention, the term "data set" refers to a set of data or a database. A data set may be classified into a particular "complete data set type" based on the data set's primary data set type, secondary data set type and the same tertiary data set type.

For the purposes of the present invention, the term "discrete data" refers to data parameter values that are obtained only once during the process of producing one batch of product. Examples of discrete data include: the amount of an ingredient added at some step in a process, the source of an ingredient added at a particular step in a process, the date of production of an ingredient used in a process, etc.

For the purposes of the present invention, the term "continuous data" refers to data parameter values that are obtained at several times during the same step of the process of producing a batch of product, with each collection having an associated time. Examples of continuous data include: the temperature at a particular step of a process measured in 5 second intervals for the duration of the step, the moisture content of the effluent air at a particular step measured in 10 second intervals for the duration of the step, the amount of contamination present at a particular step measured in 15 minute intervals, etc.

For the purposes of the present invention, the term "replicate data" refers to data parameter values that are obtained from several measurements of the same parameter made independent of the time of the measurement, i.e. replicate data includes data obtained from multiple measurements of the same parameter made at the same time and data obtained from multiple measurements of the same parameter taken with no regard as to the time that the measurements were made. Replicate data may also be discrete data or continuous data.

For the purposes of the present invention, the term "analysis group" refers to a collection of parameter sets that may be selected by a user wherein all of the parameter sets meet the "parameter restrictions" for one or more parameters. For example, an analysis group could include all of the parameter sets which have median temperature parameter values of 35 to 38° C for three different time points, a minimum pH parameter value above 7, have the same raw materials supplier parameter, have a raw materials supplied date parameter value of January, etc. An analysis group is a structured data container that supports fast, efficient utilization of data via standardized interfaces. The structure of an analysis group permits it to hold all types of data concurrently, e.g. discrete, continuous, replicate, etc. An analysis group can be thought of as a sparsely populated multidimensional data cube, with parameter sets (that relate to individual batches of manufactured product) making up one axis, parameter names making up another axis, and time offsets (for continuous parameters) making up another axis, and replicate information making up another axis. Analysis groups also allow the dynamic creation of additional parameters within the analysis groups, allow for the data within them to be subsetted for subsequent operations and allow themselves to be updated with new data from the data sources on an on-demand basis. Analysis groups of the type employed by the method of the present invention are described in greater detail in U.S. Patent Application No. 09/816,547, entitled "System, Method and Computer Program Product for Mapping Data of Multi-Database Origins," which provides a detailed description of continuous data of the type used in the method of the present invention, and the entire contents and disclosure of this patent application is hereby incorporated by reference.

For the purposes of the present invention, the term "computer system" refers to any type of computer system that implements software including an individual computer such as a personal computer, mainframe computer, mini-computer, etc. or a network of computers, such as a network of computers in a business, the Internet, personal data assistant, cell phone, etc.

For the purposes of the present invention, the term "visual display device" includes any type of visual display device such as a CRT monitor, LCD screen, etc. Description

The present invention provides a method for statistical analysis of large multi- parameter data sets and finding correlation between values of multiple continuous parameters and selected outcome parameter(s). In a preferred embodiment of the analysis method of the present invention, finding the correlation between the multiple continuous parameters and the selected outcome parameters is done through modeling these large sets of data. In other words, the continuous parameters along with the discrete outcome parameter are first fitted to a statistical model, such as linear regression or multiple regression or non-linear regression. The output from the statistical model is a mathematical relationship between the continuous parameters and the selected outcome parameter. The selected outcome parameter may then be predicted at several time points during the process to decide or determine the fate of the manufacturing process with a high degree of certainty and for better process control.

Advanced Profile Analysis (APA) is a method of a preferred embodiment of the present invention for performing statistical analysis that relates values of multiple continuous parameters at various time points to some selected outcome parameter for the purpose of identifying which parameters are the best predictors of the chosen process outcome and for predicting unfavorable outcomes as early as possible in a production process with a high degree of certainty. It may also be used for understanding the relationship between the values of multiple process parameters at specific times (the same time or different times for each parameter) and one or more outcome parameters for the purpose of better process control. The multiple continuous parameters may come from a single piece of equipment or unit operation, such as a fermentor or from multiple pieces of equipment at one or more process steps. There are various aspects to this technique as follows:

The method may first allow the user to align all the parameters across multiple batches or production runs based on some criterion such as time at which a certain parameter value occurs, e.g., when dissolved oxygen reaches 50%. This allows the absolute times at which all the parameters are measured for a particular batch to be adjusted by the amount needed to align that batch to a process state similar to the rest of the batches.

The method of the present invention may then allow the user to specify a single time point for the continuous parameters that is the best predictor of the selected process outcome. In another embodiment, the method of the present invention may allow different time points to be used for each of the parameters. In yet another embodiment, the method of the present invention may use multiple time points for the continuous parameters based on some interval, e.g. every hour, where the value of each continuous parameter is the actual value at the given time point. Alternatively, the method of the present invention may employ the average or some other derivation of each parameter that includes a user-specified number of time points or a specified time interval on either side of a particular time point.

The method may then allow the user to specify a statistical technique, e.g., multiple regression, that will be performed at each time point that relates the values of the continuous parameters, as specified above, to the outcome parameter(s) of interest.

The resulting statistical models may be presented to the user ordered by the time of the profile or in order of significance.

Variations may consist of using any of a number of statistical techniques to apply to the method of relating the values of multiple continuous parameters to a discrete outcome parameter. For example, multiple linear regression, non-linear regression, principal component analysis, etc.

Bioprocesses are an example of data-intensive process manufacturing processes to which the analysis method of the present invention may have particular utility. A typical pharmaceutical bioprocess will now be described. A pharmaceutical bioprocess may begin with raw materials like glucose (a sugar), salts (such as sodium chloride and phosphates), and water, along with a starting living cell culture of a microorganism like E. coli.

The E. coli is inoculated and grown in a sterilized flask of these raw materials that has been equilibrated to, and then incubated at, a suitable temperature until it replicates into a large number of cells, approximately 1 X 10 per milliliter. At this point the cells have nearly depleted the raw materials in the flask and built up waste products. They must be transferred to a fresh sterile container with fresh raw materials so that they can continue growing. This is generally accomplished by transferring them to a seed fermentor.

A fermentor is a sealable, stainless steel tank that contains raw materials in a water-based medium that can be sterilized by heat and pressure much like a pressure cooker. Once the medium is sterilized and cooled to the right temperature, the E. coli is transferred into it in an aseptic manner, i.e., in a manner that excludes the possibility of contamination by other, unwanted microorganisms. After this is done, the fermentor is stirred, sterile air and/or oxygen is pumped in, the pH, temperature, pressure and dissolved oxygen concentrations are held constant at controlled setpoints and additional sterile raw materials are often pumped in to promote growth of the microorganism.

During this time in the process, a large number of measurements are made using measuring devices that measure the environmental conditions in the fermentor. These measurements are used to control the conditions in the fermentor and they are also stored in a data archive for later use.

Once the E. coli cells have grown in number, depleted the raw materials in the seed fermentor and built up near toxic levels of waste products, they are transferred progressively into larger sterile fermentors until reaching the final stage of fermentation called a production fermentor. The production fermentor is operated in much the same way as the seed fermentors, accumulating more data for later use. At the end of the production fermentor stage of the process, the microorganism has produced the maximum level of the desired product and the product is ready to be harvested and purified. The maximum level of product may be produced naturally or by means of an induction mechanism that introduces natural or non-natural biochemical pathways.

Typically the harvest procedure consists of concentrating the cells away from the liquid fermentation broth in which they are suspended. This is usually done by, e.g., a filtration or centrifugation process. Once the cells are concentrated, and if they contain the product of interest, they may be broken typically using a force called hydrodynamic sheer. The resulting product concentrate is then ready for purification. If the product of interest does not accumulate in the cells but in the medium instead, then the medium is taken on for further processing in a similar way to the contents of the broken cells. During this harvest operation, much data is again collected from the measuring devices that control the various processes and from the batch records. Examples of data collected include pressures, weights, volumes, flow rates, temperatures, pH's, operator name, start time, stop time, amount of base/acid added, amount of waste, etc.

The purification process that follows is aimed at achieving a higher concentration of the desired product at the same time as removing the contaminating materials. This is accomplished by subjecting the harvested mixture to separation techniques that selectively favor the desired product over the contaminating materials. These techniques may consist of any or all of the following: salt fractionation, selective precipitation, crystallization, size exclusion, affinity binding, hydrophobic separation, ion exchange, diafiltration, etc. During each of these processes, many measuring devices are used to make measurements and control the conditions in the process stream and much data is accumulated for later use. Examples of measurements include temperatures, flow rates, volumes, start times, stop times, ionic strength, pH, color, etc.

Once the desired product has reached its maximum practical level of concentration and purity that may be the result of the several similar or different operations of the purification process, it passes into the final stage of manufacturing called filling and finishing. During this stage of the process, it is mixed with neutral carrier molecules so that it may be prepared in the right dosage form for administration to the patients who need it. Often it is also sterilized if it is intended for injection into patients. The final product is placed into suitable containers and labeled with the ingredients and expiration date. Once again, much data is accumulated during this part of the process for later use. Examples of data that is accumulated includes, the names of operators, the vendors of materials used, room air quality measurements, calibration dates, equipment service dates, pyrogen levels, particulates, color, pH, ionic strength, potency levels, contaminant levels, etc.

The analysis method of the present invention employs Advanced Profile Analysis (APA) to search through time-based profiles of the continuous process parameters of a process manufacturing process to determine the time (or combination of times) that correspond to specific features in the profiles that provide the best statistical model for predicting an outcome parameter of interest. For example, in the pharmaceutical bioprocess described above, the analysis method of the present invention may be used to find the parameters and times most useful for predicting the outcome, and to determine as early in the fermentation process as possible if a product is likely to fail an in-process or final product specification later in the process. That way, time and resources need not be wasted producing a bad product.

The analysis method of the present invention also allows users to select specific features of continuous data profiles of a process manufacturing process that are then extracted and quantified, and used as discrete parameters singly or together for statistical analysis in combination with other process parameters. This gives the ability to find out, for example, what upstream parameters may be driving specific features in the oxygen uptake rate of the production fermentation. These results may be used to give improved control within the physical constraints of the equipment or the cost constraints of the process. Users could also determine what features in the post-induction CER (Carbon Dioxide Evolution Rate) of the production fermentation in combination with measurements made in the recovery process, are associated with a troublesome contaminant in the final product. The only evidence for the contaminant might be a shoulder on the main peak in one of the downstream chromatography steps which shoulder may be quantified using a number of methods. This information may give better process control, lower failure rates and higher quality and predictability. Scale-up problems that might result if such determinations were made on smaller experimental scale may be avoided if the analyses are done on data taken from full-scale operations.

In a preferred embodiment of the analysis method of the present invention, a user may select any number of continuous parameters from a single operation for any number of batches together with a discrete outcome parameter of interest, e.g., Product Impurity A. Next, the continuous data from the parameter sets are aligned. For a number of different reasons, the profile plots may not be aligned across batches, perhaps because product is transferred into a fermentor in different physiological states equating to different times in its growth phase. In this case, it may be appropriate, and the system will have the ability to align the profiles where the dissolved oxygen reaches, for example, 30%>. Other means of aligning the parameter sets may also be applied. Next, time-based profiles are chosen and values are assigned to temporary discrete parameters derived at the intersections of the profiles with the lines on the continuous data plots. The time-based profiles, and therefore the corresponding discrete parameter values, may be chosen manually or by using an automatically generated time interval approach, e.g., run a model that chooses time-based profiles at thirty-minute intervals. For each time- based profile, the user may choose to use a single value of the profiled parameter at that time, or use a neighborhood mean for each of the profile parameters. For example, if the data in any one time region shows a high degree of variability, the user may prefer to use the average of a certain number of nearby values, neighborhood values, in each profile of a particular parameter across all the batches to which the analysis method of the present invention is being applied. The data is then analyzed. The user has the option of saving the time-based profile data as new parameters for further analysis, or using it directly in a series of regression models to determine the time-based profile, or combination of time-based profiles, that give the best overall model that predicts the chosen outcome. The user will be able to choose from among the standard regression techniques available in commercially available software, such as Discoverant®, a software product made by Aegis Analytical Corporation, the assignee of the present invention.

When Discoverant is used to perform a multiple regression in the method of the present invention, the multiple regression feature of Discoverant allows a user to perform regression using more than one independent parameter. One example is to estimate the effects on dissolution rate, i.e. the response parameter, of different factors, i.e. the independent continuous parameters, such as time-based profiles of dryer air humidity, KW input to a wet granulating mixer, and/or the addition rate of granulating solution to a granulator. The user may perform multiple regression using the following techniques: all parameters, forward selection, backwards elimination, stepwise, etc. When the all parameters technique is employed, multiple regression is performed using all of the parameters selected, whether their coefficient estimates are significant or not. When the forward selection techniques are employed, predictor parameters are added to the regression model one at a time. The candidate parameter for inclusion at each step is the one that produces the biggest decrease in residual sum of squares (SSE). If the calculated significance at each step is smaller than the specified p-value, then the parameter is added to the model. Otherwise the process is terminated. When the backwards elimination technique is employed, regression is performed starting with all of the parameters in the model. The candidate parameter considered for removal at each step is the one that produces the smallest increase in SSE from the previous step. The procedure terminates when all the parameters are removed from the model or the calculated significance is greater than the specified p-value. The stepwise procedure technique is a combination of the forward selection and backward elimination techniques. After the first parameter is added to the model, each time a new one is added, the procedure looks back to see if any of the parameters that were added earlier are no longer significant in the model. Any parameter that is no longer significant is dropped from the model before looking forward for the next possible candidate for inclusion. The procedure terminates when no new parameters are added to the model based on the specified p- values. FIG. 1 illustrates in flow chart 100 how a preferred embodiment of the method of the present invention may be implemented in a software program. When using such a software program, various options will be displayed to a user on a visual display device (not shown in FIG. 1) such as a Windows type screen, and the time-based profiles for various batches will be displayed on a graph or plot on a screen on the visual display device. At step 102, a user creates an analysis group from raw data sources by using a method such as described in US Patent Application No. 09/816,547, entitled "System, Method and Computer Program Product for Mapping Data of Multi-Database Origins" the entire contents and disclosure of which is hereby incorporated by reference. At step 104 the user selects continuous parameters of interest and particular batches to be analyzed for those continuous parameters. At step 106, the user aligns the continuous data from each of the batches. The user selects a single continuous parameter such as the dissolved oxygen concentration in the fermentor, the carbon dioxide evolution rate of the E. coli cells, the RPM of the agitator, the airflow through the fermentation tank, the optical density of the fermentation broth, etc. The user chooses an alignment criterion, such as the point where dissolved oxygen reaches 80%>. Then the user activates the display button on a screen displayed on a visual display device to display the aligned batches on the visual display device. At step 108, the user selects a multiple regression method such as forward selection, backward elimination, etc. to analyze the aligned profiles. At step 110 the user selects a single discrete outcome parameter of interest such as fermentation yield. At step 112 the user selects multiple time-based profiles, either automatically at repeating intervals or interactively at particular times chosen by the user via on-screen interactions, upon which the previously selected regression method will be performed. At step 114, the user clicks on an "analyze button" on a screen on the visual display device, and the software program performs the analysis of the profiles.

When the "analyze button" is clicked, the software program first creates the data set from which to build the specified regression model. For example, if a user is analyzing 5 batches and 3 continuous parameters where the user wants to use neighborhood means using +/- 3 data points for profiles at 5 and 10 minutes, the process might look like profiles in Tables 1 and 2 below:

In Tables 1 and 2 above, Batch refers to batch number, Yield refers to percentage yield, CER refers to Carbon Dioxide Evolution Rate, BaseFlow refers to the amount of base fed into the fermentor in liters/minute and %DO refers to percentage of dissolved oxygen in the fermentor. Yield is the discrete dependent parameter and CER, BaseFlow and %>DO are the continuous independent parameters.

In Profile 1, the value of 28.2 for CER is determined by taking the average of the actual value of CER at 5 minutes, and the values at 2, 3, 4, 6, 7, and 8 minutes, assuming that the parameter has values recorded once per minute. Once the data sets have been constructed for the selected time profiles, the user-specific regression technique may be used to construct the "best fit" model. Using stepwise regression produces results such as the following:

Profile 1 : Yield = 56.2 + 0.52*CER - 2.31 *%DO Profile 2: Yield = 63.4 + 3.45*BaseFlow If 2nd order terms were included, Profile 2 might look like this:

Yield = 63.4 +3.5*CER + 2.3*BaseFlow - 0.89*CER*BaseFlow The above results show that the yield of this particular fermentation may be predicted based on the values of the continuous independent parameters at the given times.

The software program will create a statistical model, which predicts the outcome based on the values of the independent continuous parameters, for each time-based profile or a single model for all the profiles together. If there are multiple data models, the software program may provide one-line summaries for each, sorted by order of significance followed by more detailed output. If the selected dependent parameter, i.e., the outcome parameter of interest, is used to determine the success or failure of a batch, the earliest significant time profile could be used to predict the likelihood of success or failure of a future batch. In any event, establishing correlations with desired outcomes can provide insight into where to focus root cause investigations of failures, thus saving time and money. A particular implementation of a software program of the type illustrated in FIG. 1 for use on a particular process manufacturing process is described in the example below.

Although the analysis method of the present invention has been described above with respect to analyzing an example of a bioprocess, the method of the present invention may be used to analyze other types of process manufacturing processes such as for example the production tablets or capsules containing active drug substances, the production of chemicals or specialty chemicals, the production of refined minerals, the production of gasoline from crude oil, etc.

Although the analysis method of the present invention has been described above as presenting results to a user on a visual display device, results may be presented in written or printed form, audio form, or any other means of communicating the results to a user.

The present invention will now be described by way of example. EXAMPLE

An Advanced Profile Analysis (APA) was carried out on a bioprocess. First, the data was aligned. In a bioprocess, production batches or production runs do not always start in a fermentor in the same physiological state. Therefore they must be aligned so that comparisons from batch to batch at specific time intervals make sense. To accomplish this, any number of parameters could be used to determine how much to time-shift the data from each batch, e.g., Carbon Dioxide Evolution Rate (CER). In the screen shot of FIG. 2, the weight of Tank A is the plotted parameter used for alignment. FIG. 2 shows the plots before alignment. The alignment criteria may be where the maximum or minimum occurs, or as in the present example, where the amount of base used intersects the value 2. Only one parameter is displayed here in the present example to better show the changes in the plots after the alignment is applied. In the present example, TankWt A is highlighted to indicate the chosen independent continuous parameter. The highlighted items to the right of TankWt A, e.g., 104a, 108a, etc., indicate the batches that were chosen for the graphical display and subsequent analysis. Clicking the display button plots the values of the continuous parameter, TankWt A in this case, on the y-axis vs. the time (in minutes) from the start of the step on the x-axis.

The screen shot of FIG. 3 shows how the plots are shifted using the method of the present invention so that they all intersect the value 2 on the y-axis at the same point in time. All the batches are shifted to the left to line up with the batch that intersects the value 2 the earliest. The amount by which each batch is shifted may in itself be an important parameter to correlate with various outcomes of interest. The x-axis is rescaled to accommodate the batch that has to shift the most to the left to achieve the alignment criteria. For example, if the batch that takes the longest time to reach the value 2 takes 350 minutes longer than the fastest one to reach 2, the minimum value on the x-axis has to be no larger than -350 to show all the data. Because all shifts are to the left, the maximum value displayed for the x-axis may get smaller due to the shift. The alignment operation by itself provides a major timesaving over conventional analysis methods. To do a similar operation in a commercially available spreadsheet program, one would have to do the following:

1. Find the time point where each of the many batches first intersects the specified value.

2. Determine the batch that intersects the specified value at the latest time, i.e., the one that needs to shift the most.

3. Insert blank rows in the spreadsheet (the number of rows being determined in step 2) to accommodate the shifting of all the batches.

4. Shift all the rows associated with each batch according to the number that corresponds to intersection point in step 1.

5. Insert correct negative time stamps for all the values of all the batches that shifted.

Once the alignment is performed, the user may specify multiple time points (profiles), where the value(s) of the parameter(s) at the intersection point(s) can be used in a regression analysis. In the screenshot shown in FIG. 4 there is only one parameter shown so the regression will be a simple linear regression rather than a multiple regression that will be described below, unless all the profiles are used at once, in which case it would be a three parameter multiple regression model. In the present example, each vertical line in the graph represents a time-based profile at a particular time. Consequently, a new discrete parameter will be created for TankWt A at each of the selected time profiles where the profile line intersects that parameter. For each batch, the created values correspond to values of TankWt A at each of the time points that correspond to intersection of the time-based profile lines with the continuous parameter.

Rather than use a single intersection point for the values that are used in the regression model, an option is provided to use neighborhood means instead. In this option, an average may be taken using the single intersection value and including a specified number of points on both sides of it to help average out noisy (i.e. highly variable) parameter values, thus resulting in more robust regression models. Also, if there is not a value at the intersection point for some of the batches, a linear interpolation may be used so that there will be a complete data set. In any case, all the values that are used in the program to build the regression model may be saved as new parameters so that they may be used elsewhere in the system to create additional models, e.g., principal component regression.

A more typical situation is when there are multiple continuous parameters chosen to determine the profile that results in the best model in terms of predicting the outcome of interest, also known as the 'Dependent Parameter.' The screenshot of FIG. 5 shows five continuous parameters and a series of selected time-based profiles. To avoid having potentially hundreds of lines on a single graph, there is a multi-page display where each page shows the chosen continuous parameters for each batch. The y-axis provides a standardized or normalized scale that ranges from 0 to 100, i.e. range- scaled. To determine the actual value of a particular parameter, the user can select the Point ID mode and click on a point. The system will then put a label on the plot that indicates the time offset and actual value. In the present example, MassRate A, Rate PI, TankWt A, TankWt B, and TankWt C are highlighted to indicate the chosen independent continuous parameters. The highlighted items to the right, e.g., 104a, 108a, etc., indicate the batches that are chosen for the graphical display and subsequent analysis. Clicking the display button plots the values of the continuous parameters (in terms of % of range) on the y-axis vs. the time (in minutes) from the start of the step on the x-axis.

In the present example, five multiple regression models were developed, one for each time-based profile, and upon clicking the 'Analyze' button, a detailed report is provided which starts with a one-line summary for each model and lists them in descending order of significance as seen in the example in FIG. 6. The first column of the Regression Summary table indicates the time of the time-based profile, the second column indicates the calculated F-Statistic which is used in connection with the degrees of freedom in the model (column 3) and error degrees of freedom (column 4) to determine the calculated p-value (column 5). A calculated p-value of less than 0.05 is generally considered to indicate that the model is statistically significant, i.e., the parameters in the model can be reliably used to help predict the value of the dependent (outcome) parameter of interest; Response A in the present example. Column 6 gives the value of R-Squared, which is the percentage of variability in the dependent parameter that can be explained by the model. Column 7 is an adjusted R-Squared, which is similar to R-Squared but is adjusted downward to account for the number of parameters and degrees of freedom and to help prevent misinterpretation from over fitting the model.

A one-page detailed output is provided for each model so the user can see which of the continuous parameters are the most significant. In the screenshot of FIG. 7, Column 1 of the first table provides the parameter name, Column 2 provides the coefficient estimate for the parameter, Column 3 provides the standard error associated with the coefficient estimate, Column 4 provides the calculated t-Statistic (Column2/Column3), and Column 5 provides the calculated p-value that indicates the level of significance for each of the parameters. The second table provides the sum of squares and degrees of freedom for each source, along with mean square errors for the model and residual, and calculated F-Ratio and p-value that indicates the level of significance of the overall model. The R-Squared and adjusted R-Squared values are the same in the one-line summaries in FIG. 6.

Although the present invention has been fully described in conjunction with the preferred embodiment thereof with reference to the accompanying drawings, it is to be understood that various changes and modifications may be apparent to those skilled in the art. Such changes and modifications are to be understood as included within the scope of the present invention as defined by the appended claims, unless they depart therefrom.

Claims

WHAT IS CLAIMED IS:

1. A method for analysis comprising the steps of: generating plots for each of a plurality of batches of a process manufacturing process based on data for at least one continuous parameter; aligning said plots based on at least one aligning continuous parameter value of each of said plurality of batches; selecting a plurality of profiles for said aligned plots; analyzing said profiles using a regression method to provide analysis results indicating the level of success of said process manufacturing process; and displaying said analysis results to a user.

2. The method of claim 1 , wherein at least one of said batches is a production run.

3. The method of claim 1, wherein said data is measured at a plurality of time points for each of said batches.

4. The method of claim 1, wherein said data is obtained from one measuring device.

5. The method of claim 1, wherein said data is obtained from a plurality of measuring devices.

6. The method of claim 1, wherein said data is from a single step of said process manufacturing process.

7. The method of claim 1, wherein said data is from a plurality of steps of said process manufacturing process.

8. The method of claim 1, wherein said data is for a plurality of continuous parameters.

9. The method of claim 1, wherein said data is for at least one environmental parameter.

10. The method of claim 1, wherein said data is for at least one equipment parameter.

11. The method of claim 1 , wherein said data is for at least one material parameter.

12. The method of claim 1, wherein said data is for at least two different species of parameters.

13. The method of claim 1, wherein said data is for at least one environmental parameter and at least one equipment parameter.

14. The method of claim 1, wherein said data is for at least one environmental parameter and at least one material parameter.

15. The method of claim 1, wherein said data is for at least one equipment parameter and at least one material parameter.

16. The method of claim 1, wherein said data is for at least one environmental parameter, at least one equipment parameter and at least one material parameter.

17. The method of claim 1, wherein said regression method comprises a linear regression method.

18. The method of claim 1, wherein said regression method comprises a multiple regression method.

19. The method of claim 1, wherein said regression method comprises a non-linear regression method.

20. The method of claim 1, wherein said process manufacturing process is a bioprocess.

21. The method of claim 20, wherein said bioprocess comprises fermenting a living cell culture.

22. The method of claim 21, wherein said bioprocess further comprises purifying an output material produced by the fermentation of said living cell culture.

23. The method of claim 21 , wherein said living cell culture is a bacterial culture.

24. The method of claim 23, wherein said bioprocess further comprises purifying an output material produced by the fermentation of said bacterial culture.

25. The method of claim 1, wherein said aligning parameter comprises dissolved oxygen for a fermentor.

26. The method of claim 1, wherein said aligning parameter comprises carbon dioxide evolution rate.

27. The method of claim 1, wherein said aligning parameter comprises the RPM of an agitator.

28. The method of claim 1, wherein said aligning parameter comprises the airflow through a fermentation tank.

29. The method of claim 1, wherein said aligning parameter comprises the optical density of a fermentation broth.

30. The method of claim 1, wherein said analysis results are displayed to said user on a visual display device.

31. The method of claim 1, wherein said method is implemented in a computer system.

32. The method of claim 1, wherein said plurality of profiles are selected automatically.

33. The method of claim 1 , wherein said plurality of profiles are selected manually.

34. A machine readable medium storing instructions that, if executed by a computer system, causes the computer system to perform a set of operations comprising: generating plots for each of a plurality of batches of a process manufacturing process based on data for at least one continuous parameter; aligning said plots based on at least one aligning continuous parameter of each of said plurality of batches; selecting a plurality of profiles for said aligned plots; analyzing said profiles using a regression method to provide analysis results indicating the level of success of said process manufacturing process; and displaying said analysis results to a user.

35. The machine readable medium of claim 34, wherein at least one of said batches is a production run.

36. The machine readable medium of claim 34, wherein said data is measured at a plurality of time points for each of said batches.

37. The machine readable medium of claim 34, wherein said data is obtained from one measuring device.

38. The machine readable medium of claim 34, wherein said data is obtained from a plurality of measuring devices.

39. The machine readable medium of claim 34, wherein said data is from a single step of said process manufacturing process.

40. The machine readable medium of claim 34, wherein said data is from a plurality of steps of said process manufacturing process.

41. The machine readable medium of claim 34, wherein said data is for a plurality of continuous parameters.

42. The machine readable medium of claim 34, wherein said data is for at least one environmental parameter.

43. The machine readable medium of claim 34, wherein said data is for at least one equipment parameter.

44. The machine readable medium of claim 34, wherein said data is for at least one material parameter.

45. The machine readable medium of claim 34, wherein said data is for at least two different species of parameters.

46. The machine readable medium of claim 34, wherein said data is for at least one environmental parameter and at least one equipment parameter.

47. The machine readable medium of claim 34, wherein said data is for at least one environmental parameter and at least one material parameter.

48. The machine readable medium of claim 34, wherein said data is for at least one equipment parameter and at least one material parameter.

49. The machine readable medium of claim 34, wherein said data is for at least one environmental parameter, at least one equipment parameter and at least one material parameter.

50. The machine readable medium of claim 34, wherein said regression method comprises a linear regression method.

51. The machine readable medium of claim 34, wherein said regression method comprises a multiple regression method.

52. The machine readable medium of claim 34, wherein said regression method comprises a non-linear regression method.

53. The machine readable medium of claim 34, wherein said process manufacturing process is a bioprocess.

54. The machine readable medium of claim 53, wherein said bioprocess comprises fermenting a living cell culture.

55. The machine readable medium of claim 54, wherein said bioprocess further comprises purifying an output material produced by the fermentation of said living cell culture.

56. The machine readable medium of claim 54, wherein said living cell culture is a bacterial culture.

57. The machine readable medium of claim 56, wherein said bioprocess further comprises purifying an output material produced by the fermentation of said bacterial culture.

58. The machine readable medium of claim 34, wherein said aligning parameter comprises dissolved oxygen for a fermentor.

59. The machine readable medium of claim 34, wherein said aligning parameter comprises carbon dioxide evolution rate for E. coli cells.

60. The machine readable medium of claim 34, wherein said aligning parameter comprises the RPM of an agitator.

61. The machine readable medium of claim 34, wherein said aligning parameter comprises the airflow through a fermentation tank.

62. The machine readable medium of claim 34, wherein said aligning parameter comprises the optical density of a fermentation broth.

63. The machine readable medium of claim 34, wherein said analysis results are displayed to said user on a visual display device.

64. The machine readable medium of claim 34, wherein said plurality of profiles are selected automatically.

65. The machine readable medium of claim 34, wherein said plurality of profiles are selected manually.

66. A method for analysis comprising the steps of: generating plots for each of a plurality of batches of a process manufacturing process based on data for at least one continuous parameter; aligning said plots based on at least one aligning continuous parameter of each of said plurality of batches; selecting a plurality of profiles for said aligned plots; analyzing said profiles using a regression method to provide analysis results indicating the level of success of said process manufacturing process; and storing said analysis results in a machine readable medium.

67. The method of claim 66, wherein at least one of said batches is a production run.

68. The method of claim 66, wherein said data is measured at a plurality of time points for each of said batches.

69. The method of claim 66, wherein said data is obtained from one measuring device.

70. The method of claim 66, wherein said data is obtained from a plurality of measuring devices.

71. The method of claim 66, wherein said data is from a single step of said process manufacturing process.

72. The method of claim 66, wherein said data is from a plurality of steps of said process manufacturing process.

73. The method of claim 66, wherein said data is for a plurality of continuous parameters.

74. The method of claim 66, wherein said data is for at least one environmental parameter.

75. The method of claim 66, wherein said data is for at least one equipment parameter.

76. The method of claim 66, wherein said data is for at least one material parameter.

77. The method of claim 66, wherein said data is for at least two different species of parameters.

78. The method of claim 66, wherein said data is for at least one environmental parameter and at least one equipment parameter.

79. The method of claim 66, wherein said data is for at least one environmental parameter and at least one material parameter.

80. The method of claim 66, wherein said data is for at least one equipment parameter and at least one material parameter.

81. The method of claim 66, wherein said data is for at least one environmental parameter, at least one equipment parameter and at least one material parameter.

82. The method of claim 66, wherein said regression method comprises a linear regression method.

83. The method of claim 66, wherein said regression method comprises a multiple regression method.

84. The method of claim 66, wherein said regression method comprises a non-linear regression method.

85. The method of claim 66, wherein said process manufacturing process is a bioprocess.

86. The method of claim 85, wherein said bioprocess comprises fermenting a living cell culture.

87. The method of claim 86, wherein said bioprocess further comprises purifying an output material produced by the fermentation of said living cell culture.

88. The method of claim 86, wherein said living cell culture is a bacterial culture.

89. The method of claim 88, wherein said bioprocess further comprises purifying an output material produced by the fermentation of said bacterial culture.

90. The method of claim 66, wherein said aligning parameter comprises dissolved oxygen for a fermentor.

91. The method of claim 66, wherein said aligning parameter comprises carbon dioxide evolution rate for E. coli cells.

92. The method of claim 66, wherein said aligning parameter comprises the RPM of an agitator.

93. The method of claim 66, wherein said aligning parameter comprises the airflow through a fermentation tank.

94. The method of claim 66, wherein said aligning parameter comprises the optical density of a fermentation broth.

95. The method of claim 66, further comprising displaying said analysis results to said user.

96. The method of claim 95, wherein said analysis results are displayed to said user on a visual display device.

97. The method of claim 66, wherein said method is implemented in a computer system.

98. The method of claim 66, wherein said plurality of profiles are selected automatically.

99. The method of claim 66, wherein said plurality of profiles are selected manually.

100. A machine readable medium storing instructions that, if executed by a computer system, causes the computer system to perform a set of operations comprising: generating plots for each of a plurality of batches of a process manufacturing process based on data for at least one continuous parameter; aligning said plots based on at least one aligning continuous parameter of each of said plurality of batches; selecting a plurality of profiles for said aligned plots; analyzing said profiles using a regression method to provide analysis results indicating the level of success of said process manufacturing process; and storing said analysis results in a second machine readable medium.

101. The machine readable medium of claim 100, wherein at least one of said batches is a production run.

102. The machine readable medium of claim 100, wherein said data is measured at a plurality of time points for each of said batches.

103. The machine readable medium of claim 100, wherein said data is obtained from a measuring device.

104. The machine readable medium of claim 100, wherein said data is obtained from a plurality of measuring devices.

105. The machine readable medium of claim 100, wherein said data is from a single step of said process manufacturing process.

106. The machine readable medium of claim 100, wherein said data is from a plurality of steps of said process manufacturing process.

107. Thee machine readable medium of claim 100, wherein said data is for a plurality of continuous parameters.

108. The machine readable medium of claim 100, wherein said data is for at least one environmental parameter.

109. The machine readable medium of claim 100, wherein said data is for at least one equipment parameter.

110. The machine readable medium of claim 100, wherein said data is for at least one material parameter.

111. The machine readable medium of claim 100, wherein said data is for at least two different species of parameters.

112. The machine readable medium of claim 100, wherein said data is for at least one environmental parameter and at least one equipment parameter.

113. The machine readable medium of claim 100, wherein said data is for at least one environmental parameter and at least one material parameter.

114. The machine readable medium of claim 100, wherein said data is for at least one equipment parameter and at least one material parameter.

115. The machine readable medium of claim 100, wherein said data is for at least one environmental parameter, at least one equipment parameter and at least one material parameter.

116. The machine readable medium of claim 100, wherein said regression method comprises a linear regression method.

117. The machine readable medium of claim 100, wherein said regression method comprises a multiple regression method.

118. The machine readable medium of claim 100, wherein said regression method comprises a non-linear regression method.

119. The machine readable medium of claim 100, wherein said process manufacturing process is a bioprocess.

120. The machine readable medium of claim 119, wherein said bioprocess comprises fermenting a living cell culture.

121. The machine readable medium of claim 120, wherein said bioprocess further comprises purifying an output material produced by the fermentation of said living cell culture.

122. The machine readable medium of claim 120, wherein said living cell culture is a bacterial culture.

123. The machine readable medium of claim 122, wherein said bioprocess further comprises purifying an output material produced by the fermentation of said bacterial culture.

124. The machine readable medium of claim 100, wherein said aligning parameter comprises dissolved oxygen for a fermentor.

125. The machine readable medium of claim 100, wherein said aligning parameter comprises carbon dioxide evolution rate for E. coli cells.

126. The machine readable medium of claim 100, wherein said aligning parameter comprises the RPM of an agitator.

127. The machine readable medium of claim 100, wherein said aligning parameter comprises the airflow through a fermentation tank.

128. The machine readable medium of claim 100, wherein said aligning parameter comprises the optical density of a fermentation broth.

129. The machine readable medium of claim 128, further comprising displaying said analysis results to said user.

130. The machine readable medium of claim 129, wherein said analysis results are displayed to said user on a visual display device.

131. The machine readable medium of claim 100, wherein said plurality of profiles are selected automatically.

132. The machine readable medium of claim 100, wherein said plurality of profiles are selected manually.