WO2006125863A1 - Analysis techniques for liquid chromatography/mass spectrometry - Google Patents

Analysis techniques for liquid chromatography/mass spectrometry Download PDF

Info

Publication number
WO2006125863A1
WO2006125863A1 PCT/FI2006/050208 FI2006050208W WO2006125863A1 WO 2006125863 A1 WO2006125863 A1 WO 2006125863A1 FI 2006050208 W FI2006050208 W FI 2006050208W WO 2006125863 A1 WO2006125863 A1 WO 2006125863A1
Authority
WO
WIPO (PCT)
Prior art keywords
peak
peaks
spectrum
retention time
mass
Prior art date
Application number
PCT/FI2006/050208
Other languages
French (fr)
Inventor
Matej Oresic
Mikko Katajamaa
Original Assignee
Valtion Teknillinen Tutkimuskeskus
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Valtion Teknillinen Tutkimuskeskus filed Critical Valtion Teknillinen Tutkimuskeskus
Publication of WO2006125863A1 publication Critical patent/WO2006125863A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/86Signal analysis
    • G01N30/8675Evaluation, i.e. decoding of the signal into analytical information
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/62Detectors specially adapted therefor
    • G01N30/72Mass spectrometers
    • G01N30/7233Mass spectrometers interfaced to liquid or supercritical fluid chromatograph
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/86Signal analysis
    • G01N30/8624Detection of slopes or peaks; baseline correction
    • G01N30/8631Peaks

Definitions

  • the invention relates to processing techniques, including methods, equipment and software products, for analysis of mass spectrometry data as used in connection with liquid or gas chromatography.
  • LC and MS are abbreviations for liquid chromatography and mass spectrometry, respectively.
  • the invention will be described in connection with liquid chromatography, it is also applicable to gas chromatography.
  • LG/MS Liquid chromatography coupled to mass spectrometry
  • Typical LC/MS experiments include several analytical stages, starting with sample pre-treatment which commonly includes sample cleanup and extraction methods.
  • the sample can then be introduced to an LC column where the molecules separate based on their size (size exclusion chromatography), affinity to stationary phase (affinity chromatography), polarity (ion exchange chromatography), and/or hydrophobicity (reversed phase chromatography).
  • Retention time measures the time between the sample injection and the appearance of the compound peak maximum after chromatographic separation.
  • mass spectrometry MS can then be used to separate the co-elutants according to mass-to-charge ratio (m/z).
  • ionization methods exist, among the most commonly used are the soft ionization methods such as electrospray ionization (ESI) and atmospheric pressure - chemical ionization (APCI).
  • ESI electrospray ionization
  • APCI atmospheric pressure - chemical ionization
  • the principles of mass detection can also vary, with the most common instruments being triple quadrupole, (quadrupole) ion trap, (quadrupole) time of flight mass spectrometers. Because of the large number of possible applica tions and approaches, it is a challenge to develop a generic solution for processing and analysis of LC/MS data.
  • LC/MS application is differential profiling, where the extraction, LC methods, and MS instrument setup are set to provide a broad coverage of compounds, with the main aim to enable relative quantitative comparisons for individual compounds across multiple samples.
  • the applications of such approach can be found in domains of systems biology, functional genomics, and biomarker discovery. While such approaches cannot match targeted analytical measurements in ability to accurately quantify individual analytes, it is the role of data processing methods to enable comparative studies of analytes, even if they may be unknown.
  • the data processing for differential profiling comprises several stages- Smoothing (spectral filtering) aims at reducing the complexity of spectra and removing the noise. Peak detection finds the peaks corresponding to the compounds or fragments thereof.
  • An object of the present invention is to provide a method, an apparatus and a computer program product for implementing the method so as to solve the above-mentioned problem.
  • the objects of the invention are achieved by a method, program product and computer system which are characterized by what is stated in the independent claims.
  • the preferred embodiments of the invention are disclosed in the dependent claims.
  • the invention is based on the idea of complementing conventional LC/MS spectrometry operations with an additional step of visualizing peaks of each spectrum, wherein the visualizing step comprises mapping each peak to be visualized to a coordinate system in which a first coordinate indicates mass/charge ratio and a second coordinate indicates retention time: and assigning a specific visual attribute to each peak to be visualized.
  • a preferred embodiment of the invention also comprises an alignment step and a gap filling step, or program instructions and data structures for executing these steps.
  • Alignment aims at matching the corresponding peaks across multiple sample runs.
  • Gap filling is a second peak detection operation, in which previously undetected peaks in a spectrum are searched by using knowledge of "neighbour" spectra, ie, the remaining spectra in the set of aligned spectra.
  • normalization may be used to reduce systematic errors, by adjusting the intensities within each sample run.
  • An advantage of the invention is improved processing of spectral data.
  • Figure 1 is a fiow chart illustrating main phases in a method according to the invention
  • Figure 2 is a block diagram illustrating principal software blocks in an exemplary object-based implementation of the invention
  • FIGS 3A and 3B illustrate peak detection methods
  • Figure 4A shows total ion chromatograms from one elicited and one control sample
  • Figure 4B shows a log-ratio view for the top 20% most intense peaks in the control samples shown in Figure 4A;
  • Figure 5A shows differences between elicited and control groups
  • Figure 5B shows distribution within the elicited and control groups
  • Figure 6 shows a view of an exemplary user interface.
  • Figure 1 is a flow chart illustrating main phases in a method according to the invention.
  • the invention relates to processing of spectral data from a plurality of sample runs. Each sample run produces a spectrum (spectral data) from a sample.
  • the samples used in me different sample runs can be subsamples from a common larger sample, or they can derive from different samples altogether.
  • Reference numeral 1-2 denotes sample preparation steps which are known to those skilled in the art and which have been briefly discussed in the background section of this document.
  • Reference numeral 1-4 denotes a step which comprises spectrometry operations, including recording of measured spectral data.
  • Reference numeral 1-6 denotes an optional step in which the spectral data is converted from a vendor-specific data format to some open data format, such as netCDF. A benefit of this step, or the corresponding routine and data structures in the software product, is the ability to support a wide variety of spectrometry instruments.
  • the spectral data is smoothed to suppress noise and other spurious data.
  • this step may be performed by the spectrometer itself
  • the spectral data is internally represented in two dimensions, wherein one dimension corresponds to mass-charge ratio m/z, while the other dimension corresponds to retention time rt.
  • the term 'internal representation' means that a visualization of the spectral data is not necessary, at least not at this stage.
  • Reference numeral 1-12 denotes a peak detection step in which peaks in the spectral data are detected.
  • Steps 1-2 through 1-12 are known to those skilled in the art and a detailed description is omitted for brevity.
  • the several sample runs are typically processed serially, each sample run at a time.
  • the several sample runs are processed in parallel, interdependently.
  • Steps 1-14 to 1-18 relate to a preferred embodiments and are not essential for the present invention.
  • step 1-14 data from the several sample runs are aligned such that there is a maximal correspondence between the peaks of the spectra.
  • the verb 'align' may imply visualization, but visualization is not strictly necessary, and any equivalent data processing technique may be used.
  • the alignment operation searches for corresponding peaks across different mass spectrometry runs. Peaks from the same compound usually match closely in m/z values, but retention time between the runs may vary. The retention time largely depends on the analytical method used.
  • a method according to the invention comprises a second peak detection step 1-16, the purpose of which is to fill these gaps.
  • the second peak detection step employs the m/z m and rt m values for estimating locations in which the missing peaks can be expected.
  • a search is then conducted to find the highest local maximum over a range around the expected location in the raw spectral data. The search is performed over a search win- dow which is preferably user-settable.
  • Step 1-18 is a normalization step which will be further described under a subheading "Normalization' .
  • Steps 1 -22 and 1-24 collectively constitute the visualization of peaks.
  • each peak to be visualized is mapped to a coordinate system in which a first coordinate indicates mass/charge ratio and a second coordinate indicates retention time.
  • a specific visual attribute is assigned to each peak to be visualized. The visualization steps will be described under a subheading "Visualization".
  • FIG. 2 is a block diagram illustrating an exemplary object-based implementation of an analysis system according to the invention.
  • the analysis system is generally denoted by reference numeral 200.
  • the block diagram in Figure 2 follows the conventions of Unified Modelling Language (UML).
  • UML Unified Modelling Language
  • This class model includes a set of core classes used for representing raw LC/MS data and interfaces for different types of data processing methods and visualization blocks.
  • the core objects for representing raw LC/MS data don't store the actual measurements inside them, but retrieve the data from disk when necessary. This makes it possible to visualize and process several large raw data files at the same lime.
  • New data process- ing methods can be added to the toolbox by implementing a suitable interface.
  • Input data formats and conversion Input to the analysis system 200 should be unprocessed measurement data from the mass spectrometer. Such raw data is denoted by reference numeral 205. It is also possible to apply some pre-processing, such as centrolding, before loading the data to the analysis system 200. Such pre-processing may, for example, reduce the amount of storage space needed for the data and/or speed up data processing. This is particularly useful in connection with high-resolution mass spec- trometers such as QTof and FTMS. However, the success of data processing with the analysis system depends on the quality of the input data and the preprocessing methods being used. For supporting input data files in NetCDF format, NetCDF Java Library by Unidata community can be used in the analysis system 200.
  • the system shown in Figure 2 can be made compatible with NetCDF proteomics and metabolomics data created from a wide variety of instruments, including Ouattro Micro (Waters), QSTAR Pulsar (Applied Biosystems), LTQ-FTMS (Thermo Finnigan), and LCQ (Thermo Finnigan).
  • the analysis system can be expanded to include support for upcoming new mass spectrometry data formats such as mzData and mzXML
  • the analysis system 200 comprises a filter section 210, which can be implemented in any of a wide variety of techniques, including a moving average fiiter and Savitzky- Golay filter.
  • a peak detector 215 finds the peaks in the spectral data.
  • the embodiment shown in Figure 2 implements two peak picking methods, namely a local maxima detection and a recursive threshold detection. Both of these detection methods operate in two steps: first, 1-dimensional m/z peaks are searched within each retention time scan separately, and then 1-dimensional peaks in successive spectra are joined together to form 2-dimensional peaks. The joining occurs only between those m/z peaks which are located in successive spectra, have similar m/z values according to pre-set threshold, and form together a well-shaped peak in the chromatographic direction.
  • Figures 3A and 3B show a simple example of these two peak detection steps.
  • the two peak picking methods differ in the implementation of the first step: the local maximum method picks each local maximum in a spectrum as an m/z peak, whereas the recursive threshold method considers only those maxima that have a suitable width, which differentiates noise peaks from real peaks.
  • the choice of methods for smoothing and peak detection depends on the nature of input data, if the data is already pre-processed and/or in centroid form, smoothing is not necessary and peak detection method based on searching for local maximums is typically the best choice. With unprocessed data, the recursive threshold peak detection usually gives better results.
  • the detected peaks are aligned by an aligner block 220.
  • the analysis system implements an alignment technique that matches each individual peak list against a master peak list. For every peak in an individual peak list, the best matching row in master peak list is defined as the one having smallest distance measure.
  • k is an adjustable parameter for controlling the balance between accuracy of m/z ratio and retention time values. Generally, k can be set to a larger number with increased resolution of the mass detector.
  • the master peak list ts empty and all peaks of the first peak list are appended to new rows of the master peak list.
  • between a peak and the best-matching row should preferably be within a user-definable threshold level, or the peak needs to be appended to a new row at the end of the master peak list. If a single row of the master peak list is the best match for multiple peaks of a peak list, then the only the peak with smallest distance measure will be added to the best matching row of the master peak list, while the others will be assigned to their second best matches.
  • the analysis system 200 also comprises a normalization block 225.
  • the purpose of the normalization is to reduce the systematic error in data.
  • the embodiment shown in Figure 2 implements two different normalization approaches: a rather straightforward set of linear normalization methods as well as a more ambitious approach that uses multiple internal standard compounds injected to the spectrometry samples.
  • Linear normalization methods divide all peak intensities of a single sample by some value calculated using data from that sample.
  • the linear normalization method in the analysis system may offer four different ways to calculate the normalization factor: average peak intensity, average squared peak intensity, maximum intensity and total raw signal. All of these methods work globally, which means that they normalize the entire sample using a single normalization factor.
  • the analysis system 200 also comprises a more ambitious normalization method which uses information from multiple standard compounds.
  • This method assumes that some standard compounds are injected to each of the spectrometry samples in known concentrations prior to LC/MS analysis.
  • the standard compound peaks can be used to calculate a set of normalization factors, one for each standard compound.
  • There are several ways to use this information in normalization One possibility is to determine which standard compound peak is closest to a peak, and normalize this peak using the corresponding normalization factor.
  • the distance function is same as in equation [1].
  • a variation of this method is a method based on normalization using weighted contribution of each standard compound. In this method, the same distance metric as in equation [1] can be used to calculate the distance from a peak to each standard compound. Contribution of each standard to the final normalization factor can be weighted by the inverse of distance between the peak and the standard as shown by equation [2]:
  • m is the number of injected standard compounds
  • nf i is the normalization factor calculated using the standard compound with index i
  • d(p, IS i ) is the distance between the peak to be normalized and the peak of the standard compound with index i.
  • the spectral data is ready to be exported from the analysis system as a peak intensity matrix.
  • This matrix can be then further processed with proprietary or off-the-shelf mathematics packages, such as Matlab® (MathWorks, Inc.) or R Statistical Language which already have a large collection of data analysis tools available for statistical analyses of multivariate data.
  • the analysis system 200 implements one or more visuaiization techniques for quickly previewing the processed results.
  • These visualization techniques implement the idea of plotting the peak intensity matrix as a two-dimensional plot where one axis (eg the x-axis) is the retention time and the other (eg the y- axis) is the m/z ratio. Peaks are plotted at the intersection of the coordinates for retention time and m/z ratio using appropriate visual attributes, such as colour, shade, shape, size, line type, or the like.
  • Figure 4B shows a log ratio plot which is particularly useful for displaying differences between two groups of samples. Differences are measured using a log ratio value which is calculated between average peak intensities of two selected groups;
  • equation [3] are the average intensity of peak p in the first and second group of raw data files, respectively.
  • visual coding such as colour coding
  • a first shade or visual attribute eg red or dark hue
  • a second shade or visual attribute eg green or light hue
  • a mathematically precise log ratio is easy to define and implement, but other continuous or stepwise functions can be used. In order to present difference values over a wide range, the function should have a derivate which decreases when its argument increases.
  • Another useful visualization method is a coefficient of variation plot, which displays a variation of peak intensities within one group of samples.
  • the coefficient of variation plot is drawn similarly as the log ratio plot, but colour or other visual coding is used for displaying the coefficient of variation between peak intensities within a selected group of samples:
  • equation [4] is the average peak jntensity and is the stan dard deviation of peak intensities in the selected group of samples.
  • FIG. 6A shows a view of an exemplary user interface 60.
  • Reference numerals 80A to 60H denote various sections of the user interface. Section 60A is a list data files.
  • Section 60B shows a total ion chromatogram for a selected data file.
  • Section 80C shows a spectrum based on a selected data file.
  • Section 60D shows a list of peaks from the selected data file.
  • Section 60E shows the spectral peaks plotted on a two-dimensional coordinate system.
  • Section 60F shows a total ion chromatogram for another selected data file.
  • Section 60G is a list of peaks after alignment
  • Section 60H is a list of alignment results after the second peak processing for selected files.

Abstract

A method for analyzing liquid chromatography/mass spectrometry [=”CL/MS”] data comprises: preparing (1-2) a plurality of sample runs; processing (1-4) each of the prepared sample runs in an LC/MS spectrometer to obtain a spectrum in respect of each processed sample run; internally representing (1-10) each spectrum as a layout of mass/charge versus retention time; performing a first peak detection (1-12) to detect peaks of each spectrum; visualizing peaks of each spectrum, wherein the visualizing step comprises: mapping (1-22) each peak to be visualized to a coordinate system in which a first coordinate indicates mass/charge ratio and a second coordinate indicates retention time; and assigning (1-24) a specific visual attribute to each peak to be visualized.

Description

Analysis techniques for liquid chromatography/mass spectrometry
BACKGROUND OF THE INVENTION
The invention relates to processing techniques, including methods, equipment and software products, for analysis of mass spectrometry data as used in connection with liquid or gas chromatography. Later in this document, LC and MS are abbreviations for liquid chromatography and mass spectrometry, respectively. Although the invention will be described in connection with liquid chromatography, it is also applicable to gas chromatography.
Liquid chromatography coupled to mass spectrometry (LG/MS) has been widely used in proteomics and metabolomics research. In this context, the technology has been increasingly used for differential profiling, ie, broad screening of biomolecular components across multiple samples or sample runs, which correspond to different conditions, interventions, or time points, in order to elucidate the observed phenotypes and discover biomarkers. One of the major challenges in this domain is development of better solutions for processing LC/MS of data.
Typical LC/MS experiments include several analytical stages, starting with sample pre-treatment which commonly includes sample cleanup and extraction methods. The sample can then be introduced to an LC column where the molecules separate based on their size (size exclusion chromatography), affinity to stationary phase (affinity chromatography), polarity (ion exchange chromatography), and/or hydrophobicity (reversed phase chromatography). Retention time measures the time between the sample injection and the appearance of the compound peak maximum after chromatographic separation. In analyses of complex mixtures, it is likely that many analytes elute at the same time, and individual compound peaks cannot be resolved by LC techniques alone Mass spectrometry (MS) can then be used to separate the co-elutants according to mass-to-charge ratio (m/z). The co-elutanis enter the LC-MS interface where they are ionized and introduced into the mass spectrometer where the m/z ratio is measured. Several ionization methods exist, among the most commonly used are the soft ionization methods such as electrospray ionization (ESI) and atmospheric pressure - chemical ionization (APCI). The principles of mass detection can also vary, with the most common instruments being triple quadrupole, (quadrupole) ion trap, (quadrupole) time of flight mass spectrometers. Because of the large number of possible applica tions and approaches, it is a challenge to develop a generic solution for processing and analysis of LC/MS data.
One increasingly utilized type of LC/MS application is differential profiling, where the extraction, LC methods, and MS instrument setup are set to provide a broad coverage of compounds, with the main aim to enable relative quantitative comparisons for individual compounds across multiple samples. The applications of such approach can be found in domains of systems biology, functional genomics, and biomarker discovery. While such approaches cannot match targeted analytical measurements in ability to accurately quantify individual analytes, it is the role of data processing methods to enable comparative studies of analytes, even if they may be unknown.
The data processing for differential profiling comprises several stages- Smoothing (spectral filtering) aims at reducing the complexity of spectra and removing the noise. Peak detection finds the peaks corresponding to the compounds or fragments thereof.
One of the major challenges in the LC/MS domain is development of better solutions for processing LC/MS of data. A particular problem is related to visualization of spectrum peaks.
BRIEF DESCRIPTION OF THE INVENTION
An object of the present invention is to provide a method, an apparatus and a computer program product for implementing the method so as to solve the above-mentioned problem. The objects of the invention are achieved by a method, program product and computer system which are characterized by what is stated in the independent claims. The preferred embodiments of the invention are disclosed in the dependent claims.
The invention is based on the idea of complementing conventional LC/MS spectrometry operations with an additional step of visualizing peaks of each spectrum, wherein the visualizing step comprises mapping each peak to be visualized to a coordinate system in which a first coordinate indicates mass/charge ratio and a second coordinate indicates retention time: and assigning a specific visual attribute to each peak to be visualized.
A preferred embodiment of the invention also comprises an alignment step and a gap filling step, or program instructions and data structures for executing these steps. Alignment aims at matching the corresponding peaks across multiple sample runs. Gap filling is a second peak detection operation, in which previously undetected peaks in a spectrum are searched by using knowledge of "neighbour" spectra, ie, the remaining spectra in the set of aligned spectra.
According to another preferred embodiment the invention, normalization may be used to reduce systematic errors, by adjusting the intensities within each sample run.
An advantage of the invention is improved processing of spectral data.
BRIEF DESCRIPTION OF THE DRAWINGS
In the following the invention will be described in greater detail by means of preferred embodiments with reference to the attached drawings, in which
Figure 1 is a fiow chart illustrating main phases in a method according to the invention;
Figure 2 is a block diagram illustrating principal software blocks in an exemplary object-based implementation of the invention;
Figures 3A and 3B illustrate peak detection methods:
Figure 4A shows total ion chromatograms from one elicited and one control sample;
Figure 4B shows a log-ratio view for the top 20% most intense peaks in the control samples shown in Figure 4A;
Figure 5A shows differences between elicited and control groups;
Figure 5B shows distribution within the elicited and control groups;
Figure 6 shows a view of an exemplary user interface.
DETAILED DESCRIPTION OF THE INVENTION
Figure 1 is a flow chart illustrating main phases in a method according to the invention. The invention relates to processing of spectral data from a plurality of sample runs. Each sample run produces a spectrum (spectral data) from a sample. The samples used in me different sample runs can be subsamples from a common larger sample, or they can derive from different samples altogether.
Reference numeral 1-2 denotes sample preparation steps which are known to those skilled in the art and which have been briefly discussed in the background section of this document. Reference numeral 1-4 denotes a step which comprises spectrometry operations, including recording of measured spectral data. Reference numeral 1-6 denotes an optional step in which the spectral data is converted from a vendor-specific data format to some open data format, such as netCDF. A benefit of this step, or the corresponding routine and data structures in the software product, is the ability to support a wide variety of spectrometry instruments. In a further optional step 1-8 the spectral data is smoothed to suppress noise and other spurious data. In some implementations this step may be performed by the spectrometer itself In step 1-10 the spectral data is internally represented in two dimensions, wherein one dimension corresponds to mass-charge ratio m/z, while the other dimension corresponds to retention time rt. The term 'internal representation' means that a visualization of the spectral data is not necessary, at least not at this stage. Reference numeral 1-12 denotes a peak detection step in which peaks in the spectral data are detected.
Steps 1-2 through 1-12 are known to those skilled in the art and a detailed description is omitted for brevity. In these steps the several sample runs are typically processed serially, each sample run at a time. In the following steps the several sample runs are processed in parallel, interdependently.
Steps 1-14 to 1-18 relate to a preferred embodiments and are not essential for the present invention. In step 1-14 data from the several sample runs are aligned such that there is a maximal correspondence between the peaks of the spectra. The verb 'align' may imply visualization, but visualization is not strictly necessary, and any equivalent data processing technique may be used. The alignment operation searches for corresponding peaks across different mass spectrometry runs. Peaks from the same compound usually match closely in m/z values, but retention time between the runs may vary. The retention time largely depends on the analytical method used.
After completion of the alignment process, it is likely that the master peak list has some empty gaps, because it is not certain that every peak is detected and aligned in every sample run. The need to deal with these missing values often complicates further statistical analyses, and for this reason, a method according to the invention comprises a second peak detection step 1-16, the purpose of which is to fill these gaps. In one implementation, the second peak detection step employs the m/zm and rtm values for estimating locations in which the missing peaks can be expected. A search is then conducted to find the highest local maximum over a range around the expected location in the raw spectral data. The search is performed over a search win- dow which is preferably user-settable.
Step 1-18 is a normalization step which will be further described under a subheading "Normalization' .
Steps 1 -22 and 1-24 collectively constitute the visualization of peaks. In step 1-22 each peak to be visualized is mapped to a coordinate system in which a first coordinate indicates mass/charge ratio and a second coordinate indicates retention time. In step 1-24 a specific visual attribute is assigned to each peak to be visualized. The visualization steps will be described under a subheading "Visualization".
Figure 2 is a block diagram illustrating an exemplary object-based implementation of an analysis system according to the invention. The analysis system is generally denoted by reference numeral 200. The block diagram in Figure 2 follows the conventions of Unified Modelling Language (UML). This class model includes a set of core classes used for representing raw LC/MS data and interfaces for different types of data processing methods and visualization blocks. In the implementation shown here, the core objects for representing raw LC/MS data don't store the actual measurements inside them, but retrieve the data from disk when necessary. This makes it possible to visualize and process several large raw data files at the same lime. New data process- ing methods can be added to the toolbox by implementing a suitable interface.
Input data formats and conversion Input to the analysis system 200 should be unprocessed measurement data from the mass spectrometer. Such raw data is denoted by reference numeral 205. It is also possible to apply some pre-processing, such as centrolding, before loading the data to the analysis system 200. Such pre-processing may, for example, reduce the amount of storage space needed for the data and/or speed up data processing. This is particularly useful in connection with high-resolution mass spec- trometers such as QTof and FTMS. However, the success of data processing with the analysis system depends on the quality of the input data and the preprocessing methods being used. For supporting input data files in NetCDF format, NetCDF Java Library by Unidata community can be used in the analysis system 200. Many mass spectrometer vendors provide converters for translating raw data files from their proprietary formats to this common presentation format. The system shown in Figure 2 can be made compatible with NetCDF proteomics and metabolomics data created from a wide variety of instruments, including Ouattro Micro (Waters), QSTAR Pulsar (Applied Biosystems), LTQ-FTMS (Thermo Finnigan), and LCQ (Thermo Finnigan). The analysis system can be expanded to include support for upcoming new mass spectrometry data formats such as mzData and mzXML
Smoothing aims to remove noise in the measured spectra, which facilitates further peak detection. Smoothing is an optional stage in data processing and can also be left out if the data is not noisy or if the input data is already available as centroids. For smoothing the spectral data, the analysis system 200 comprises a filter section 210, which can be implemented in any of a wide variety of techniques, including a moving average fiiter and Savitzky- Golay filter.
Peak detection
After the optional smoothing, a peak detector 215 finds the peaks in the spectral data. By way of example, the embodiment shown in Figure 2 implements two peak picking methods, namely a local maxima detection and a recursive threshold detection. Both of these detection methods operate in two steps: first, 1-dimensional m/z peaks are searched within each retention time scan separately, and then 1-dimensional peaks in successive spectra are joined together to form 2-dimensional peaks. The joining occurs only between those m/z peaks which are located in successive spectra, have similar m/z values according to pre-set threshold, and form together a well-shaped peak in the chromatographic direction.
Figures 3A and 3B show a simple example of these two peak detection steps.
The two peak picking methods differ in the implementation of the first step: the local maximum method picks each local maximum in a spectrum as an m/z peak, whereas the recursive threshold method considers only those maxima that have a suitable width, which differentiates noise peaks from real peaks. The choice of methods for smoothing and peak detection depends on the nature of input data, if the data is already pre-processed and/or in centroid form, smoothing is not necessary and peak detection method based on searching for local maximums is typically the best choice. With unprocessed data, the recursive threshold peak detection usually gives better results.
After peak detection, the detected peaks are aligned by an aligner block 220. In one exemplary but non-restrictive implementation, the analysis system implements an alignment technique that matches each individual peak list against a master peak list. For every peak in an individual peak list, the best matching row in master peak list is defined as the one having smallest distance measure.
Figure imgf000009_0001
... wherein m/zp and rtp and are the m/z ratio and retention time, respectively, of a peak in an individual peak list, while m/zm and rtm ate the average m/z ratio and retention time, respectively, of all peaks from different peak lists assigned to same row of the master peak list. k is an adjustable parameter for controlling the balance between accuracy of m/z ratio and retention time values. Generally, k can be set to a larger number with increased resolution of the mass detector.
Initially, the master peak list ts empty and all peaks of the first peak list are appended to new rows of the master peak list. When adding peaks of the sequential peak lists to the master list, both | m/zp - m/zm | and | rtp - rtm | between a peak and the best-matching row should preferably be within a user-definable threshold level, or the peak needs to be appended to a new row at the end of the master peak list. If a single row of the master peak list is the best match for multiple peaks of a peak list, then the only the peak with smallest distance measure will be added to the best matching row of the master peak list, while the others will be assigned to their second best matches.
Normalization
The analysis system 200 also comprises a normalization block 225. The purpose of the normalization is to reduce the systematic error in data. The embodiment shown in Figure 2 implements two different normalization approaches: a rather straightforward set of linear normalization methods as well as a more ambitious approach that uses multiple internal standard compounds injected to the spectrometry samples.
Linear normalization methods divide all peak intensities of a single sample by some value calculated using data from that sample. By way of example, the linear normalization method in the analysis system may offer four different ways to calculate the normalization factor: average peak intensity, average squared peak intensity, maximum intensity and total raw signal. All of these methods work globally, which means that they normalize the entire sample using a single normalization factor.
As shown in Figure 2, the analysis system 200 also comprises a more ambitious normalization method which uses information from multiple standard compounds. This method assumes that some standard compounds are injected to each of the spectrometry samples in known concentrations prior to LC/MS analysis. The standard compound peaks can be used to calculate a set of normalization factors, one for each standard compound. There are several ways to use this information in normalization. One possibility is to determine which standard compound peak is closest to a peak, and normalize this peak using the corresponding normalization factor. The distance function is same as in equation [1]. A variation of this method is a method based on normalization using weighted contribution of each standard compound. In this method, the same distance metric as in equation [1] can be used to calculate the distance from a peak to each standard compound. Contribution of each standard to the final normalization factor can be weighted by the inverse of distance between the peak and the standard as shown by equation [2]:
Figure imgf000010_0001
In equation [2], m is the number of injected standard compounds, nfi is the normalization factor calculated using the standard compound with index i, d(p, ISi) is the distance between the peak to be normalized and the peak of the standard compound with index i. Both methods reduce to the common single-standard calibration when m=1 , in which case only a single internal standard is used.
After processing, the spectral data is ready to be exported from the analysis system as a peak intensity matrix. This matrix can be then further processed with proprietary or off-the-shelf mathematics packages, such as Matlab® (MathWorks, Inc.) or R Statistical Language which already have a large collection of data analysis tools available for statistical analyses of multivariate data.
Visualization
For visualization independently of external software packages, the analysis system 200 implements one or more visuaiization techniques for quickly previewing the processed results. These visualization techniques implement the idea of plotting the peak intensity matrix as a two-dimensional plot where one axis (eg the x-axis) is the retention time and the other (eg the y- axis) is the m/z ratio. Peaks are plotted at the intersection of the coordinates for retention time and m/z ratio using appropriate visual attributes, such as colour, shade, shape, size, line type, or the like.
Figure 4B shows a log ratio plot which is particularly useful for displaying differences between two groups of samples. Differences are measured using a log ratio value which is calculated between average peak intensities of two selected groups;
Figure imgf000011_0001
In equation [3], and are the average intensity of peak p in
Figure imgf000011_0003
Figure imgf000011_0004
the first and second group of raw data files, respectively.
In the log ratio plot, visual coding, such as colour coding, can be used for visualizing the log ratio values. For example, a first shade or visual attribute (eg red or dark hue) can indicate positive log ratio values and a second shade or visual attribute (eg green or light hue) negative log ratio values.
A mathematically precise log ratio is easy to define and implement, but other continuous or stepwise functions can be used. In order to present difference values over a wide range, the function should have a derivate which decreases when its argument increases.
Another useful visualization method is a coefficient of variation plot, which displays a variation of peak intensities within one group of samples. The coefficient of variation plot is drawn similarly as the log ratio plot, but colour or other visual coding is used for displaying the coefficient of variation between peak intensities within a selected group of samples:
Figure imgf000011_0002
In equation [4],
Figure imgf000011_0005
is the average peak jntensity and is the stan
Figure imgf000011_0006
dard deviation of peak intensities in the selected group of samples.
These visualization techniques are particularly useful in quality control, because the analysis system permits the user to return to the raw data and visually verify the obtained results. A benefit of these technique is the ability to clearly see the separation between two sample groups and/or differences and variability across different samples, EXAMPLE: METABOLIC PROFILING
A concrete example of metabolic profiling of plant secondary compounds in Catharanthus roseus will be described. Studies of plant metabolites are a demanding area since plants produce large number of metabolites of high chemical diversity, many of which are unknown. Plant secondary metabolites are produced as responses to changes in the environmental conditions. The biosynthetic pathways of secondary metabolites are largely unknown, and discovery driven 'omics' approaches promise to enhance our knowledge in this domain. In order to illustrate the utility of the analysis system according to the invention, a demonstration of it in connection with metabolic profiling of cell cultures of the medicinal plant Catharanthus roseus will be described. This plant has been extensively studied due to the presence of terpenoid indole alkaloids (TIA), several of which are in high demand for pharmaceutical use. We focused on fraction containing most important secondary metabolites leading to TIA (Methods described in supplementary file). We profiled 20 samples, of which 10 were control strains and 10 were elicited strains. The replicates are the same strain in parallel cultures corresponding to the same time point, so can be considered as biological replicates. We also injected an internal standard compound vincamine (PubChem SID 390304).
Using the analysis system according to the invention with moving average filter (m/z=0.3 window setting), recursive threshold peak detection (default settings), alignment (100s tolerance in retention time, otherwise default settings), gap-filling (60s tolerance in retention time), and normalization by total raw signal, 2175 peaks were detected. Representative total ion chroma- tograms from one elicited and one control sample are shown in Figure 4A. The log-ratio view for top 20% most intense peaks is shown in Figure 4B. After exporting the processed data in tabular format, further analyses of the data matrix were performed in Matlab® using PLS Toolbox (Eigenvector Research, Inc.) and with R Statistical Language. Principal components analysis revealed clear differences between the elicited and control groups, as shown in Figure 6A. Using factor analysis (not shown), we found that the two of the main contributors to the clustering of the elicited group were ajmalicine (PubChem SID 153482) and tabersonine (PubChem SID 163306). The compounds were identified using our internal spectral library based on molecular weight and retention time. Their distribution within the elicited and control groups shows the compounds are significantly upregulated after elicitation (see Figure 5B). Figure 6 shows a view of an exemplary user interface 60. Reference numerals 80A to 60H denote various sections of the user interface. Section 60A is a list data files. Section 60B shows a total ion chromatogram for a selected data file. Section 80C shows a spectrum based on a selected data file. Section 60D shows a list of peaks from the selected data file. Section 60E shows the spectral peaks plotted on a two-dimensional coordinate system. Section 60F shows a total ion chromatogram for another selected data file. Section 60G is a list of peaks after alignment, and Section 60H is a list of alignment results after the second peak processing for selected files.
Although the invention has been described in connection with liquid chromatography, it is equally applicable to gas chromatography . It will be apparent to a person skilled in the art that, as the technology advances, the inventive concept can be implemented in various ways. The invention and its embodiments are not limited to the examples described above but may vary within the scope of the claims.

Claims

Claims
1. A method for analyzing liquid chromatography/mass spectrometry [="LC/MS"] data, the method comprising: preparing (1-2) a plurality of sample runs; processing (1-4) each of the prepared sample runs in an LC/MS spectrometer to obtain a spectrum in respect of each processed sample run; internally representing (1-10) each spectrum as a layout of mass/charge versus retention time; performing a first peak detection (1-12) to detect peaks of each spectrum; visualizing peaks of each spectrum, wherein the visualizing step comprises: mapping (1-22) each peak to be visualized to a coordinate system in which a first coordinate indicates mass/charge ratio and a second coordinate indicates retention time; and assigning (1-24) a specific visual attribute to each peak to be visualized.
2. A method according to claim 1, further comprising internally aligning (1-14) the detected peaks of each spectrum; and performing a second peak detection (1-16) to detect peaks missed in the first peak detection.
3. A method according to claim 2, wherein the second peak detection comprises local maxima detection.
4. A method according to claim 2 or 3, wherein the second peak detection comprises recursive threshold detection.
5. A method according to claim 1 , further comprising normalizing the spectra.
6. A method according to claim 5, further comprising injecting one or more standard compounds with a predetermined concentration into each sample run prior to the processing step (1-4) in order to obtain a set of standard compound peaks for each injected standard compound.
7 A method according to claim 6, further comprising searching for the standard compound peak closest to a peak being analyzed and normalizing the peak being analyzed based on a distance measure of the distance between the peak being analyzed and said closest standard compound peak.
8. A method according to any one of claims 2 to 7, wherein the aligning step comprises (1-14): generating a peak list in respect of each spectrum; generating a master peak list; for each peak in each peak list, finding the corresponding peak in master peak list by using a predetermined distance measure.
9. A method according to claim 7 or 8, wherein the distance measure is based on a weighted combination of | m/zp - m/zm I and I rtp - rtm | , wherein m/zp and rtp and are the mass-to-charge ratio and retention time, respectively, of a peak in an individual peak list, m/zm and rtm are the average m/z ratio and retention time, respectively, of all peaks from different peak lists assigned to same row of the master peak list
10. A method according to any one of the preceding claims, further comprising visualizing peaks of each spectrum, wherein the visualizing step comprises: mapping each peak to be visualized to a coordinate system in which a first coordinate indicates mass/charge ration and a second coordinate indicates retention time; and assigning a specific visual attribute to each peak to be visualized.
11. A method according to claim 10, further comprising visualizing peaks from a first group and a second group of samples, and the specific visual attribute is based on a ratio of average intensities of corresponding peaks in the first group and a second group.
12. A method according to claim 10, further comprising visualising peaks from a group of samples, and the specific visual attribute is based on a variation of peak intensities within the group of samples.
13. A computer program product, executable in a computer system, the computer program product comprising program code for instructing the computer system to perform the following steps on a plurality of spectra obtained from a spectrometer used to analyze a plurality of sample runs: internally representing (1-10) each spectrum as a layout of mass/charge versus retention time; performing a first peak detection (1-12) to detect peaks of each spectrum; and searching (1-18) for the standard compound peak closest to a peak being analyzed and normalizing (1-20) the peak being analyzed based on a distance measure of the distance between the peak being analyzed and said closest standard compound peak.
14. A computer system comprising the computer program product according to claim 13.
PCT/FI2006/050208 2005-05-26 2006-05-24 Analysis techniques for liquid chromatography/mass spectrometry WO2006125863A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI20055253 2005-05-26
FI20055253A FI20055253A (en) 2005-05-26 2005-05-26 Analytical technique for liquid chromatography / mass spectrometry

Publications (1)

Publication Number Publication Date
WO2006125863A1 true WO2006125863A1 (en) 2006-11-30

Family

ID=34630188

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2006/050208 WO2006125863A1 (en) 2005-05-26 2006-05-24 Analysis techniques for liquid chromatography/mass spectrometry

Country Status (2)

Country Link
FI (1) FI20055253A (en)
WO (1) WO2006125863A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014235088A (en) * 2013-06-03 2014-12-15 国立医薬品食品衛生研究所長 Quantitative method and program
JP2015117965A (en) * 2013-12-17 2015-06-25 株式会社島津製作所 Method and device for outputting data to printer
CN109507347A (en) * 2017-09-14 2019-03-22 湖南中烟工业有限责任公司 A kind of chromatographic peak selection method
EP3968355A1 (en) * 2020-09-11 2022-03-16 Thermo Fisher Scientific (Bremen) GmbH Method of control of a spectrometer

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5119315A (en) * 1989-04-28 1992-06-02 Amoco Corporation Method of correlating a record of sample data with a record of reference data
US5175430A (en) * 1991-05-17 1992-12-29 Meridian Instruments, Inc. Time-compressed chromatography in mass spectrometry
US5885841A (en) * 1996-09-11 1999-03-23 Eli Lilly And Company System and methods for qualitatively and quantitatively comparing complex admixtures using single ion chromatograms derived from spectroscopic analysis of such admixtures
US5905192A (en) * 1997-07-23 1999-05-18 Hewlett-Packard Company Method for identification of chromatographic peaks
US20030111596A1 (en) * 2001-10-15 2003-06-19 Surromed, Inc. Mass specttrometric quantification of chemical mixture components
US20040096982A1 (en) * 2002-11-19 2004-05-20 International Business Machines Corporation Methods and apparatus for analysis of mass spectra
US20040113062A1 (en) * 2002-05-09 2004-06-17 Surromed, Inc. Methods for time-alignment of liquid chromatography-mass spectrometry data
US20040195500A1 (en) * 2003-04-02 2004-10-07 Sachs Jeffrey R. Mass spectrometry data analysis techniques
EP1600771A1 (en) * 2004-12-15 2005-11-30 Agilent Technologies, Inc. Peak pattern calibration

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5119315A (en) * 1989-04-28 1992-06-02 Amoco Corporation Method of correlating a record of sample data with a record of reference data
US5175430A (en) * 1991-05-17 1992-12-29 Meridian Instruments, Inc. Time-compressed chromatography in mass spectrometry
US5885841A (en) * 1996-09-11 1999-03-23 Eli Lilly And Company System and methods for qualitatively and quantitatively comparing complex admixtures using single ion chromatograms derived from spectroscopic analysis of such admixtures
US5905192A (en) * 1997-07-23 1999-05-18 Hewlett-Packard Company Method for identification of chromatographic peaks
US20030111596A1 (en) * 2001-10-15 2003-06-19 Surromed, Inc. Mass specttrometric quantification of chemical mixture components
US20040113062A1 (en) * 2002-05-09 2004-06-17 Surromed, Inc. Methods for time-alignment of liquid chromatography-mass spectrometry data
US20040096982A1 (en) * 2002-11-19 2004-05-20 International Business Machines Corporation Methods and apparatus for analysis of mass spectra
US20040195500A1 (en) * 2003-04-02 2004-10-07 Sachs Jeffrey R. Mass spectrometry data analysis techniques
EP1600771A1 (en) * 2004-12-15 2005-11-30 Agilent Technologies, Inc. Peak pattern calibration

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KATAJAMAA M. ET AL.: "Processing methods for differential analysis of LC/MS profile data", BMC BIOINFORMATICS, vol. 6, no. 179, July 2005 (2005-07-01), pages 179, XP021000775 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014235088A (en) * 2013-06-03 2014-12-15 国立医薬品食品衛生研究所長 Quantitative method and program
JP2015117965A (en) * 2013-12-17 2015-06-25 株式会社島津製作所 Method and device for outputting data to printer
CN109507347A (en) * 2017-09-14 2019-03-22 湖南中烟工业有限责任公司 A kind of chromatographic peak selection method
EP3968355A1 (en) * 2020-09-11 2022-03-16 Thermo Fisher Scientific (Bremen) GmbH Method of control of a spectrometer
GB2598770A (en) * 2020-09-11 2022-03-16 Thermo Fisher Scient Bremen Gmbh Method of control of a spectrometer
GB2598770B (en) * 2020-09-11 2024-01-17 Thermo Fisher Scient Bremen Gmbh Method of control of a spectrometer

Also Published As

Publication number Publication date
FI20055253A0 (en) 2005-05-26
FI20055253A (en) 2006-11-27

Similar Documents

Publication Publication Date Title
Johnsen et al. Gas chromatography–mass spectrometry data processing made easy
Domingo-Almenara et al. Metabolomics data processing using XCMS
Katajamaa et al. Data processing for mass spectrometry-based metabolomics
Katajamaa et al. Processing methods for differential analysis of LC/MS profile data
Putri et al. Current metabolomics: technological advances
US7457708B2 (en) Methods and devices for identifying related ions from chromatographic mass spectral datasets containing overlapping components
Du et al. Metabolomics data preprocessing using ADAP and MZmine 2
EP2728350B1 (en) Method and system for processing analysis data
Röst et al. Automated SWATH data analysis using targeted extraction of ion chromatograms
US20080302957A1 (en) Identifying ions from mass spectral data
JP2008536147A (en) Chromatographic and mass spectral data analysis
US8927925B2 (en) Interactive method for identifying ions from mass spectral data
JP2015503746A (en) Use of windowed mass spectrometry data to determine or confirm residence time
US9348787B2 (en) Method and system for processing analysis data
US20070095757A1 (en) Methods and systems for the annotation of biomolecule patterns in chromatography/mass-spectrometry analysis
Nevedomskaya et al. Alignment of capillary electrophoresis–mass spectrometry datasets using accurate mass information
WO2006125863A1 (en) Analysis techniques for liquid chromatography/mass spectrometry
Likić Extraction of pure components from overlapped signals in gas chromatography-mass spectrometry (GC-MS)
Yu et al. A chemometric-assisted method based on gas chromatography–mass spectrometry for metabolic profiling analysis
WO2006125864A1 (en) Analysis techniques for liquid chromatography/mass spectrometry
Codrea et al. Tools for computational processing of LC–MS datasets: a user's perspective
Meng et al. LipidMiner: a software for automated identification and quantification of lipids from multiple liquid chromatography-mass spectrometry data files
US10147590B2 (en) Mass spectrometry data processing apparatus and mass spectrometry data processing method
WO2006125865A1 (en) Analysis techniques for liquid chromatography/mass spectrometry
GB2404193A (en) Automated chromatography/mass spectrometry analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

NENP Non-entry into the national phase

Ref country code: RU

WWW Wipo information: withdrawn in national office

Country of ref document: RU

122 Ep: pct application non-entry in european phase

Ref document number: 06725965

Country of ref document: EP

Kind code of ref document: A1