WO2007072256A2

WO2007072256A2 - Apparatus and method for classifying data

Info

Publication number: WO2007072256A2
Application number: PCT/IB2006/054529
Authority: WO
Inventors: Simona E. Grigorescu
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 2005-12-23
Filing date: 2006-11-30
Publication date: 2007-06-28
Also published as: WO2007072256A3

Abstract

An apparatus (2) for classifying medical image data is disclosed. The apparatus includes a processor 14 for receiving first data sets belonging to more than one class and including a plurality of features. The classes to which the first data sets belong are known and the processor determines the feature which best discriminates between the known classes of the first data sets. Subsequently received second data sets are then classified on the basis of the same feature.

Description

Apparatus and method for classifying data

The present invention relates to an apparatus and method for classifying data, and relates particularly, but not exclusively, to an apparatus and method for adaptively classifying 3D medical image model data.

Computer aided detection and diagnosis systems (CAD) have been developed to assist radiologists in quickly and reliably detecting unhealthy structures in the human body from computer tomography (CT) or magnetic resonance (MR) scan data. Such systems generally take 3D model data obtained by scanning a patient, and output a list of locations of suspicious structures which can then be investigated further by the radiologist.

The operation of an existing CAD system of this type is explained with reference to Figures 1 to 3. Figure 1 represents a set of examples of lesions and false alarms for which two features fl and f2 are computed. Prior to the CAD system going on-line, these examples are plotted in Figure 1, and define a first set of points 2 corresponding to false alarms and a second set of points 4 corresponding to lesions. The CAD system uses this representation to define a boundary 6 between the two sets of points 2,4. After the CAD system goes on-line, every new data example is classified on the basis of its position relative to this boundary 6.

After going on-line, the system can update the definition of the boundary 6 on the basis of each new example of lesion of false alarm it encounters. Referring to Figure 2, if a new known lesion 8 is introduced, the CAD system updates the boundary 6 between the lesions and the false alarms, and any data example subsequently received is classified according to its position relative to the updated boundary 6.

However, the arrangement described above suffers from the drawback that the features needed to discriminate between lesions and false alarms depend upon the scanning protocol used. For example, although the data received from a hospital using one scanning protocol may be easy to classify such as the data shown in Figures 1 and 2, the same data from a different hospital using a different scanning protocol could be as shown in Figure 3. In that case, it may be difficult to define a boundary between the two classes of data points on the basis of the position of points in the space defined by features fl and f2. This results in many faulty classifications, which may cause the CAD system to perform well at one clinical site and poorly at a different one.

Preferred embodiments of the present invention seek to improve the reliability of classification of such data. According to an aspect of the present invention, there is provided an updating method for updating a method of classifying medical image data, the updating method comprising:

(i) determining, for a plurality of first data sets, wherein each said first data set represents at least part of an entity belonging to a respective known data class and has a plurality of features, correspondence between values of a plurality of said features and membership of said known data classes; and

(ii) selecting, on the basis of said correspondence, at least one said feature to form the basis of subsequent allocation of data classes to at least one second data set, wherein the or each said second data set represents at least part of an entity belonging to a respective unknown data class and has a plurality of said features.

By obtaining data having a plurality of parameters and selecting a parameter to form the basis of subsequent classification based on correspondence between that parameter and membership of the known data classes into which subsequent data is to be divided, this provides the advantage of enabling the classification of data to be optimized for the particular data values available. This in turn minimizes the extent to which classification based on results obtained from one location becomes unreliable when applied to results obtained from another location.

The method may further comprise selecting a plurality of said features on the basis of said correspondence. This provides the advantage of enabling an apparatus for carrying out the method to be provided with pre-settings suitable, for example, for medical imaging data obtained according to different scanning protocols.

The method may be offered to customers via the Internet.

The method may further comprise generating at least one said feature of said first data sets.

The method may further comprise determining the extent to which values of features of said first data sets can be represented as a plurality of separated regions, wherein each said region corresponds to a respective known data class. The method may further comprise determining the Mahalanobis distance between at least one pair of said regions.

According to another aspect of the present invention, there is provided a classifying method for classifying medical image data, the classifying method comprising: (i) receiving a plurality of first data sets, wherein each said first data set represents at least part of an entity belonging to a respective known data class and has a plurality of features;

(ii) updating the method by means of an updating method as defined above;

(iii) receiving at least one second data set, wherein the or each said second data set represents at least part of an entity belonging to a respective unknown data class and has a plurality of said features; and

(iv) allocating a respective data class to at least one said second data set on the basis of at least one said feature.

According to a further aspect of the present invention, there is provided a medical imaging method comprising:

(i) generating a plurality of first data sets, wherein each said first data set represents at least part of an entity belonging to a respective known data class and has a plurality of features;

(ii) generating at least one second data set, wherein the or each said second data set represents at least part of an entity belonging to a respective unknown data class and has a plurality of said features; and

(iii) classifying the or each said second data set by means of a classifying method as defined above.

According to a further aspect of the present invention, there is provided an updating apparatus for updating a method of classifying medical image data, the updating apparatus comprising:

(i) at least one determining device for determining, for a plurality of first data sets, wherein each said first data set represents at least part of an entity belonging to a respective known data class and has a plurality of features, correspondence between values of a plurality of said features and membership of said known data classes; and

(ii) at least one selecting device for selecting, on the basis of said correspondence, at least one said feature to form the basis of subsequent allocation of data classes to at least one second data set, wherein the or each said second data set represents at least part of an entity belonging to a respective unknown data class and has a plurality of said features. At least one said selecting device may be adapted to select a plurality of said features on the basis of said correspondence.

The apparatus may further comprise at least one generating device for generating at least one said feature of said first data sets. At least one said determining device may be adapted to determine the extent to which values of features of said first data sets can be represented as a plurality of separated regions, wherein each said region corresponds to a respective known data class.

At least one said determining device may be adapted to determine the

Mahalanobis distance between at least one pair of said regions. According to a further aspect of the present invention, there is provided a classifying apparatus for classifying medical image data, the classifying apparatus comprising:

(i) at least one receiving device for receiving a plurality of first data sets, wherein each said first data set represents at least part of an entity belonging to a respective known data class and has a plurality of features;

(ii) an updating apparatus as defined above;

(iii) at least one second receiving device for receiving at least one second data set, wherein the or each said second data set represents at least part of an entity belonging to a respective unknown data class and has a plurality of said features; and (iv) at least one allocating device for allocating data classes to at least one second data set on the basis of at least one said feature.

According to a further aspect of the present invention, there is provided a medical imaging apparatus comprising:

(i) at least one imaging device for generating at least one second data set, wherein the or each said second data set represents at least part of an entity belonging to a respective unknown data class and has a plurality of features; and

(ii) a classifying apparatus as defined above.

According to a further aspect of the present invention, there is provided an updating data structure for use by a computer system for updating a method of classifying medical image data, the updating data structure comprising:

(i) first computer code executable to determine, for a plurality of first data sets, wherein each said first data set represents at least part of an entity belonging to a respective known data class and has a plurality of features, correspondence between values of a plurality of said features and membership of said known data classes; and (ii) second computer code executable to select, on the basis of said correspondence, at least one said feature to form the basis of subsequent allocation of data classes to at least one second data set, wherein the or each said second data set represent at least part of an entity belonging to a respective unknown data class and has a plurality of said features.

The second computer code may be executable to select a plurality of said features on the basis of said correspondence.

The updating data structure may further comprises third computer code executable to generate at least one said feature of said first data sets. The first computer code may be executable to determine the extent to which values of features of first data sets can be represented as a plurality of separated regions, wherein each said region corresponds to a respective known data class.

The first computer code may be executable to determine the Mahalanobis distance at least one pair of said regions. According to a further aspect of the present invention, there is provided a classifying data structure for use by a computer system for classifying medical image data, the classifying data structure comprising:

(i) fourth computer code executable to receive a plurality of first data sets, wherein each said first data set represents at least part of an entity belonging to a respective known data class and has a plurality of features;

(ii) an updating data structure as defined above;

(iii) fifth computer code executable to receive at least one second data set, wherein the or each said second data set represents at least part of an entity belonging to a respective unknown data class and has a plurality of said features; and (iv) sixth computer code executable to allocate a respective data class to at least one second data set on the basis of at least said feature.

According to further aspect of the present invention, there is provided a medical imaging data structure for use by a computer system for medical imaging, the medical imaging data structure comprising: seventh computer code executable to generate at least one second data set, wherein the or each said second data set represents at least part of an entity belonging to a respective unknown data class and has a plurality of features; and a classifying data structure as defined above. According to a further aspect of the present invention, there is provided a computer readable medium carrying a data structure as defined above stored thereon.

A preferred embodiment of the invention will now be described, by way of example only, and not in any limitative sense, with reference to the accompanying drawings, in which:

Figs. 1 to 3 illustrate the principle of operation of an existing CAD system; Fig. 4 is a schematic representation of a medical imaging apparatus embodying the present invention; and

Fig. 5 is a flowchart explaining the operation of a method embodying the present invention of updating the apparatus of Fig. 4.

Referring to Figure 4, a medical imaging apparatus 2 has a platform 4 for supporting a patient 6, and a plurality of opposed pairs of x-ray sources 8 and detectors 10 arranged around a generally circular frame 12 through which the platform 4 passes. The x-ray sources 8 and detectors 10 are controlled by means of a processor 14, which also controls a motor (not shown) for moving the platform 4 in the direction of arrow A relative to the frame 12. The processor 14 also receives input signals from the X-ray detectors 10 and forms a 3-D model of the part of the patient 6 under investigation, and enables an image to be shown on a display 16. This aspect of the operation of the apparatus 2 will be familiar to persons skilled in the art and will therefore not be explained in greater detail herein.

The operation of the present invention will now be described with reference to Figure 5.

Once the image data is available, the patient's organ of interest is selected by means of one or more methods which will be familiar to persons skilled in the art. Of the data relating to the organ of interest, only voxels having a certain property are chosen for further processing, in order to identify suspicious regions (i.e. suspected lesions) in step SlO. Objects are then generated by adding neighboring voxels satisfying certain properties. For example, for polyp detection in CT images, first, the air in the colon is segmented by means of thresholding the CT data. Further, the voxels surrounding the segmented air are selected. These voxels represent the colon lumen. Next, those voxels in the colon lumen having a shape index close to 1 are retained for further analysis. Finally, objects are generated starting from the selected voxels by adding neighboring voxels that belong to the colon lumen and have a shape index of 0.75 or greater.

At step S20, for the objects generated at the previous step, features describing object size, shape, and/or texture are computed. As an example, in the case of polyp detection, for the objects generated in the previous step one can determine the minimum ellipsoid enclosing the segmented object and use the length of its principal diameters as features. Another examples of features are statistics (i.e. average, standard deviation, minimum, maximum) computed over the grey values in an object.

A tuning step of the present invention can be either launched by a user, or carried out remotely on a user's behalf, for example via the Internet. Prior to the tuning step S30, data from one or more patients is loaded, and objects having a series of features are generated, as described above. Each object is then labeled by the human expert (such as a radiologist) as a true lesion or a faulty detection. These objects are called throughout this embodiment "example set". In the tuning step, feature selection is first carried out at step S30A. The feature selection procedure works as follow: for all objects labeled by the human expert as true lesions, the distribution of the values of each feature is computed, and the features are then sorted according to how similar their distribution is to a Gaussian distribution. This similarity can be measured by means of statistical tests which will be familiar to persons skilled in the art such as the Kolmogorov-Smirnov test.

An empty set of "good features" is created, and the most Gaussian feature is selected and used to compute the Mahalanobis distance between the group of objects labeled as true lesions and the other group of objects. If the Mahalanobis distance is larger than a certain threshold, for example larger than 0.01, the feature is added to the set of "good features" and removed from the original pool of sorted features. Otherwise, it is simply discarded from the pool of sorted features.

The most Gaussian feature from the pool of sorted features is then selected and together with the features from the "good feature" set is used to compute a new value for the Mahalanobis distance. If the difference between this new value and the previous one is greater than a certain threshold, for example 0.01, the feature is added to the set of "good features" and removed from the original pool of sorted features. Otherwise, it is simply discarded from the pool of sorted features. This process is then continued with every feature in the top of the sorted list until the list is empty or until a prescribed number of features was added to the "good feature" set. As a result, only the selected "good features" are computed at step S20 for subsequent data acquisition, until the tuning process is next carried out. In this way, unnecessary feature acquisition and/or data processing is avoided.

Once generated, the "good feature" set selected in step S30A is then used at step S30B for supervised training of the classifier that is used at step S40. Generally speaking, a classifier has a set of internal parameters that are used together with object feature values in the classification phase (step S40) for taking the decision to which class that object belongs. During step S30B, these internal parameter values are updated based on the values of the "good feature" set for the "example set". As an example, a linear classifier combines linearly the feature values of an object using a set of weights in one single quantity. This quantity is compared to a certain threshold for taking the decision about the class to which the object belongs. The values of these weights and that of the threshold are computed during the training phase S3 OB based on the "example set" feature values by means of mathematical formulas that are familiar to the person skilled in the art.

It will be appreciated by persons skilled in the art that the above embodiment has been described by way of example only, and not in any limitative sense, and that various alterations and modifications are possible without departure from the scope of the invention as defined by the appended claims. For example, the apparatus 2 could be set for a number of scanning protocols and selected features together with the associated trained classifier stored for each such protocol as a system preset. This would enable the user to choose, from knowledge of the scanning protocol, which of the available presets best suits the data being classified. Also, the tuning process of Figure 5 can be carried out either within a medical imaging apparatus, or remotely on behalf of a user of the medical imagining apparatus, for example as an updating service provided via the Internet.

Claims

CLAIMS:

1. An updating method for updating a method of classifying medical image data, the updating method comprising:

2. An updating method according to claim 1, further comprising selecting a plurality of said features on the basis of said correspondence.

3. An updating method according to claim 1, wherein the method is offered to customers via the Internet.

4. An updating method according to claim 1, further comprising generating at least one said feature of said first data sets.

5. An updating method according to claim 1, further comprising determining the extent to which values of features of said first data sets can be represented as a plurality of separated regions, wherein each said region corresponds to a respective known data class.

6. An updating method according to claim 1, further comprising determining the

Mahalanobis distance between at least one pair of said regions.

7. A classifying method for classifying medical image data, the classifying method comprising: (i) receiving a plurality of first data sets, wherein each said first data set represents at least part of an entity belonging to a respective known data class and has a plurality of features;

(ii) updating the method by means of an updating method according to claim 1 ; (iii) receiving at least one second data set, wherein the or each said second data set represents at least part of an entity belonging to a respective unknown data class and has a plurality of said features; and

8. A medical imaging method comprising:

(i) generating a plurality of first data sets, wherein each said first data set represents at least part of an entity belonging to a respective known data class and has a plurality of features; (ii) generating at least one second data set, wherein the or each said second data set represents at least part of an entity belonging to a respective unknown data class and has a plurality of said features; and

(iii) classifying the or each said second data set by means of a classifying method according to claim 7.

9. An updating apparatus for updating a method of classifying medical image data, the updating apparatus comprising:

(i) at least one determining device for determining, for a plurality of first data sets, wherein each said first data set represents at least part of an entity belonging to a respective known data class and has a plurality of features, correspondence between values of a plurality of said features and membership of said known data classes; and (ii) at least one selecting device for selecting, on the basis of said correspondence, at least one said feature to form the basis of subsequent allocation of data classes to at least one second data set, wherein the or each said second data set represents at least part of an entity belonging to a respective unknown data class and has a plurality of said features.

10. An updating apparatus according to claim 9, wherein at least one said selecting device is adapted to select a plurality of said features on the basis of said correspondence.

11. An updating apparatus according to claim 9, further comprising at least one generating device for generating at least one said feature of said first data sets.

12. An updating apparatus according to claim 9, wherein at least one said determining device is adapted to determine the extent to which values of features of said first data sets can be represented as a plurality of separated regions, wherein each said region corresponds to a respective known data class.

13. An updating apparatus according to claim 9, wherein at least one said determining device is adapted to determine the Mahalanobis distance between at least one pair of said regions.

14. A classifying apparatus for classifying medical image data, the classifying apparatus comprising: (i) at least one receiving device for receiving a plurality of first data sets, wherein each said first data set represents at least part of an entity belonging to a respective known data class and has a plurality of features;

(ii) an updating apparatus according to claim 9;

(iii) at least one second receiving device for receiving at least one second data set, wherein the or each said second data set represents at least part of an entity belonging to a respective unknown data class and has a plurality of said features; and

(iv) at least one allocating device for allocating data classes to at least one second data set on the basis of at least one said feature.

15. A medical imaging apparatus comprising:

(i) at least one imaging device for generating at least one second data set, wherein the or each said second data set represents at least part of an entity belonging to a respective unknown data class and has a plurality of features; and (ii) a classifying apparatus according to claim 14.

16. An updating data structure for use by a computer system for updating a method of classifying medical image data, the updating data structure comprising: (i) first computer code executable to determine, for a plurality of first data sets, wherein each said first data set represents at least part of an entity belonging to a respective known data class and has a plurality of features, correspondence between values of a plurality of said features and membership of said known data classes; and (ii) second computer code executable to select, on the basis of said correspondence, at least one said feature to form the basis of subsequent allocation of data classes to at least one second data set, wherein the or each said second data set represent at least part of an entity belonging to a respective unknown data class and has a plurality of said features.

17. An updating data structure according to claim 16, wherein the second computer code is executable to select a plurality of said features on the basis of said correspondence.

18. An updating data structure according to claim 16, further comprising third computer code executable to generate at least one said feature of said first data sets.

19. An updating data structure according to claim 16, wherein the first computer code is executable to determine the extent to which values of features of first data sets can be represented as a plurality of separated regions, wherein each said region corresponds to a respective known data class.

20. An updating data structure according to claim 16, wherein the first computer code is executable to determine the Mahalanobis distance at least one pair of said regions.

21. A classifying data structure for use by a computer system for classifying medical image data, the classifying data structure comprising:

(i) fourth computer code executable to receive a plurality of first data sets, wherein each said first data set represents at least part of an entity belonging to a respective known data class and has a plurality of features; (ii) an updating data structure according to claim 16; (iii) fifth computer code executable to receive at least one second data set, wherein the or each said second data set represents at least part of an entity belonging to a respective unknown data class and has a plurality of said features; and

(iv) sixth computer code executable to allocate a respective data class to at least one second data set on the basis of at least said feature.

22. A medical imaging data structure for use by a computer system for medical imaging, the medical imaging data structure comprising: seventh computer code executable to generate at least one second data set, wherein the or each said second data set represents at least part of an entity belonging to a respective unknown data class and has a plurality of features; and a classifying data structure according to claim 21.

23. A computer readable medium carrying a data structure according to claim 16 stored thereon.