US20050033523A1 - Similarity calculation method and device - Google Patents

Similarity calculation method and device Download PDF

Info

Publication number
US20050033523A1
US20050033523A1 US10/489,012 US48901204A US2005033523A1 US 20050033523 A1 US20050033523 A1 US 20050033523A1 US 48901204 A US48901204 A US 48901204A US 2005033523 A1 US2005033523 A1 US 2005033523A1
Authority
US
United States
Prior art keywords
distance calculation
vector
hierarchical
threshold value
transform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US10/489,012
Other versions
US7260488B2 (en
Inventor
Mototsugu Abe
Masayuki Nishiguchi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Corp
Original Assignee
Sony Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Corp filed Critical Sony Corp
Assigned to SONY CORPORATION reassignment SONY CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: NISHIGUCHI, MASAYUKI, ABE, MOTOTSUGU
Publication of US20050033523A1 publication Critical patent/US20050033523A1/en
Application granted granted Critical
Publication of US7260488B2 publication Critical patent/US7260488B2/en
Adjusted expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2131Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on a transform domain processing, e.g. wavelet transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor

Definitions

  • the present invention relates to a similarity calculation method, a similarity calculation apparatus, a program and a recording medium which perform pattern matching between two vectors at a high speed.
  • the so-called full search in which similarities between input value and respective all candidiates are determined thereafter to determine data where the distance is the shortest is a technology which is most simple and has no detection leakage, and is frequently used in the case where data quantity is small.
  • the portion similar to input image or input voice (sound) is retrieved from a large quantity of accumulated images or voices (sounds)
  • the dimension of the feature vector per second is large and retrieval with respect to those feature vectors which have been accumulated by several ten to several hundred hours is conducted, there is the problem that retrieval time becomes vast when such simple full search is performed.
  • the present invention has been proposed in view of such conventional actual circumstances, and its object is to provide a similarity calculation method and a similarity calculating apparatus which perform pattern matching between two vectors at a high speed while satisfying the above-described conditions, a program for allowing computer to execute the similarity calculation processing, and a computer readable recording medium where such program is recorded.
  • a similarity calculation method is directed to a similarity calculation method of determining similarity between two input vectors, and includes a hierarchical distance calculation step of performing distance calculation between the two input vectors in a hierarchical manner, a threshold value comparison step of comparing integrated value of distances calculated at respective hierarchies of the hierarchical distance calculation step with threshold value set in advance, a control step of controlling distance calculation at the hierarchical distance calculation step in accordance with comparison result at the threshold value comparison step, and an output step of outputting, as the similarity, integrated value of distances calculated up to the last hierarchy, wherein, at the control step, in the case where integrated value of distances calculated up to a certain hierarchy is above the threshold value at the threshold value comparison step, control is conducted so that distance calculation is truncated.
  • distance calculation between two vectors is conducted in a hierarchical manner, whereby in the case where integrated value of distances calculated up to a certain hierarchy is above a predetermined threshold value, it is only detected, without calculating actual distance, that the integrated value of distances is above the threshold value to thereby allow operation to be performed at a high speed.
  • this similarity calculation method may further include a transform step of implementing a predetermined transform operation to the two input vectors.
  • a transform step of implementing a predetermined transform operation to the two input vectors.
  • the predetermined transform operation is, e.g., transform for performing sequencing of order of respective components constituting input vector in accordance with magnitude of dispersion of the respective components, Discrete Cosine Transform, Discrete Fourier Transform, Walsh-Hadamard Transform or Karhunen-Lueve Transform.
  • this similarity calculation method may include a division step of taking out, in the predetermined order, respective components which constitute the two input vectors transformed at the transform step to divide them into hierarchical plural partial vectors.
  • a division step of taking out, in the predetermined order, respective components which constitute the two input vectors transformed at the transform step to divide them into hierarchical plural partial vectors.
  • distance calculation between respective components which constitute partial vectors is performed in a hierarchical manner in order from the partial vector of the uppermost hierarchy, whereby in the case where integrated value of calculated distances between all components which constitute partial vectors up to a certain hierarchy is below the threshold value, distance calculation between respective components which constitute partial vector of one hierarchy lower is performed.
  • a similarity calculating apparatus is directed to a similarity calculating apparatus adapted for determining similarity between two input vectors, and comprises hierarchical distance calculating means for performing distance calculation between the two input vectors in a hierarchical manner, threshold value comparing means for comparing integrated value of distances calculated at respective hierarchies by the hierarchical distance calculating means with threshold value set in advance, control means for controlling distance calculation by the hierarchical distance calculating means in accordance with comparison result by the threshold value comparing means, and output means for outputting, as the similarity, integrated value of distances calculated up to the last hierarchy, wherein the control means conducts a control so as to abort (truncate) distance calculation in the case where integrated value of distances calculated up to a certain hierarchy is above the threshold value as the result of comparison by the threshold value comparing means.
  • Such similarity calculating apparatus performs distance calculation between two vectors in a hierarchical manner, whereby in the case where integrated value of distances calculated up to a certain hierarchy is above a predetermined threshold value, it is only detected, without calculating actual distance, that the integrated value of distances is the threshold value or larger to thereby allow operation to be conducted at a high speed.
  • this similarity calculating apparatus may further comprise transform means for implementing a predetermined transform operation to the two input vectors.
  • the hierarchical distance calculating means conducts distance calculation between the two input vectors transformed by the transform means in a predetermined order based on the predetermined transform operation.
  • the predetermined transform operation is, e.g., transform for performing sequencing of order of respective components which constitute input vector in accordance with magnitude of dispersion of the respective components, Discrete Cosine Transform, Discrete Fourier Transform, Walsh-Hadamard Transform, or Karhunen-Lueve Transform.
  • this similarity calculating apparatus may comprise dividing means for taking out, in the predetermined order, respective components which constitute the respective two input vectors transformed by the transform means to divide them into hierarchical plural partial vectors.
  • the hierarchical distance calculating means performs, in a hierarchical manner, distance calculation between respective components which constitute partial vectors in order from the partial vector of the uppermost hierarchy, whereby in the case where integrated value of calculated distances between all components which constitute partial vectors up to a certain hierarchy is below the threshold value, distance calculation between respective components which constitute partial vectors of one hierarchy lower is performed.
  • program according to the present invention serves to allow computer to execute the above-described similarity calculation processing
  • recording medium according to the present invention is a computer readable recording medium where such program is recorded.
  • FIG. 1 is a view for explaining outline of the configuration of a similarity vector detecting apparatus in the first embodiment.
  • FIG. 2 is a flowchart for explaining processing at the time of vector registration in the similarity vector detecting apparatus.
  • FIG. 3 is a flowchart for explaining processing at the time of vector retrieval in the similarity vector detecting apparatus.
  • FIG. 4 is a view for intuitively explaining processing in the first embodiment.
  • FIG. 5 is a view showing an example in which there exists deviation in distribution of vector within feature space.
  • FIG. 6 is a view for explaining outline of the configuration of a similarity vector detecting apparatus in the second embodiment.
  • FIG. 7 is a flowchart for explaining processing at the time of vector registration in the similarity vector detecting apparatus.
  • FIG. 8 is a flowchart for explaining processing at the time of vector retrieval in the similarity vector detecting apparatus.
  • FIG. 9 is a view for explaining outline of the configuration of a similarity vector detecting apparatus in the third embodiment.
  • FIG. 10 is a flowchart for explaining processing at the time of vector registration in the similarity vector detecting apparatus.
  • FIG. 11 is a flowchart for explaining processing at the time of vector retrieval in the similarity vector detecting apparatus.
  • FIG. 12 is a flowchart for explaining an example of processing for extracting acoustic feature vector from acoustic signal.
  • FIG. 13 is a view for explaining an example of processing for extracting acoustic feature vector from acoustic signal.
  • FIG. 14 is a view for explaining transform encoding in acoustic signal.
  • FIG. 15 is a flowchart for explaining an example of processing for extracting acoustic feature vector from encoded acoustic signal.
  • FIG. 16 is a view for explaining an example of processing for extracting acoustic feature vector from encoded acoustic signal.
  • FIG. 17 is a flowchart for explaining an example of processing for extracting image feature vector from video signal.
  • FIG. 18 is a view for explaining an example of processing for extracting image feature vector from video signal.
  • FIG. 19 is a flowchart for explaining another example of processing for extracting image feature vector from video signal.
  • FIG. 20 is a view for explaining a further example of processing for extracting image feature vector from video signal.
  • FIG. 21 is a flowchart for explaining a further example of processing for extracting image feature vector from encoded video signal.
  • FIG. 22 is a view for explaining a further example of processing for extracting image feature vector from encoded video signal.
  • the present invention is applied to a similarity vector detection method and an apparatus therefor which detect, at a high speed, vectors similar to input vector from plural registered vectors.
  • f ( f[ 1 ], f[ 2 ], . . . , f[N ]) t
  • g ( g[ 1 ], g[ 2 ], . . . , g[N ]) t
  • f[1], f[2], . . . represent respective components of vector f.
  • g[1], g[2], . . . represent respective components of vector g.
  • t represents transposition and N represents dimension of vector.
  • the similarity vector detecting apparatus 1 serves to input vector f and vector g to output square distance between the vectors (or ⁇ 1), and is composed of a recording unit 10 , a hierarchical distance calculating unit 11 , and a threshold value judgment unit 12 .
  • step S 1 the recording unit 10 ( FIG. 1 ) inputs in advance registered vector g.
  • vector g is plural numbers and may become vast number in many cases.
  • step S 2 the recording unit 10 records inputted vector g.
  • the apparatus since it is unnecessary to conduct special operation at the time of registration, the apparatus is simple and is suitable for processing on the real time basis.
  • the recording unit 10 is, e.g., magnetic disc, optical disc or semiconductor memory, etc.
  • the threshold value judgment unit 12 sets threshold value S of distance.
  • the hierarchical distance calculating unit 11 inputs vector f, and acquires one vector g recorded at the recording unit 10 .
  • step S 12 the hierarchical distance calculating unit 11 sets component number i serving as internal variable to 1, and sets integrated value sum of distance to 0.
  • the threshold value judgment unit 12 discriminates whether or not integrated value sum is smaller than threshold value S. In the case where integrated value sum is smaller than threshold value S (Yes), processing proceeds to step S 16 . In the case where integrated value sum is threshold value S or larger (No), the threshold value judgment unit 12 outputs ⁇ 1 at step S 15 to complete processing.
  • ⁇ 1 which is outputted is convenient numerical value indicating that distance between inputted vector f and acquired vector g is above threshold value S, and this vector g is nullified.
  • the threshold value judgement unit 12 provides threshold value S and serves to truncate integrating operation at the hierarchical distance calculating unit 11 in the case where integrated value sum is above threshold value S at the middle hierarchy of integrating operation to thereby realize high speed processing.
  • step S 16 it is discriminated whether or not component number i is the number of dimensions N of vector f or vector g or smaller. In the case where the component number i is N or smaller (Yes), i is incremented at step S 17 to return to step S 13 . On the other hand, in the case where the component number i is larger than N (No), the threshold value judgment unit 12 outputs integrated value sum at step S 18 because integrating operation has been completed until the last component of vector f or vector g to complete processing. It is to be noted that integrated value sum at this time is square of distance between vectors.
  • this processing corresponds to the processing to calculate precise distance only with respect to registered vectors in which distance from input vector indicated by x in the figure is within the range of super sphere having radius ⁇ S in connection with a large number of registered vectors indicated by black circle in FIG. 4 , and to nullify registered vectors without the range at the time point when integrated value of distances of every respective axes is above radius.
  • threshold value S of distance is set, thereby making it possible to conduct retrieval equivalent to full search at a high speed.
  • difference takes place in retrieval speed by this arrangement order. For example, in such cases that deviation exists in distribution of vectors within feature space as shown in FIG. 5 , retrieval speed greatly changes in dependency upon which of f[1] axis or f[2] axis is first integrated. In this example, employment of a method of first evaluating f[2] axis results in less extra integration to thereby realize high speed operation.
  • the similarity vector detecting apparatus 2 serves to input vectors f and g to output distance between the vectors (or ⁇ 1), and is composed of vector transform units 20 , 21 , a recording unit 22 , a hierarchical distance calculating unit 23 , and a threshold value judgment unit 24 .
  • the vector transform units 20 , 21 serve to respectively implement similar transform operations to vectors g and f.
  • the recording unit 22 is, e.g., magnetic disc, optical disc or semiconductor memory, etc.
  • step S 20 the vector transform unit 20 ( FIG. 6 ) inputs registered vector g in advance.
  • step S 21 vector g is transformed as indicated by the above-described formula (5) to generate vector g′.
  • step S 22 the recording unit 10 records transformed vector g′.
  • the threshold value judgment unit 24 sets threshold value S of distance.
  • the vector transform unit 21 inputs vector f and the hierarchical distance calculating unit 23 acquires one vector g′ recorded at the recording unit 22 .
  • step S 32 the vector transform unit 21 transforms vector f as indicated by the above-described formula (4) to generate vector f′.
  • the hierarchical distance calculating unit 23 sets component number i serving as internal variable to 1, and sets integrated value sum of distance to 0.
  • the threshold value judgment unit 24 discriminates whether or not integrated value sum is smaller than threshold value S. In the case where integrated value sum is smaller than threshold value S (Yes), processing proceeds to step S 37 . In the case where integrated value sum is threshold value S or larger (No), the threshold value judgment unit 24 outputs ⁇ 1 at step S 36 to complete processing.
  • step S 37 it is discriminated whether or not the component number i is the number of dimensions N or smaller of vector f′ and vector g′. In the case where the component number i is N or smaller (Yes), i is incremented at step S 38 to return to step S 34 . On the other hand, in the case where the component number i is larger than N (No), the threshold value judgment unit 24 outputs integrated value sum at step S 39 because integrating operation is completed up to the last component of vectors f′ and g′ to complete processing. It is to be noted that the integrated value sum at this time is square of distance between vectors.
  • Sequential matrix is mentioned as the most simple orthogonal transform.
  • order of vector component is caused to simply undergo sequencing.
  • sequential matrix P of the eighth order is expressed in a form as indicated by the following formula (8).
  • the orthogonal transform using this sequential matrix is effective in such cases that ways of spreading of respective vector components are different, and is high in speed since it is sufficient to perform sequencing so that multiplication/division and/or conditional branch are not necessary.
  • Discrete Cosine Transform represented by the following formulas (10), (11), and Discrete Fourier Transform (DFT) represented by the following formulas (12), (13) are used as orthogonal transform to conduct integration in order from low frequency component, thereby making it possible to perform integration in order from component of high significance.
  • DCT Discrete Cosine Transform
  • DFT Discrete Fourier Transform
  • the Walsh-Hadamard Transform is orthogonal transform where respective elements of transform matrix are constituted only by ⁇ 1, and is suitable for high speed transform because multiplication is not required at the time of transform.
  • sequency is used as concept close to frequency and components are arranged in order from low sequency so that high speed of distance calculation can be realized with respect to vectors where correlation relationship between adjacent component is large similarly to the above-described Discrete Cosine Transform or Discrete Fourier Transform.
  • the Walsh-Hadmard Transform matrix is constituted in accordance with codes of Fourier Transform matrix, or is constituted by recursive expansion operation of matrix.
  • the Walsh-Hadamard Transform matrix W of the eighth order arranged in order of sequency is indicated by the following formula (14).
  • KL transform Karhunen-Loeve Transform
  • the KL transform matrix T is eigen matrix in which dispersion matrix V of sample vectors is decomposed into eigen values, and is defined as indicated by the following formula (15) in the case where eigen value is assumed as ⁇ 1 , ⁇ N .
  • the KL transform is orthogonal transform matrix which completely removes correlation relationship between respective components, and dispersion of transformed vector components results in eigen value ⁇ i .
  • the KL transform matrix T is constituted so that eigen values ⁇ i are arranged in order of magnitude to thereby integrate all components to remove overlapping information thereafter to have ability to perform integration of distances from the axis where dispersion is the largest.
  • the KL transform in the above-described second embodiment corresponds to analysis method called main component analysis in the multivariate analysis field, and is an operation for extracting main component constituting vector.
  • the main component of transformed vector g′ obtained in the second embodiment is recorded as index vector g 1
  • the remaining component is recorded as detail vector g 2 .
  • distance calculation is first performed with reference to index vector g 1 to acquire detail vector g 2 only in the case where that result is smaller than threshold value S to further perform distance calculation, thereby making it possible to shorten data read-in time.
  • the similarity vector detecting apparatus 3 serves to input vector f and vector g to output square distance between vectors (or ⁇ 1), and is composed of vector transform units 30 , 31 , an index recording unit 32 , a detail recording unit 33 , a hierarchical distance calculating unit 34 , and a threshold value judgment unit 35 .
  • the vector converting units 30 , 31 serve to respectively implement transform operation similar to the above-described second embodiment to the vectors g and f.
  • the index recording unit 32 and the detail recording unit 33 are, e.g., magnetic disc, optical disc or semiconductor memory, etc.
  • the vector transform unit 30 ( FIG. 9 ) inputs registered vector g in advance.
  • vector g is transformed as indicated by the above-described formula (5) to generate vector g′.
  • the vector transform unit 30 divides it into index vector g 1 having a predetermined number M (1 ⁇ M ⁇ N) of components and detail vector g 2 having the remaining component in order from component having small component number, i.e., component having large dispersion or eigen value in the above-described transform operations or low frequency component.
  • the index recording unit 32 records index vector g 1 .
  • the detail recording unit 33 records detail vector g 2 .
  • the threshold value judgment unit 35 sets threshold value S of distance.
  • the vector transform unit 31 inputs vector f, and the hierarchical distance calculating unit 34 acquires one index vector g 1 recorded at the index recording unit 32 .
  • the vector transform unit 31 transforms vector f as indicated by the above-described formula (4) to generate vector f′. Further, the vector transform unit 31 divides it into index vector f 1 having a predetermined number M (1 ⁇ M ⁇ N) of components and detail vector f 2 having the remaining component in order from component having small component number.
  • the hierarchical distance calculating unit 34 sets component number i serving as internal variable to 1 and sets integrated value sum of distance to 0.
  • the threshold value judgment unit 35 discriminates whether or not integrated value sum is smaller than threshold value S. In the case where integrated value sum is smaller than threshold value S (Yes), processing proceeds to step S 57 . In the case where integrated value sum is threshold value S or larger (No), the threshold value judgment unit 35 outputs ⁇ 1 at step S 56 to complete processing.
  • ⁇ 1 which is outputted is convenient numerical value indicating that distance is above the threshold value so that it is nullified.
  • step S 57 it is discriminated whether or not component number i is the number of dimensions M of index vector f 1 and index vector g 1 or smaller. In the case where the component number i is M or smaller (Yes), i is incremented at step S 58 to return to the step S 54 . On the other hand, in the case where component number i is larger than M (No), the hierarchical distance calculating unit 34 acquires one detail vector g 2 recorded at the detail recording unit 33 .
  • the hierarchical distance calculating unit 34 performs integrating operation as indicated by the above-described formula (16) between the i-th component f′[i] of vector f′ and the i-th component g′[i] of vector g′.
  • the threshold value judgment unit 35 discriminates whether or not integrated value sum is smaller than threshold value S. In the case where the integrated value sum is smaller than threshold value S (Yes), processing proceeds to step S 63 . In the case where integrated value sum is threshold value S or larger (No), the threshold value judgment unit 35 outputs ⁇ 1 at step S 62 to complete processing.
  • step S 63 it is discriminated whether or not the component number i is the number of dimensions N of vector f′ or vector g′ or smaller. In the case where the component number i is N or smaller (Yes), i is incremented at step S 64 to return to the step S 60 . On the other hand, in the case where the component number i is larger than N (No), the threshold value judgment unit 35 outputs integrated value sum at step S 65 since integration is completed until the last component of vector g′ to complete processing. At this time, the integrated value sum results in square of distance between vectors.
  • index vector is divided into two stages of index vector and detail vector
  • index vector is further similarly divided into index vector of high order and detailed index vector so that three-stage configuration is provided.
  • acoustic feature vector and/or image feature vector are extracted to use them as the above-described vectors f and g, thereby making it possible to retrieve, at a high speed, similar acoustic or video signal from registered acoustic signal or video signal by using the techniques of the above-described first to third embodiments in the case where acoustic signal or video signal is inputted.
  • step S 70 acoustic signals with respect to each time period T are acquired from acoustic signal within object time period.
  • q index representing discrete frequency
  • Q is the maximum discrete frequency.
  • step S 72 it is discriminated whether or not calculation within object time period is completed. In the case where such calculation is completed (Yes), processing proceeds to step S 73 . In the case where such calculation is not completed (No), processing returns to the step S 70 .
  • step S 73 average spectrum S′q of the determined power spectrum coefficients S q is calculated.
  • this average spectrum S′ q is changed into vector to generate acoustic feature vector a.
  • acoustic feature vector a Since the acoustic signal is vast, there are many instances where such signal is recorded or is caused to undergo transmission after being compression-encoded. While it is possible to extract acoustic feature vector a by using the above-described technique after encoded acoustic signal is decoded into signal in the base band, extracting processing can be conducted efficiently and at a high speed if acoustic feature vector a can be extracted only by partial decoding.
  • acoustic signal serving as original sound is divided into frames with respect to each time period T, as shown in FIG. 14 .
  • orthogonal transform such as Modified Discrete Cosine Transform (MDCT), etc. is implemented to acoustic signal with respect to each frame, and the coefficients thereof are quantized and encoded.
  • scale factors serving as normalization coefficient of magnitude are extracted with respect to each frequency band, and are separately encoded.
  • step S 80 encoded acoustic signal within the time period T in the object time period is acquired.
  • step S 81 scale factors with respect to each frame are partially decoded.
  • step S 82 it is discriminated whether or not decoding within the object time period is completed. In the case where such decoding is completed (Yes), processing proceeds to step S 83 . In the case where such decoding is not completed (No), processing returns to the step S 80 .
  • step S 83 maximum scale factors are detected with respect to each band from scale factors within the object time period.
  • step S 84 those scale factors are changed into vectors to generate acoustic feature vector a.
  • step S 90 image frame is acquired from video signal within the object time period T.
  • time average image 100 is prepared on the basis of acquired all image frames.
  • step S 92 the prepared time average image 100 is divided into X ⁇ Y small blocks in breadth and width directions to prepare block average image 110 in which pixel values within respective blocks are averaged.
  • step S 93 these small blocks are arranged in order of R, G, B, e.g., from the left upper direction toward the right lower direction to generate one-dimensional image feature vector v.
  • This image feature vector v is represented by, e.g., the following formula (18).
  • v ( R 00 , . . . , R X-1,Y-1 , G 00 , . . . , G X-1,Y-1 , B 00 , . . . , B X-1,Y-1 ) (18)
  • time change of video signal is not so rapid in the ordinary state, it is also possible to obtain the same effects/advantages by employing an approach to select, as representative image, one frame within the object time period without preparing the time average image 100 to substitute it.
  • step S 100 image frame is acquired from video signal within object time period T.
  • step S 101 histogram with respect to signal values of respective colors, e.g., R, G, B is prepared from signal values of respective image frames.
  • step S 102 these colors are arranged in order of, e.g., R, G, B to generate one-dimensional image feature vector v.
  • This image feature vector v is represented by the following formula (19).
  • v ( R 0 , . . . , R N-1 , G 0 , . . . , G N-1 , B 0 , . . . , B N-1 ) (19)
  • step S 110 encoded video signal of encoded group (Group of pictures: GOP) proximate to object time period T to be changed into vector is acquired to acquire intra-frame encoded picture (I picture) 120 within that GOP.
  • GOP Group of pictures
  • frame image is encoded with macro block MB (16 ⁇ 16 pixels, or 8 ⁇ 8 pixels) being as unit, and Discrete Cosine Transform (DCT) is used.
  • DCT Discrete Cosine Transform
  • step S 111 these DC coefficients are acquired.
  • these coefficients are arranged in order of, e.g., Y, Cb, Cr to generate one-dimensional image feature vector v.
  • This image feature vector v is represented by, e.g., the following formula (20).
  • v ( Y 00 , . . . , Y X-1,Y-1 , Cb 00 , . . . , Cb X-1,Y 1 , Cr 00 , . . . , Cr X-1,Y-1 ) (20)
  • hierarchical distance integrating operation is performed in detecting analogous (similar) vector on the basis of distance between vectors to truncate distance integrating operation at the time when integrated value of distances is above threshold value with respect to distance set in advance, thereby making it possible to detect similar vector at a high speed.
  • distance calculation can be truncated at the early stage.
  • detection time can be shortened to a large extent.
  • acoustic feature vector and/or image feature vector is extracted in advance to register the vector thus extracted, whereby in the case where arbitrary acoustic signal or video signal is inputted, similar acoustic or video signals can be retrieved at a high speed while maintaining structural simplicity and/or retrieval accuracy similar to full search.
  • the present invention has been explained in the above-described embodiments as the configuration of hardware, the present invention is not limited to such implementation, but arbitrary processing may be also realized by allowing CPU (Central Processing Unit) to execute computer program.
  • CPU Central Processing Unit
  • computer program may be provided in the state where it is recorded on recording medium, or may be provided by allowing it to undergo transmission through other transmission medium such as Internet.

Abstract

In a similarity vector detecting apparatus (2), vector transform units (20), (21) implement transform by sequential matrix, Discrete Cosine Transform, Discrete Fourier Transform, Walsh-Hadamard Transform, or Karhunen-Lueve Transform to registered vector g and input vector f. A hierarchical distance calculating unit (23) performs, in a hierarchical manner, distance calculation between two vectors in order from vector component having high significance, i.e., component having large dispersion or eigen value in the above-described transform operations, or from low frequency component. Further, in the case where it is judged at a threshold value judgment unit (24) that integrated value of distances calculated up to a certain hierarchy is above threshold value S of distance, only output indicating that the integrated value is above the threshold value S is provided to truncate distance calculation.

Description

    TECHNICAL FIELD
  • The present invention relates to a similarity calculation method, a similarity calculation apparatus, a program and a recording medium which perform pattern matching between two vectors at a high speed.
  • This Application claims priority of Japanese Patent application No. 2002-200481, field on Jul. 9, 2002, the entirety of which is incorporated by reference herein.
  • BACKGROUND ART
  • Hitherto, in order to detect pattern which is substantially the same as already known pattern from an unknown input signal, or to evaluate similarity between two signals, judgment of similarity or coincidence of data is conducted in all technical fields to which signal processing is related, such as acoustic processing technology, image processing technology, communication technology, and/or rador technology, etc. In general, for detection of analogous data, there is used a technique of allowing data to be feature vector to judge similarity by magnitude of the distance or angle (correlation) thereof.
  • Particularly, the so-called full search in which similarities between input value and respective all candidiates are determined thereafter to determine data where the distance is the shortest is a technology which is most simple and has no detection leakage, and is frequently used in the case where data quantity is small. However, e.g., in the case where the portion similar to input image or input voice (sound) is retrieved from a large quantity of accumulated images or voices (sounds), since the dimension of the feature vector per second is large and retrieval with respect to those feature vectors which have been accumulated by several ten to several hundred hours is conducted, there is the problem that retrieval time becomes vast when such simple full search is performed.
  • On the other hand, in order to retrieve large quantity of data, in such cases that complete coincidence retrieval of coded data, e.g., document retrieval is conducted, high speed operation technology such as binary tree search or Hash method is used. In accordance with this technology, data are stored in advance in the state where they are put in order, to omit comparison of branch or table different from input data at the time of retrieval to thereby realize high speed operation. However, in the case where physical signal, e.g., image or sound, etc. is taken as subject, since distortion and/or noise essentially exist in data, it is rare that coded data completely coincide with each other. As a result, in the case where high speed operation technology is used, a large number of detection leakages would take place. In addition, since data is essentially multi-dimensional, there is the problem that it is difficult to implement in advance univocal sequencing to data.
  • In view of the above, there is proposed, in the Japanese Patent Publication Laid Open No. H08-123460, a technology in which processing for grouping plural vectors close in distance to represent the grouped vectors by one representative vector is performed at the time of data registration to first calculate distance between input vector and representative vector at the time of retrieval to conduct comparison with all vectors within group only with respect to vectors of the group close in distance to thereby permit similar (analogous) vector retrieval to be performed at high speed, and to have ability to reflect distortion of vector at multi-dimension.
  • Further, there is proposed, in the Japanese Patent Publication Laid Open No. 2001-134573, a technology in which vectors are encoded to index them by short code to thereby suppress increase in the number of times of distance calculations to permit high speed similar (analogous) data retrieval.
  • However, in the technology described in the above-described Japanese Patent Publication Laid Open No. H08-123460, there was the problem that suitable grouping and selection of representative vector are required at the time of registration so that registration operation becomes troublesome. Moreover, there was also the problem that since it is not limited at the time of retrieval that, e.g., registered vector which is minimum distant with respect to input vector belongs to group in which representative vector which is minimum distant with respect to input vector represents, operation for determining group to be retrieved becomes troublesome.
  • Further, in the technology described in the above-described Japanese Patent Publication Laid Open No. 2001-134573, there was the problem that distance relationship between vectors is lost when encoding is performed, or there results in complicated distance relationship in non-additive or non-monotonous manner so that mechanism of registration and/or retrieval becomes troublesome.
  • Here, since image and/or sound are essentially time-series, it is desirable that registration is conducted on the real time basis, and it is desirable that time order can be reflected at the time of retrieval. In other words, there are instances where such techniques which requires registration operation to exchange time-series, and/or which requires redistribution (reshuffle) with respect to data or index of already registered data at the time of registration as in the case of the technology described in the above-described Japanese Patent Publication Laid Open No. H08-123460 and Japanese Patent Publication Laid Open No. 2001-134573 are not suitable for retrieval of time-series data.
  • That is, there is desired such a mechanism that retrieval is performed in a time extremely shorter than that at full search while satisfying the conditions where
      • (a) structural simplicity and robustness with respect to distortion of full search are not lost,
      • (b) registration and/or deletion are conducted within real time, and
      • (c) operation with respect to other already registered data is not required by registration or deletion.
    DISCLOSURE OF THE INVENTION
  • The present invention has been proposed in view of such conventional actual circumstances, and its object is to provide a similarity calculation method and a similarity calculating apparatus which perform pattern matching between two vectors at a high speed while satisfying the above-described conditions, a program for allowing computer to execute the similarity calculation processing, and a computer readable recording medium where such program is recorded.
  • To attain the above-described object, a similarity calculation method according to the present invention is directed to a similarity calculation method of determining similarity between two input vectors, and includes a hierarchical distance calculation step of performing distance calculation between the two input vectors in a hierarchical manner, a threshold value comparison step of comparing integrated value of distances calculated at respective hierarchies of the hierarchical distance calculation step with threshold value set in advance, a control step of controlling distance calculation at the hierarchical distance calculation step in accordance with comparison result at the threshold value comparison step, and an output step of outputting, as the similarity, integrated value of distances calculated up to the last hierarchy, wherein, at the control step, in the case where integrated value of distances calculated up to a certain hierarchy is above the threshold value at the threshold value comparison step, control is conducted so that distance calculation is truncated.
  • In such similarity calculation method, distance calculation between two vectors is conducted in a hierarchical manner, whereby in the case where integrated value of distances calculated up to a certain hierarchy is above a predetermined threshold value, it is only detected, without calculating actual distance, that the integrated value of distances is above the threshold value to thereby allow operation to be performed at a high speed.
  • Moreover, this similarity calculation method may further include a transform step of implementing a predetermined transform operation to the two input vectors. In this case, at the hierarchical distance calculation step, distance calculation between the two input vectors transformed at the transform step is performed in a predetermined order based on the predetermined transform operation. Here, the predetermined transform operation is, e.g., transform for performing sequencing of order of respective components constituting input vector in accordance with magnitude of dispersion of the respective components, Discrete Cosine Transform, Discrete Fourier Transform, Walsh-Hadamard Transform or Karhunen-Lueve Transform.
  • Further, this similarity calculation method may include a division step of taking out, in the predetermined order, respective components which constitute the two input vectors transformed at the transform step to divide them into hierarchical plural partial vectors. In this case, at the hierarchical distance calculation step, distance calculation between respective components which constitute partial vectors is performed in a hierarchical manner in order from the partial vector of the uppermost hierarchy, whereby in the case where integrated value of calculated distances between all components which constitute partial vectors up to a certain hierarchy is below the threshold value, distance calculation between respective components which constitute partial vector of one hierarchy lower is performed.
  • Further, in order to attain the above-described object, a similarity calculating apparatus according to the present invention is directed to a similarity calculating apparatus adapted for determining similarity between two input vectors, and comprises hierarchical distance calculating means for performing distance calculation between the two input vectors in a hierarchical manner, threshold value comparing means for comparing integrated value of distances calculated at respective hierarchies by the hierarchical distance calculating means with threshold value set in advance, control means for controlling distance calculation by the hierarchical distance calculating means in accordance with comparison result by the threshold value comparing means, and output means for outputting, as the similarity, integrated value of distances calculated up to the last hierarchy, wherein the control means conducts a control so as to abort (truncate) distance calculation in the case where integrated value of distances calculated up to a certain hierarchy is above the threshold value as the result of comparison by the threshold value comparing means.
  • Such similarity calculating apparatus performs distance calculation between two vectors in a hierarchical manner, whereby in the case where integrated value of distances calculated up to a certain hierarchy is above a predetermined threshold value, it is only detected, without calculating actual distance, that the integrated value of distances is the threshold value or larger to thereby allow operation to be conducted at a high speed.
  • Further, this similarity calculating apparatus may further comprise transform means for implementing a predetermined transform operation to the two input vectors. In this case, the hierarchical distance calculating means conducts distance calculation between the two input vectors transformed by the transform means in a predetermined order based on the predetermined transform operation. Here, the predetermined transform operation is, e.g., transform for performing sequencing of order of respective components which constitute input vector in accordance with magnitude of dispersion of the respective components, Discrete Cosine Transform, Discrete Fourier Transform, Walsh-Hadamard Transform, or Karhunen-Lueve Transform.
  • Further, this similarity calculating apparatus may comprise dividing means for taking out, in the predetermined order, respective components which constitute the respective two input vectors transformed by the transform means to divide them into hierarchical plural partial vectors. In this case, the hierarchical distance calculating means performs, in a hierarchical manner, distance calculation between respective components which constitute partial vectors in order from the partial vector of the uppermost hierarchy, whereby in the case where integrated value of calculated distances between all components which constitute partial vectors up to a certain hierarchy is below the threshold value, distance calculation between respective components which constitute partial vectors of one hierarchy lower is performed.
  • In addition, program according to the present invention serves to allow computer to execute the above-described similarity calculation processing, and recording medium according to the present invention is a computer readable recording medium where such program is recorded.
  • Still further objects of the present invention and practical merits obtained by the present invention will become more apparent from the description of the embodiments which will be given below.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a view for explaining outline of the configuration of a similarity vector detecting apparatus in the first embodiment.
  • FIG. 2 is a flowchart for explaining processing at the time of vector registration in the similarity vector detecting apparatus.
  • FIG. 3 is a flowchart for explaining processing at the time of vector retrieval in the similarity vector detecting apparatus.
  • FIG. 4 is a view for intuitively explaining processing in the first embodiment.
  • FIG. 5 is a view showing an example in which there exists deviation in distribution of vector within feature space.
  • FIG. 6 is a view for explaining outline of the configuration of a similarity vector detecting apparatus in the second embodiment.
  • FIG. 7 is a flowchart for explaining processing at the time of vector registration in the similarity vector detecting apparatus.
  • FIG. 8 is a flowchart for explaining processing at the time of vector retrieval in the similarity vector detecting apparatus.
  • FIG. 9 is a view for explaining outline of the configuration of a similarity vector detecting apparatus in the third embodiment.
  • FIG. 10 is a flowchart for explaining processing at the time of vector registration in the similarity vector detecting apparatus.
  • FIG. 11 is a flowchart for explaining processing at the time of vector retrieval in the similarity vector detecting apparatus.
  • FIG. 12 is a flowchart for explaining an example of processing for extracting acoustic feature vector from acoustic signal.
  • FIG. 13 is a view for explaining an example of processing for extracting acoustic feature vector from acoustic signal.
  • FIG. 14 is a view for explaining transform encoding in acoustic signal.
  • FIG. 15 is a flowchart for explaining an example of processing for extracting acoustic feature vector from encoded acoustic signal.
  • FIG. 16 is a view for explaining an example of processing for extracting acoustic feature vector from encoded acoustic signal.
  • FIG. 17 is a flowchart for explaining an example of processing for extracting image feature vector from video signal.
  • FIG. 18 is a view for explaining an example of processing for extracting image feature vector from video signal.
  • FIG. 19 is a flowchart for explaining another example of processing for extracting image feature vector from video signal.
  • FIG. 20 is a view for explaining a further example of processing for extracting image feature vector from video signal.
  • FIG. 21 is a flowchart for explaining a further example of processing for extracting image feature vector from encoded video signal.
  • FIG. 22 is a view for explaining a further example of processing for extracting image feature vector from encoded video signal.
  • BEST MODE FOR CARRYING OUT THE INVENTION
  • Explanation will be given below in detail with reference to the attached drawings in connection with practical embodiments to which the present invention is applied. In this embodiment, the present invention is applied to a similarity vector detection method and an apparatus therefor which detect, at a high speed, vectors similar to input vector from plural registered vectors.
  • Specifically, in the similarity vector detection method and the apparatus therefor ofs this embodiment, in calculating distance between two vectors, there is employed an approach to calculate distance when corresponding distance is below a predetermined threshold value, and to only detect, without calculating actual distance, that corresponding distance is larger than threshold value when it is above the predetermined value to thereby allow operation of similarity vector detection to be conducted at a high speed. It is to be noted that, in the similarity vector detecting apparatus in this embodiment, in the case where distance is above threshold value, −1 is assumed to be outputted for convenience.
  • Hereinafter, two vectors f and g for calculating distance are represented by the following formulas.
    f=(f[1], f[2], . . . , f[N])t  (1)
    g=(g[1], g[2], . . . , g[N])t  (2)
  • Here, in the formula (1), f[1], f[2], . . . represent respective components of vector f. In the formula (2), g[1], g[2], . . . represent respective components of vector g. In addition, t represents transposition and N represents dimension of vector.
  • (1) First Embodiment
  • Outline of the configuration of the similarity vector detecting apparatus in the first embodiment is shown in FIG. 1. As shown in FIG. 1, the similarity vector detecting apparatus 1 serves to input vector f and vector g to output square distance between the vectors (or −1), and is composed of a recording unit 10, a hierarchical distance calculating unit 11, and a threshold value judgment unit 12.
  • The processing at the time of registration in this similarity vector detecting apparatus 1 will be explained by using the flowchart of FIG. 2. First, at step S1, the recording unit 10 (FIG. 1) inputs in advance registered vector g. In general, vector g is plural numbers and may become vast number in many cases. Further, at the subsequent step S2, the recording unit 10 records inputted vector g.
  • As stated above, in the first embodiment, since it is unnecessary to conduct special operation at the time of registration, the apparatus is simple and is suitable for processing on the real time basis. In this example, the recording unit 10 is, e.g., magnetic disc, optical disc or semiconductor memory, etc.
  • Subsequently, the processing at the time of retrieval in the similarity vector detecting apparatus 1 will be explained by using the flowchart of FIG. 3. First, at step S10, the threshold value judgment unit 12 (FIG. 1) sets threshold value S of distance. At the subsequent step S11, the hierarchical distance calculating unit 11 inputs vector f, and acquires one vector g recorded at the recording unit 10.
  • Subsequently, at step S12, the hierarchical distance calculating unit 11 sets component number i serving as internal variable to 1, and sets integrated value sum of distance to 0. At step S13, integrating operation as indicated by the following formula (3) is performed between the i-th component f[i] of vector f and the i-th component g [i] of vector g.
    sum=sum+(f[i]−g[i])2  (3)
  • At step S14, the threshold value judgment unit 12 discriminates whether or not integrated value sum is smaller than threshold value S. In the case where integrated value sum is smaller than threshold value S (Yes), processing proceeds to step S16. In the case where integrated value sum is threshold value S or larger (No), the threshold value judgment unit 12 outputs −1 at step S15 to complete processing. Here, as described above, −1 which is outputted is convenient numerical value indicating that distance between inputted vector f and acquired vector g is above threshold value S, and this vector g is nullified. As stated above, the threshold value judgement unit 12 provides threshold value S and serves to truncate integrating operation at the hierarchical distance calculating unit 11 in the case where integrated value sum is above threshold value S at the middle hierarchy of integrating operation to thereby realize high speed processing.
  • As step S16, it is discriminated whether or not component number i is the number of dimensions N of vector f or vector g or smaller. In the case where the component number i is N or smaller (Yes), i is incremented at step S17 to return to step S13. On the other hand, in the case where the component number i is larger than N (No), the threshold value judgment unit 12 outputs integrated value sum at step S18 because integrating operation has been completed until the last component of vector f or vector g to complete processing. It is to be noted that integrated value sum at this time is square of distance between vectors.
  • While the processing with respect to one registered vector g has been indicated above in the flowchart of FIG. 3, similar processing is performed with respect to registered all vectors g in practice to output, as vector similar to vector f, all vectors g in which integrated value sum of distances with respect to vector f is below the threshold value S.
  • When the processing in the first embodiment which has been explained above is intuitively explained, this processing corresponds to the processing to calculate precise distance only with respect to registered vectors in which distance from input vector indicated by x in the figure is within the range of super sphere having radius ✓S in connection with a large number of registered vectors indicated by black circle in FIG. 4, and to nullify registered vectors without the range at the time point when integrated value of distances of every respective axes is above radius.
  • It is to be noted that while square distance between vectors has been used in the above-described explanation, similar technique may be used with respect to arbitrary distance scale without being limited to square distance. It should be noted that in the case where square distance is used, there is no possibility that erroneous nullification is caused to take place because integrated value sum monotonously increases with respect to integrated value of distances between respective components. Moreover, since sum total of distances between respective components is in correspondence with distance between vectors, entirely the same distances as simple full search method are outputted in regard to vectors f and g in which distance is threshold value ✓S or smaller so that there is no possibility that error may take place.
  • Further, in the case of this technique, since it is unnecessary to prepare reference table, etc. which may break the time series relationship, updating and/or deletion of data can be conducted in accordance with time series order, so processing and/or management are easy. In addition, it is also easily possible to conduct retrieval in accordance with time series order, or to designate time series range to be retrieved.
  • (2) Second Embodiment
  • In the above-described first embodiment, threshold value S of distance is set, thereby making it possible to conduct retrieval equivalent to full search at a high speed. However, in the case of this technique, since from which vector component execution of retrieval begins is dependent upon arrangement order of vectors, difference takes place in retrieval speed by this arrangement order. For example, in such cases that deviation exists in distribution of vectors within feature space as shown in FIG. 5, retrieval speed greatly changes in dependency upon which of f[1] axis or f[2] axis is first integrated. In this example, employment of a method of first evaluating f[2] axis results in less extra integration to thereby realize high speed operation.
  • In view of the above, in the second embodiment which will be explained below, as indicated by the following formulas (4) and (5), multiplication of normal orthogonal transform matrix U is conducted with respect to input vector f and registered vector g to perform orthogonal transform operation to conduct retrieval in order of significance by using the orthogonally transformed vectors f′ and g′ to thereby allow retrieval to be conducted at higher speed.
    f′=Uf  (4)
    g′=Ug  (5)
  • It is to be noted that square distance d2 between two vectors g and f is not changed by normal orthogonal transform matrix U as indicated by the following formula (6).
    d 2 =∥f′−g′∥ 2 =∥U(f−g)∥2=(f−g)t U t U(f−g)=(f−g)t(f−g)=∥f−g∥ 2  (6)
  • Outline of the configuration of the similarity vector detecting apparatus in the second embodiment is shown in FIG. 6. As shown in FIG. 6, the similarity vector detecting apparatus 2 serves to input vectors f and g to output distance between the vectors (or −1), and is composed of vector transform units 20, 21, a recording unit 22, a hierarchical distance calculating unit 23, and a threshold value judgment unit 24. Here, the vector transform units 20, 21 serve to respectively implement similar transform operations to vectors g and f. In addition, the recording unit 22 is, e.g., magnetic disc, optical disc or semiconductor memory, etc.
  • The processing at the time of registration in this similarity vector detecting apparatus 2 will be explained by using the flowchart of FIG. 7. First, at step S20, the vector transform unit 20 (FIG. 6) inputs registered vector g in advance. At the subsequent step S21, vector g is transformed as indicated by the above-described formula (5) to generate vector g′. Further, at step S22, the recording unit 10 records transformed vector g′.
  • Next, the processing at the time of retrieval in the similarity vector detecting apparatus 2 will be explained by using the flowchart of FIG. 8. First, at step S30, the threshold value judgment unit 24 (FIG. 6) sets threshold value S of distance. At the subsequent step S31, the vector transform unit 21 inputs vector f and the hierarchical distance calculating unit 23 acquires one vector g′ recorded at the recording unit 22.
  • Subsequently, at step S32, the vector transform unit 21 transforms vector f as indicated by the above-described formula (4) to generate vector f′.
  • At step S33, the hierarchical distance calculating unit 23 sets component number i serving as internal variable to 1, and sets integrated value sum of distance to 0. At step S34, integrating operation as indicated by the following formula (7) is performed between the i-th component f′[i] of vector f′ and the i-th component g′[i] of vector g′.
    sum=sum+(f′[i]−g′[i])2  (7)
  • At step S35, the threshold value judgment unit 24 discriminates whether or not integrated value sum is smaller than threshold value S. In the case where integrated value sum is smaller than threshold value S (Yes), processing proceeds to step S37. In the case where integrated value sum is threshold value S or larger (No), the threshold value judgment unit 24 outputs −1 at step S36 to complete processing.
  • At step S37, it is discriminated whether or not the component number i is the number of dimensions N or smaller of vector f′ and vector g′. In the case where the component number i is N or smaller (Yes), i is incremented at step S38 to return to step S34. On the other hand, in the case where the component number i is larger than N (No), the threshold value judgment unit 24 outputs integrated value sum at step S39 because integrating operation is completed up to the last component of vectors f′ and g′ to complete processing. It is to be noted that the integrated value sum at this time is square of distance between vectors.
  • While the processing with respect to one registered vector g′ has been indicated above in the flowchart of FIG. 8, there is employed in practice an approach to perform similar processing with respect to registered all vectors g′ to output, as vector similar to vector f′, all vectors g′ in which integrated value sum of distance with respect to vector f′ is below the threshold value S.
  • Here, while various matrixes may be used as the above-described normal orthogonal transform matrix U, explanation will be given below by taking four examples in practical sense.
  • (2-1) Practical Example of Orthogonal Transform
  • (2-1-1)
  • Sequential matrix is mentioned as the most simple orthogonal transform. In this sequential matrix, order of vector component is caused to simply undergo sequencing. For example, sequential matrix P of the eighth order is expressed in a form as indicated by the following formula (8). P = [ 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 ] ( 8 )
  • In the case where distribution of respective components of vectors is different as in the case of the above-described FIG. 5, it is obvious that the larger dispersion of component is, the larger distribution with respect to distance becomes. Accordingly, in determining order of sequencing, it is optimum to prepare in advance sufficient number (I) of sample vectors gi to set sequential matrix arranged in order of magnitude of dispersion vector V calculated by the following formula (9). V = i = 1 I ( g i - g _ ) 2 , g _ = 1 I i g i ( 9 )
  • It is to be noted that the orthogonal transform using this sequential matrix is effective in such cases that ways of spreading of respective vector components are different, and is high in speed since it is sufficient to perform sequencing so that multiplication/division and/or conditional branch are not necessary.
  • (2-1-2)
  • In feature quantity where correlation relationship between adjacent components is large, such as image feature quantity or acoustic feature quantity, etc., energy in the case where feature vector is considered as discrete signal deviates to lower frequency component.
  • In view of the above, Discrete Cosine Transform (DCT) represented by the following formulas (10), (11), and Discrete Fourier Transform (DFT) represented by the following formulas (12), (13) are used as orthogonal transform to conduct integration in order from low frequency component, thereby making it possible to perform integration in order from component of high significance. Thus, distance calculation is performed at a high speed. D = [ D 11 D 1 N D N1 D NN ] ( 10 ) D m n = α ( m - 1 ) cos ( m - 1 ) ( 2 n - 1 ) π 2 N , α = { 1 N ( n = 1 ) 2 N ( n 1 ) ( 11 ) F = [ F 11 F 1 N F N1 F NN ] ( 12 ) F mn = { 1 N cos ( - 2 π ( n / 2 - 1 ) ( m - 1 ) N ) ( n : even ) 1 N sin ( - 2 π ( ( n + 1 ) / 2 - N / 2 ) ( m - 1 ) N ) ( n : odd ) ( 13 )
  • Here, since high speed transform method can be used for Discrete Cosine Transform or Discrete Fourier Transform, and since it is unnecessary to hold all transform matrixes, memory use quantity and/or operation speed in the case where operation is realized by computer are far advantageous as compared to the case where all calculations of matrix is performed.
  • (2-1-3)
  • The Walsh-Hadamard Transform is orthogonal transform where respective elements of transform matrix are constituted only by ±1, and is suitable for high speed transform because multiplication is not required at the time of transform. Here, sequency is used as concept close to frequency and components are arranged in order from low sequency so that high speed of distance calculation can be realized with respect to vectors where correlation relationship between adjacent component is large similarly to the above-described Discrete Cosine Transform or Discrete Fourier Transform.
  • The Walsh-Hadmard Transform matrix is constituted in accordance with codes of Fourier Transform matrix, or is constituted by recursive expansion operation of matrix. As an example, the Walsh-Hadamard Transform matrix W of the eighth order arranged in order of sequency is indicated by the following formula (14). W = 1 8 [ 1 1 1 1 1 1 1 1 1 1 1 1 - 1 - 1 - 1 - 1 1 1 - 1 - 1 - 1 - 1 1 1 1 1 - 1 - 1 1 1 - 1 - 1 1 - 1 - 1 1 1 - 1 - 1 1 1 - 1 - 1 1 - 1 1 1 - 1 1 - 1 1 - 1 - 1 1 - 1 1 1 - 1 1 - 1 1 - 1 1 - 1 ] ( 14 )
  • (2-1-4)
  • In case where sufficient number of sample vectors are collected in advance, and where a certain amount of cost can be required for transform operation, it is effective that optimum Karhunen-Loeve Transform (hereinafter referred to as KL transform) is used as orthogonal transform.
  • The KL transform matrix T is eigen matrix in which dispersion matrix V of sample vectors is decomposed into eigen values, and is defined as indicated by the following formula (15) in the case where eigen value is assumed as λ1, λN.
    V=T t ΛT,Λ=diag{λ1, λ2, . . . , λN}  (15)
  • Here, the KL transform is orthogonal transform matrix which completely removes correlation relationship between respective components, and dispersion of transformed vector components results in eigen value λi. Accordingly, the KL transform matrix T is constituted so that eigen values λi are arranged in order of magnitude to thereby integrate all components to remove overlapping information thereafter to have ability to perform integration of distances from the axis where dispersion is the largest.
  • It is to be noted that, in the technique using this KL transform, since it is necessary to hold KL transform matrix T over the entire dimension in principle at the time of operation, and since it is necessary to perform matrix operation of all order with respect to all vectors, operation cost is high. However, since this operation is performed at the time of registration, it cannot be said that time required for retrieval processing for which high speed is required is particularly increased.
  • In addition, although slight degradation of accuracy is involved, there is employed an approach to extract only vector components having large eigen value to hold them without holding vector components having small eigen value to thereby compress vector itself, thus also making it possible to reduce memory area and/or data read-in time of the recording unit 22 (FIG. 6).
  • (3) Third Embodiment
  • While the retrieval operation is caused to be conducted at a high speed by realization of high speed of distance calculation in the above-described first and second embodiments, data read-in time from the recording unit, e.g., hard disc, etc. also results in cause of large overhead in performing retrieval.
  • Here, the KL transform in the above-described second embodiment corresponds to analysis method called main component analysis in the multivariate analysis field, and is an operation for extracting main component constituting vector. In view of the above, in the third embodiment which will be explained below, the main component of transformed vector g′ obtained in the second embodiment is recorded as index vector g1, and the remaining component is recorded as detail vector g2. At the time of retrieval, distance calculation is first performed with reference to index vector g1 to acquire detail vector g2 only in the case where that result is smaller than threshold value S to further perform distance calculation, thereby making it possible to shorten data read-in time.
  • Outline of the configuration of the similarity vector detecting apparatus in the third embodiment is shown in FIG. 9. As shown in FIG. 9, the similarity vector detecting apparatus 3 serves to input vector f and vector g to output square distance between vectors (or −1), and is composed of vector transform units 30, 31, an index recording unit 32, a detail recording unit 33, a hierarchical distance calculating unit 34, and a threshold value judgment unit 35. Here, the vector converting units 30, 31 serve to respectively implement transform operation similar to the above-described second embodiment to the vectors g and f. In addition, the index recording unit 32 and the detail recording unit 33 are, e.g., magnetic disc, optical disc or semiconductor memory, etc.
  • The processing at the time of registration in this similarity vector detecting apparatus 3 will be explained by using the flowchart of FIG. 10. First, at step S40, the vector transform unit 30 (FIG. 9) inputs registered vector g in advance. At the subsequent step S41, vector g is transformed as indicated by the above-described formula (5) to generate vector g′. Further, the vector transform unit 30 divides it into index vector g1 having a predetermined number M (1≦M<N) of components and detail vector g2 having the remaining component in order from component having small component number, i.e., component having large dispersion or eigen value in the above-described transform operations or low frequency component. Further, at step S42, the index recording unit 32 records index vector g1. At step S43, the detail recording unit 33 records detail vector g2.
  • Next, the processing at the time of retrieval in the similarity vector detecting apparatus 3 will be explained by using the flowchart of FIG. 11. First, at step S50, the threshold value judgment unit 35 (FIG. 9) sets threshold value S of distance. At the subsequent step S51, the vector transform unit 31 inputs vector f, and the hierarchical distance calculating unit 34 acquires one index vector g1 recorded at the index recording unit 32.
  • Subsequently, at step S52, the vector transform unit 31 transforms vector f as indicated by the above-described formula (4) to generate vector f′. Further, the vector transform unit 31 divides it into index vector f1 having a predetermined number M (1≦M<N) of components and detail vector f2 having the remaining component in order from component having small component number.
  • At step S53, the hierarchical distance calculating unit 34 sets component number i serving as internal variable to 1 and sets integrated value sum of distance to 0. At step S54, integrating operation as indicated by the following formula (16) is performed between the i-th component f′[i] of vector f′ and the i-th component g′[i] of vector g′.
    sum=sum (f′[i]−g′[i])2  (16)
  • At step S55, the threshold value judgment unit 35 discriminates whether or not integrated value sum is smaller than threshold value S. In the case where integrated value sum is smaller than threshold value S (Yes), processing proceeds to step S57. In the case where integrated value sum is threshold value S or larger (No), the threshold value judgment unit 35 outputs −1 at step S56 to complete processing. Here, as described above, −1 which is outputted is convenient numerical value indicating that distance is above the threshold value so that it is nullified.
  • At step S57, it is discriminated whether or not component number i is the number of dimensions M of index vector f1 and index vector g1 or smaller. In the case where the component number i is M or smaller (Yes), i is incremented at step S58 to return to the step S54. On the other hand, in the case where component number i is larger than M (No), the hierarchical distance calculating unit 34 acquires one detail vector g2 recorded at the detail recording unit 33.
  • At step S60, the hierarchical distance calculating unit 34 performs integrating operation as indicated by the above-described formula (16) between the i-th component f′[i] of vector f′ and the i-th component g′[i] of vector g′.
  • At step S61, the threshold value judgment unit 35 discriminates whether or not integrated value sum is smaller than threshold value S. In the case where the integrated value sum is smaller than threshold value S (Yes), processing proceeds to step S63. In the case where integrated value sum is threshold value S or larger (No), the threshold value judgment unit 35 outputs −1 at step S62 to complete processing.
  • At step S63, it is discriminated whether or not the component number i is the number of dimensions N of vector f′ or vector g′ or smaller. In the case where the component number i is N or smaller (Yes), i is incremented at step S64 to return to the step S60. On the other hand, in the case where the component number i is larger than N (No), the threshold value judgment unit 35 outputs integrated value sum at step S65 since integration is completed until the last component of vector g′ to complete processing. At this time, the integrated value sum results in square of distance between vectors.
  • While the processing with respect to one registered vector g′ is indicated above in the flowchart of FIG. 11, similar processing is performed with respect to all registered vectors g′ in practice to output, as vector similar to vector f′, all vectors g′ in which integrated value sum of distances with respect to vector f′ is below the threshold value S.
  • In the above-described third embodiment, as compared to the first and second embodiments, memory capacity and/or accuracy are not changed, and operating speed changes little. However, in the case where most comparisons are nullified at the stage of index vector g1 so that it is unnecessary to acquire detail vector g2, overhead by data access is cancelled.
  • While it is assumed in the above-described explanation that vector is divided into two stages of index vector and detail vector, it is a matter of course that there can be made expansion to multi-stage, such as, for example, index vector is further similarly divided into index vector of high order and detailed index vector so that three-stage configuration is provided.
  • (4) Extraction of Feature Vector
  • Explanation will be given below in connection with a technique of extracting feature vector from acoustic signal or video signal. In a manner described later, acoustic feature vector and/or image feature vector are extracted to use them as the above-described vectors f and g, thereby making it possible to retrieve, at a high speed, similar acoustic or video signal from registered acoustic signal or video signal by using the techniques of the above-described first to third embodiments in the case where acoustic signal or video signal is inputted.
  • (4-1) Extraction of Acoustic Feature Vector
  • (4-1-1)
  • Explanation will be given by using the flowchart of FIG. 12 and FIG. 13 in connection with the example of the case where power spectrum coefficients are used as feature quantity relating to acoustic signal. First, at step S70, as shown in FIG. 13, acoustic signals with respect to each time period T are acquired from acoustic signal within object time period.
  • Subsequently, at step S71, spectrum operation, e.g., high speed Fourier transform, is implemented to the acquired acoustic signal to determine power spectrum coefficients Sq (q=0, 1, . . . , Q−1) with respect to each short time period. Here, q is index representing discrete frequency and Q is the maximum discrete frequency.
  • Subsequently, at step S72, it is discriminated whether or not calculation within object time period is completed. In the case where such calculation is completed (Yes), processing proceeds to step S73. In the case where such calculation is not completed (No), processing returns to the step S70.
  • At step S73, average spectrum S′q of the determined power spectrum coefficients Sq is calculated. At step S74, this average spectrum S′q is changed into vector to generate acoustic feature vector a. This acoustic feature vector a is represented by, e.g., the following formula (17).
    a=(S0, . . . , SQ−1)  (17)
  • It is to be noted that while explanation has been given in the above-described example on the premise that acoustic signal within object time period is divided into each time period T, spectrum operation may be implemented without dividing into each time period T in the case where the object time period is short.
  • In addition, while the example using power spectrum coefficient has been explained in the above-described example, the present invention is not limited to such implementation but cepstrum coefficient having equivalent information, etc., may also be used. Further, in place of Fourier transform, similar effect can also be obtained by linear predictive coefficient using AR (Auto-Regressive) model.
  • (4-1-2)
  • Since the acoustic signal is vast, there are many instances where such signal is recorded or is caused to undergo transmission after being compression-encoded. While it is possible to extract acoustic feature vector a by using the above-described technique after encoded acoustic signal is decoded into signal in the base band, extracting processing can be conducted efficiently and at a high speed if acoustic feature vector a can be extracted only by partial decoding.
  • Here, in the transform encoding which is encoding method generally used, acoustic signal serving as original sound is divided into frames with respect to each time period T, as shown in FIG. 14. Further, orthogonal transform such as Modified Discrete Cosine Transform (MDCT), etc. is implemented to acoustic signal with respect to each frame, and the coefficients thereof are quantized and encoded. In this instance, scale factors serving as normalization coefficient of magnitude are extracted with respect to each frequency band, and are separately encoded. In view of the above, by decoding only the scale factors, they can be used as acoustic feature vector a.
  • Explanation will be given by using the flowchart of FIG. 15 and FIG. 16 in connection with the example of the case where scale factors are used as feature quantity relating to acoustic signal. First, at step S80, encoded acoustic signal within the time period T in the object time period is acquired. At step S81, scale factors with respect to each frame are partially decoded.
  • Subsequently, at step S82, it is discriminated whether or not decoding within the object time period is completed. In the case where such decoding is completed (Yes), processing proceeds to step S83. In the case where such decoding is not completed (No), processing returns to the step S80.
  • At step S83, maximum scale factors are detected with respect to each band from scale factors within the object time period. At step S84, those scale factors are changed into vectors to generate acoustic feature vector a.
  • In this way, it is possible to extract, at a high speed, acoustic feature vector a equivalent to the above without completely decoding encoded acoustic signal.
  • (4-2) Extraction of Image Feature Vector
  • (4-2-1)
  • Explanation will be given by using the flowchart of FIG. 17 and FIG. 18 in connection with the example of the case where luminance information and color information are used as feature quantity relating to video signal. First, at step S90, as shown in FIG. 18, image frame is acquired from video signal within the object time period T.
  • Subsequently, at step S91, time average image 100 is prepared on the basis of acquired all image frames.
  • Subsequently, at step S92, the prepared time average image 100 is divided into X×Y small blocks in breadth and width directions to prepare block average image 110 in which pixel values within respective blocks are averaged.
  • Further, at step S93, these small blocks are arranged in order of R, G, B, e.g., from the left upper direction toward the right lower direction to generate one-dimensional image feature vector v. This image feature vector v is represented by, e.g., the following formula (18).
    v=(R 00 , . . . , R X-1,Y-1 , G 00 , . . . , G X-1,Y-1 , B 00 , . . . , B X-1,Y-1)  (18)
  • It is to be noted that explanation has been given in the above-described example in connection with the example where pixel values of the block average image 110 in which the time average image 100 is divided are rearranged to generate one-dimensional image feature vector v, however, the present invention is not limited to such implementation, but there may be employed an approach to rearrange pixel values of the time average image 100 without preparing the block average image 110 to generate one-dimensional image feature vector v.
  • In addition, since time change of video signal is not so rapid in the ordinary state, it is also possible to obtain the same effects/advantages by employing an approach to select, as representative image, one frame within the object time period without preparing the time average image 100 to substitute it.
  • (4-2-2)
  • There are many instances where there exist a certain relation in images where distribution of color with respect to all images are similar, e.g., studio image, etc. photographed from the same angle of news image even in the case where corresponding video signal is not entirely the same video signal. Thus, there is a demand for performing retrieval in the state where these images are considered to be the same. In such case, it is effective to employ a method of rejecting spatial dependency of image to prepare histogram of color distribution to make comparison.
  • In view of the above, explanation will be given by using the flowchart of FIG. 19 and FIG. 20 in connection with the example of the case where histogram of color distribution is used as feature quantity in this way. First, at step S100, as shown in FIG. 20, image frame is acquired from video signal within object time period T.
  • Subsequently, at step S101, histogram with respect to signal values of respective colors, e.g., R, G, B is prepared from signal values of respective image frames.
  • Further, at step S102, these colors are arranged in order of, e.g., R, G, B to generate one-dimensional image feature vector v. This image feature vector v is represented by the following formula (19).
    v=(R 0 , . . . , R N-1 , G 0 , . . . , G N-1 , B 0 , . . . , B N-1)  (19)
  • It is to be noted that while explanation has been given in the above-described example on the premise that histogram with respect to signal values of R, G, B is prepared, it is possible to obtain similar effects/advantages even if histogram with respect to signal values of luminance (Y) and color difference (Cb, Cr) is prepared.
  • (4-2-3)
  • Since video signal is vast, there are many cases where such signal is recorded or is caused to undergo transmission after being compression-encoded. While it is possible to extract image feature vector v by using the above-described technique after employing an approach to decode encoded video signal into signal of base band, extraction processing can be performed efficiently and at a high speed if image feature vector v can be extracted only by partial decoding.
  • Explanation will be given by using the flowchart of FIG. 21 and FIG. 22 in connection with the example of the case where image feature vector v is extracted from video signal compression-encoded by MPEG1 (Moving Picture Experts Group 1) or MPEG2. First, at step S110, encoded video signal of encoded group (Group of pictures: GOP) proximate to object time period T to be changed into vector is acquired to acquire intra-frame encoded picture (I picture) 120 within that GOP.
  • Here, frame image is encoded with macro block MB (16×16 pixels, or 8×8 pixels) being as unit, and Discrete Cosine Transform (DCT) is used. These DC-transformed DC coefficients correspond to average value of pixel values of image within macro block.
  • In view of the above, at step S111, these DC coefficients are acquired. At the subsequent step S112, these coefficients are arranged in order of, e.g., Y, Cb, Cr to generate one-dimensional image feature vector v. This image feature vector v is represented by, e.g., the following formula (20).
    v=(Y 00 , . . . , Y X-1,Y-1 , Cb 00 , . . . , Cb X-1,Y 1 , Cr 00 , . . . , Cr X-1,Y-1)  (20)
  • In this way, it is possible to extract image feature vector v at a high speed without completely decoding encoded video signal.
  • It is to be noted that while explanation has been given in the above-described example that video signal which has been compression-encoded by the MPEG1 or the MPEG2 is assumed to be used, the present invention may also be applied to other compression-encoding system.
  • (5) Others
  • As explained above, in accordance with this embodiment, hierarchical distance integrating operation is performed in detecting analogous (similar) vector on the basis of distance between vectors to truncate distance integrating operation at the time when integrated value of distances is above threshold value with respect to distance set in advance, thereby making it possible to detect similar vector at a high speed. Particularly, in such cases that vector similar to input vector is detected from a large quantity of registered vectors, since most registered vectors are non-similar so that integrated value of distances is above threshold value, distance calculation can be truncated at the early stage. Thus, detection time can be shortened to a large extent.
  • In addition, by implementing sequential transform, Discrete Cosine Transform, Discrete Fourier Transform, Walsh-Hadamard Transform or KL Transform in advance to vector to perform integrating operation in order from vector component having high significance, i.e., component having large dispersion or eigen value in the above-described transform operations or in order from low frequency component, it is possible to detect similar vector efficiently and at a high speed, taking the distribution of vector components into consideration.
  • Accordingly, also in performing retrieval of acoustic signal or video signal, acoustic feature vector and/or image feature vector is extracted in advance to register the vector thus extracted, whereby in the case where arbitrary acoustic signal or video signal is inputted, similar acoustic or video signals can be retrieved at a high speed while maintaining structural simplicity and/or retrieval accuracy similar to full search.
  • While the invention has been described in accordance with certain embodiments thereof illustrated in the accompanying drawings and described in the above description in detail, it should be understood by those ordinarily skilled in the art that the invention is not limited to the embodiments, but various modifications, alternative embodiments or equivalents can be implemented without departing from the scope and spirit of the present invention as set forth and defined by the appended claims.
  • For example, while the present invention has been explained in the above-described embodiments as the configuration of hardware, the present invention is not limited to such implementation, but arbitrary processing may be also realized by allowing CPU (Central Processing Unit) to execute computer program. In this case, computer program may be provided in the state where it is recorded on recording medium, or may be provided by allowing it to undergo transmission through other transmission medium such as Internet.
  • INDUSTRIAL APPLICABILITY
  • In accordance with the above-described present invention, there is employed such approach to perform distance calculation between two vectors in a hierarchical manner, whereby in the case where that integrated value of distances calculated up to a certain hierarchy is above a predetermined threshold value, it is only detected, without calculating actual distance, that the integrated value of distances is threshold value or larger, thereby permitting operation to be conducted at a high speed. Particularly, in such cases that vector similar to input vector is detected from a large quantity of registered vectors, since most registered vectors are non-similar and thus integrated value of distances is above threshold value, distance calculation can be truncated at the early stage. Therefore, detection time can be shortened to a large extent.

Claims (26)

1. A similarity calculation method of determining similarity between two input vectors, including
a hierarchical distance calculation step of performing distance calculation between the two input vectors in a hierarchical manner,
a threshold value comparison step of comparing integrated value of distances calculated at respective hierarchies of the hierarchical distance calculation step with a threshold value set in advance,
a control step of controlling distance calculation at the hierarchical distance calculation step in accordance with comparison result at the threshold value comparison step, and
an output step of outputting, as the similarity, integrated value of the calculated distances up to the last hierarchy,
wherein, at the control step, control is conducted such that distance calculation is truncated in the case where integrated value of distances calculated up to a certain hierarchy is above the threshold value.
2. The similarity calculation method as set forth in claim 1, wherein distance calculation between respective components constituting the two input vectors is performed in a hierarchical manner at the hierarchical distance calculation step, whereby in the case where integrated value of distances calculated up to a certain hierarchy is below the threshold value, distance calculation between next components is performed.
3. The similarity calculation method as set forth in claim 2, which further includes a transform step of implementing a predetermined transform operation to the two input vectors,
wherein distance calculation between the two input vectors transformed at the transform step is performed in a predetermined order based on the predetermined transform operation at the hierarchical distance calculation step.
4. The similarity calculation method as set forth in claim 3, wherein the predetermined transform operation is a transform operation which performs sequencing of order of respective components constituting the two input vectors in accordance with magnitude of dispersion of the respective components, and
wherein distance calculation between the two input vectors transformed at the transform step is performed in order from components of large dispersion at the hierarchical distance calculation step.
5. The similarity calculation method as set forth in claim 3, wherein the predetermined transform operation is Discrete Cosine Transform operation or Discrete Fourier Transform operation, and
wherein distance calculation between the two input vectors transformed at the transform step is performed in order from low frequency component at the hierarchical distance calculation step.
6. The similarity calculation method as set forth in claim 3, wherein the predetermined transform operation is Walsh-Hadamard Transform operation, and
wherein distance calculation between the two input vectors transformed at the transform step is performed in order from low sequency component at the hierarchical distance calculation step.
7. The similarity calculation method as set forth in claim 3, wherein the predetermined transform operation is Karhunen-Loeve transform operation, and
wherein distance calculation between the two input vectors transformed at the transform step is performed in order from component of large eigen value at the hierarchical distance calculation step.
8. The similarity calculation method as set forth in claim 3, which further includes a division step of taking out respective components constituting the two input vectors transformed at the transform step in the predetermined order to divide them into hierarchical plural partial vectors,
wherein distance calculation between respective components constituting partial vectors is performed in a hierarchical manner in order from the partial vector of the uppermost hierarchy at the hierarchical distance calculation step, whereby in the case where integrated value of calculated distances between all components constituting partial vectors up to a certain hierarchy is below the threshold value, distance calculation between respective components constituting partial vector of one hierarchy lower is performed.
9. The similarity calculation method as set forth in claim 1, wherein the input vector is obtained by changing an acoustic signal into feature vector, and
wherein the feature vector is obtained by changing power spectrum coefficients within a predetermined time period of the acoustic signal into vector.
10. The similarity calculation method as set forth in claim 1, wherein the input vector is obtained by changing an acoustic signal into feature vector, and
wherein the feature vector is obtained by changing linear predictive coefficients within a predetermined time period of the acoustic signal into vector.
11. The similarity calculation method as set forth in claim 1, wherein the input vector is obtained by changing an encoded acoustic signal into feature vector, and
wherein the feature vector is obtained by changing parameters indicating intensities of frequency components within respective frames of the encoded acoustic signal into vectors.
12. The similarity calculation method as set forth in claim 1, wherein the input vector is obtained by changing a video signal into feature vector, and
wherein the feature vector is obtained by changing signal value of representative image within a predetermined time period of the video signal, average image of frame image within the predetermined time period, or small image obtained by dividing, on predetermined block unit basis, the representative image or the average image into vector.
13. The similarity calculation method as set forth in claim 1, wherein the input vector is obtained by changing a video signal into feature vector, and
wherein the feature vector is obtained by changing histogram with respect to luminance and/or color of frame image within a predetermined time period of the video signal into vector.
14. The similarity calculation method as set forth in claim 1, wherein the input vector is obtained by changing encoded video signal into feature vector, and
wherein the feature vector is obtained by changing signal values of DC components of respective blocks serving as encoding unit of intraframe encoding image proximate to a predetermined time period of the encoded video signal into vector.
15. A similarity calculating apparatus adapted for determining similarity between two input vectors, comprising
hierarchical distance calculating means for performing distance calculation between the two input vectors in a hierarchical manner,
threshold value comparing means for comparing integrated value of distances calculated at respective hierarchies by the hierarchical distance calculating means with a threshold value set in advance,
control means for controlling distance calculation by the hierarchical distance calculating means in accordance with comparison result by the threshold value comparing means, and
output means for outputting, as the similarity, integrated value of distances calculated up to the last hierarchy,
wherein the control means is operative so that in the case where integrated value of distances calculated up to a certain hierarchy is above the threshold value as the result of comparison by the threshold comparing means, it conducts a control so as to truncate distance calculation.
16. The similarity calculating apparatus as set forth in claim 15, wherein the hierarchical distance calculating means performs distance calculation between respective components constituting the two input vectors in a hierarchical manner, whereby in the case where integrated value of distances calculated up to a certain hierarchy is below the threshold value, it performs distance calculation between next components.
17. The similarity calculating apparatus as set forth in claim 16, which further comprises transform means for implementing a predetermined transform operation to the two input vectors,
wherein the hierarchical distance calculating means performs distance calculation between the two input vectors transformed by the transform means in a predetermined order based on the predetermined transform operation.
18. The similarity calculating apparatus as set forth in claim 17, which comprises dividing means for taking out, in the predetermined order, respective components constituting the two input vectors transformed by the transform means to divide them into hierarchical plural partial vectors,
wherein the hierarchical distance calculating means performs, in a hierarchical manner, distance calculation between respective components constituting partial vectors in order from the partial vector of the uppermost rank hierarchy, whereby in the case where integrated value of calculated distances calculated between all components constituting partial vectors up to a certain hierarchy is below the threshold value, the hierarchical distance calculating means performs distance calculation between respective components constituting partial vector of one hierarchy lower.
19. A program for allowing computer to execute similarity calculation processing for determining similarity between two input vectors, including
a hierarchical distance calculation step of performing distance calculation between the two input vectors in a hierarchical manner,
a threshold value comparison step of comparing integrated value of distances calculated at respective hierarchies of the hierarchical distance calculation step with a threshold value set in advance,
a control step of controlling distance calculation at the hierarchical distance calculation step in accordance with comparison result at the threshold value comparison step, and
an output step of outputting, as the similarity, integrated value of distances calculated up to the last hierarchy,
wherein, at the control step, in the case where integrated value of distances calculated up to a certain hierarchy is above the threshold value at the threshold value comparison step, control is conducted in such a manner to truncate distance calculation.
20. The program as set forth in claim 19, wherein distance calculation between respective components constituting the two input vectors is performed in a hierarchical manner at the hierarchical distance calculation step, whereby in the case where the integrated value of distances calculated up to a certain hierarchy is below the threshold value, distance calculation between next components is performed.
21. The program as set forth in claim 20, including a transform step of implementing a predetermined transform operation to the two input vectors,
wherein, at the hierarchical distance calculation step, distance calculation between the two input vectors transformed at the transform step is performed in a predetermined order based on the predetermined transform operation.
22. The program as set forth in claim 21, which further includes a division step of taking out, in the predetermined order, respective components constituting the two input vectors transformed at the transform step to divide them into hierarchical plural partial vectors,
wherein distance calculation between respective components constituting partial vectors is performed in a hierarchical manner in order from the partial vector of the uppermost hierarchy at the hierarchical calculation step, whereby in the case where integrated value of calculated distances between all components constituting partial vectors up to a certain hierarchy is below the threshold value, distance calculation between respective components constituting partial vector of one hierarchy lower is performed.
23. A computer readable medium adapted so that program for allowing computer to execute similarity calculation processing which determines similarity between two vectors is recorded,
the program including
a hierarchical distance calculation step of performing distance calculation between the two input vectors in a hierarchical manner,
a threshold value comparison step of comparing integrated value of distances calculated at respective hierarchies of the hierarchical distance calculation step with a threshold value set in advance,
a control step of controlling distance calculation at the hierarchical distance calculation step in accordance with comparison result at the threshold value comparison step, and
an output step of outputting, as the similarity, integrated value of distances calculated up to the last hierarchy,
wherein, at the control step, in the case where integrated value of distances calculated up to a certain hierarchy is above the threshold value at the threshold value comparison step, control is conducted in such a manner to truncate distance calculation.
24. The recording medium as set forth in claim 23, wherein distance calculation between respective components constituting the two input vectors is performed in a hierarchical manner at the hierarchical distance calculation step, whereby in the case where integrated value of distances calculated up to a certain hierarchy is below the threshold value, distance calculation between next components is performed.
25. The recording medium as set forth in claim 24, wherein the program further including a transform step of implementing a predetermined transform operation to the two input vectors, and
wherein, at the hierarchical distance calculation step, distance calculation between the two input vectors transformed at the transform step is performed in a predetermined order based on the predetermined transform operation.
26. The recording medium as set forth in claim 25, wherein the program including a division step of taking out, in the predetermined order, respective components constituting the respective two input vectors transformed at the transform step to divide them into hierarchical plural partial vectors, and
wherein distance calculation between respective components constituting partial vectors is performed in a hierarchical manner in order from the partial vector of the uppermost hierarchy at the hierarchical distance calculation step, whereby in the case where integrated value of calculated distances between all components constituting partial vectors up to a certain hierarchy is below the threshold value, distance calculation between respective components constituting partial vector of one hierarchy lower is performed.
US10/489,012 2002-07-09 2003-06-26 Similarity calculation method and device Expired - Lifetime US7260488B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2002-200481 2002-07-09
JP2002200481A JP4623920B2 (en) 2002-07-09 2002-07-09 Similarity calculation method and apparatus, program, and recording medium
PCT/JP2003/008142 WO2004006185A1 (en) 2002-07-09 2003-06-26 Similarity calculation method and device

Publications (2)

Publication Number Publication Date
US20050033523A1 true US20050033523A1 (en) 2005-02-10
US7260488B2 US7260488B2 (en) 2007-08-21

Family

ID=30112514

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/489,012 Expired - Lifetime US7260488B2 (en) 2002-07-09 2003-06-26 Similarity calculation method and device

Country Status (7)

Country Link
US (1) US7260488B2 (en)
EP (1) EP1521210B9 (en)
JP (1) JP4623920B2 (en)
KR (1) KR101021044B1 (en)
CN (1) CN1324509C (en)
DE (1) DE60330147D1 (en)
WO (1) WO2004006185A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100235352A1 (en) * 2006-11-26 2010-09-16 Algotec Systems Ltd. Comparison workflow automation by registration
US8738633B1 (en) * 2012-01-31 2014-05-27 Google Inc. Transformation invariant media matching
US20170206202A1 (en) * 2014-07-23 2017-07-20 Hewlett Packard Enterprise Development Lp Proximity of data terms based on walsh-hadamard transforms
US10783268B2 (en) 2015-11-10 2020-09-22 Hewlett Packard Enterprise Development Lp Data allocation based on secure information retrieval
US11080301B2 (en) 2016-09-28 2021-08-03 Hewlett Packard Enterprise Development Lp Storage allocation based on secure data comparisons via multiple intermediaries
US11080480B2 (en) 2017-08-29 2021-08-03 Fujitsu Limited Matrix generation program, matrix generation apparatus, and plagiarism detection program

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7539870B2 (en) * 2004-02-10 2009-05-26 Microsoft Corporation Media watermarking by biasing randomized statistics
JP4220449B2 (en) * 2004-09-16 2009-02-04 株式会社東芝 Indexing device, indexing method, and indexing program
JP2006101462A (en) * 2004-09-30 2006-04-13 Sanyo Electric Co Ltd Image signal processing device
US7552303B2 (en) * 2004-12-14 2009-06-23 International Business Machines Corporation Memory pacing
KR100687207B1 (en) * 2005-09-16 2007-02-26 주식회사 문화방송 Image transmitting apparatus and image receiving apparatus
US9568591B2 (en) * 2014-11-10 2017-02-14 Peter Dan Morley Method for search radar processing using random matrix theory
US9503747B2 (en) * 2015-01-28 2016-11-22 Intel Corporation Threshold filtering of compressed domain data using steering vector
KR102359556B1 (en) 2016-11-11 2022-02-08 삼성전자주식회사 User certification method using fingerprint image and generating method of coded model for user certification
CN108960537B (en) * 2018-08-17 2020-10-13 安吉汽车物流股份有限公司 Logistics order prediction method and device and readable medium
CN112861260B (en) * 2021-02-01 2022-03-11 中国人民解放军国防科技大学 Method, device and equipment for matching charging performance of solid rocket engine
CN114225361A (en) * 2021-12-09 2022-03-25 栾金源 Tennis ball speed measurement method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5937101A (en) * 1995-01-20 1999-08-10 Samsung Electronics Co., Ltd. Post-processing device for eliminating blocking artifact and method therefor
US5949908A (en) * 1994-11-24 1999-09-07 Victor Company Of Japan, Ltd. Method of reducing quantization noise generated during a decoding process of image data and device for decoding image data
US6535617B1 (en) * 2000-02-14 2003-03-18 Digimarc Corporation Removal of fixed pattern noise and other fixed patterns from media signals
US6807305B2 (en) * 2001-01-12 2004-10-19 National Instruments Corporation System and method for image pattern matching using a unified signal transform
US6963667B2 (en) * 2001-01-12 2005-11-08 National Instruments Corporation System and method for signal matching and characterization
US6968090B2 (en) * 2000-12-22 2005-11-22 Fuji Xerox Co., Ltd. Image coding apparatus and method

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS4934246A (en) * 1972-07-28 1974-03-29
JPS6227878A (en) * 1985-07-29 1987-02-05 Ricoh Co Ltd Matching method
JPH0711819B2 (en) * 1986-06-20 1995-02-08 株式会社リコー Pattern recognition method
JPS6339092A (en) * 1986-08-04 1988-02-19 Ricoh Co Ltd Dictionary retrieving system
JPS6339093A (en) * 1986-08-04 1988-02-19 Ricoh Co Ltd Dictionary retrieving system
JP2953706B2 (en) * 1989-04-15 1999-09-27 株式会社東芝 Pattern recognition device
JPH0743598B2 (en) * 1992-06-25 1995-05-15 株式会社エイ・ティ・アール視聴覚機構研究所 Speech recognition method
JP2700440B2 (en) * 1994-04-19 1998-01-21 エヌ・ティ・ティ・データ通信株式会社 Article identification system
JP3224955B2 (en) * 1994-05-27 2001-11-05 株式会社東芝 Vector quantization apparatus and vector quantization method
JPH1013832A (en) * 1996-06-25 1998-01-16 Nippon Telegr & Teleph Corp <Ntt> Moving picture recognizing method and moving picture recognizing and retrieving method
KR100247969B1 (en) * 1997-07-15 2000-03-15 윤종용 Apparatus and method for massive pattern matching
US6253201B1 (en) * 1998-06-23 2001-06-26 Philips Electronics North America Corporation Scalable solution for image retrieval
JP3252802B2 (en) * 1998-07-17 2002-02-04 日本電気株式会社 Voice recognition device
JP2002008027A (en) * 2000-06-20 2002-01-11 Ricoh Co Ltd Pattern recognition method, pattern recognition device, and storage medium with pattern recognition program recorded thereon
JP3816309B2 (en) 2000-06-26 2006-08-30 アマノ株式会社 Parking lot management device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5949908A (en) * 1994-11-24 1999-09-07 Victor Company Of Japan, Ltd. Method of reducing quantization noise generated during a decoding process of image data and device for decoding image data
US6167157A (en) * 1994-11-24 2000-12-26 Victor Company Of Japan, Ltd. Method of reducing quantization noise generated during a decoding process of image data and device for decoding image data
US5937101A (en) * 1995-01-20 1999-08-10 Samsung Electronics Co., Ltd. Post-processing device for eliminating blocking artifact and method therefor
US6535617B1 (en) * 2000-02-14 2003-03-18 Digimarc Corporation Removal of fixed pattern noise and other fixed patterns from media signals
US6968090B2 (en) * 2000-12-22 2005-11-22 Fuji Xerox Co., Ltd. Image coding apparatus and method
US6807305B2 (en) * 2001-01-12 2004-10-19 National Instruments Corporation System and method for image pattern matching using a unified signal transform
US6963667B2 (en) * 2001-01-12 2005-11-08 National Instruments Corporation System and method for signal matching and characterization

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100235352A1 (en) * 2006-11-26 2010-09-16 Algotec Systems Ltd. Comparison workflow automation by registration
US9280815B2 (en) * 2006-11-26 2016-03-08 Algotec Systems Ltd. Comparison workflow automation by registration
US8738633B1 (en) * 2012-01-31 2014-05-27 Google Inc. Transformation invariant media matching
US9508023B1 (en) 2012-01-31 2016-11-29 Google Inc. Transformation invariant media matching
US20170206202A1 (en) * 2014-07-23 2017-07-20 Hewlett Packard Enterprise Development Lp Proximity of data terms based on walsh-hadamard transforms
US10783268B2 (en) 2015-11-10 2020-09-22 Hewlett Packard Enterprise Development Lp Data allocation based on secure information retrieval
US11080301B2 (en) 2016-09-28 2021-08-03 Hewlett Packard Enterprise Development Lp Storage allocation based on secure data comparisons via multiple intermediaries
US11080480B2 (en) 2017-08-29 2021-08-03 Fujitsu Limited Matrix generation program, matrix generation apparatus, and plagiarism detection program

Also Published As

Publication number Publication date
KR20050016278A (en) 2005-02-21
EP1521210B1 (en) 2009-11-18
US7260488B2 (en) 2007-08-21
DE60330147D1 (en) 2009-12-31
JP2004046370A (en) 2004-02-12
EP1521210A4 (en) 2007-07-04
EP1521210A1 (en) 2005-04-06
KR101021044B1 (en) 2011-03-14
CN1324509C (en) 2007-07-04
EP1521210B9 (en) 2010-09-15
JP4623920B2 (en) 2011-02-02
CN1552042A (en) 2004-12-01
WO2004006185A1 (en) 2004-01-15

Similar Documents

Publication Publication Date Title
US7260488B2 (en) Similarity calculation method and device
JP3960151B2 (en) Similar time series detection method and apparatus, and program
CA2364798C (en) Image search system and image search method thereof
JP3550681B2 (en) Image search apparatus and method, and storage medium storing similar image search program
JP3997749B2 (en) Signal processing method and apparatus, signal processing program, and recording medium
US7457460B2 (en) Picture retrieving apparatus and method which converts orthogonal transform coefficients into color histogram data
JP2003507804A (en) Method and apparatus for variable complexity decoding of motion compensated block-based compressed digital video
WO2002033978A1 (en) Non-linear quantization and similarity matching methods for retrieving image data
US8175392B2 (en) Time segment representative feature vector generation device
JP2004350283A (en) Method for segmenting compressed video into 3-dimensional objects
KR100788642B1 (en) Texture analysing method of digital image
KR100486738B1 (en) Method and apparatus for extracting feature vector for use in face recognition and retrieval
JP2010183499A (en) Image comparison device and method, image retrieval device, program, and recording medium
Seales et al. Object recognition in compressed imagery
CN114979470A (en) Camera rotation angle analysis method, device, equipment and storage medium
KR100616229B1 (en) Method and Apparatus for retrieving of texture image
Pi et al. Image retrieval based on histogram of new fractal parameters
KR100333744B1 (en) Searching system, method and recorder of similar images using for characteristics of compressed images
JP4697111B2 (en) Image comparison apparatus and method, and image search apparatus and method
CN113179157B (en) Text-related voiceprint biological key generation method based on deep learning
McIntyre et al. Exploring content-based image indexing techniques in the compressed domain
JP3983981B2 (en) Digital video processing method and apparatus
Chiang et al. A hierarchical grid-based indexing method for content-based image retrieval
Ho et al. An effective histogram-based approach to JPEG-100 forensics
Fan et al. Retrieval based on Indexing for Compressed Domain

Legal Events

Date Code Title Description
AS Assignment

Owner name: SONY CORPORATION, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ABE, MOTOTSUGU;NISHIGUCHI, MASAYUKI;REEL/FRAME:015802/0017;SIGNING DATES FROM 20040213 TO 20040228

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12