CN104715033A - Step type voice frequency retrieval method - Google Patents

Step type voice frequency retrieval method Download PDF

Info

Publication number
CN104715033A
CN104715033A CN201510113675.1A CN201510113675A CN104715033A CN 104715033 A CN104715033 A CN 104715033A CN 201510113675 A CN201510113675 A CN 201510113675A CN 104715033 A CN104715033 A CN 104715033A
Authority
CN
China
Prior art keywords
fingerprint
audio
retrieved
similarity
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510113675.1A
Other languages
Chinese (zh)
Inventor
牛保宁
姚姗姗
王运生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiyuan University of Technology
Original Assignee
Taiyuan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiyuan University of Technology filed Critical Taiyuan University of Technology
Priority to CN201510113675.1A priority Critical patent/CN104715033A/en
Publication of CN104715033A publication Critical patent/CN104715033A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1737Details of further file system functions for reducing power consumption or coping with limited storage space, e.g. in mobile devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/61Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A step type voice frequency retrieval method comprises the steps that the Fibonacci Hash algorithm is used to establish a Hash index table for a voice frequency original fingerprint database; the voice frequency original fingerprint database is converted through the BoF algorithm to generate a voice frequency intermediate fingerprint database corresponding to the voice frequency original fingerprint database; by screening for three times, a third serial number set of a possible outcome is screened out, and the retrieval range is narrowed once again; in original fingerprints corresponding to the third serial number set, accurate match retrieval is conducted with the original fingerprints of a voice frequency clip to be retrieved, and the final retrieval result is obtained. By means of the step type voice frequency retrieval method, when the quick voice frequency retrieval is conducted, on the premise that the precision is not lowered, the calculated amount can be reduced, it is achieved that the efficiency is improved, and the use of internal storage can be reduced.

Description

A kind of staged audio search method
Technical field
The invention belongs to content-based audio-frequency fingerprint searching field, specifically based on a kind of staged audio search method of Philip audio-frequency fingerprint and Bag-of-Features (BoF) algorithm.
Background technology
Along with internet since the new century is in mondial extensively universal, the fast development of audio encoding and decoding technique and the birth of high-capacity storage medium, exponentially other increases the DAB resource quantity in network.While the network digital audio resource of magnanimity brings great convenience; due to present stage internet DAB management system and the lack of standardization and imperfection of copyright protection scheme; the network user can arbitrarily upload or download DAB resource and even change audio content, and this is in the legitimate rights and interests of DAB resource copyright owner that virtually constituted a serious infringement.
Audio search method main is at present divided into based on text and content-based two large classes, and content-based audio retrieval has become the focus of recent domestic research.
Content-based audio-frequency fingerprint retrieval is that the fingerprint in audio-frequency fingerprint to be retrieved and audio fingerprint database is carried out similarity mode, obtains the process of result for retrieval by comparing similarity.
Philips(Philip) audio-frequency fingerprint is at present comparatively conventional a kind of fingerprint.The most direct audio retrieval algorithm is that the fingerprint of the reference audio in the fingerprint of audio fragment to be retrieved and audio repository is carried out similarity mode one by one, but this method is along with the amplification of the audio repository effect expected that causes recall precision not reach people completely.
Audio-frequency fingerprint has the characteristic of higher-dimension usually, and the similarity mode of higher-dimension fingerprint can cause calculating and storage cost to increase with the form of index.Treat audio-frequency fingerprint and retrieve the data higher-dimension problem brought, the problem of most critical is design one search method fast and accurately.
Different audio-frequency fingerprints needs to take corresponding suitable fingerprint searching algorithm and Similarity Match Method to be solved according to its data structure characteristic sum application scenarios etc.At present, the research direction of quick retrieval mainly contains and reduces dimension and set up index two class.
The thought of dimensionality reduction technology is the calculated amount reduced by minimizing finger print data amount in fingerprint similarity mode process, thus reaches the object improving recall precision.
The dimensionality reduction technology based on OPCA that Diamantaras and Kung proposes, the identification of streaming media is very effective, but has that classifying quality is undesirable, the problem of unstable result.Based on this, the people such as Hu propose the audio frequency dimensionality reduction technology of a kind of w-PCA based on weighting, in data low-dimensional, have remarkable superiority, but comparatively responsive to choosing of dimension.The people such as Shen propose a kind of summation algorithm, significantly can improve retrieval rate, but only have when maximum eigenwert is much larger than being applicable to during further feature value using.In addition, the people such as Zheng propose a kind of quantization method of weighting self-similarity, before retrieval, the audio feature vector fingerprint of multidimensional are carried out dimension-reduction treatment.The people such as Panagiotou set up Markov model after using the technology of gathering of a kind of Based PC A to carry out dimensionality reduction for Delta Mel Cepstral Frequency Coefficients or Delta chromaticity, effectively accelerate retrieving.But dimensionality reduction technology is while raising recall precision, reduces precision and the recall rate of fingerprint, this mates with the retrieval that we will carry out and runs in the opposite direction.
The object of indexing means is by setting up index association to fingerprint, thus rapid drop range of search, realize efficient retrieval.The people such as Haitsma propose all possible audio-frequency fingerprint to set up a fast query table to Philips fingerprint, fingerprint in audio-frequency fingerprint storehouse is associated with fast query table respectively, in question blank, the song of inquiry associated by audio-frequency fingerprint can be found fast.But when inquiring about audio distortion, retrieval performance can significantly decline.The people such as Chen improve the method, propose a kind of quick retrieval based on Fibonacci Hash, according to the size of the capacity adjustment Hash table of internal memory, effectively can save internal memory.The form that the people such as Kurth propose audio-frequency fingerprint to quantize code book sets up index, can significantly improve retrieval rate, but when index building due to the restriction of the error rate to feature, False Rate can be caused to raise.Meanwhile, Kurth and Muller proposes a kind of search method based on the index of falling sort file to the multiple fault-tolerant strategy that is machine-processed with arrangement and that repeatedly inquire about of CENS (chroma energy normalized statistics) integrate features.The people such as Vitola use a kind of hash function to take the fingerprint to frequecy characteristic, and use Hash numbering to divide search space, have higher extendability in parallel architecture.But set up index and need to spend extra storage space, along with the continuous increase of amount of audio data, this will be a very serious problem.
Summary of the invention
While improving recall precision, reduce the precision of fingerprint to overcome dimensionality reduction technology in searching algorithm and set up the deficiency of the storage space that index needs cost extra, the invention provides efficient a kind of staged audio search method, calculated amount can be reduced under the prerequisite not reducing precision, the raising of implementation efficiency, and the use reducing internal memory.
The technical solution adopted for the present invention to solve the technical problems is:
1, audio frequency original fingerprint storehouse is set up;
2, use Fibonacci Hash (Fibonacci Hash) algorithm, hash index table is set up to original fingerprint storehouse;
3, original fingerprint storehouse is converted to middle fingerprint base through BOF algorithm;
4, three screenings are carried out to middle fingerprint base;
Described three screenings are first time screening, programmed screening, for the third time screening;
First time screening adopts Fibonacci Hash filters;
Programmed screening, third time screening: all adopt the fixed intervals sampling matching method based on threshold value to filter;
5, the original fingerprint corresponding to the result filtered out third time adopts Philip algorithm to carry out exact matching with retrieval audio frequency original fingerprint, obtains final result for retrieval;
The present invention is according to Philips(Philip) audio-frequency fingerprint and Bag-of-Features(BOF) technology, devise the middle fingerprint that a kind of data volume is less, be used for the dissimilar audio frequency of fast filtering.Devise a kind of fixed intervals sampling matching method based on threshold value simultaneously, when using the fingerprint of audio fragment to be retrieved to mate with storehouse sound intermediate frequency, owing to supposing that audio clip length to be retrieved is less than storehouse sound intermediate frequency, once mate every a segment distance, and in each matching process, only coupling is at a distance of the sub-fingerprint of fixed intervals, according to its similarity of threshold decision, can matching times be reduced, accelerate retrieval matching speed.
Add Fibonacci hash algorithm, the size of generating indexes can be adjusted according to the size of internal memory, reduce the excessive use of storage space.
The present invention, when carrying out audio frequency quick-searching, can reach and reduce calculated amount under the prerequisite not reducing precision, the raising of implementation efficiency, and the use that can reduce internal memory.
The described fixed intervals sampling matching method based on threshold value is as follows:
1. sub-fingerprint amount threshold: if the middle fingerprint totalframes of audio fragment to be retrieved is less than the middle fingerprint totalframes of reference audio, then judge that reference audio is as possible outcome;
2. single frames distance threshold α and average distance threshold ā: if the single frames distance ε of the middle fingerprint of audio fragment to be retrieved ibe less than single frames distance threshold α, or front N ithe mean distance ∑ ε of frame i/ N iwhen being less than mean distance threshold value ā, directly judge that reference audio is as possible outcome; Fixed intervals sampling matching method is adopted during calculating; α and ā is the integer being greater than 0; N ifor being greater than the integer of zero, scope 0-N m/ Q; N mfor the totalframes of fingerprint in the middle of audio fragment to be retrieved, N mfor being greater than the integer of zero; Q is a constant, and Q is 1-N m, carry out a similarity mode at interval of Q frame; N m/ Q is the total degree that the middle fingerprint of audio fragment to be retrieved needs to carry out similarity mode;
3. Cumulative Distance threshold value beta and cumulative frequency threshold value Ω: namely process 2. in, before accumulative, the distance ε m of fingerprint in the middle of m frame, if ε m is less than β, or when m does not reach Ω, then judges that reference audio is as possible outcome; β and Ω is the integer being greater than 0; M be greater than zero integer, scope 0-N m/ Q; Wherein N mfor the totalframes of fingerprint in the middle of audio fragment to be retrieved, N mfor being greater than the integer of zero; Q is a constant, and Q is 1-N m, carry out a similarity mode at interval of Q frame; N m/ Q is the total degree that the middle fingerprint of audio fragment to be retrieved needs to carry out similarity mode;
4. front t frame similarity threshold γ: when namely using original fingerprint at every turn to slide window coupling, the similarity S of the front t frame fingerprint of first contrast t, work as S tduring > γ, judge that reference audio is as possible outcome, calculates the similarity S of overall fingerprint v; γ, S tand S vbe the real number being greater than 0; Fixed intervals sampling matching method is adopted during calculating; T be greater than zero integer, scope 0-N o/ Q; Wherein N ofor the totalframes of audio fragment original fingerprint to be retrieved, N ofor being greater than the integer of zero; Q is a constant, scope 1-N o, carry out a similarity mode at interval of Q frame; N o/ Q is the total degree that the original fingerprint of audio fragment to be retrieved needs to carry out similarity mode;
5. accumulation similarity threshold η: namely process 4. in, the similarity ε of n frame original fingerprint before accumulative nif, ε n< η, judges that reference audio is as possible outcome; η and ε nbe the real number being greater than 0; N be greater than zero integer, scope 0-N o/ Q; Wherein N ofor the totalframes of audio fragment original fingerprint to be retrieved; Q is a constant, scope 1-N o, carry out a similarity mode at interval of Q frame; N o/ Q is the total degree that the original fingerprint of audio fragment to be retrieved needs to carry out similarity mode;
6. slip interval threshold θ: when namely similarity is lower than slip interval threshold θ when between fingerprint, increases slip number of times, then carries out similarity mode; θ be greater than 0 real number.
Described fixed intervals sampling matching method is as follows:
For the audio fragment to be retrieved that length is N frame, in reference audio, first choose the audio fragment that length is N frame.For two fragments, at interval of Q frame, get a sub-fingerprint and calculate its similarity (Q is a constant, scope 1-N).(N be greater than zero integer) if similarity reaches single frames distance threshold α, or front N ithe mean distance of frame reaches mean distance threshold value ā, or front t frame similarity threshold γ, then slides window backward, reference audio is chosen the audio fragment that other end length is N frame, repeats above-mentioned deterministic process.Until judge do not meet threshold value and stop, or sliding window is to audio frequency ending, obtains the overall similarity of audio frequency, completes and once mate.
Fixed intervals sampling matching method above based on threshold value is applied in the filtration retrieving of middle fingerprint and original fingerprint, carries out quick similar judgement, can reach retrieval effectiveness more efficiently.
Accompanying drawing explanation
Fig. 1 is searching system logic diagram of the present invention.
Fig. 2 is the system of selection schematic diagram of fixed bit position of the present invention.
Fig. 3 is fixed intervals sampling matching method schematic diagram of the present invention.
Embodiment
First step: set up audio frequency original fingerprint storehouse; Namely before retrieval, original fingerprint (Philips fingerprint) is extracted to reference audio, set up audio frequency original fingerprint storehouse; Use Fibonacci hash algorithm, hash index table is set up to original fingerprint storehouse;
Second step: convert original fingerprint storehouse to middle fingerprint base through BOF algorithm;
The relatively little middle fingerprint base of data volume is generated by the conversion of BoF algorithm, the audio frequency sequence number one_to_one corresponding in order in two kinds of fingerprint bases by original fingerprint storehouse.
The generating algorithm program of middle fingerprint is as follows:
Input:
The Philips fingerprint of F [m] // comprise a m fingerprint
M // by the number of bits of sub-fingerprint classification institute foundation
The interval frame number of fingerprint in the middle of an X // two adjacent son
Y // foundation Y original sub-fingerprint forms fingerprint in the middle of a son
Export:
MF [(m-Y)/X+1] [2 m] // middle fingerprint
Start
1. i←0, j←0;
2. for i = 0 to ((m-Y)/X +1) do
3. for j=0 to 2M-1 do
4. MF [i] [i] ← 0 // middle fingerprint initialization
5. end for
6. end for
7. while i < m-Y do
8. for j = i to i+Y-1 do
9. the most M-bit position, end of g ← F [j]
10. MF[i][g]←MF[i][g] +1
11. end for
12. i←i+X
13. end while
Terminate.
First, according to M the bit at the most end in sub-fingerprint, the sub-fingerprint of original fingerprint (i.e. Philips fingerprint) is divided into 2 mclass.This M bit is selected from usually has more high-octane low frequency range.Because a sub-fingerprint of original fingerprint is made up of 32 bits, so the span of M is 1-32, and the least possible, just can reach the object reducing fingerprint dimension.In experimentation, inventor constantly changes the value of M from small to large, and through comparing, as M=3, experimental result is best.
Then, sub-fingerprint is divided into each frame length and comprises Y sub-fingerprint, and the overlapping frame of interval X sub-fingerprint.Wherein 0<X<Y<N., N ofor the totalframes of audio fragment original fingerprint to be retrieved, N ofor being greater than the integer of zero.In each frame, calculate a continuous print Y fingerprint and belong to 2 respectively mwhich kind of in class, add up the sub-fingerprint number comprised in each class, these are 2 years old mindividual statistical value is as fingerprint in the middle of the son of this frame, and in the middle of all sons, fingerprint forms the middle fingerprint of this first audio frequency.
Test in the present embodiment select M to be 3, X be 32 and Y be 480.Namely in the middle of, a frame of fingerprint comprises 480 sub-fingerprints, and two sub-fingerprints in 32, consecutive frame interval.According to 3 bits at their most ends, these 480 sub-fingerprints are divided into 8 classes.
Calculate the number of each class neutron fingerprint, form fingerprint in the middle of a son comprising 8 integers.Fig. 2 is shown in the system of selection of M fixed bit position.Each bit has 0 or 1 two kind of expression, so 3 bits can represent 2 3, i.e. 8 kinds of different situations.According to these 8 kinds different situations, the sub-fingerprint of original fingerprint can be divided into 8 classes.Such as, in Fig. 1, first three sub-fingerprint most end 3 is all 100, can be divided into the first kind, and the 4th and the 7th sub-fingerprint are all 101, Equations of The Second Kind can be divided into, 5th and the 8th sub-fingerprint are all 100, can be divided into the 3rd class, by that analogy.
Third step:
First time screening is carried out in original fingerprint storehouse; Namely use the original fingerprint of audio fragment to be retrieved, in hash index table, carry out indexed search, filter out the sequence number collection I of possible outcome, reduce range of search;
Programmed screening is carried out in middle fingerprint base; The original fingerprint of retrieval audio frequency is converted to the middle fingerprint of audio fragment to be retrieved by Bag-of-Features (BOF), adopt the fixed intervals sampling matching method based on threshold value to carry out filtration with the doubtful middle fingerprint that may mate in fingerprint base in the middle of audio frequency in the middle fingerprint of audio fragment to be retrieved to retrieve, rapid screening goes out the sequence number collection II of possible outcome, reduces range of search further;
Can calculate in many ways with the similarity of fingerprint in the middle of the doubtful son that may mate in fingerprint base in the middle of audio frequency for fingerprint in the middle of the son of audio fragment to be retrieved.We use Euclidean distance.
Wherein, x 1, x 2x nrepresent respectively the 1st of audio fragment to be retrieved, the 2nd ... fingerprint in the middle of the n-th son, y 1, y 2y nrepresent respectively the 1st of fingerprint in the middle of in fingerprint base, one may mate in the middle of audio frequency doubtful son, the 2nd ... fingerprint in the middle of the n-th son.
If the similarity obtained is less than single frames distance threshold α at every turn, or front N ithe similarity mean value of frame is less than mean distance threshold value ā, then obtain possible outcome; The distance ε of the middle fingerprint of m frame before accumulative mif, ε mbe less than Cumulative Distance threshold value beta or m when not reaching cumulative frequency threshold value Ω, obtain the possible outcome after reducing, put into sequence number collection II.Wherein 0<N i<N m/ Q, 0<m<N m/ Q(is N wherein mfor the totalframes of fingerprint in the middle of audio fragment to be retrieved, N mfor being greater than the integer of zero; Q is a constant, scope 1-N m, carry out a similarity mode at interval of Q frame; N m/ Q is the total degree that the middle fingerprint of audio fragment to be retrieved needs to carry out similarity mode)
Third time screening is carried out in original fingerprint storehouse; Use the original fingerprint of audio fragment to be retrieved, the fixed intervals sampling matching method based on threshold value is adopted to carry out filtration retrieval in the doubtful original fingerprint that the possible outcome sequence number gone out at programmed screening is corresponding, filter out the sequence number collection III of possible outcome, again reduce range of search;
The similarity bit error rate (BER) (BER) of the sub-fingerprint of retrieval audio frequency original fingerprint and doubtful original fingerprint judges.
Wherein, a represents the number of not identical bits in matching process, and b represents the total length of original fingerprint.
If front N obefore the bit error rate of frame, t frame bit error threshold gamma is large, then obtain possible outcome; Calculate the bit error S of overall fingerprint again v; Fixed intervals sampling matching method is adopted during calculating; Wherein 0<t<N o/ Q(is N wherein ofor the totalframes of audio fragment original fingerprint to be retrieved, N ofor being greater than the integer of zero; Q is a constant, scope 1-N o, carry out a similarity mode at interval of Q frame; N o/ Q is the total degree that the original fingerprint of audio fragment to be retrieved needs to carry out similarity mode)
4th step:
Original fingerprint corresponding to the result filter out third time adopts Philip algorithm to carry out exact match search with retrieval audio frequency original fingerprint, obtains final result for retrieval.
The described fixed intervals sampling matching method based on threshold value is as follows:
1. sub-fingerprint amount threshold: if the middle fingerprint totalframes of audio fragment to be retrieved is less than the middle fingerprint totalframes of reference audio, then judge that reference audio is as possible outcome;
2. single frames distance threshold α and average distance threshold ā: if the single frames distance ε of the middle fingerprint of audio fragment to be retrieved ibe less than single frames distance threshold α, or front N ithe mean distance ∑ ε of frame i/ N iwhen being less than mean distance threshold value ā, directly judge that reference audio is as possible outcome; Fixed intervals sampling matching method is adopted during calculating; α and ā is the integer being greater than 0; N ifor being greater than the integer of zero, scope 0-N m/ Q; N mfor the totalframes of fingerprint in the middle of audio fragment to be retrieved, N mfor being greater than the integer of zero; Q is a constant, and Q is 1-N m, carry out a similarity mode at interval of Q frame; N m/ Q is the total degree that the middle fingerprint of audio fragment to be retrieved needs to carry out similarity mode;
3. Cumulative Distance threshold value beta and cumulative frequency threshold value Ω: namely process 2. in, the distance ε of fingerprint in the middle of m frame before accumulative mif, ε mbe less than β, or when m does not reach Ω, then judge that reference audio is as possible outcome; β and Ω is the integer being greater than 0; M be greater than zero integer, scope 0-N m/ Q; Wherein N mfor the totalframes of fingerprint in the middle of audio fragment to be retrieved, N mfor being greater than the integer of zero; Q is a constant, and Q is 1-N m, carry out a similarity mode at interval of Q frame; N m/ Q is the total degree that the middle fingerprint of audio fragment to be retrieved needs to carry out similarity mode;
4. front t frame similarity threshold γ: when namely using original fingerprint at every turn to slide window coupling, the similarity S of the front t frame fingerprint of first contrast t, work as S tduring > γ, judge that reference audio is as possible outcome, calculates the similarity S of overall fingerprint v; γ, S tand S vbe the real number being greater than 0; Fixed intervals sampling matching method is adopted during calculating; T be greater than zero integer, scope 0-N o/ Q; Wherein N ofor the totalframes of audio fragment original fingerprint to be retrieved, N ofor being greater than the integer of zero; Q is a constant, scope 1-N o, carry out a similarity mode at interval of Q frame; N o/ Q is the total degree that the original fingerprint of audio fragment to be retrieved needs to carry out similarity mode;
5. accumulation similarity threshold η: namely process 4. in, the similarity ε of n frame original fingerprint before accumulative nif, ε n< η, judges that reference audio is as possible outcome; η and ε nbe the real number being greater than 0; N be greater than zero integer, scope 0-N o/ Q; Wherein N ofor the totalframes of audio fragment original fingerprint to be retrieved; Q is a constant, scope 1-N o, carry out a similarity mode at interval of Q frame; N o/ Q is the total degree that the original fingerprint of audio fragment to be retrieved needs to carry out similarity mode;
6. slip interval threshold θ: when namely similarity is lower than slip interval threshold θ when between fingerprint, increases slip number of times, then carries out similarity mode; θ be greater than 0 real number.
Described fixed intervals sampling matching method is shown in Fig. 3.Z is fingerprint dimension.For the audio fragment to be retrieved that length is N frame, in reference audio, first choose the audio fragment that length is N frame.For two fragments, at interval of Q frame, get a sub-fingerprint and calculate its similarity (Q is a constant, scope 1-N).(N be greater than zero integer) if similarity reaches single frames distance threshold α, or front N ithe mean distance of frame reaches mean distance threshold value ā, or front t frame similarity threshold γ, then slides window backward, reference audio is chosen the audio fragment that other end length is N frame, repeats above-mentioned deterministic process.Until judge do not meet threshold value and stop, or sliding window is to audio frequency ending, obtains the overall similarity of audio frequency, completes and once mate.
The present invention is known content at described Bag-of-Features (BoF) algorithm, delivering document is: F. Precioso, M. Cord, D Gorisse, and N. Thome, " Efficient bag-of-features kernel representation for image similarity search; " Proc of the 18th IEEE International Conference on Image Processing (ICIP2011). Brussels, pp. 109-112, September 2011.

Claims (3)

1. a staged audio search method, comprises following content:
(1) original fingerprint extracted to reference audio and set up audio frequency original fingerprint storehouse;
(2) use Fibonacci hash algorithm, hash index table is set up to audio frequency original fingerprint storehouse;
(3) audio frequency original fingerprint storehouse is generated fingerprint base in the middle of the audio frequency corresponding with original fingerprint storehouse through the conversion of BoF algorithm;
(4) first time screening: extract original fingerprint by audio fragment to be retrieved, and carry out indexed search in hash index table, filter out the first sequence number collection possessing possible outcome, reduce range of search;
(5) programmed screening: the middle fingerprint original fingerprint of audio fragment to be retrieved being generated audio fragment to be retrieved through the conversion of BoF algorithm, use the middle fingerprint of audio fragment to be retrieved in middle fingerprint base, adopt the fixed intervals sampling matching method based on threshold value to carry out filtration retrieval, filter out the second sequence number collection possessing possible outcome, reduce range of search further;
(6) third time screening: the original fingerprint namely using audio fragment to be retrieved, in the original fingerprint that the first sequence number set pair is answered, adopt the fixed intervals sampling matching method based on threshold value to carry out filtration retrieval, filter out the 3rd sequence number collection of possible outcome, again reduce range of search;
(7), in the original fingerprint of answering at the 3rd sequence number set pair, carry out exact match search with the original fingerprint of audio fragment to be retrieved, obtain final result for retrieval.
2. a kind of staged audio search method according to claim 1, is characterized in that the described fixed intervals sampling matching method based on threshold value comprises following content:
1. sub-fingerprint amount threshold: if the middle fingerprint totalframes of audio fragment to be retrieved is less than the middle fingerprint totalframes of reference audio, then judge that reference audio is as possible outcome;
2. single frames distance threshold α and average distance threshold ā: if the single frames distance ε of the middle fingerprint of audio fragment to be retrieved ibe less than single frames distance threshold α, or front N ithe mean distance ∑ ε of frame i/ N iwhen being less than mean distance threshold value ā, directly judge that reference audio is as possible outcome; Fixed intervals sampling matching method is adopted during calculating; α and ā is the integer being greater than 0; N ifor being greater than the integer of zero, scope 0-N m/ Q; N mfor the totalframes of fingerprint in the middle of audio fragment to be retrieved, N mfor being greater than the integer of zero; Q is a constant, and Q is 1-N m, carry out a similarity mode at interval of Q frame; N m/ Q is the total degree that the middle fingerprint of audio fragment to be retrieved needs to carry out similarity mode;
3. Cumulative Distance threshold value beta and cumulative frequency threshold value Ω: namely process 2. in, the distance ε of fingerprint in the middle of m frame before accumulative mif, ε mbe less than β, or when m does not reach Ω, then judge that reference audio is as possible outcome; β and Ω is the integer being greater than 0; M be greater than zero integer, scope 0-N m/ Q; Wherein N mfor the totalframes of fingerprint in the middle of audio fragment to be retrieved, N mfor being greater than the integer of zero; Q is a constant, and Q is 1-N m, carry out a similarity mode at interval of Q frame; N m/ Q is the total degree that the middle fingerprint of audio fragment to be retrieved needs to carry out similarity mode;
4. front t frame similarity threshold γ: when namely using original fingerprint at every turn to slide window coupling, the similarity S of the front t frame fingerprint of first contrast t, work as S tduring > γ, judge that reference audio is as possible outcome, calculates the similarity S of overall fingerprint v; γ, S tand S vbe the real number being greater than 0; Fixed intervals sampling matching method is adopted during calculating; T be greater than zero integer, scope 0-N o/ Q; Wherein N ofor the totalframes of audio fragment original fingerprint to be retrieved, N ofor being greater than the integer of zero; Q is a constant, scope 1-N o, carry out a similarity mode at interval of Q frame; N o/ Q is the total degree that the original fingerprint of audio fragment to be retrieved needs to carry out similarity mode;
5. accumulation similarity threshold η: namely process 4. in, the similarity ε of n frame original fingerprint before accumulative nif, ε n< η, judges that reference audio is as possible outcome; η and ε nbe the real number being greater than 0; N be greater than zero integer, scope 0-N o/ Q; Wherein N ofor the totalframes of audio fragment original fingerprint to be retrieved; Q is a constant, scope 1-N o, carry out a similarity mode at interval of Q frame; N o/ Q is the total degree that the original fingerprint of audio fragment to be retrieved needs to carry out similarity mode;
6. slip interval threshold θ: when namely similarity is lower than slip interval threshold θ when between fingerprint, increases slip number of times, then carries out similarity mode; θ be greater than 0 real number.
3. a kind of staged audio search method according to claim 2, is characterized in that described fixed intervals sampling matching method comprises following content:
For the audio fragment to be retrieved that length is N frame, in reference audio, first choose the audio fragment that length is N frame;
For two fragments, at interval of Q frame, get a sub-fingerprint and calculate its similarity, Q is a constant, scope 1-N; N be greater than zero integer; If similarity reaches single frames distance threshold α, or front N ithe mean distance of frame reaches mean distance threshold value ā, or front t frame similarity threshold γ, then slides window backward, reference audio is chosen the audio fragment that other end length is N frame, repeats above-mentioned deterministic process; Until judge do not meet threshold value and stop, or sliding window is to audio frequency ending, obtains the overall similarity of audio frequency, completes and once mate.
CN201510113675.1A 2015-03-16 2015-03-16 Step type voice frequency retrieval method Pending CN104715033A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510113675.1A CN104715033A (en) 2015-03-16 2015-03-16 Step type voice frequency retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510113675.1A CN104715033A (en) 2015-03-16 2015-03-16 Step type voice frequency retrieval method

Publications (1)

Publication Number Publication Date
CN104715033A true CN104715033A (en) 2015-06-17

Family

ID=53414360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510113675.1A Pending CN104715033A (en) 2015-03-16 2015-03-16 Step type voice frequency retrieval method

Country Status (1)

Country Link
CN (1) CN104715033A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893549A (en) * 2016-03-31 2016-08-24 中国人民解放军信息工程大学 Audio retrieval method and device
CN107293307A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 Audio-frequency detection and device
CN108399913A (en) * 2018-02-12 2018-08-14 北京容联易通信息技术有限公司 High robust audio fingerprinting method and system
CN108509558A (en) * 2018-03-23 2018-09-07 太原理工大学 A kind of sample count audio search method that resistance rapid-curing cutback is disturbed
CN110047515A (en) * 2019-04-04 2019-07-23 腾讯音乐娱乐科技(深圳)有限公司 A kind of audio identification methods, device, equipment and storage medium
CN110866141A (en) * 2018-08-28 2020-03-06 杭州网易云音乐科技有限公司 Audio file processing method, medium, device and computing equipment
CN110889010A (en) * 2018-09-10 2020-03-17 杭州网易云音乐科技有限公司 Audio matching method, device, medium and electronic equipment
CN112528069A (en) * 2020-12-04 2021-03-19 西安电子科技大学 Audio fingerprint retrieval method based on quantum Grover algorithm
CN112784099A (en) * 2021-01-29 2021-05-11 山西大学 Sampling counting audio retrieval method resisting tonal modification interference

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6185527B1 (en) * 1999-01-19 2001-02-06 International Business Machines Corporation System and method for automatic audio content analysis for word spotting, indexing, classification and retrieval
CN104391924A (en) * 2014-11-21 2015-03-04 南京讯思雅信息科技有限公司 Mixed audio and video search method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6185527B1 (en) * 1999-01-19 2001-02-06 International Business Machines Corporation System and method for automatic audio content analysis for word spotting, indexing, classification and retrieval
CN104391924A (en) * 2014-11-21 2015-03-04 南京讯思雅信息科技有限公司 Mixed audio and video search method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王运生: "基于内容的海量音频高效检索", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107293307B (en) * 2016-03-31 2021-07-16 阿里巴巴集团控股有限公司 Audio detection method and device
CN107293307A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 Audio-frequency detection and device
CN105893549A (en) * 2016-03-31 2016-08-24 中国人民解放军信息工程大学 Audio retrieval method and device
CN105893549B (en) * 2016-03-31 2019-11-19 中国人民解放军信息工程大学 Audio search method and device
CN108399913A (en) * 2018-02-12 2018-08-14 北京容联易通信息技术有限公司 High robust audio fingerprinting method and system
CN108509558A (en) * 2018-03-23 2018-09-07 太原理工大学 A kind of sample count audio search method that resistance rapid-curing cutback is disturbed
CN108509558B (en) * 2018-03-23 2021-11-05 太原理工大学 Anti-speed-variation-interference sampling counting audio retrieval method
CN110866141A (en) * 2018-08-28 2020-03-06 杭州网易云音乐科技有限公司 Audio file processing method, medium, device and computing equipment
CN110889010A (en) * 2018-09-10 2020-03-17 杭州网易云音乐科技有限公司 Audio matching method, device, medium and electronic equipment
CN110047515A (en) * 2019-04-04 2019-07-23 腾讯音乐娱乐科技(深圳)有限公司 A kind of audio identification methods, device, equipment and storage medium
CN110047515B (en) * 2019-04-04 2021-04-20 腾讯音乐娱乐科技(深圳)有限公司 Audio identification method, device, equipment and storage medium
WO2020199384A1 (en) * 2019-04-04 2020-10-08 腾讯音乐娱乐科技(深圳)有限公司 Audio recognition method, apparatus and device, and storage medium
CN112528069A (en) * 2020-12-04 2021-03-19 西安电子科技大学 Audio fingerprint retrieval method based on quantum Grover algorithm
CN112528069B (en) * 2020-12-04 2024-03-01 西安电子科技大学 Audio fingerprint retrieval method based on quantum Grover algorithm
CN112784099A (en) * 2021-01-29 2021-05-11 山西大学 Sampling counting audio retrieval method resisting tonal modification interference
CN112784099B (en) * 2021-01-29 2022-11-11 山西大学 Sampling counting audio retrieval method resisting tonal modification interference

Similar Documents

Publication Publication Date Title
CN104715033A (en) Step type voice frequency retrieval method
CN103440313B (en) music retrieval system based on audio fingerprint feature
Fisher et al. A clustering-based framework to control block sizes for entity resolution
US9367887B1 (en) Multi-channel audio video fingerprinting
CN104050247B (en) The method for realizing massive video quick-searching
US20140310006A1 (en) Method to generate audio fingerprints
CN108509558B (en) Anti-speed-variation-interference sampling counting audio retrieval method
CN104598632B (en) Focus incident detection method and device
CN101334773A (en) Method for filtrating search engine searching result
CN112434553B (en) Video identification method and system based on deep dictionary learning
US8175392B2 (en) Time segment representative feature vector generation device
CN102890700A (en) Method for retrieving similar video clips based on sports competition videos
CN104090880A (en) Method and deice for configuring equalizer parameters of audio files
Mou et al. Content-based copy detection through multimodal feature representation and temporal pyramid matching
Jiang et al. Video copy detection using a soft cascade of multimodal features
CN110767248B (en) Anti-modulation interference audio fingerprint extraction method
Ma et al. A novel approach for high‐dimensional vector similarity join query
CN113590818B (en) Government text data classification method based on integration of CNN (carbon fiber network), GRU (grid-like network) and KNN (K-nearest neighbor network)
Chiang et al. A multi-embedding neural model for incident video retrieval
CN108494620B (en) Network service flow characteristic selection and classification method
CN101515286A (en) Image matching method based on image feature multi-level filtration
CN109389172B (en) Radio signal data clustering method based on non-parameter grid
CN110110120B (en) Image retrieval method and device based on deep learning
CN103294696A (en) Audio and video content retrieval method and system
CN113158048A (en) Mobile internet browsing content intelligent recommendation method, system, equipment and storage medium based on feature recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150617

WD01 Invention patent application deemed withdrawn after publication