US9330332B2 - Fast computation of kernel descriptors - Google Patents
Fast computation of kernel descriptors Download PDFInfo
- Publication number
- US9330332B2 US9330332B2 US14/046,194 US201314046194A US9330332B2 US 9330332 B2 US9330332 B2 US 9330332B2 US 201314046194 A US201314046194 A US 201314046194A US 9330332 B2 US9330332 B2 US 9330332B2
- Authority
- US
- United States
- Prior art keywords
- patch
- kernel
- images
- patches
- processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G06K9/4633—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G06K9/6247—
Definitions
- This invention relates to computation of kernel descriptors, and in particular to fast matching of image patches using fast computation of kernel descriptors.
- a convolutional GRBM method has been proposed to extract spatio-temporal features using a multi-stage architecture.
- ISA convolutional independent subspace analysis
- a two layer hierarchical sparse coding scheme has been used for learning image representations at the pixel level.
- An orientation histogram in effect uses a pre-defined d-dimensional codebook that divides the ⁇ space into uniform bins, and uses hard quantization for projecting pixel gradients.
- the first layer codes are passed to the second layer for jointly encoding signals in the region.
- the orientation histograms and hierarchical sparse coding in effect define the following kernel for measuring the similarity between two patches P and Q:
- Kernel descriptors have been proposed to generalize these approaches by replacing the product ⁇ (z) T ⁇ (z′) above with a match kernel k(z, z′) and allows one to induce arbitrary feature spaces ⁇ (z) (including infinite dimensional) from pixel level attributes. This provides a powerful framework for designing rich low-level features and has shown state-of-the-art results for image and object recognition.
- kernel computations are generally costly and hence it is slow to extract them from densely sampled video patches.
- kernel descriptor computation that takes O(1) operations per pixel in each patch, based on pre-computed kernel values is used. This speeds up the kernel descriptor features under consideration, to levels that are comparable with D-SIFT and color SIFT, and two orders of magnitude faster than STIP and HoG3D.
- kernel descriptors are applied to extract gradient, flow and texture based features for video analysis. In tests of the approach on a large database of internet videos, the flow based kernel descriptors are up to two orders of magnitude faster than STIP and HoG3D, and also produce significant performance improvements. Further, using features from multiple color planes produces small but consistent gains.
- a method for image processing makes use of precomputed stored tables (e.g., “kernel sum tables”), which are read.
- Each kernel table represents a mapping from a corresponding pixel attribute to a vector of values.
- Images are accepted for processing, and patches are identified within said images.
- a feature vector is computed based summations of a product of terms over locations z in the patch.
- Each term within the product is obtained by a lookup in the kernel sum table corresponding to the location z of an attribute of the patch at the location z.
- the feature vectors thus obtained can then be used for several downstream image/video processing applications, such as similarity computation between two patches P and Q.
- a method for image processing makes use of precomputed stored tables (e.g., “kernel sum tables”), which are read.
- Each kernel table represents a mapping from a corresponding feature to a vector of values.
- Images are accepted for processing, and patches are identified within said images.
- the processing includes repeatedly computing similarities between pairs of patches for images being processed.
- Computation of a similarity between a patch P and a patch Q comprises computing for patch P one or more summations over locations z in the patch P of terms, each term being a product of terms including a term obtained by a lookup in a corresponding kernel table according to the location z and/or an attribute of the patch P at the location z, computing for patch Q one or more summations over locations z in the patch Q of terms, each term being a product of terms including a term obtained by a lookup in a corresponding kernel table according to the location z and/or an attribute of the patch Q at the location z, and combining the sums of the one or more summations for P and one or more summations for Q to determine a kernel descriptor similarity between P and Q.
- a result of processing the images is determined using the computed similarities between the patches.
- the kernel tables are precomputed prior to accepting the images for processing.
- An advantage of the approach is that the computational resources required are greatly reduced as compared to conventional approaches to image/video processing using kernel descriptors.
- FIG. 1 is a diagram of a video processing system.
- a computer implemented video processing system 100 includes a runtime processing system 130 , which accepts an input video 132 (e.g., a series of image frames acquired by a camera) and provides a video processor output 138 .
- an input video 132 e.g., a series of image frames acquired by a camera
- video processor output 138 A wide variety of well-known processing tasks may be performed by this system to produce the output 138 .
- a common feature of such tasks is repeated computation of comparison of patches (e.g., pixel regions) of images of the input video.
- the input video 132 includes a large number of input images (e.g., video frames) 134 .
- Each input image may have a large number of patches 136 .
- a patch P is illustrated in one image and another patch Q is illustrated in another image.
- the runtime processing system 130 includes a computation module 140 that is configured to accept data representing two patches 136 (e.g., P and Q), and to provide a quantity K(P,Q) 142 representing a similarity between the two patches. It should be understood that this similarity computation is repeated a very large number of times, and therefore the computational resources required for this computation may represent a substantial portion of the total resources required to support the runtime system 130 .
- similarity computation module 140 is presented in the context of a video processing system as an example and that such a module is applicable in other image or video processing systems, and more generally, in other applications in which a similar similarity computation may be used.
- One approach to similarity computation is based on a kernel representation approach.
- a kernel representation approach In the discussion below, an example with two kernels, one associated with orientation and one associated with position is presented. However, it should be understood that the approach is applicable to other kernel representations with two or more components.
- a desired similarity between patches is computed as
- K grad ⁇ ( P , Q ) ⁇ z ⁇ P ⁇ ⁇ z ′ ⁇ Q ⁇ m ⁇ ⁇ ( z ) ⁇ m ⁇ ⁇ ( z ′ ) ⁇ k o ⁇ ( ⁇ ⁇ z , ⁇ ⁇ z ′ ) ⁇ k p ⁇ ( z , z ′ )
- the sum over z ⁇ P is a sum over the pixel locations z in the patch P
- the sum over z′ ⁇ Q is a sum over the pixel locations z′ in the patch Q.
- K grad ( P,Q ) F grad ( P ) ⁇ F grad ( Q )
- each of these vectors F grad can potentially be infinite dimensional depending on the kernels (such as k p , k o ). This is addressed using an approximation that projects F grad to rad an orthonormal basis with a limited number (e.g., 1 ⁇ t ⁇ T) of basis vectors. Therefore, the finite dimensional approximation of the kernel similarity is then
- ⁇ t are eigenvectors of a Kronecker product
- the corresponding eigenvalue ⁇ t ⁇ o t ⁇ p t .
- N p is used such that T o t ( ⁇ tilde over ( ⁇ ) ⁇ ( z )) ⁇ T o t ( q o ( ⁇ tilde over ( ⁇ ) ⁇ ( z ))) and T p t ( z ) ⁇ T p t ( q p ( z ))
- a kernel preprocessor 120 is used to precompute a kernel table T o [ ⁇ ] of size T ⁇ N o and T p [z] of size T ⁇ N p using the approach outlined above, generally before beginning processing of the input video.
- the kernel similarity computation element 140 reads the precomputed tables, and uses them to compute (i.e., approximate via the tables, either by direct lookup or an interpolation) the T dimensional vectors F grad (P) and F grad (Q) from which the similarity K grad (P,Q) 142 is obtained by computing the inner product as described above.
- a “bag-of-words” framework is used to represent the information from different feature descriptors. This is done in two steps—in the first coding step the descriptors are projected to a pre-trained codebook of descriptor vectors, and then in the pooling step the projections are aggregated to a fixed length feature vector.
- the primal of this problem can be formulated as
- d k 1 2 ⁇ ⁇ ⁇ ( ⁇ k ⁇ ⁇ ( ⁇ t ⁇ H k ⁇ ⁇ ) q ) 1 q - 1 p ⁇ ( ⁇ t ⁇ H k ⁇ ⁇ ) q p
- the SMO algorithm can be applied by selecting two variables at a time and optimizing until convergence.
- a number of different implementations of the runtime and preprocessing systems may be used, for example, using software, special-purpose hardware, or a combination of software and hardware.
- computation of the kernel tables is performed using a general-purpose computer executing software stored on a tangible non-transitory medium (e.g., magnetic or optical disk).
- the software can include instructions (e.g., machine level instructions or higher level language statements).
- the kernel similarity computation is implemented using special-purpose hardware and/or using a co-processor to a general purpose computer.
- the kernel tables which may be passed to the runtime system and/or stored on a tangible medium, should be considered to comprise software which imparts functionality to the kernel similarity computation (hardware and/or software-implemented) element.
- the kernel tables are integrated into a configured or configurable circuit, for example, being stored in a volatile or non-volatile memory of the circuit.
Abstract
Description
where
k p(z,z′)=exp(−γp ∥z−z′∥ 2)
and
k o({tilde over (θ)}z,{tilde over (θ)}z′)=exp(−γo∥{tilde over (θ)}(z)−{tilde over (θ)}(z′)∥2).
where the sum over z∈P is a sum over the pixel locations z in the patch P and the sum over z′∈Q is a sum over the pixel locations z′ in the patch Q.
K grad(P,Q)=F grad(P)·F grad(Q)
However, each of these vectors Fgrad can potentially be infinite dimensional depending on the kernels (such as kp, ko). This is addressed using an approximation that projects Fgrad to rad an orthonormal basis with a limited number (e.g., 1≦t≦T) of basis vectors. Therefore, the finite dimensional approximation of the kernel similarity is then
where {xi} and {yj} are preselected basis sets for the arguments of the kernel functions. For example, the set {xi} may represent do=25 angles between 0 and 2 π and the {yj} may represent dp=25 2D positions in a unit 5×5 square. In such an example, the double summation requires do×dp=625 evaluations of the innermost term for each pixel of P.
K o,c {circle around (x)}K p,c
where Ko,c and Kp,c denote the centered orientation and position kernel matrices corresponding to Ko and Kp, respectively, and the elements of the kernel matrices are defined as
K o =[K o,ij] and K p =[K p,ij]
where
K o,ij =k o(x i ,x j) K p,st =k p(y s ,y t).
αij t=αo,i tαp,j
where αo t=[αo,i t] is a (do dimensional) eigenvenvector of Ko=[Ko,ij] and αp t=[αp,j t] is a (dp dimensional) eigenvenvector of Kp=[Kp,st], and the corresponding eigenvalue λt=λo tλp t.
which can be rearranged as
and the terms in brackets can be replaced with precomputed functions
T o t({tilde over (θ)}(z))˜T o t(q o({tilde over (θ)}(z)))
and
T p t(z)˜T p t(q p(z))
αi =f(x i), i=1 . . . N
h m =g({αi}i∈N
z T =[h 1 T . . . h M t]
where ck is the kth codeword. In soft quantization, the assignment of the feature vectors to codewords is distributed as
where β controls the soft assignment. In our experiments we find soft quantization to consistently outperform hard quantization.
for each pair of samples x and y in the training set. Then, given a set of kernels {Kk} for individual features, we learn a linear combination K=ΣkdkKk of the base kernels. The primal of this problem can be formulated as
and then computing the lp-MKL dual as
A={α|0≦α≦C1,1tYα=0}, Hk=YKkY, and Y is a diagonal matrix with labels on the diagonal. The kernel weights can then be computed as
where Thi is the detection threshold. The final score P for a video is computed as P=Σiwipi/Σiwi. In our experiments, this approach consistently improved performance over any individual system.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/046,194 US9330332B2 (en) | 2012-10-05 | 2013-10-04 | Fast computation of kernel descriptors |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261710355P | 2012-10-05 | 2012-10-05 | |
US14/046,194 US9330332B2 (en) | 2012-10-05 | 2013-10-04 | Fast computation of kernel descriptors |
Publications (2)
Publication Number | Publication Date |
---|---|
US20140099033A1 US20140099033A1 (en) | 2014-04-10 |
US9330332B2 true US9330332B2 (en) | 2016-05-03 |
Family
ID=49447829
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/046,194 Active 2034-01-11 US9330332B2 (en) | 2012-10-05 | 2013-10-04 | Fast computation of kernel descriptors |
Country Status (2)
Country | Link |
---|---|
US (1) | US9330332B2 (en) |
WO (1) | WO2014055874A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10032110B2 (en) | 2016-12-13 | 2018-07-24 | Google Llc | Performing average pooling in hardware |
US10037490B2 (en) | 2016-12-13 | 2018-07-31 | Google Llc | Performing average pooling in hardware |
US11947622B2 (en) | 2012-10-25 | 2024-04-02 | The Research Foundation For The State University Of New York | Pattern change discovery between high dimensional data sets |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015196281A1 (en) * | 2014-06-24 | 2015-12-30 | Sportlogiq Inc. | System and method for visual event description and event analysis |
CN105139428B (en) * | 2015-08-11 | 2018-02-27 | 鲁东大学 | A kind of coloured image SURF character description methods and system based on quaternary number |
WO2017031630A1 (en) * | 2015-08-21 | 2017-03-02 | 中国科学院自动化研究所 | Deep convolutional neural network acceleration and compression method based on parameter quantification |
CN105550687A (en) * | 2015-12-02 | 2016-05-04 | 西安电子科技大学 | RGB-D image multichannel fusion feature extraction method on the basis of ISA model |
CN106650617A (en) * | 2016-11-10 | 2017-05-10 | 江苏新通达电子科技股份有限公司 | Pedestrian abnormity identification method based on probabilistic latent semantic analysis |
KR20180075220A (en) * | 2016-12-26 | 2018-07-04 | 삼성전자주식회사 | Method, Device and System for Processing of Multimedia signal |
US10475152B1 (en) | 2018-02-14 | 2019-11-12 | Apple Inc. | Dependency handling for set-aside of compute control stream commands |
CN108876723B (en) * | 2018-06-25 | 2020-04-24 | 大连海事大学 | Method for constructing color background of gray target image |
CN109902198A (en) * | 2019-03-11 | 2019-06-18 | 京东方科技集团股份有限公司 | A kind of method, apparatus and application system to scheme to search figure |
CN110188217A (en) * | 2019-05-29 | 2019-08-30 | 京东方科技集团股份有限公司 | Image duplicate checking method, apparatus, equipment and computer-readable storage media |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7310720B2 (en) * | 2004-06-08 | 2007-12-18 | Siemens Energy & Automation, Inc. | Method for portable PLC configurations |
US7787678B2 (en) * | 2005-10-07 | 2010-08-31 | Siemens Corporation | Devices, systems, and methods for processing images |
US7860874B2 (en) * | 2004-06-08 | 2010-12-28 | Siemens Industry, Inc. | Method for searching across a PLC network |
US7881878B2 (en) * | 2005-04-11 | 2011-02-01 | Siemens Medical Solutions Usa Inc. | Systems, devices, and methods for diffusion tractography |
US8313437B1 (en) * | 2010-06-07 | 2012-11-20 | Suri Jasjit S | Vascular ultrasound intima-media thickness (IMT) measurement system |
US8369967B2 (en) * | 1999-02-01 | 2013-02-05 | Hoffberg Steven M | Alarm system controller and a method for controlling an alarm system |
US8516266B2 (en) * | 1991-12-23 | 2013-08-20 | Steven M. Hoffberg | System and method for intermachine markup language communications |
US8805653B2 (en) * | 2010-08-11 | 2014-08-12 | Seiko Epson Corporation | Supervised nonnegative matrix factorization |
-
2013
- 2013-10-04 US US14/046,194 patent/US9330332B2/en active Active
- 2013-10-04 WO PCT/US2013/063474 patent/WO2014055874A1/en active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8516266B2 (en) * | 1991-12-23 | 2013-08-20 | Steven M. Hoffberg | System and method for intermachine markup language communications |
US8369967B2 (en) * | 1999-02-01 | 2013-02-05 | Hoffberg Steven M | Alarm system controller and a method for controlling an alarm system |
US7310720B2 (en) * | 2004-06-08 | 2007-12-18 | Siemens Energy & Automation, Inc. | Method for portable PLC configurations |
US7860874B2 (en) * | 2004-06-08 | 2010-12-28 | Siemens Industry, Inc. | Method for searching across a PLC network |
US7881878B2 (en) * | 2005-04-11 | 2011-02-01 | Siemens Medical Solutions Usa Inc. | Systems, devices, and methods for diffusion tractography |
US7787678B2 (en) * | 2005-10-07 | 2010-08-31 | Siemens Corporation | Devices, systems, and methods for processing images |
US8313437B1 (en) * | 2010-06-07 | 2012-11-20 | Suri Jasjit S | Vascular ultrasound intima-media thickness (IMT) measurement system |
US8805653B2 (en) * | 2010-08-11 | 2014-08-12 | Seiko Epson Corporation | Supervised nonnegative matrix factorization |
Non-Patent Citations (4)
Title |
---|
Bo et al., "Kernel Descriptors for Visual Recognition," Neural Information Processing Systems, Dec. 6, 2010. |
Bo et al., "Object Recognition with Hierarchical Kernel Descriptors," Computer Vision and Pattern Recognition (CVPR), IEEE Conference ON, IEEE, Jun. 20, 2011. |
Ren et al., "RGB-(D) Scene Labeling: Features and Algorithms," Computer Vision and Pattern Recognition (CVPR), IEEE Conference ON, IEEE, Jun. 16, 2012. |
Reubold et al., Kernel Descriptors in Comparison with Hierarchical Matching Pursuit, Robot Learning Seminar, Jul. 1, 2012. |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11947622B2 (en) | 2012-10-25 | 2024-04-02 | The Research Foundation For The State University Of New York | Pattern change discovery between high dimensional data sets |
US10032110B2 (en) | 2016-12-13 | 2018-07-24 | Google Llc | Performing average pooling in hardware |
US10037490B2 (en) | 2016-12-13 | 2018-07-31 | Google Llc | Performing average pooling in hardware |
US10679127B2 (en) | 2016-12-13 | 2020-06-09 | Google Llc | Performing average pooling in hardware |
US11232351B2 (en) | 2016-12-13 | 2022-01-25 | Google Llc | Performing average pooling in hardware |
Also Published As
Publication number | Publication date |
---|---|
US20140099033A1 (en) | 2014-04-10 |
WO2014055874A1 (en) | 2014-04-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9330332B2 (en) | Fast computation of kernel descriptors | |
He et al. | Spatial pyramid pooling in deep convolutional networks for visual recognition | |
CN108491817B (en) | Event detection model training method and device and event detection method | |
Mishina et al. | Boosted random forest | |
US9400918B2 (en) | Compact face representation | |
Cao | Singular value decomposition applied to digital image processing | |
Squalli-Houssaini et al. | Deep learning for predicting image memorability | |
US9436890B2 (en) | Method of generating feature vector, generating histogram, and learning classifier for recognition of behavior | |
US8428397B1 (en) | Systems and methods for large scale, high-dimensional searches | |
Safadi et al. | Descriptor optimization for multimedia indexing and retrieval | |
US8463050B2 (en) | Method for measuring the dissimilarity between a first and a second images and a first and second video sequences | |
JP5591178B2 (en) | Method for classifying objects in test images | |
Timofte et al. | Iterative nearest neighbors | |
CN113496277A (en) | Neural network device for retrieving image and operation method thereof | |
CN115937655A (en) | Target detection model of multi-order feature interaction, and construction method, device and application thereof | |
Natarajan et al. | Multi-channel shape-flow kernel descriptors for robust video event detection and retrieval | |
Damodaran et al. | Scene classification using transfer learning | |
Channoufi et al. | Spatially constrained mixture model with feature selection for image and video segmentation | |
EP3166022A1 (en) | Method and apparatus for image search using sparsifying analysis operators | |
Rana et al. | Feature learning for the image retrieval task | |
CN110532384B (en) | Multi-task dictionary list classification method, system, device and storage medium | |
Uzair et al. | Sparse kernel learning for image set classification | |
CN111984800A (en) | Hash cross-modal information retrieval method based on dictionary pair learning | |
Liyanage et al. | Satellite image classification using LC-KSVD sparse coding | |
Wang et al. | Discriminative structured dictionary learning for image classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: RAYTHEON BBN TECHNOLOGIES CORPORATION, MASSACHUSET Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WU, SHUANG;PRASAD, ROHIT;NATARAJAN, PREMKUMAR;SIGNING DATES FROM 20131209 TO 20140317;REEL/FRAME:035814/0935 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |