WO2016090439A1

WO2016090439A1 - Method for detecting the brightness of fingerprints using convolutional networks

Info

Publication number: WO2016090439A1
Application number: PCT/BR2015/000131
Authority: WO
Inventors: Rodrigo Frassetto NOGUEIRA; Roberto de Alencar LOTUFO; Rubens Campos MACHADO
Original assignee: Universidade Estadual De Campinas - Unicamp; Centro De Tecnologia Da Informação Renato Archer - Cti
Priority date: 2014-12-09
Filing date: 2015-08-28
Publication date: 2016-06-16
Also published as: BR102014030832A2; BR102014030832B1

Abstract

The present invention relates to a method for detecting the brightness of fingerprints using the feature-extraction technique known as convolutional networks and artificially increasing the data set in order to improve the accuracy of the classification system. Dimensionality reduction is also applied using principal component analysis (PCA) with whitening and a support vector machine (SVM) classifier with Gaussian kernel, used to determine whether the samples are true or false. Pre-processing operations such as frequency filtering, region-of-interest (ROI) extraction and contrast equalization can be used to improve accuracy.

Description

"VIVIDITY DETECTION METHOD OF

DIGITAL IMPRESSIONS USING CONVOLUTIONARY NETWORKS "

FIELD OF INVENTION

The present invention is inserted in the field of electrical engineering and computing, more specifically related to biometric authentication method with emphasis on false fingerprint detection.

[002] The present invention relates to a fingerprint vivacity detection method that uses convoludal networks as a machine learning technique for feature extraction and artificially augmenting the dataset to improve classifier system accuracy,

BACKGROUND OF THE INVENTION

(003] Biometric systems have become increasingly important in recent years.The main purpose of biometrics is to automatically discriminate individuals reliably using signals derived from physical or behavioral traits such as fingerprints, face, iris, voice, hand and Biometric technologies have several advantages over classic security methods based on information (password, PIN, etc.) or physical device (card, key, etc.). Of all biometric systems, fingerprint recognition systems are the most popular. However, using fake physical biometrics can be an easy way to bypass system security. In particular, fingerprints can be counterfeited using common materials such as gelatin, silicone and wood glue. Their creation can be divided into two categories: WO 20I6 / U90439 PCT / BR20I5 / INNM3I

1004} -Without cooperation: the templates are created of latent fingerprints only, that is, the phisher has no direct fingerprint access,

(005) -With cooperation: the user presses his finger on the mold to create his impression, which usually produces higher quality fingerprints.

[006] A secure fingerprint system must be able to correctly distinguish a false impression from a true one. Additionally, it is desirable that the system be able to differentiate true from false fingerprints when new counterfeiting techniques (and therefore not considered during the development phase) are presented to the system.

[007] In practical applications, classification is most often done in real time, ie the decision whether the impression is false or true is fired after the sample is presented to the system. Thus, samples need to be sorted in a short period of time (typically less than 5 seconds), especially when the system is used for large crowdsets such as public buildings and banks.

Different methods for vivacity detection have been proposed [2] [3] [4j. They can be divided into hardware based techniques and software based techniques.

[00 $] In hardware approaches, a specific device is added to the sensor to detect particular properties of a particular vividness trait, such as blood pressure [5], temperature [6], odor [7] or perspiration (8] [ 9] [10j.] A method proposed in [11} attempts to solve the problem by using skin distortion, which involves pressing and moving the finger on the sensor surface to create a distortion.

2 The movement of a real elastic skin is large compared to that of the movement of a false finger. A method for detecting false fingerprints by measuring the electrical characteristics of different skin layers is proposed in [12]. The authors used different skin characteristics such as stratum corneum impedance for measures of vivacity. Some of these methods are slow because they need the finger to be placed on the sensor surface for a few seconds until information such as perspiration and temperature is available.

In software approaches, false traces are detected as soon as a sample is captured by a common sensor. The features used to distinguish true from false prints are taken from the fingerprint image rather than the finger. The advantages of software-based techniques are that the sensor does not need to be replaced as counterfeiting techniques evolve and the cost of the biometric system becomes smaller as no additional hardware is required except that more computational power may be available. needed to process the images in real time. However, the use of additional information not present in still images captured by a common sensor is one of the advantages of methods using hardware. There are software-based techniques in which the characteristics used by the classifier are extracted from specific fingerprint measurements, such as distance between papillae [13] [14] and characteristics that are extracted in the frequency domain through the use of the Fourier transform. [15] There are implementations where features are extracted using generic pullers such as Waveiets and Local Binary Patterns (L8P). 10012] In the article by Galbally, F. A. Onso-Fernandez, J. Fierrez and J. Ortega-Garcia. "A hjgh performance fingerprint liveness detection metbod based on quality related features, '' Future Generation Computer Systems, vol. 28, no. 1, pp. 311-321. 2012, [13], a variety of quality measures, such as The intensity, continuity and clarity of the papillae are extracted from the fingerprint image using statistical measurements of local angles, power spectrum and pixel intensity.A feature selection is then made in the validation phase and a Linear Discriminant Analysis (LDA) classifier. ) is used to make the final prediction.The results show an average accuracy of 90% on two standard benchmarks, which is less than the 96.3% obtained by the present invention.Another disadvantage of the invention presented in [13] is that it is necessary to manually select the better characteristics that will be used in the classification. Eventually, future counterfeiting techniques may circumvent the proposed techniques, requiring that a new round of manual feature selection is made. The present invention does not suffer from this problem as it automatically learns the characteristics that best discriminate true from false fingerprints based on the images provided during training. This way, for the network to learn to detect them. Just re-train using samples of the new counterfeiting techniques. 10013] In the article by S. Moon. JS Chen, KC Chan, K. So and KC Woo _: "Wavelet based fingerprint liveness detection," Electronics Letters, Vol. 20. no. 41, pp. 1 112-1113, 2005. [16], wavelets are used as feature vectors. False and true fingerprints have a significant difference in the distances between papillae and papillae frequency. Wavelet analysis uses a multi-resolution and orientation representation of a fingerprint image through subbands. Due to the multi-resolution property of wavelets, true and false fingerprint texture differences are analyzed at various scales. Additionally, the subbands carry high frequency components, which are quite striking for texture characterization. However, the method has some disadvantages: the images used are those captures immediately after the finger is placed on the sensor surface. It is not known whether this method is applicable to any type of fingerprint image as some devices wait a while (up to one second) to effectively capture the image. Also, as the method is based on perspiration, finger drying may be required before capture. The main differences from [16] with respect to the present invention are that the former does not use data augmentation and the feature extraction techniques are different. The accuracy obtained using Wavelets is lower than the accuracy obtained using the convolutional networks proposed in the present invention.

(0014] In the article by Nikam and S. Agarwai, "Locai binary pattern and wave-based spooffingerprint detectton," int J. Biometrics, vol. 1. no. 2, pp. 141-159, 2008, a system that uses a combination of several techniques is presented (17] is made the union of different classifiers (k-NN, SVM and AdaBoost) using the rule of "majority ^vote". They are trained using different extractors characteristics (LBP and wavelets, The authors note that the performances of LBP and wavelet extractors are similar and that the performance of hybrid classifiers is better than the performance of individual classifiers. The main difference from (17] with respect to the present invention is the feature extraction technique used.Although the combination of techniques often yields accuracy, the accuracy obtained in [17] is less than the accuracy obtained using only the convolutional networks as proposed in the present invention.

In the article by X. Jia, X. Yang, K. Cao. Y. Zang, N. 2hang _: R. Dai and J. Tian, "Mu-scale Local Binary Pattern with Filters for Spoof Fingerprint Detection," Information Sciences, Vol. 268 _: pp. 91-102. 2013, [18], a multi-scale variant of LBP presented performs well on fingerprint vivacity detection benchmarks. Due to the fact that the original LBP cannot capture large regions, the texture of fingerprint images may be too complex to be represented by the technique. In addition, LBP has high noise sensitivity (19), so the multi-scale LBP operator (MSL8P) is introduced by applying multiple LBP filters with different radii, and combining each LBP image through concatenation. histograms, forming the final feature vector. With increasing L&P scale, large sample distances make the LBP fragile, so the MSLBP is combined with a low-pass Gaussian filter set. An SVM classifier is then trained The use of an LBP multi-scale operator can be considered equivalent to a single-layer convolutional network trained using the scaling data augmentation technique. The advantage of the technique proposed in the present invention is that , through the use of multiple layers, data can be represented through the hierarchical combination of several filters. exponentially smaller filters to represent the same structure compared to the number of filters required using single layer techniques. Additionally, in convolutional networks the filters can be learned based on the training data, while the technique proposed in [18) has fixed value filters. The advantage of learning filters is that the system can adapt as input data changes. This is effective when, for example, new print forgery techniques emerge, as the system will learn to recognize these new techniques through the use of new filters. Since the type of feature extracted by the technique proposed in [18] is fixed (ie texture) and independent of training data, the technique can become useless if a new counterfeiting technique that has a texture very similar to the actual prints is created. since the system, which is based on this feature alone, will not be able to differentiate between true and false impressions.

(0016 | In applications such as fingerprint vividness detection, image degradation may limit the applicability of systems that use texture as information for classification. A degradation class is blurring due to movement or lack of focus during image capture. Because "blurring" of the image is difficult to remove and introduces new artifacts, it is desirable that the system be able to analyze texture so as to be invariant to blur.

L Ghiani's article, GL Marcialis and F. Roli, Tingerprint Liveness Detection by Local Phase Quantization, "21st International Conference on. IEEE, 2012, [20], attempts to solve the problem by using Local Phase Quantization descriptors. (LPQ), which use the quantized phase of the discrete Fourier transform calculated locally in a window that goes through all positions of the image. The phases of the four low-frequency coefficients are decorrelated and uniformly quantized in an eight-dimensional space. A histogram of code words is created and used as a feature for texture classification. Ideally, low frequency components are shown to be invariant to symmetrical and central blurs. Although this ideal invariance is not completely obtained by the finite-size window method, the technique is quite blunt-insensitive. Since only a single phase information is used, the technique is also invariant to uniform changes in illumination.

Thus, the authors argue that LPQ's effectiveness lies in its ability to represent the full spectrum of image characteristics in a very compact manner, which avoids redundant information. Because different orientations of a fingerprint can be captured by the sensor, they have adopted a version of LPQ that is invariant to rotation. The results show that PPI and LPQ have similar performances and preliminary experiments show that there is complementarity between the techniques, but more studies are needed. The disadvantages of the system proposed in the article by L. Ghiani et al .. [20] are similar to those of the system proposed in the article by X. Jia, [18]. The technique extracts a single type of feature (blurs) and if a new counterfeiting technique is created that can be similar to an actual print image on blur, the system can be circumvented. Already the technique proposed in the present invention has the ability to automatically learn which are the relevant characteristics based on the training data. So if a new counterfeiting technique comes up, just train the existing system with images that use the new technique and filters will learn what are the relevant characteristics to discriminate the new data set. Furthermore, the technique proposed in [20] has an exponentially lower data representation power for the same number of filters as the convolutional network technique (used in this invention), since the first uses only a single layer to extract the characteristics. while the second uses multiple layers, allowing a hierarchical and, consequently, more compact representation of the data.

[0019] Article L. Ghiani, HA, G.L Marcialis and F. Roli. "Fingerprint liveness detection using Binarized Statistical image Features." In Biometrics: Theory, Applications and Systems (BTAS), 2013 1-6, September 2013, [21] attempts to combine the advantages of both LBP and LPQ through a local descriptor called Binarized Statistical Image Feature (BSIF). The idea is to automatically learn a filter set from a small set of natural images, rather than using manually chosen filters such as LBP and LPQ. To characterize texture properties, each region of the fingerprint is used to histogram the values obtained with BSIF. The value of each element (ie the bits) in the BSIF binary code string is computed by binarizing the response of a zero-threshold linear filter. Each bit is associated with a different filter, and the length of the string used determines the number of filters used. The filter set is learned through Independent Component Analysis, which maximizes the statistical independence of filter responses. The results are promising, but as the experiments were Made using filters previously learned with only 13 natural images, performance could certainly be improved if more training images were used and captured from a specific sensor. In the present invention, the filters may be combined in the form of several layers, which enables hierarchical representation of structures. The advantage of multilayer techniques, such as the convolutional networks used in the present invention, over techniques using only a single layer, as in the article by L. Ghiani et al., [21] is that they require an exponentially number. smaller filters to represent the same structure. In other words, multilayer techniques offer greater structure representation power for the same number of filters than single layer techniques. Additionally, the filters mentioned in L Ghiani's article, [21]. have binary values while convolutional filters have real values, which allows more detailed representations.

[0020] US20130294663 describes systems and methods for detecting vividness in fingerprints by multi-resonational texture analysis. The technology described in the patent resembles the present invention in that it comprises a method for detecting fingerprint vivacity which utilizes, among the aforementioned techniques, a method comprising the transfer of a plurality of images to said application (feature similar to increased data usage comprised in technology). However, the document only mentions that the system receives multiple images as input, not to mention how multiple image capture is performed. In addition the method reported in US20130294663 requires as input at least two images of the same fingerprint while the present invention, in contrast, creates a plurality of images from capturing only one image. {0021} The technology described in US5825907 refers to the neural network system for fingerprint classification. The present invention resembles the technology described in the US patent in that it comprises the use of a type of neural network. However, it differs from the fact that the methodology presented in US5825907 comprises an application for classifying a fingerprint into 5 distinct classes. These classes can then be used to reduce the number of fingerprints to be searched in an identification search. With respect to the present invention, the method is used for detection of vividness, the application of which is not mentioned in the cited patent. Another difference is that it uses convolutional networks, which are a type of neural network but whose implementation differs from the implementations of a classical neural network. In this sense, convolutional networks present better scalability and performance (accuracy) in several computer vision problems.

US7095880 discloses a method and apparatus for fingerprint capture. The described technology resembles the present invention in that it comprises a method of transferring a plurality of fingerprint images to said application. However, the proposed invention differs from the technology mentioned in the document in that it artificially creates multiple images. , requiring only a single image to be captured by the sensor The technology cited requires that the sensor capture multiple images for processing, It is also necessary to hold the user's finger on the sensor until multiple images are captured. This requirement may cause undesired slowness to the user. Another disadvantage is that US technology is restricted to sensors that are capable of capturing multiple images.

BRIEF DESCRIPTION THE INVENTION

The present invention relates to a fingerprint vivacity detection method using convolutional networks comprising the following main steps:

a) Artificial increase of image data.

b) Image preprocessing;

(c) Extraction of image characteristics by convolutional networks;

d) reduction of dimensionality; and data normalization;

e) Classification of samples (convoluted and reduced) to true or false using a binary classifier;

The method is performed in two phases: the first of training and the second of classification.

In the method presented step c is used at least once for system training.

Additionally, in the method presented above, steps c and e are required.

(0027) The method proposed in the present invention can be performed on cloud computers, local servers, desktop computers, notebooks.

BRIEF DESCRIPTION OF THE FIGURES

f0028] Figure 1 - Process flow overview 10029] Figure 2 - Original image (left), filtered with a low-pass filter (middle) and filtered with a high-pass filter (right).

[0030] Figure 3 - Sequence of steps for extracting the region of interest from a fingerprint image.

[0031] Figure 4 - Original images (left) and with CLAHE applied

(right).

[0032] Figure 5 - Illustration of a sequence of single layer convolutional network operations applied to an image of a fingerprint.

[0033] Figure 6 - Illustration of a sequence of operations performed by a two-layer convolutional network.

[0034] Figure 7 - Original images (left) and divisive normalization (right).

Figure 8 - Illustration of three types of artificially increasing data transformations: horizontal reflections, rotations, and translations.

DETAILED DESCRIPTION OF THE INVENTION

The present invention relates to a method of fingerprint vivacity detection using convolutional networks. comprising the following main steps:

a) Artificial increase of image data.

b) Image preprocessing:

b1) Image reduction by bi-linear interpolation;

b2) Frequency filtering comprising a low pass filter or a high pass filter;

b3) Determination of the region of interest (ROI) of the image; b3,1) Application of the morphological opening to highlight the region where the fingerprint is located;

b3.2) Calculation of the center of mass and standard deviations of the height and width of the image obtained in step b3,1;

b3.3) Determination of the rectangle that will contain the region of interest:

b4) Image contrast equalization;

c) Extraction of image characteristics by convolutional networks;

d) Reduction of image dimensionality; and data normalization; and

e) Classification of images (convoluted and reduced) into true or false using a binary classifier.

(0037] The method is performed in two phases: the first of training and the second of classification.

The training phase further comprises a validation sub-step which can be performed as follows:

1.Creation of hyper parameter value sets and combinations of execution or not each of the optional steps;

2. From the sets created in 1, choose one and apply the settings to the modules in steps a through e,

3. Division of the training set into two blocks, A and B;

4. The system comprised of steps aae with the configuration chosen in step 2 is trained using the dataset A and validated using the dataset

B;

5. The system comprised of steps a through and with the configuration chosen in step 2 is trained using dataset B and validated using dataset

THE;

6. Repeat steps 3. 4 and 5 at least five times, each time using a different division to create blocks A and B;

7. Accuracy computation averaging at least 10 validations.

8. Steps 2 through 7 are repeated for each set of hyper parameters created in step 1.9. Choose the values for hyperparameters of each step and which steps will compose the fini system, based on the best accuracy computed in 7; In the present invention, the hyper-parameter choices and the choice of processing steps that will make up the system were made using the 5x2-Fold cross-validation scheme.

In the training phase, step c is performed at least once for system training.

Steps c and e are required for both phases (training and classification).

[0041] The steps a. b1. b2, b3, b4 and d are optionally performed in both phases (training and classification).

[0042] The method proposed in the present invention can be performed on high performance cloud computers to shorten training times. The method can also be performed on local servers, desktops, laptops, devices Programmable Gate Arrays (FPGA) that have sufficient memory (preferably larger than 1GB).

(0043) The execution or not of each of these steps can be better understood by analyzing Figure 1, and the final model is decided in the validation sub-step, which is a sub-step of the training phase, which consists in choosing the system hyper-parameters such as image resizing factor, number of PCA components and filter dimensions of convolutional networks. However, the present invention provides an extension of traditional validation methods by including choosing which steps will make up the final system. In other words, besides choosing the best values for hyperparameters of each step, the choice to execute (or not) each step is also made in the validation phase. The combination of steps that had the best accuracy during the validation phase will be the final model. More specifically, the following steps were performed:

(1) - Each step is seen as a Booiean parameter. If the parameter has true value, the step will be executed, otherwise not.

(2) - All possible combinations of these parameters are made. An example of these combinations would be: perform step a, perform step bl do not perform step b2, perform step b3 and do not perform step b4, do not perform step d. Because c and c are always performed, there are six steps (a. Bl b2. B3 b4 ed) that may or may not be performed. Therefore, we have a total of 2 ^S = 64 possible combinations.

(3) - For each of the combinations, the accuracy of the system is evaluated using the validation data set, which can be consist of a few tens, preferably hundreds, of true or false images.

(Ay The combination that obtained the best accuracy in the validation set will be chosen to compose the final system.

Below is a detailed description of each step involved related to the proposed method.

Step a - Data Increase

Data augmentation is a technique of artificially creating slightly modified samples from the original samples. The data augmentation techniques of the present invention comprise the group of data augmentation techniques selected from rotations, translations, horizontal reflections, scaling, local alteration of pixel intensities, noise addition or sample creation from models. 3D of the object to be categorized, preferably horizontal translations and reflections, for artificially creating slightly modified samples from the original samples. By using an augmented data set during training, the classifier is expected to become more robust against small variations that may be present in the data, forcing it to learn more complex (and possibly more important) structures. It is successfully used in many computer vision applications [15], (16] and

[17] In the data augmentation technique, horizontal reflections and translations are performed as the two types of artificial data augmentation transformations.

Step b - Preprocessing:

(G046] Four preprocessing operations can be used in the present invention: image reduction (step b1), filtering on frequency (step b2). extraction of the region of interest (ROi) (step b3) and contrast equalization (step b4).

Step Bi - Image Reduction:

[0047] Images may be reduced in size to shorten training times. In general, training time is linearly proportional to image size. Therefore, if the image is reduced to half its original size, processing time is expected to be reduced by almost half. In this invention image reduction is done using bt-linear interpolation and images can be scaled from 100% (no reduction) to 25% of their original size, preferably 100% (no reduction).

Step b2 - Frequency Filtering:

It is possible that noise removal through low pass filtering may increase classifier performance. It is also possible that the information relevant to differentiating false from true fingerprints may be in the low frequency components, so high pass filtering can improve the accuracy of the system. The low pass filter is preferably implemented as the convolution of an image by a Gaussian filter and the high pass filter is preferably implemented as the subtraction of the original image filtered by the low pass filter. The filters used in frequency filtering have a standard deviation of 3 and window size between 7x7 and 21x21 pixels, preferably 13x13 pixels. Figure 2 shows the effect of these filters on some fingerprint images.

Step b3 ^' ~ Region of Interest - ROI: Many fingerprints of some sensors are not centered on the image and / or the background corresponds to much of the image. For the classifier system to receive samples that have the highest fingerprint / background ratio possible, a simple technique for extracting the region of interest was created through the following steps:

Step b3.1 - Morphological Opening

(0050] The morphological aperture is applied to highlight the region where the fingerprint is in. The structuring element should be greater than the maximum distance between papillae comprising size range of 10 to 35 pixels for 640x480 pixel resolution images, preferably 21. pixels, ensuring that digital printing will become a continuous region after the operation.

Step b3.2 - Calculation of the center of mass and standard deviations (0051) The center of mass and the standard deviations of the height and width of the image obtained in step b3.1 are plotted.

Step b3.3 - Determining the rectangle that will contain the region of interest

(G052] There is the region of interest, which will be the center-centered rectangle whose dimensions of height and width are three times the standard deviations calculated in the previous step.

The sequence of steps for extracting the region of interest from a fingerprint image can be seen in Figure 3.

Step b4 - Contrast Equalization:

For contrast equalization, techniques selected from the group of histogram equalization techniques are used, preferably the technique called Contrast Limited Adaptive Histogram Equalization (CLAHE) [1] for contrast equalization, which is a variant of the Adaptive Htstogram Equatization (AHE) | 2J technique. The HE computes several histograms, each corresponding to a distinct section of the image and uses them to redistribute the image pixel intensities. It is therefore suitable for improving the local contrast of images. However, AHE has a tendency to amplify noise in relatively homogeneous regions of the image. CLAHE avoids this by limiting amplification by using upper limits for histogram values before computing the cumulative neighborhood distribution function. For the calculation of the histogram, a neighborhood with a disc diameter between 5 and 60 pixels, preferably 30 pixels, is used. Figure 4 compares original and CLAHE filtered images.

Step c - Convoiucionats Networks:

A classical convolutional network is composed of alternating layers of convolutions and local pooling (ie, sub-sampling) [39]. The objective of the first convolution layer is to extract patterns found within focal regions of the input image that are common throughout the sample set, this is done by convolution of a filter by the input image and the resulting output is a map of c characteristics for each layer filter.

The resulting activations of f (c) are then passed to the pooling layer, which aggregates the information into small local regions, R, producing a grouped feature map s (usually smaller than c) as output. Denoting the aggregate function as pool () for all feature maps c. we have:

10057] where Rj is the pooling region] in the feature map c and c is the index of each element within it. Among the various types of pooling, two are commonly used: average and max. Average pooling returns the average (or the sum) of the activation units of a neighborhood R,

l0058) Max-pooting selects the maximum value within an R region,

The motivation for using pooling comes from the fact that map activations are less sensitive to exact locations of image structures than the original feature map c. In a multi-layered model. Convoluctonal layers, which take in and out of pooling layers, can then extract features that are increasingly invariant to local transformations of the input image [3] [4]. This is important for classification, as these transformations can hide the identity of the object. Therefore, pooling attempts to obtain invariance to changes in position and lighting conditions, robustness to noise, and compact representation.

(oo $ 0) In the present invention possible pooling operations are selected from the average pooling, max-pooting, $ torch $ tic pooling, multi-scale / spatial pyramidal pooling, preferably max ~ pooling operations group.

Convolutional networks can have multiple convolution and pooling layers. The purpose of stacking these layers is to create a system that is able to capture more complex structures in the samples, because the characteristic patterns obtained in one layer are used as input to the next layer, thus allowing a hierarchical representation of the found structures, thus obtaining more complex structures as one rises in the hierarchy. .

Since convolution is a linear operation and the combination of two or more linear operations is equivalent to a single linear operation, the effort to build a multilayer network would be in vain if we stacked only convolutional layers. nonlinear function is applied to each element of each feature map c: a = f (c), resulting in a network composed of multiple nonlinear layers.Famous functions can be used for f (c), where tanh (c) and logistic functions are popular choices.In this invention we use f (c) - max (c) as a nonlinear function.Generally speaking, this function has been shown to perform better than others [5].

(0063] Figure 5 illustrates a sequence of operations performed by a single-layer convolutional network.The input image is convoluted with three random filters of size between 5x5 and 15x15, preferably 5x5 (enlarged for easy viewing), generating three A nonlinear max (x, 0) function is then applied to the images, followed by a max-poolmg operation and sub-sampling by a factor of 2.

[0064] Figure 6 illustrates a sequence of operations of a two-layer convolutional network (the nonlinearity and max-poolmg layers are not shown to simplify the illustration). First layer output images are used as input to the second layer. They are convoluted with nine random filters and go through max-pooling and sub-sampling operations. The resulting images are usually rasterized and concatenated, forming a one-dimensional vector that is then used as input to a binary classifier comprised in step e.

(0065) Convolutional filter weights may be random or learned (6j. The first approach is easier to implement and has system training time and is obviously faster. However, convolutional networks whose filters have been learned have a accuracy better than networks whose filters are random [7] [8] [9].

Several variations of the architecture presented above may also be used. A common approach is to add a layer that performs the local contrast normalization operation (not to be confused with the contrast equalization described in the preprocessing step) between the convolution and pooling layers. The object of this layer is to normalize the intensities of the pixels based on their surroundings. Two types of local contrast normalization are commonly used: subtractive normalization and divisive normalization. The subtractive normalization operation on an image can be defined as:

Where w _pq is a window of gaussian weights so that

Where i refers to the index of the third dimension of the image, j and k refer to the first two dimensions of the image (height and width), and q refer to the neighborhood region defined by j and k.

[0069] Divisive normalization is calculated as:

Figure 7 shows divisive normalization using 9x9 size filters applied to some fingerprint images. Step d - Dimensionality reduction using the Principal Component Analysis (PCA) method and normalization using the Whitening technique:

After feature extraction using convolutions, each data set sample size is independently normalized to zero mean and unit variance. This is usually necessary because the objective functions of many machine learning methods (such as the Support Vector Machine using kemel Gaussian) assume that all dimensions are zero-centered and have variance of the same order. If one dimension has a variance that is much larger than the others, it may dominate the objective function and render the estimator unable to use the other dimensions to learn as expected.

f0072 | Normalized data is then subjected to dimensionality reduction by techniques comprised of the Principal Component Analysis technique group. Principal Component Analysis - PCA), Kemel PCA. Linear Discriminant Analysis (LDA), Independent Component Anafysis (ICA), or duct-encoders, preferably PCA. Principal Component Analysis is a statistical procedure that uses orthogonal transformations to convert a set of possibly correlated observations of variables into a set of variables whose values are linearly uncorrelated, called principal components [10]. The number of major components is less than or equal to the original number of variables comprising 30 to 1300 major components. This transformation is defined so that the first major component has the largest possible variance (ie that takes into account the largest possible amount of data variability), and that each successor component has the largest possible variance within the constraint that it it must be orthogonal (that is, uncorrelated) to its predecessor components.

After PCA, Whiteniny's technique [11] [12], also known as Sphering, is applied to normalize the variances of the main components, this has been shown to be quite effective in computer vision applications [13]. The technique divides the main components by their standard deviations, which results in a covariance matrix that is identity. Denoting the main components rotated by vj, this means that it is computed

To obtain the standard components s,. Such a procedure is useful in classification models that make assumptions about signal isotropy, which is the case with SVM classifiers that use Gaussian kemel.

Among the data normalization techniques, whitening or sample normalization, preferably whitening, may be used.

Step e - Image Classification:

76 Samples that had their characteristics extracted by convolutional networks and were dimensionally reduced by PCA (performing this step is optional) will serve as input to the classifier. Because it is a binary classification problem, ie samples should be classified into only two classes, true or false, any binary classifier can be used, such as: K-Neighbors-Mals-Nearby. Linear Discriminant Analysis (LDA), Naive Bay es, Logistic Regression, Decision Trees, Neural Networks (with a Softmax function in the last layer), among others. In this invention a $ upport Vector Machines (SVM) classifier was preferably used. which is suited to the problem because it is an inherently binary classifier and is widely used in machine learning problems [14].

The following are preferred configurations for each of the five main steps of the present invention: data augmentation, preprocessing, feature extraction using convolutional networks. reduced size, classification.

Data Increase:

Preprocessing:

In the preprocessing step the sequence of operations described below can be performed In the preferred embodiment of the invention no processing operation is used _: once that they bring little or no accuracy gain and increase processing time.

1. Extraction of the region of interest: For the opening operation, a box is preferably used as a structuring element of 21x21 pixels. The rectangle that marks the region of interest is three times the standard deviation of the image.

2. Image Dimension Reduction: Preferably the image is reduced to 250x250 pixels. However, it is possible to use different resolutions, and the best one can be chosen in the model validation step.

3. Application of a low pass or high pass filter: Gaussian weights with standard deviation of 3 and window size 13x13 pixels are preferably used.

4. Contrast Equalization: A 30-disc diameter neighborhood is preferably used to calculate the histogram.

Feature extraction using convolutional networks:

The network architecture is chosen during the validation phase. Preferably, the best amount of layers is between two to five layers. It has been experimentally noted that architectures with more than five layers increase processing time without noticeable accuracy gains. Already single layer architectures have lower accuracy in most cases.

[Ooeoj Preferred. max-pooling is used as a pooiing operation. However, the invention may be used with other types of pooing, such as average, stochastic or pyramidal.

The local contrast normalization layer is not used in the preferred embodiment of the invention due to the increase in the computational cost and the small gains in accuracy it brings when it is used.

Dimensionality Reduction:

Preferably, PCA with whitening is used for size reduction. The number of final dimensions should be chosen such that the resulting dimensions should represent at least 98% of the variability of training data.

Other dimensionality reduction techniques may also be used, such as PCA without Whitening, Linear Discriminant Analysis (LDA) or Independent Component Analysis (ICA).

In addition, one can choose not to use any dimensionality reduction technique, as this step is the largest responsible for the memory consumption (about 90%) of the system. However, if the step is not used, a fall in accuracy of between 1% and 2% is expected.

Classifiers:

When working with SVM classifiers it is necessary to choose the kernel. Gaussian kernet was used in this invention. Another option among the various types of fears would be the linear kemel, which results in faster training times than the Gaussian kernel. despite having a lower accuracy in some cases [18]. The smoothing parameter C and the coefficient y are chosen during the validation phase.

Validation Phase:

In the present invention, the choice of hyper-parameters and the choice of processing steps that will make up the system were done using the 5x2-Fold cross-validation scheme [19], which works as follows:

1. Divide the training set into two blocks. A and B;

2. Train in A and validate in B;

3. Train in B and validate in A;

4. Steps 1, 2 and 3 are repeated five times, with each iteration using a different division to create blocks A and B.

5. Accuracy is computed by averaging 10 (5x2) validations.

Other cross validation schemes, such as 10-Fold, can also be used. However, it was noted during the experiments that both schemes bring similar results for the choice of hyper-parameters, but 5x2-Foid is faster because it uses only 50% of the training dataset instead of the 90% used by the scheme. 10-Foíd (it is assumed here that training is slower than classification and both methods iterate 10 times over the data set).

(0088] Table 1 summarizes the parameters used in each module during the validation phase.

(0089] Table 1 - List of hyper parameters used during the validation phase.

Hyper-

Variations Module

parameter

Factor of

25% to 100%

J pre-reduction

(image processing size (in i

original) each dimension) Networks

convolutions

When the data augmentation technique is used it is important to be careful not to shuffle the samples between training set and validation set, ie samples that were derived from the same image should not be separated into training set. and validation, because if this occurs, the generalizer capacity of the classifier will be impaired, since similar samples occurred in both the training and validation stages.

Other Implementation Details:

[0091] The methods were written in Python and most code uses functions from the Numpy, Scipy, Scíkit-image, and Scíkit-Leam packages, except for convolutional networks, for which an efficient implementation provided by [20] was used. Any code that implements a classic convolutional network should have the same results as we had.

[0QB2) Numpy is a general purpose vector processing package designed to manipulate large multidimensional arrays. Although Numpy is an extension of Python, its functions are written in C. Thus, any algorithm that can be primarily expressed as vectors or arrays can be executed as fast as their C equivalent.

[0093] An important aspect of this invention is that the methods have been performed on cloud computers, where the user can rent virtual computers and pay only for the hours the machines have been in use. Among the advantages offered by this type of service is the ability to rent out-of-the-box instances (with machine learning packages such as Scikit-Leam already installed), high availability (often greater than 99%) and instances optimized for high processing power, commonly used on web servers, on-demand processing, distributed analysis and video processing. For the system training phase we used the fastest instance available, with 32 processing cores and 60 GB of RAM, which allowed us to train models using augmented data sets and exhaustively search for the best hyper parameters in just a few. hours If ordinary desktop computers or notebooks were used, the training would take days or weeks. It is noteworthy that while the system training times are long, the rating / prediction times are less than half a second per image on a low performance desktop or notebook computer (with 2 cores and 2 GB of RAM).

In the present invention, for each sensor type a classifier is trained. This makes it easier for classifiers to learn since each classifier will learn sample characteristics that are similar to each other. However, it is also possible to train a single classifier for two or more sensor types. The advantage of this approach is that the classifier design (such as the choice of hyper- parameters) needs to be done only once. However, it is necessary to resize the images so that they are all the same size, since sorting accepts only samples of the same size. Preferably, the sample size will be the smallest size of all samples.

Bibliography

[1} A. Wiehe, T. Sondrol. Olsen, O. K. and F. Skarderud, "Attacking fingerprint sensors," Technical report, NISLAB Authentication Laboratory, Gjuvik University Cotlege, 2004.

[2] Y. Chen and A. Jain, "Fingerprint deformation for spoof detection." Proc. IEEE Biometric Symposium, pp. 19-21, 2005.

[3} B. Tan and S. Schuckers, Xomparison of ridge-and-intensity perspiration liveness detection methods in fingerprint scanners, "Defense and Security Symposium International Society for Optics and Photonics. 2006.

[4] P. Coii, G. L. Marciais and F. Roli, "Fingerprint siiicon replies: static and dynamic features for vitality detection using an optics! Capture device." Internationa! Journal of Image and Graphics 8.04, pp. 495-512, 2008.

[5] P. Lapsley. J. Lee, D. Stop and N. Hoffman, "Anti-fraud biometric scanner that accurately detects blood fiow". US Patent 5,737,439, 1998.

16] M.R. Arneson, B.L. Blan. H. M. Carim and D. W. Osten, "Biometric. Personal authentication system". U.S. Patent 5,719,950, 1998.

[7] D. Baldisserra, A. Franco, D. Maio and D. Maltoni, "Fake fingerprint detection by odor analysis." in Advances in Bhmetrics, Berlin Heidelberg, Springer, 2005, pp 265-272.

[8} R. Derakhshani, S. Schuckers, L. Homak and L. O'Gorman, "Determination of Vitality from Noninvasive Biomedical Measurement for Use in Fingerprint Scanners," Pattern Recognition, Vol. 36, no. 2. pp. 383-396, 2003.

[9} S. Parthasaradhi, R. Derakhshani, L. Hornak and S. Schuckers, Tme-series detection of perspiration as a liveness test in fingerprint scanners, "IEEE Transactions on Systems. Man, and Cybemetics-Part C: Applications and Reviews, Vol. 35, no. 3, pp 335-343. 2005

[10] S. Schuckers and A. Abhyankar, Detecting! Iveness in fingerprint scanners using wavelets. resutts of the test dataset, ^* Proceedings of BioAW _t pp. 100-110, 2004.

[11] A. Antonelli. R Cappelli, D. May and D. Maltoni, "Fake Finger Detection by Skin Distortion Analysis / ^* Information Forensics and Security, pp. 360-373, 2006.

[12] O, G. Martinsen, S. Ciausen, JB Nysaather and S. Grimnes, "Utilizing eectric properties of the epidermal skin layers to detect fake fingers in biometric fingerprint systems— A pilot study," Biomedical Engineering, IEEE Transactions on , vol. 5, no. 54, pp. 891-894, 2007.

[13] J Galba! Ly, F. Alonso-Femandez, J. Fierrez and J. Ortega-Garcia, "A high performance fingerprint detection method based on quality related features," Future Generation Computer Systems, Vol. 28, no. 1, pp. 311-321, 2012.

[14] A. K. Jain, Y. Chen and M, Demirku, "Pores and ridges: high-resolution fingerprint matching using led 3 features," Pattem Anaiysis and Machine Intelligence, vol. 29, no. 1. pp. 15-27. 2007

[15] P. Coli, G. S. Marciaiis and F. Roli, "Power spectrum-based fingerprint vitality detection," Proceedings of IEEE Workshop on Automatic Identification Advanced Technologies, pp. 169-173, 2007.

116] Y. S Moon, JS Chen, KC Chan, K. So and K. C Woo _:

"Wavelet based fingerprint liveness detection," Electronics Letters, Vol. 20, no. 41, pp. 1112-1113, 2005.

[17] S. Nikam and S. Agarwal, "Locai binary pattem and wavelet-based spoof fingerprint detection." Int. J. Biometrics, vol. 1. no. 2. pp. 141-159. 2008

[18] X. Jía. X. Yang. K. Cao, Y. Zang, N. Zhang, R. Dai and J. Tian, "Multi-scale Locai Binary Pattern with Filters for Spoof Fingerprint Detectron / 'Information Sciences, vol 268, pp. 91-102, 2013. [19] T. Ojala, M. Piettkainen and T. Mãenpáà, "Multiresoiution gray scale and rotation invariant texture analysis with local binary patterns," IEEE Trans. Patiern Anal. Mach Inteit, Vol. 24, no. 7, pp 971-987. Jul 2002

[20] L. Ghiant, G.L. Marcialis and F. Roli, "Fingerprint Detection by Local Phase Quantization," 21st International Conference on. IEEE, 2012.

[21] L. Ghiani, HA, G. L Marciaiis and F, Roli, Tingerprint liveness detection using Binarized Statistrcal Image Features, "In Biometrics: Theory, Applications and Systems (BTAS), 2013 IEEE Sixth International Conference on. Pp. 1 -6, September 2013.

[22] A. Krizhevsky. I. Sutskever and G. E. Hinton, "ImageNet

Classification with Deep Convolutional Neural Networks, "NIPS, Vol. 1, No. 2, 2012.

[23] D, C. Ciresan. U. Meier, J. Masci. L.M. Gambardella and J.

Schmidhuber, "High-Performance Neural Networks for Visual Objective Classification," arXiv: 1102.0183, 2011.

[24] D. Ciresan, U. Meier and J. Schmidhuber, "Multi-column Deep Neural Networks for Image Classification," Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on., Pp. 3642-3649. June 2012

[25] J. B. Zimmerman, S. M. Pizer, E. V. Staab, J. R. Perry, W.

McCartney and BC Brenton, "An Evaluation of the Effectiveness of Adaptive Histogram Equalization for Contrast Enhancement" ^, Medical Imaging, IEEE Transactions, pp. 304-312, 1988.

[26] S. M Pizer, E. P. Ambum, J. D. Austin, R. Cromartie, A.

Geselowitz, T. Greer and K. Zuiderveld, "Adaptive Histogram Equalization and Its Variations," Computer vishn, graphics, and image processing, pp. 355-368, 1987.

[27] YL Boureau, J. Ponce and Y. LeCun, "A theoretical analysis of feature pooling in visual recognition," Proceedings of the 27th International Conference on Machine Leaming (ICML-10), 2010. [28] M, D. Zeiler and R. Fergus. "Stochastic pooling for regularization of deep convolutional neural networks," arXiv preprint arXiv; 1301.3567, 2013.

(29] D Baldisserra _! A. Franco, D. May and D. Maltoni, "Fake fingerprint detection by odor analysis," in Advances in

8iometncs _t Berlin Heidelberg, Springer, 2005, pp. 265-272.

[30] GE Hinton, S. Osindero and Y.-W. Teh, "A fast leaming algorithm for deep belief nets, ^M. Neural computation 18 7, pp.

1527-1554. 2006

[31} Y. LeCun, K. Kavukcuoglu and C. Farabet, "Convolutional Networks and Applications in Vision," in Circuits and Systems

(Bait). Proceedings of 2010 IEEE International Symposium on, pp. 253-256. May 2010.

[32] K. Jarrett, K. Kavukcuoglu, M. Ranzato and Y. LeCun, "What is the best multi-stage architecture for object recognition?"

Computer Vtsion, 2009 IEEE 12th International Conference on.

IEEE, pp. 2143-2153, 2009.

[33] V. Nair and G. E. Hinton, "Rectified linear units improve restricted Boltzmann machines," Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010.

[34] J. Jackson, The Use Guide to Principal Components, Wiley.

1991

[35] A. Hyvärinen, J. Hurri and P. O. Hoyer. "Principal Components and Whitening / 'in Natural Image Statistics, London, Springer, 2009, pp. 93-130

[36] 'Whitening, "[Online]. Available: http://ufldlstanford.edu/wiki/index.php/Whitening. [Accessed 08 03 2014].

[37] A Coates, AY Ng and H. Lee, "An Analysis of Single-Layer Networks in Unsurpassed Feature Leaming," International Conference on Artificial Intelligence and Statistics $ tic $ _t 2011

[38] N. Cristianini and J. Shawe-Taylor, An introduction to support vector machines and other kernel-based learning methods., Cambridge university press, 2000. 139] C.-W. Hsu, C.-C. Chang and C.-J. Lin, "A Practical Guide to Support Vector Classification.," 2003.

[40] T, G. Dietterich, "Approximate Statistical Tests for Comparing Supervised Classification Leaming Algorithms," Neural Computation, Vol. 10, no 7. pp. 1895-1923, 1998.

[41] G. Chiachía. [Online]. Available https: //github.corn/giovanichiachia/convnet-rfw. [Accessed 17 05 2014].

[42] L. Wan, M. Zeier, S. Zhang, Y. LeCun and R. Fergus, "Reguiarization of Neural Networks Using Dropconnect," Proceedings of the 30th International Conference on Machine Leaming (iCML-13), pp. 1058-1066, 2013.

[43] I. J. Goodfeliow, D. Warde-Fariey, M. Mirza, A Courviíle and Y.

Bengio, "Maxout networks /" arXiv preprint arXiv: 1302.4389, 2013.

[44] Y. LeCun and Y. Bengio. 'The institutional networks for images, speech, and time series, ^* in The Handbook of Brain Theory and Neural Networks, 1995, pp. 33-61.

[45] K. Jarrett, K. Kavukcuoglu, M, Ranzato and Y. LeCun, "What is the best multi-stage architecture for object recognition ?," Computer Vision, 2009 IEEE 12th International Conference on, pp. 2146-2153, 2009.

[46] J. Yang, K. Yu, Y. Gong and T. Huang, "Linear spatial pyramid matching using sparse coding for image classification," Computer Vision and Pattem Recognition, pp. 1794-1801, 2009.

[47] YL Boureau, F. Bach, Y. LeCun and J. Ponce, "Leaming mid-level features for recognition." Computer Vision and Pattem Recognition (CVPR) _t 2010 IEEE Conference on, p. 2559-2566, 2010.

Claims

1. Fingerprint vivacity detection method comprising the following steps:

a) Artificial increase of image data.

b) Image preprocessing;

b1) Image reduction through bi- linear interpolation;

b2) Sequence filtering comprising a low pass filter or a high pass filter;

(b) determination of the region of interest (ROI) of the image;

b3.1) Application of the morphological aperture to highlight the region where the fingerprint is located;

b3.2) Calculation of the center of mass and standard deviations of the height and width of the image obtained in step b3.1;

b3.3) Determination of the region of interest;

b4) Image contrast equalization;

c) Extraction of image characteristics by convolutional networks;

d) Reduction of image dimensionality; and data normalization;

e) Classification of images (convoluted and reduced) into true or false using a binary classifier;

Method according to claim 1, characterized in that the choice of processing steps that will compose the system is made using the 5x2-Fold cross-validation scheme.

Method according to claim 1, characterized in that step c is optionally used at least once for system training.

Method according to claim 1, characterized in that steps c and e are required.

Method according to claim 1, characterized in that steps a, b1 b2, b3 and b4 and d are optionally performed.

Method according to Claim 1, characterized in that in the step to artificially increasing data comprises techniques selected from rotations, translations, horizontal reflections, scaling, bcai alteration of pixel intensities, noise addition or creation. samples from 3D models of the object to be categorized, preferably horizontal translations and reflections, for the artificial creation of modified samples from the original samples.

Method according to claim 1, characterized in that in step b1 the images are scaled from 100% (without reduction) to 25% of their size.

Method according to Claim 1, characterized in that in step b2 the low-pass filter is preferably implemented as convolution of an image by a Gaussian filter; and the bypass filter is preferably implemented as the subtraction of the image from the low pass filter filtered image.

Method according to claim 8, characterized in that the filters have a standard deviation of 3 and a window size between 7x7 and 21x21 pixels. preferably 13x13 pixels.

Method according to claim 1, characterized in that in step b3.1 a structuring element is used comprising size range between 11x11 and 35x35 pixels, preferably 21x21 pixels,

Method according to Claim 1, characterized in that in step b4 a set of techniques selected from the group of contrast equalization techniques selected from histogram equalization, AHE and CLAME, preferably CLAHE, is used.

Method according to claim 11, characterized in that a disk neighborhood of diameters between 5 and 60 pixels, preferably 30 pixels, is used for the histogram calculation.

Method according to claim 1, characterized in that in step c the convolutional networks comprise at least one typical layer.

Method according to claim 13, characterized in that the typical layer comprises a convolution layer for the convolution operation, a nonlinearity operation, a pooling layer for performing a pooing operation and a sub sampling.

Method according to claim 13, characterized in that the output images of the typical first layer are used as input to the second typical layer;

Method according to claim 13, characterized in that each typical layer comprises at least 32 convolutional filters of sizes between 3x3 and 15x15, preferably 7x7.

Method according to claim 16, characterized in that the weight of the convolutional filters is random or learned by the use of a learning method as the backpropagation method.

Method according to claim 14, characterized in that the pooling operation aggregates information within small local regions, R, producing a grouped feature map V through equation 1, generic; and be selected from the group of operations average pooling, max-poofing, stochastic pooling, multi-scale pyramidal pooling, preferably the max-pooling operation.

Method according to claim 18, characterized in that the max-poling operation selects the maximum value of the activation units of a region Rj through equation 3; and the average poling operation returns the average (or the sum) of the activation units of a neighborhood R _j through equation 2.

Method according to claim 1, characterized in that in step c the convolutional networks optionally use an additional layer between the convolution and pooling layers for local contrast normalization operation.

Method according to claim 20, characterized in that the local contrast normalization operation layer normalizes the pixel intensities by subtractive normalization using equation 4 and equation 5; and currency normalization using equations 6 and 7.

Method according to claim 1, characterized in that the images resulting from step c are rasterized and concatenated for forming a one-dimensional vector used as input to a binary classifier comprised in step e.

Method according to Claim 1, characterized in that the dimensionality reduction step is carried out by techniques comprised of the PCA technique group. Kernel PCA, LDA, ICA, auto-encoders, preferably PCA. and the Further data normalization shall be performed by methods comprised of the group of data normalization techniques, such as whitening or sample normalization, preferably whitening. Performing this step d is optional.

Method according to claim 23, characterized in that the number of principal components of the PCA comprises between 30 and 1300 principal components.

A method according to claim 23 or 24, characterized in that the Whitening technique divides the principal components by their standard deviations, resulting in a covariance matrix (identity), and denotes the principal components rotated through equation 8.

Method according to claim 1, characterized in that in step and the binary classifier comprises binary classifiers comprised of the K-Viztnhos-Closest binary classifiers group, LDA, Naive Bayes Logistic Regression, Decision Trees, Networks Neural (with a Softmax function in the last layer) and SVM, preferably SVM.

Method according to Claim 1, characterized in that in the step to be used selected techniques of artificially increasing data, such as rotations, translations, horizontal reflections, scaling, local change in intensities, noise addition or creation. samples from 3D models of the object to be categorized, preferably using horizontal translations and reflections, for artificially creating slightly modified samples from the original samples

Method according to claims 1 to 27, characterized in that it is performed on computers in the cloud, on local servers, desktop computers or standard notebooks, mobile devices. PLCs or Field Programmable Gate Arrays (FPGA).