Real-time rotation, shift, scale and skew visual recognition system
Cross-Reference to Related Applications
The present application claims priority from the Australian provisional application 2013900288 filed on 30 January 2013 with NewSouth Innovations Pty Limited being the applicant and the contents of which are incorporated herein by reference.
Technical Field
The present invention relates to recognizing objects rapidly using artificial neural networks.
Background
Object recognition software requires a large amount of electrical power and is slow due to extensive, sequential mathematical operations. Such a large power consumption is, however, impractical for portable, battery-operated computers. The problem is further compounded when battery-operated computers typically utilize a central processing unit (CPU) that is less powerful than its mains-power-operated counterpart, thereby increasing computational time.
For example, a missile targeting a flying object (e.g., a plane) may encounter counter-measures (e.g., decoy flares) deployed by a plane to 'trick' the missile into hitting the flares. In such circumstances, the missile may employ object recognition software to discern the intended target from the decoys to prevent being 'tricked'. However, conventional power-hungry, slow object recognition software is impractical due to the missile's limited amount of power (i.e., battery-operated) and time-constraints in identifying objects, which means the missile may hit the flare long before any object is recognised.
In another example, a rapid, low-power object recognition software can be used by autonomous robotics to assist in navigation, fetching, cleaning and interacting with objects. Much of the current inability of autonomous robots to make a significant impact in the daily lives of people is the lack of ability to interact in real world environments. A vision system employing such a rapid, low-power object recognition system facilitates interaction with the environment without increasing the size and power requirements of the autonomous robot.
Thus, a need exists to provide rapid, accurate visual recognition software that can run on battery-operated CPUs.
Summary
It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.
Disclosed are arrangements which seek to address the above problems by employing artificial neural networks to rapidly and accurately recognise objects. Aspects of the present disclosure provide fast, low-power and real-time object recognition system. Aspects of the present disclosure can be used in applications where power, size and speed are of paramount importance, for example, missile defence, unmanned aerial vehicle (UAV) (for military applications and civilian applications), autonomous vehicles, robotics (e.g., nano-robotics), etc.
Another aspect of the present disclosure requires no salience detection and is capable of recognizing a number of objects either simultaneously or sequentially. Another aspect of the present disclosure requires no image centring mechanism and can
dynamically reconfigure the artificial neural network to recognise different objects in a single image.
According to a first aspect of the present disclosure, there is provided a method for a method of identifying an object by a battery-powered apparatus, the method comprising: capturing a first image of the object by an image capturing unit; transforming the first image to centre the object in a second image; pixelating the second image; projecting the pixelated image onto a static neural network (SNN), the SNN comprises a plurality of strands of neurons and a summing neuron, wherein each neuron of the plurality of strands of neurons comprises electronic components and/or software codes for generating a pulse output and the summing neuron comprises electronic components and/or software codes for summing pulse outputs, wherein the neurons in each strand are connected together and the plurality of strands of neurons are radially arranged, wherein the neuron at one end of each of the plurality of strands of neurons is connected to the summing neuron, wherein each projected pixel of the projected pixelated image is mapped onto a neuron of the plurality of strands of neurons such that an active pixel of the projected pixelated image generates the pulse output, wherein the pulse output in each of the plurality of strands is propagated toward the neuron connected to the summing neuron, and wherein the summing neuron sums the pulse output of the neurons connected to the summing neuron; generating a Time Series Signature (TSS) pattern by the summing neuron based on the sum of the pulse output; and identifying of the object by a pattern identification unit based on the generated TSS pattern.
According to a second aspect of the present disclosure, there is provided a battery- powered apparatus for identifying an object, the apparatus comprising: an image capturing unit for capturing a first image of the object; an image processing unit for receiving the object of the first image, transforming the first image to centre the object in a
second image, pixelating the second image, and projecting the pixelated image; a static spiking neural network (SNN) comprises a plurality of strands of neurons and a summing neuron, wherein each neuron of the plurality of strands of neurons comprises electronic components and/or software codes for generating a pulse output and the summing neuron comprises electronic components and/or software codes for summing pulse outputs, wherein the neurons in each strand are connected together and the plurality of strands of neurons are radially arranged, wherein the neuron at one end of each of the plurality of strands of neurons is connected to the summing neuron, the SNN configured for receiving the projected pixelated image, wherein each projected pixel of the projected pixelated image is mapped onto a neuron of the plurality of strands of neurons such that an active pixel of the projected pixelated image activates the neuron, and wherein the neuron activation in each of the plurality of strands is propagated toward the neuron connected to the summing neuron; summing the pulse output of the neurons connected to the summing neuron; and generating a Time Series Signature (TSS) pattern by the summing neuron based on the sum of the pulse output; and a pattern identification unit for identifying the object based on the generated TSS pattern.
According to another aspect of the present disclosure, there is provided a method of identifying an object by a battery-powered apparatus, the method comprising: capturing an image of the object by an image capturing unit; pixelating the image; projecting the pixelated image onto a dynamic neural network (DNN), the DNN comprises a plurality of neurons, wherein each neuron comprises electronic components and/or software codes for generating an electrical output, wherein each pixel of the projected pixelated image is mapped onto a neuron of the plurality of neurons such that an active pixel generates the electrical output; distributing the generated electrical outputs of the neurons amongst neurons which surround the neurons generating the electrical output according to a Gaussian function, normalising the distributed electrical outputs in the neurons; thresholding the normalised distributed electrical outputs, wherein the electrical output of each neuron of the plurality of neurons is based on the distributing, the normalising and the thresholding steps; connecting adjacent neurons of the plurality of neurons based on the electrical outputs of the adjacent neurons, wherein a neuron of the adjacent neurons with a lower electrical output is unidirectionally connected to a neuron of the adjacent neurons with a higher electrical output; propagating the electrical outputs in accordance with the connections established in the connecting step; generating a Time Series Signature (TSS) pattern by a neuron of the plurality of neurons with the highest electrical output by summing the electrical outputs; and identifying the object by a pattern identification unit based on the generated TSS pattern.
According to another aspect of the present disclosure, there is provided a battery- powered apparatus for identifying an object, the apparatus comprising: an image capturing unit for capturing an image of the object; an image processing unit for receiving the object of the image, pixelating the second image, and projecting the pixelated image; a dynamic neural network (DNN) comprises a plurality of neurons, wherein each neuron comprises electronic components and/or software codes for generating an electrical output, wherein each pixel of the projected pixelated image is mapped onto a neuron of the plurality of neurons such that an active pixel generates the electrical output, the DNN configured for receiving the projected pixelated image, wherein each pixel of the projected pixelated image is mapped onto a neuron of the plurality of neurons such that an active pixel generates an electrical output; distributing the generated electrical outputs of the neurons amongst neurons which surround the neurons generating the electrical output according to a Gaussian function; normalising the distributed electrical outputs in the neurons; thresholding the normalised distributed electrical outputs, wherein the electrical output of each neuron of the plurality of neurons is based on the distributing, the normalising and the thresholding steps; connecting adjacent neurons of the plurality of neurons based on the electrical outputs of the adjacent neurons, wherein a neuron of the adjacent neurons with a lower electrical output is unidirectionally connected to a neuron of the adjacent neurons with a higher electrical output; propagating the electrical outputs in accordance with the connections established in the connecting step; and generating a Time Series Signature (TSS) pattern by a neuron of the plurality of neurons with the highest electrical output by summing the electrical outputs; and a pattern identification unit for identifying the object based on the TSS pattern.
According to another aspect of the present disclosure there is provided a computer program product including a computer readable medium having recorded thereon a computer program for implementing any one of the methods described above.
Other aspects of the invention are also disclosed.
Brief Description of the Drawings
At least one embodiment of the present invention is described hereinafter with reference to the drawings, in which:
Fig. 1 shows a real-world application of an object recognition system;
Fig. 2 shows the components of the object recognition system of Fig. 1 ;
Fig. 3A is a first implementation of the object recognition system using a static neural network;
Fig. 3B is a flow diagram for a method for processing an image for use by the object recognition system of Fig. 3 A;
Fig. 4 shows an example of centring an object on an image for use by the object recognition system of Fig. 3 A;
Fig. 5A is a static neural network of the object recognition system of Fig. 3 A; Fig. 5B shows an alternative pattern for the static neural network of Fig. 5 A;
Fig. 6 is a strand of neurons of the static neural network of Fig. 5 A;
Fig. 7A is a schematic diagram for a neuron of the static neural network of Fig.
5A;
Fig. 7B is an implementation of the neuron of Fig. 7 A on an analogue circuit; Figs. 8A and 8B are examples of Time Series Sequence (TSS) generated by the object recognition system of Fig. 2;
Fig. 9 A is a second implementation of the object recognition system using a dynamic neural network;
Fig. 9B is a flow diagram for a method for processing an image for use by the object recognition system of Fig. 9A;
Fig. 10 is the dynamic neural network of the object recognition system of Fig. 9A;
Figs. 1 1 A to 1 1C are the neurons for use by the dynamic neural network of Fig.
10;
Fig. 12 is a flow diagram for a method for use by the dynamic neural network of Fig. 10 to generate a Time Series Sequence for an object;
Figs. 13A to 13E are neuron connectivity of the dynamic neural network for performing Gaussian Filtering;
Figs. 14A to 14D are examples of an image undergoing Gaussian Filtering iterations;
Fig. 15 is a flowchart of a method for generating a layer of neurons with a vector field; and
Fig. 16 is an implementation of the layer of neurons with the generated vector field of Fig. 15; and
Fig. 17 shows a vector field of the dynamic neural network to recognise an object. Detailed Description
Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.
Fig. 1 shows a missile 1 having an image capturing unit 8 and an object recognition system (ORS) 10 for identifying objects (e.g., a jet 3 and flare counter-
measures 5). The missile 1 also has a missile control unit (MCU) 11 that controls the flight path of the missile 1.
As the missile 1 is propelling toward the intended target (i.e., jet 3), the image capturing unit 8 capture images 6 of the area which the missile 1 is heading toward. At the same time, the jet 3 may deploy flare counter-measures 5 to 'trick' the missile 1 into hitting the flares 5. In such circumstances, the image 6 captured by the image capturing unit 8 includes, inter alia, the jet 3 and the flare counter-measures 5.
The ORS 10 processes the captured image 6 and identifies each object 3, 5 in the captured image. The processing of captured images 6 and recognising objects 3, 5 by the ORS 10 is described in detail hereinafter.
The MCU 1 1 uses the identification of the objects 3, 5 to determine a flight path toward the intended target jet 3. Thus, the missile 1 avoids being 'tricked' into hitting the flare counter-measures 5.
The application of the ORS 10 in Fig. 1 is an example. The ORS 10 may be deployed in other applications such as, inter alia, a robotic eye for rapid identification of objects, robotic and UAVs navigation system for recognising key features in visual scenes, mobile security systems where power and size are important considerations, remote scientific monitoring where access is limited and battery changes restrictive, etc.
Fig. 2 is an implementation of the ORS 10 including an image processing unit 12, a neural network (NN) 14 and a pattern identification unit 16. The image processing unit 12 receives a captured image 6 from the image capturing unit 8 and processes the received image 6 into an image format that the NN 14 can process. The processed image is then projected onto the NN 14.
The NN 14 processes the projected image to output a Time Series Signature (TSS). A TSS is a time-series pattern identifying an object's features and is unique to different objects.
The pattern identification unit 16 is capable of matching an identified TSS with a TSS stored in a database. In this implementation, a Polychronous Network (PCN) 16 is used. The PCN 16, which is an artificial neural network, is capable of firing a neuron indicating that a time-series pattern (i.e., the TSS) is known when the PCN 16 recognizes that the time-series pattern has been previously identified. The functionality of the PCN 16 is similar to a content addressable memory.
Alternatively, the pattern identification unit compares the received TSS with a TSS database to determine the identity of the object 3, 5.
In the present disclosure, two implementations of the ORS 10 are described. The first implementation utilises a static neural network (SNN) 14A as the NN 14, shown in
Fig. 3 A. The second implementation utilises a dynamic neural network (DNN) 14B as the SNN 14, shown in Fig. 9A.
Fig. 3B shows a flow chart of a method 300 for processing of captured images 6 by the image processing unit 12. The method 300 commences with step 302 by centring object 3, 5 on the captured image 6. As mentioned hereinbefore, images 6 are captured by the image capturing unit 8 which is, for examples, an analogue camera, a digital camera, etc.
The image processing unit 12 employs a salience detection method to detect objects 3, 5 in the captured image 6 to centre the object 3, 5 in an image. The salience detection method processes the captured image 6 to recognise areas in the image 6 belonging to an object 3, 5. The salience detection method determines areas of activities in a captured image 6 by transforming each object 3, 5 into a circle which the image processing unit 12 determines as the object 3, 5.
Fig. 4 shows the result of the image processing unit 12 employing a salience detection method to identify areas in the image 6 belonging to objects 3, 5 and creating separate images 6A to 6D of each object 3, 5 so that each separate image 6A, 6B, 6C, 6D has one object 3, 5 centred on each image 6A, 6B, 6C, 6D. Each separate image 6A, 6B, 6C, 6D is then processed by the SNN 14A and the PCN 16 to identify each object 3, 5. Some examples of salience detection methods are discussed in "A Model of Saliency- Based Visual Attention for Rapid Scene Analysis," by L. Itti, C. Koch, and E. Niebur in IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 1 1, November 1998.; "Modelling Visual Attention via Selective Tuning," by J.K. Tsotsos, S.M. Culhane, W.Y.K. Wai, Y.H. Lai, N. Davis, and F. Nuflo in Artificial Intelligence, vol. 78, no. 1-2, pp. 507-545, Oct. 1995, the contents of which are incorporated herein by reference in their entirety. Alternatively, the salience detection method is performed by the SNN 14 A.
In an alternative implementation, the image processing unit 12 in conjunction with a salience detection method and the image capturing unit 8 capture multiple images 6A, 6B, 6C, 6D with each captured image 6A, 6B, 6C, 6D having an object 3, 5 centred in the captured image 6A, 6B, 6C, 6D. In this implementation, a laser targeting system determines the range of an object and a mechanical system moves the image capturing unit 8 until the object 3, 5 is centred in each captured image 6 A, 6B, 6C, 6D. Processing continues at step 304.
At step 304, each image 6A, 6B, 6C, 6D is pixelated such that each pixel can be mapped onto a neuron of the SNN 14A. A one-to-one mapping of a pixel to a neuron of a neural network is called retinotopical mapping. The pixels of a pixelated image can be classified into active and passive pixels. Active pixels are pixels which represents a part of
an object's feature. Passive pixels are pixels that do not represent any part of an object's feature. For example, in a black and white image of a fish, the pixels representing the outline of the fish are black whilst the pixels representing the background of the image are white. In this example, the black pixels are the active pixels and the white pixels are the passive pixels.
Before each of the images 6A, 6B, 6C, 6D is pixelated, the images 6A, 6B, 6C, 6D may be filtered by a high-pass filter to accentuate the outline of the object 3, 5 to be identified. The neurons and other details of the SNN 14A are described further hereinafter in relation to Figs. 5 to 8. The method 300 then advances to step 306.
At step 306, the pixelated image of step 304 is projected onto the SNN 14 A.
Before the pixelated image is projected onto the neurons of the SNN 14 A, the pixelated image can be further processed by other filters (e.g., high-pass filter, Gabor filter, etc.) to further accentuate the features of the captured image 6, 6A, 6B, 6C, 6D.
In a hardware implementation, the image processing unit 12 outputs either a current or a voltage for each active pixel, whilst a passive pixel has no output.
In a software implementation, the image processing unit 12 outputs a "1" for an active pixel and a "0" for a passive pixel. The method 300 concludes at the completion of step 306.
Fig. 5 A shows a SNN 14A for processing the pixelated image generated by the output of the image processing unit 12. The SNN 14A comprises a number of neuron strands 501 and a summing neuron 504. Each strand 501 comprises a number of neurons 502, connected together by connections 503, being arranged radially as shown in Fig. 5A. Fig. 5B shows an alternative pattern for arranging the strands 501.
A neuron 502 is a circuit or a software program for generating a pulse if an active pixel is mapped onto the neuron. The neurons 502 used by the SNN 14A are spiking neurons, which means that each neuron 502 outputs a bi-stable pulse (e.g., a high or a low voltage/current output, a binary "1" or "0" output). Implementations of the neurons 502 are described in detail in relation to Fig. 7.
Fig. 6 shows a connection of one strand 501 of the plurality of neurons 502 connected to a central neuron 505 and a summing neuron 504 of the SNN 14A of Fig. 5 A. One end of each of the plurality of neuron strands 501 is connected to the summing neuron 504. In the SNN 14A depicted in Fig. 5A, the neuron located at the periphery of the SNN 14A is connected to the summing neuron 504. The neuron located at the periphery of the SNN 14A is referred to as 502P hereinafter. The other end of the plurality of neuron strands 501, which in the SNN 14A depicted in Fig. 5A is located at the centre of the SNN 14A, is then connected to a centre neuron 505. Each neuron 502, 504, 505 is connected to
another neuron 502, 504, 505 by the connection 503, which is a collection of wires for connecting the inputs and outputs of corresponding neurons 502, 504, 505.
When pixels of a pixelated image are projected onto the SNN 14A, neurons 502 corresponding to active pixels (i.e., pixels with output of "1") are activated. The operation of the SNN 14A is governed by a time-clock (not shown) such that, every At, an activated neuron 502 propagates the activation to a subsequent neuron on the strand 501. In the SNN 14A of Fig. 5 A, the neuron activation is propagated to the periphery neurons 502P. Thus, after ηΔί, the periphery neuron 502P is activated and the summing neuron 504 sums the total number of periphery neurons 502P activated after ηΔΐ. The series of sum of activated periphery neurons 502P at different intervals of At is called a Time Series
Signature (TSS).
For example, in Fig. 5 A, a neuron 510 is turned on as the corresponding pixel is active. After lAt, as determined by a time clock (not shown), neuron 512 is activated.
After 5 At, the periphery neuron 514 is activated. At 5 At, the summing neuron 504 sums the total number of the periphery neurons 502P being activated at time 5At.
In an alternative implementation, the centre neuron 505 is omitted and the plurality of neuron strands 501 is not connected at the centre of the SNN 14 A. This implementation does not affect the output TSS materially.
In another alternative implementation, the summing neuron 504 is located at the central neuron 505 and the propagation of neuron activation is in the opposite direction
(i.e., toward the centre of the SNN 14A). The TSS generated by such an implementation is the reverse of the TSS generated by the SNN 14A of Fig. 5A.
Fig. 7A is a schematic diagram of the neuron 502 including an input SI for receiving a previous neuron's pulse output p, a pixel input S2 for receiving a signal of a corresponding pixel, a threshold input value Y, a control signal W and a pulse output p.
The inputs SI and S2 are either a voltage or a current for a hardware implementation or a binary "0" or "1" for a digital implementation. A pulse output p is generated if either input
SI or S2 is greater than the threshold input value Y. The time taken by the neuron 502 to output a pulse output p when either input SI or S2 changes is modulated by the control signal W. The value of the user-determined control signal W determines the response time of the neuron 502. This implementation is called a leaky, integrate and fire (LIF) neuron.
The pulse output p is used as an SI input for a subsequent neuron 502 to effect the propagation of neuron activation.
Fig. 7B is an analogue implementation of the neuron 502 including two neuron activation circuits 702 and 704 and a comparator circuit 710. Each of the neuron activation circuits 702, 704 comprises a MOSFET switch 703 and an SR (Set-Reset) latch 706 and
708, respectively. Each of the SR latches 706 and 708 comprises an S input, an R input
and a Q output. The S input is connected to a previous neuron output SI for the SR latch 706, whilst the S input for the SR latch 708 is connected to a corresponding pixel for the neuron 502. The R input is connected to the output of the comparator circuit 710. The Q output is connected to the MOSFET switch 703.
If either input SI or S2 is activated (i.e., either a previous neuron's pulse output p or an active pixel is received, respectively), the Q output is activated (i.e., a "1" output) and the MOSFET switch 703 is activated thereby allowing current X to flow and at voltage node Vmem, the currents X and W are added. In the circuit implementation, the current W flows out of the voltage node Vmem and, hence, the net current is X - W. The control signal W is implemented as a leakage current such that input current X persists on the voltage node Vmem for a limited amount of time.
The comparator circuit 710 comprises a comparator 712 having inputs from a threshold value Y and the voltage node Vmem and associated comparator circuit. The comparator 712 compares the value of Vmem and Y. If the value of Vmem is higher than Y, a pulse output p is generated. Otherwise, no pulse output p is generated. The pulse output p is connected to a Reset MOSFET switch 708 and the R input of the SR latches 706, 708 so that a generated pulse output p resets the voltage node Vmem to zero and reset the SR latches 706, 708. The voltage node Vmem and the outputs Q are therefore reset to zero after the comparator 712 outputs a pulse output p to ensure the Vmem and the SR latches are prepared for the next propagation of the neurons activation. The
implementation of Fig. 7B is one example of many possible implementations to implement a LIF neuron in analogue circuits.
In a digital implementation, the SNN 14A is formed by a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC). Hence, the LIF neuron can be implemented by programming the FPGA or the ASIC as an adder circuit. The equation for implementing the LIF neuron is:
, (X - W) . (X - W)
V mem = v mem z~' + ^ ^ '- ζ-' + ^ '- if Vmem > Y, then p = 1 , else p = 0 (eqn. 1 )
C is a constant scaling factor that can be implemented by a Look-Up Table (LUT). The LUT lists all possible divisions with associated results such that the program executed by the FPGA refers to the LUT to output a result for the division by C. Y is the threshold input value, p is the output of the eqn. 1„ X is either the pixel input value or an input from a pulse output p of a previous neuron 502 and W is a user-determined leak term
that determines the responsiveness of the neuron 502. The pulse output p is generated with a compare function.
Thus, in the digital implementation, the image processing unit 12 outputs a predetermined bit stream format to be input to the FPGA or the ASIC via a Serial
Peripheral Interface (SPI) or the like. The FPGA or the ASIC is then able to match the pixels to the corresponding neurons 502 based on the received bit stream from the image processing unit 12.
Fig. 7C shows an implementation of the summing neuron 504. The summing neuron 504 is similar to the neurons 502, except the summing neuron 504 has no comparator and associated pulse generation. The summing neuron 504 includes n number of neuron activation circuits 740 (i.e., 740A, 740B,..., 740N), a capacitor C and a leak current nW.
The number n of neuron activation circuits 740 corresponds to the number of periphery neurons 502P connected to the summing neuron 504. Each neuron activation circuit 740 includes an SR latch 742, a MOSFET switch 744 and a current X. The SR latch 742 comprises an S input, an R input and a Q output. The S input is connected to one of the periphery neurons 502P, whilst the R input is connected to a global clock (not shown) that periodically sends a reset signal to the SR latches 742. The SR latches 742 are reset to prepare the SR latches 742 to sum the neuron activation of the periphery neurons 502P at the next At. The Q output is connected to the MOSFET switch 744.
If input S is activated (i.e., a previous neuron's pulse output p is received), the Q output is activated (i.e., a "1" output) and the MOSFET switch 744 is activated thereby allowing current X to flow. At voltage node Vmem, the currents X from the number n of activated periphery neurons 502P are added on the capacitor C. Thus, Vmem gives the TSS output of the DNN 14B. The current nW is a leakage current similar to that described in the LIF neuron of Fig. 7B hereinbefore and is determined such that summed currents X persists on the voltage node Vmem for a limited amount of time such that the sum of currents X has disappeared on a subsequent summing of currents X at the next At.
Time normalization is an additional step that can be performed when the pixelated image is projected onto the SNN 14A, and is used to normalize different size images on the SNN 14A in order to generate normalised TSS. To normalize time, the total current drawn by the neurons 502 when the pixelated image is projected onto the SNN 14A is measured. Alternatively, the current drawn by each neuron 502 is individually monitored.
A large image translates to a high activation level from the neurons 502. In such circumstances, the propagation of neurons activation along the strands 501 of the SNN 14A needs to be accelerated by decreasing the level of the leak current W. On the other hand, a small image translates to a low activation level from the neurons 502. In such
circumstances, the leak current W is increased to slow down propagation of neuron activation. Thus, the measurement of neurons activity is proportional to the current drawn by the neurons 502.
As mentioned hereinbefore, the Time Series Sequence (TSS) is the output of the summing neuron 504. Figs. 8A and 8B are examples of the TSS generated by the summing neuron 504. The TSS is unique to different objects, is easy to store and is an efficient way of representing 2D images in hardware. As shown in Fig. 8A, rotating an object produces very similar TSS. Similarly, Fig. 8B shows that similar objects with different sizes also generate similar TSS. Hence, the recognition of objects by the ORS 10 is not dependent on the object's rotation or scale. The TSS generated by the SNN 14A is then input to the pattern identification unit 16, which in this implementation is a PCN.
The PCN 16 is a pattern identification unit capable of matching an identified pattern with a pattern stored in a database of the PCN 16. Each pattern stored in the database is associated with a particular object. In one implementation, the PCN 16 is an artificial neural network that is capable of learning and recalling spatio-temporal patterns (e.g., the TSS). The PCN is described in several articles, one entitled "Polychronization: Computation with Spikes" by E.M. Izhikevich and the other entitled "An analogue VLSI implementation of polychromous spiking neural networks" by Runchun (Mark) Wang et al., the contents of which are incorporated herein by reference in their entirety.
The PCN 16 functions by learning patterns of TSS and identifying that a TSS pattern presented to the PCN 16 is associated with a previously learned TSS pattern. The PCN can either be trained by presenting the PCN 16 with the TSS of the object to be learned (i.e., supervised learning), or the PCN 16 can learn unsupervised. The supervised method is favoured due to the speed with which patterns can be learned. The unsupervised method is not as accurate as the supervised method. However, the unsupervised method allows an autonomous apparatus (e.g., autonomous vehicle, autonomous robot, etc.) to learn independently.
Fig. 9A is an ORS 10 using the DNN 14B. The DNN 14B is capable of detecting an object 3, 5 from a captured image 6 without having to centre the object 3, 5 on the image 6. Further, the DNN 14B may utilise non-spiking neurons which allows a more detailed image to be processed.
Fig. 9B shows a flow chart of a method 900 for processing of captured images 6. The method 900 commences with step 902 where the captured image 6, 6A, 6B, 6C, 6D is pixelated such that each pixel can be mapped onto a neuron 902 of the DNN 14B. As mentioned hereinbefore, images 6 are captured by the image capturing unit 8 which is, for examples, an analogue camera, a digital camera, etc. Additionally, captured images 6, 6A, 6B, 6C, 6D may be further processed (e.g., high-pass filter, Gabor filter) before or after
pixilation. The further processing is performed to accentuate particular features of the captured image 6, 6A, 6B, 6C, 6D (e.g.. horizontal lines, vertical lines, diagonals, etc.). Step 902 then proceeds to step 904.
At step 904, the image processing unit 12 outputs the pixels of the pixelated image so that each pixel can be mapped onto a neuron 902 of the DNN 14B. In a hardware implementation, each active pixel is represented by either a current or a voltage and a passive pixel has no output. In a software implementation, each active pixel is represented by a "1", whilst a passive pixel is represented by a "0". Each pixel may also be
represented by a binary word (e.g., a binary number of more than 1 digit) if non-spiking neurons are utilised by the DNN 14B. The method 900 concludes at step 904.
Fig. 10 shows an example of a grid like two-dimensional (2-D) sheet of neurons 902 of the DNN 14B. No shape is imposed on the 2-D sheet of neurons 902, unlike the SNN 14A. A projected pixelated image, which is the output of the method 900, is mapped onto the 2-D sheet of neurons 902.
Fig. 1 1 A shows a schematic diagram of a neuron 902. The neuron 902 includes a pixel input X, a threshold value Y and a leakage current W.
In a hardware implementation, the pixel input X is either a voltage or a current for active pixels. The threshold value Y is either a predetermined current or a predetermined voltage. If the pixel input X is greater than the threshold value Y, a pulse output p is generated. The width of the pulse p is predetermined. The time between a change in the pixel input X and a pulse output p being generated is modulated by the leakage current W. The leakage current W is controlled external to the neuron 902 and is not shown here. The neuron 902 is called a leaky, integrate and fire (LIF) neuron. There are two types of neuron 902 that can be used by the DNN 14B: a spiking neuron 902 A and a non-spiking neuron 902B.
Fig. 1 1 B is an analogue circuit of a spiking neuron 902 A having a pixel input X, a threshold value Y and a leakage current W. The pixel input X is a current, the threshold value Y is a voltage, the leakage current W is a current and Vmem is a voltage node where the currents of the pixel input X and the leakage current W are added. The operation of the spiking neuron 902A is similar to the operation of the neuron 502 as described in relation to Fig. 7B. Fig. 1 IB is one of many implementations of a LIF neuron.
In digital implementation (e.g., on an FPGA) the LIF neuron can be implemented as per the equation 1 , as described hereinbefore.
Fig. 1 1 C is an analogue circuit of a non-spiking neuron 902B having a pixel input A and a capacitor C. The pixel input A is a current that is received from a corresponding pixel and is input into the capacitor C. A leakage current B draws current from the capacitor C. The current B, similar to the current W of Fig. 1 IB, is set external to the
neuron 902 and determines the rate at which voltage node Vmem charges. The current A is the sum of currents received from other neurons 902 connected to the neuron 902. The currents of the neurons 902 in the analogue implementation of the DN 14B are added by summing the currents at a voltage node Vmem. The voltage node Vmem, which is the sum of currents A and B, is the output of the non-spiking neuron 902B.
The digital implementation of the non-spiking neuron 902B is the same as the spiking neuron 902A but without the comparison function (i.e., the "if statement in equation 1 is removed).
Fig. 12 is a flow diagram of a method 1200 for the DNN 14B to generate a TSS for the objects 3, 5 of the captured image 6. The method 1200 commences at step 1202 by mapping the projected pixelated image of the image processing unit 12 onto a first layer of 2-D sheet of neurons 902 of the DNN 14B. The mapping of the pixels onto the first layer of neurons 902 transforms the pixelated image into activation of neurons 902. Step 1202 then proceeds to step 1204.
At step 1204, the activated neurons 902 are filtered using a Gaussian function or an approximate Gaussian function. The Gaussian filtering operation is used to effectively 'blur' the pixelated image that has been projected onto the neurons 902.
Gaussian Filtering on the DNN 14B is performed in two steps. The first step is to retinotopically mapp the first layer of neurons 902 to a second layer of neurons. The second step is to distribute the current in each neuron in the second layer to multiple surrounding neurons in the second layer. The connections between each neuron in the second layer to the multiple surrounding neurons in the second layer are weighted by the distance between the neuron and each of the surrounding neurons. Thus, Gaussian Filtering is performed by distributing the current of a neuron in the second layer onto multiple surrounding neurons in the second layer depending on the distance-weighted scale.
Fig. 13A shows parallel pathways between the neurons 902 in the first layer to the neurons 910 in the second layer. The neurons 902 employed can be either a spiking neuron 902 A or a non-spiking neuron 902B. Similarly, the neurons 910 employed can be either a spiking neuron 91 OA or a non-spiking neuron 910B.
In this implementation, each neuron 902 in the first layer is retinotopically mapped to neurons 910 in the second layer with connectivity that is weighted with the inverse of distance to effect the first step of the Gaussian Filtering process. Further, each neuron 902 in the first layer is further connected to multiple neurons 910 in the second layer to effect the second step of the Gaussian Filtering process. This implementation accelerates the first iteration of the Gaussian Filtering as the two steps are performed simultaneously. The equation for connecting each neuron 902 to neurons 910 is:
(eqn. 2)
Cw ~ d where Cw is the connection weight and d is the distance between neurons 902 to neurons 910. Thus, Cw determines the amount of current input to the neurons 910 from the neuron 902.
Fig. 13B shows an example implementation of the parallel connectivity between neurons 902 and neurons 910 using non-spiking neurons 902B to implement a Gaussian Filtering. Neuron 902 spreads the current output to adjacent neurons 910. The size of the transistors M of neurons 910 that sink the current from neuron 902 is scaled with distance, and neurons 910 that are a distance d have transistor sizes that are scaled by the distance d. Thus, the current/value of a neuron 902 is spread out over a number of neurons 910 resulting in a 'blurring' of the pixelated image. The distributed current in neurons 910 is then returned to each of the neurons 910 itself by transistor RT, which completes a first iteration of the Gaussian Filtering. Each of the returned currents is further processed by the next iteration of Gaussian Filtering.
Fig. 13C shows an example implementation of a neuron 910 in the second layer using spiking neurons 902A. The circuit of Fig. 13C includes neuron activation circuits 932 and 934, 936 which correspond to neuron 902 and surrounding neurons 910, respectively. Each neuron activation circuit 932, 934, 936 functions in the same way as described in relation to neuron activation circuit 702, 704 of Fig. 7B. In the circuit of Fig. 13C, the current X of neuron activation circuit 934 is the current from a corresponding retinotopically mapped neuron 902. The currents X of neuron activation circuits 934 and 936 are currents from surrounding neurons 910 which have been scaled according to the distance d from the neuron 910. The overall function of the circuit of Fig. 13C is the same as the circuit of Fig. 7B.
Figs. 13D and 13E show a neuron 910 connected to multiple surrounding neurons 910 to approximate the Gaussian filtering. As an illustration, neuron 940 is the neuron 910 which current is to be distributed and neurons 941 are the surrounding neurons 910. Fig. 13D shows insufficient connectivity to achieve Gaussian filtering. Fig. 13E shows adequate connectivity between neurons 910 to achieve Gaussian filtering. The minimum connectivity required to achieve the Gaussian filtering is 8 connections per neuron 910. If the connectivity is greater than 8 connections, the connectivity must spread out from the central neuron 910 in a circular manner. Step 1204 then proceeds to step 1206.
At step 1206, normalization is performed on the neurons 910. As shown in Fig.
13 (which is a collective reference to Figs. 13A to 13E), current is spread from a neuron
910 to surrounding neurons 910. The spreading of current from each neuron 910 to multiple surrounding neurons 910-reduces the intensity of the current at neuron 910, effectively 'blurring' the image. To prevent the currents from getting too small as further Gaussian Filtering is performed, a normalization operation is performed. The
normalization operation determines the maximum current (or maximum voltage Vmem) in any of the neurons 910 and divides the values of all the currents (or the voltage node Vmem) in the other neurons 910 by the determined maximum value. Thus, the maximum current level is restored to unity.
The implementations of the Gaussian filtering as per Figs. 13B and 13C are costly because of the requirement to measure and compare the current of each neuron 910 during the normalization process. Alternatively, the normalization process can be implemented in a circuit by finding a local maximum to determine the maximum current. In an alternative implementation, the average current/voltage could be determined and all the
currents/voltages of the neurons 910 are scaled based on the determined average current/voltage. The average current can be determined by measuring the current drawn from the power supply, dividing by the number of neurons 910 and scaling by a predetermined factor. Step 1206 proceeds to step 1208.
At step 1208, a thresholding operation is performed on the neurons 910. The thresholding operation compares all the current/voltage values for the neurons 910 and deactivates neurons 910 which current/voltage is smaller than a predetermined value. For spiking neurons 91 OA, the thresholding operation is performed by setting the threshold value Y. For non-spiking neurons 910B, the thresholding operation is performed by subtracting the output of circuit 910B with a threshold current/voltage value. If the threshold current value is larger than the output of circuit 910B, the output of the circuit 910B falls to 0 and the neuron 910 is deactivated. Step 1208 proceeds to step 1209.
At step 1209, the number of Gaussian filtering is determined. If the Gaussian Filtering is determined to be insufficient (NO), step 1209 proceeds to step 1204 (i.e., steps 1204 to 1208 are repeated). Eight iterations of the Gaussian
filtering/normalization/thresholding operations are preferable. If less or more than 8 iterations of Gaussian Filtering are performed, the Gaussian Filtering may produce a region of activity not of a circular nature or produce several regions of activity. The Gaussian filtering produces a circular region of neuron activation around an object 3, 5 in the image 6, 6A, 6B, 6C, 6D. The repetition of these operations ensures that only one object 3, 5 is focused on. By changing the number of iterations of the Gaussian Filtering, however, multiple object recognition is possible by determining local maximums in the
normalization procedure rather than a global maximum. Using local maximums, different
regions of the captured image 6 are normalized and being threshold by different values and therefore are not deactivated as would be the case if only a single image is focused on.
Figs. 14A to 14D show a pixelated image (i.e., regions of neurons activation) going through different number of Gaussian filtering iterations. Fig. 14A shows a pixelated image before undergoing Gaussian filtering having multiple dark spots. Fig. 14B shows a pixelated image after undergoing two iterations of Gaussian filtering. As can be seen in Fig. 14B, there are two dark spots (i.e., regions of activity) which can be used to perform simultaneous object recognition. If more than one object is to be identified in a captured image 6, then the DNN 14B can inhibit the activity of the neurons 14B that were connected in the vector field. Thus, the Gaussian Filtering operation results in another object being focused on. Different regions of activity can be used to inhibit areas of the image in subsequent iterations of recognition until the entire scene is recognised.
Fig. 14C shows an intermediate step where the region of activity is not yet circular. The region of activity needs to be circular in order to obtain a rotation/scale invariant TSS (i.e., a TSS pattern not affected by rotation or scaling of the object 3, 5). Fig. 14D shows a pixelated image after undergoing eight iterations of Gaussian filtering, normalisation and thresholding. As can be seen in Fig. 14D, the region of activity is circular.
If the Gaussian Filtering is determined to be sufficient (YES), step 1209 proceeds to step 1210.
At step 1210, a second layer of neurons 902 with vector field is generated. Fig. 15A is a flowchart of a method 1500 for generating the second layer of neurons 902 with the vector field. The method 1500 commences at step 1502 where the Gaussian filtered regions of neuron activations are further filtered using a Gaussian function. The further Gaussian Filtering performed at step 1502 is for accentuating the current distribution of the neurons 902 to get directionality for the vector field to be generated.
Fig. 15B shows the current distribution in the neurons 902 of the DNN 14B after Gaussian Filtering at step 1502 is performed. The z-axis of Fig. 15B shows the current level, the plane of the x-y axes is the 2-D sheet of neurons 902. As can be seen in Fig. 15B, the centre of the object is represented by a maximum current 1502, whilst the remaining parts of the object is represented by decreasing amount of current as compared to the maximum current 1502. If Gaussian Filtering is not performed at step 1502, the current distribution of neurons 902 is less like a little mountain and more like a mesa (i.e., a flat top). The process continues at step 1504.
At step 1504, the required vector field is determined by connecting neurons 902 to other neurons 902 having larger current/voltage. For example, according to the current distribution of Fig. 14E, the neurons 902 connect toward the centre of the object where the
current is the largest such that the propagation of the neuron activation according to the vector field is toward the centre of the object to be identified. Once a vector field is determined, the method 1500 proceeds to step 1506.
At step 1506, a second 2-D sheet of neurons 902 with a vector field, which is determined at step 1504, is generated.
In a hardware implementation, the vector field is established by performing comparison functions of adjacent neurons 902. In accordance with the Gaussian Filtered neuron activity region of step 1502, the currents in neurons 902 are determined such that the lesser neuron 902 (i.e., neuron 902 with less current) is connected to the greater neuron 902 (i.e., neuron 902 with more current) and that current can only travel from the lesser neuron 902 to the greater neuron 902. Fig. 16A is a hardware implementation to determine the vector field between two adjacent neurons A and B. The hardware implementation of the vector field includes the neuron A having a current A, the neuron B having a current B, a comparison circuit 1610 and power supply VDD, whilst Fig. 16B is the connectivity between the two adjacent neurons A, B.
In Fig. 16A, if the current A is greater than the current B, a voltage node 1602 increases above VDD/2 and the comparison circuit 1610 generates a high output for SAB- Conversely, if the current B is greater than the current A, the voltage node 1602 decreases below VDD/2 and the comparison circuit 1610 generates a high output for SBA. Thus, the circuit of Fig. 16A outputs SAB, SBA and either SAB or SBA is high at a time.
Fig. 16B is an example of an implementation for the connectivity of the neurons A and B having a first unidirectional connection from neuron A to B with a switch SWBA, and a second unidirectional connection from neuron B to A with a switch SWAB. If the output SAB is high, the switch SWAB is closed and connection from neuron B to neuron A is established, otherwise (i.e., SBA is high), the switch SWBA is closed and connection from neuron A to neuron B is established.
Once connectivity between neurons 902 is determined, a second layer of 2D sheet of neurons 902 is generated by resetting the currents in the neurons 902 (i.e., currents of the neurons 902 are set to zero) and leaving the connectivity intact. Step 1210 proceeds to step 1212.
At step 1212, the originally captured image 6 is input to the second layer of 2-D sheet of neurons 902 with vector field of step 1210. Neurons 902 that are not connected to other neurons 902 (i.e., neurons having no current) do not propagate their activation while neurons 902 that are connected, propagate the neuron activation according to the vector field (i.e., effectively toward the central neuron). Fig. 17 is a high-level view of the neuron activation propagation for an object A. Step 1212 proceeds to step 1214.
At step 1214, a TSS is generated by the central neuron similar to that of the SNN 14A. The generated TSS is then processed by the PCN as described hereinbefore.
Industrial Applicability
The arrangements described are applicable to the computer and data processing industries and particularly for industries requiring automated rapid, accurate object identification.
The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.
In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not "consisting only of. Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings.