WO2014194345A1 - Real-time rotation, shift, scale and skew visual recognition system - Google Patents

Real-time rotation, shift, scale and skew visual recognition system Download PDF

Info

Publication number
WO2014194345A1
WO2014194345A1 PCT/AU2014/000059 AU2014000059W WO2014194345A1 WO 2014194345 A1 WO2014194345 A1 WO 2014194345A1 AU 2014000059 W AU2014000059 W AU 2014000059W WO 2014194345 A1 WO2014194345 A1 WO 2014194345A1
Authority
WO
WIPO (PCT)
Prior art keywords
neurons
neuron
image
summing
strands
Prior art date
Application number
PCT/AU2014/000059
Other languages
French (fr)
Inventor
Saeed AFSHAR
Tara Julia HAMILTON
Original Assignee
Newsouth Innovations Pty Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2013900288A external-priority patent/AU2013900288A0/en
Application filed by Newsouth Innovations Pty Limited filed Critical Newsouth Innovations Pty Limited
Priority to AU2014277600A priority Critical patent/AU2014277600A1/en
Publication of WO2014194345A1 publication Critical patent/WO2014194345A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30212Military

Definitions

  • the present invention relates to recognizing objects rapidly using artificial neural networks.
  • Object recognition software requires a large amount of electrical power and is slow due to extensive, sequential mathematical operations. Such a large power consumption is, however, impractical for portable, battery-operated computers. The problem is further compounded when battery-operated computers typically utilize a central processing unit (CPU) that is less powerful than its mains-power-operated counterpart, thereby increasing computational time.
  • CPU central processing unit
  • a missile targeting a flying object may encounter counter-measures (e.g., decoy flares) deployed by a plane to 'trick' the missile into hitting the flares.
  • the missile may employ object recognition software to discern the intended target from the decoys to prevent being 'tricked'.
  • object recognition software is impractical due to the missile's limited amount of power (i.e., battery-operated) and time-constraints in identifying objects, which means the missile may hit the flare long before any object is recognised.
  • a rapid, low-power object recognition software can be used by autonomous robotics to assist in navigation, fetching, cleaning and interacting with objects.
  • Much of the current inability of autonomous robots to make a significant impact in the daily lives of people is the lack of ability to interact in real world environments.
  • a vision system employing such a rapid, low-power object recognition system facilitates interaction with the environment without increasing the size and power requirements of the autonomous robot.
  • aspects of the present disclosure provide fast, low-power and real-time object recognition system. Aspects of the present disclosure can be used in applications where power, size and speed are of paramount importance, for example, missile defence, unmanned aerial vehicle (UAV) (for military applications and civilian applications), autonomous vehicles, robotics (e.g., nano-robotics), etc.
  • UAV unmanned aerial vehicle
  • robotics e.g., nano-robotics
  • Another aspect of the present disclosure requires no salience detection and is capable of recognizing a number of objects either simultaneously or sequentially.
  • Another aspect of the present disclosure requires no image centring mechanism and can
  • a method for a method of identifying an object by a battery-powered apparatus comprising: capturing a first image of the object by an image capturing unit; transforming the first image to centre the object in a second image; pixelating the second image; projecting the pixelated image onto a static neural network (SNN), the SNN comprises a plurality of strands of neurons and a summing neuron, wherein each neuron of the plurality of strands of neurons comprises electronic components and/or software codes for generating a pulse output and the summing neuron comprises electronic components and/or software codes for summing pulse outputs, wherein the neurons in each strand are connected together and the plurality of strands of neurons are radially arranged, wherein the neuron at one end of each of the plurality of strands of neurons is connected to the summing neuron, wherein each projected pixel of the projected pixelated image is mapped onto a neuron of the plurality of the plurality of
  • a battery- powered apparatus for identifying an object
  • the apparatus comprising: an image capturing unit for capturing a first image of the object; an image processing unit for receiving the object of the first image, transforming the first image to centre the object in a second image, pixelating the second image, and projecting the pixelated image;
  • a static spiking neural network comprises a plurality of strands of neurons and a summing neuron, wherein each neuron of the plurality of strands of neurons comprises electronic components and/or software codes for generating a pulse output and the summing neuron comprises electronic components and/or software codes for summing pulse outputs, wherein the neurons in each strand are connected together and the plurality of strands of neurons are radially arranged, wherein the neuron at one end of each of the plurality of strands of neurons is connected to the summing neuron, the SNN configured for receiving the projected pixelated image, wherein each projected pixel of the
  • a method of identifying an object by a battery-powered apparatus comprising: capturing an image of the object by an image capturing unit; pixelating the image; projecting the pixelated image onto a dynamic neural network (DNN), the DNN comprises a plurality of neurons, wherein each neuron comprises electronic components and/or software codes for generating an electrical output, wherein each pixel of the projected pixelated image is mapped onto a neuron of the plurality of neurons such that an active pixel generates the electrical output; distributing the generated electrical outputs of the neurons amongst neurons which surround the neurons generating the electrical output according to a Gaussian function, normalising the distributed electrical outputs in the neurons; thresholding the normalised distributed electrical outputs, wherein the electrical output of each neuron of the plurality of neurons is based on the distributing, the normalising and the thresholding steps; connecting adjacent neurons of the plurality of neurons based on the electrical outputs of the adjacent neurons, wherein a neuron of the adjacent neurons
  • a battery- powered apparatus for identifying an object
  • the apparatus comprising: an image capturing unit for capturing an image of the object; an image processing unit for receiving the object of the image, pixelating the second image, and projecting the pixelated image; a dynamic neural network (DNN) comprises a plurality of neurons, wherein each neuron comprises electronic components and/or software codes for generating an electrical output, wherein each pixel of the projected pixelated image is mapped onto a neuron of the plurality of neurons such that an active pixel generates the electrical output, the DNN configured for receiving the projected pixelated image, wherein each pixel of the projected pixelated image is mapped onto a neuron of the plurality of neurons such that an active pixel generates an electrical output; distributing the generated electrical outputs of the neurons amongst neurons which surround the neurons generating the electrical output according to a Gaussian function; normalising the distributed electrical outputs in the neurons; thresholding the normalised distributed electrical outputs, wherein the electrical output
  • a computer program product including a computer readable medium having recorded thereon a computer program for implementing any one of the methods described above.
  • Fig. 1 shows a real-world application of an object recognition system
  • Fig. 2 shows the components of the object recognition system of Fig. 1 ;
  • Fig. 3A is a first implementation of the object recognition system using a static neural network
  • Fig. 3B is a flow diagram for a method for processing an image for use by the object recognition system of Fig. 3 A;
  • Fig. 4 shows an example of centring an object on an image for use by the object recognition system of Fig. 3 A;
  • Fig. 5A is a static neural network of the object recognition system of Fig. 3 A;
  • Fig. 5B shows an alternative pattern for the static neural network of Fig. 5 A;
  • Fig. 6 is a strand of neurons of the static neural network of Fig. 5 A;
  • Fig. 7A is a schematic diagram for a neuron of the static neural network of Fig.
  • Fig. 7B is an implementation of the neuron of Fig. 7 A on an analogue circuit
  • Figs. 8A and 8B are examples of Time Series Sequence (TSS) generated by the object recognition system of Fig. 2;
  • TSS Time Series Sequence
  • Fig. 9 A is a second implementation of the object recognition system using a dynamic neural network
  • Fig. 9B is a flow diagram for a method for processing an image for use by the object recognition system of Fig. 9A;
  • Fig. 10 is the dynamic neural network of the object recognition system of Fig. 9A;
  • Figs. 1 1 A to 1 1C are the neurons for use by the dynamic neural network of Fig.
  • Fig. 12 is a flow diagram for a method for use by the dynamic neural network of Fig. 10 to generate a Time Series Sequence for an object;
  • Figs. 13A to 13E are neuron connectivity of the dynamic neural network for performing Gaussian Filtering
  • Figs. 14A to 14D are examples of an image undergoing Gaussian Filtering iterations
  • Fig. 15 is a flowchart of a method for generating a layer of neurons with a vector field.
  • Fig. 16 is an implementation of the layer of neurons with the generated vector field of Fig. 15;
  • Fig. 17 shows a vector field of the dynamic neural network to recognise an object.
  • Fig. 1 shows a missile 1 having an image capturing unit 8 and an object recognition system (ORS) 10 for identifying objects (e.g., a jet 3 and flare counter- measures 5).
  • the missile 1 also has a missile control unit (MCU) 11 that controls the flight path of the missile 1.
  • ORS object recognition system
  • MCU missile control unit
  • the image capturing unit 8 capture images 6 of the area which the missile 1 is heading toward.
  • the jet 3 may deploy flare counter-measures 5 to 'trick' the missile 1 into hitting the flares 5.
  • the image 6 captured by the image capturing unit 8 includes, inter alia, the jet 3 and the flare counter-measures 5.
  • the ORS 10 processes the captured image 6 and identifies each object 3, 5 in the captured image.
  • the processing of captured images 6 and recognising objects 3, 5 by the ORS 10 is described in detail hereinafter.
  • the MCU 1 1 uses the identification of the objects 3, 5 to determine a flight path toward the intended target jet 3. Thus, the missile 1 avoids being 'tricked' into hitting the flare counter-measures 5.
  • the application of the ORS 10 in Fig. 1 is an example.
  • the ORS 10 may be deployed in other applications such as, inter alia, a robotic eye for rapid identification of objects, robotic and UAVs navigation system for recognising key features in visual scenes, mobile security systems where power and size are important considerations, remote scientific monitoring where access is limited and battery changes restrictive, etc.
  • Fig. 2 is an implementation of the ORS 10 including an image processing unit 12, a neural network (NN) 14 and a pattern identification unit 16.
  • the image processing unit 12 receives a captured image 6 from the image capturing unit 8 and processes the received image 6 into an image format that the NN 14 can process. The processed image is then projected onto the NN 14.
  • the NN 14 processes the projected image to output a Time Series Signature (TSS).
  • TSS Time Series Signature
  • a TSS is a time-series pattern identifying an object's features and is unique to different objects.
  • the pattern identification unit 16 is capable of matching an identified TSS with a TSS stored in a database.
  • a Polychronous Network (PCN) 16 is used.
  • the PCN 16 which is an artificial neural network, is capable of firing a neuron indicating that a time-series pattern (i.e., the TSS) is known when the PCN 16 recognizes that the time-series pattern has been previously identified.
  • the functionality of the PCN 16 is similar to a content addressable memory.
  • the pattern identification unit compares the received TSS with a TSS database to determine the identity of the object 3, 5.
  • the first implementation utilises a static neural network (SNN) 14A as the NN 14, shown in Fig. 3 A.
  • the second implementation utilises a dynamic neural network (DNN) 14B as the SNN 14, shown in Fig. 9A.
  • SNN static neural network
  • DNN dynamic neural network
  • Fig. 3B shows a flow chart of a method 300 for processing of captured images 6 by the image processing unit 12.
  • the method 300 commences with step 302 by centring object 3, 5 on the captured image 6.
  • images 6 are captured by the image capturing unit 8 which is, for examples, an analogue camera, a digital camera, etc.
  • the image processing unit 12 employs a salience detection method to detect objects 3, 5 in the captured image 6 to centre the object 3, 5 in an image.
  • the salience detection method processes the captured image 6 to recognise areas in the image 6 belonging to an object 3, 5.
  • the salience detection method determines areas of activities in a captured image 6 by transforming each object 3, 5 into a circle which the image processing unit 12 determines as the object 3, 5.
  • Fig. 4 shows the result of the image processing unit 12 employing a salience detection method to identify areas in the image 6 belonging to objects 3, 5 and creating separate images 6A to 6D of each object 3, 5 so that each separate image 6A, 6B, 6C, 6D has one object 3, 5 centred on each image 6A, 6B, 6C, 6D.
  • Each separate image 6A, 6B, 6C, 6D is then processed by the SNN 14A and the PCN 16 to identify each object 3, 5.
  • salience detection methods are discussed in "A Model of Saliency- Based Visual Attention for Rapid Scene Analysis," by L. Itti, C. Koch, and E. Niebur in IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No.
  • the salience detection method is performed by the SNN 14 A.
  • the image processing unit 12 in conjunction with a salience detection method and the image capturing unit 8 capture multiple images 6A, 6B, 6C, 6D with each captured image 6A, 6B, 6C, 6D having an object 3, 5 centred in the captured image 6A, 6B, 6C, 6D.
  • a laser targeting system determines the range of an object and a mechanical system moves the image capturing unit 8 until the object 3, 5 is centred in each captured image 6 A, 6B, 6C, 6D. Processing continues at step 304.
  • each image 6A, 6B, 6C, 6D is pixelated such that each pixel can be mapped onto a neuron of the SNN 14A.
  • a one-to-one mapping of a pixel to a neuron of a neural network is called retinotopical mapping.
  • the pixels of a pixelated image can be classified into active and passive pixels. Active pixels are pixels which represents a part of an object's feature. Passive pixels are pixels that do not represent any part of an object's feature. For example, in a black and white image of a fish, the pixels representing the outline of the fish are black whilst the pixels representing the background of the image are white. In this example, the black pixels are the active pixels and the white pixels are the passive pixels.
  • the images 6A, 6B, 6C, 6D may be filtered by a high-pass filter to accentuate the outline of the object 3, 5 to be identified.
  • the neurons and other details of the SNN 14A are described further hereinafter in relation to Figs. 5 to 8.
  • the method 300 then advances to step 306.
  • step 306 the pixelated image of step 304 is projected onto the SNN 14 A.
  • the pixelated image Before the pixelated image is projected onto the neurons of the SNN 14 A, the pixelated image can be further processed by other filters (e.g., high-pass filter, Gabor filter, etc.) to further accentuate the features of the captured image 6, 6A, 6B, 6C, 6D.
  • filters e.g., high-pass filter, Gabor filter, etc.
  • the image processing unit 12 outputs either a current or a voltage for each active pixel, whilst a passive pixel has no output.
  • the image processing unit 12 outputs a "1" for an active pixel and a "0" for a passive pixel.
  • the method 300 concludes at the completion of step 306.
  • Fig. 5 A shows a SNN 14A for processing the pixelated image generated by the output of the image processing unit 12.
  • the SNN 14A comprises a number of neuron strands 501 and a summing neuron 504.
  • Each strand 501 comprises a number of neurons 502, connected together by connections 503, being arranged radially as shown in Fig. 5A.
  • Fig. 5B shows an alternative pattern for arranging the strands 501.
  • a neuron 502 is a circuit or a software program for generating a pulse if an active pixel is mapped onto the neuron.
  • the neurons 502 used by the SNN 14A are spiking neurons, which means that each neuron 502 outputs a bi-stable pulse (e.g., a high or a low voltage/current output, a binary "1" or "0" output). Implementations of the neurons 502 are described in detail in relation to Fig. 7.
  • Fig. 6 shows a connection of one strand 501 of the plurality of neurons 502 connected to a central neuron 505 and a summing neuron 504 of the SNN 14A of Fig. 5 A.
  • One end of each of the plurality of neuron strands 501 is connected to the summing neuron 504.
  • the neuron located at the periphery of the SNN 14A is connected to the summing neuron 504.
  • the neuron located at the periphery of the SNN 14A is referred to as 502P hereinafter.
  • 5A is located at the centre of the SNN 14A, is then connected to a centre neuron 505.
  • Each neuron 502, 504, 505 is connected to another neuron 502, 504, 505 by the connection 503, which is a collection of wires for connecting the inputs and outputs of corresponding neurons 502, 504, 505.
  • neurons 502 corresponding to active pixels are activated.
  • the operation of the SNN 14A is governed by a time-clock (not shown) such that, every At, an activated neuron 502 propagates the activation to a subsequent neuron on the strand 501.
  • the neuron activation is propagated to the periphery neurons 502P.
  • the periphery neuron 502P is activated and the summing neuron 504 sums the total number of periphery neurons 502P activated after ⁇ .
  • the series of sum of activated periphery neurons 502P at different intervals of At is called a Time Series
  • a neuron 510 is turned on as the corresponding pixel is active. After lAt, as determined by a time clock (not shown), neuron 512 is activated.
  • the periphery neuron 514 is activated.
  • the summing neuron 504 sums the total number of the periphery neurons 502P being activated at time 5At.
  • the centre neuron 505 is omitted and the plurality of neuron strands 501 is not connected at the centre of the SNN 14 A. This implementation does not affect the output TSS materially.
  • the summing neuron 504 is located at the central neuron 505 and the propagation of neuron activation is in the opposite direction
  • the TSS generated by such an implementation is the reverse of the TSS generated by the SNN 14A of Fig. 5A.
  • Fig. 7A is a schematic diagram of the neuron 502 including an input SI for receiving a previous neuron's pulse output p, a pixel input S2 for receiving a signal of a corresponding pixel, a threshold input value Y, a control signal W and a pulse output p.
  • the inputs SI and S2 are either a voltage or a current for a hardware implementation or a binary "0" or "1" for a digital implementation.
  • a pulse output p is generated if either input
  • SI or S2 is greater than the threshold input value Y.
  • the time taken by the neuron 502 to output a pulse output p when either input SI or S2 changes is modulated by the control signal W.
  • the value of the user-determined control signal W determines the response time of the neuron 502. This implementation is called a leaky, integrate and fire (LIF) neuron.
  • LIF leaky, integrate and fire
  • the pulse output p is used as an SI input for a subsequent neuron 502 to effect the propagation of neuron activation.
  • Fig. 7B is an analogue implementation of the neuron 502 including two neuron activation circuits 702 and 704 and a comparator circuit 710.
  • Each of the neuron activation circuits 702, 704 comprises a MOSFET switch 703 and an SR (Set-Reset) latch 706 and
  • Each of the SR latches 706 and 708 comprises an S input, an R input and a Q output.
  • the S input is connected to a previous neuron output SI for the SR latch 706, whilst the S input for the SR latch 708 is connected to a corresponding pixel for the neuron 502.
  • the R input is connected to the output of the comparator circuit 710.
  • the Q output is connected to the MOSFET switch 703.
  • the Q output is activated (i.e., a "1" output) and the MOSFET switch 703 is activated thereby allowing current X to flow and at voltage node Vmem, the currents X and W are added.
  • the current W flows out of the voltage node Vmem and, hence, the net current is X - W.
  • the control signal W is implemented as a leakage current such that input current X persists on the voltage node Vmem for a limited amount of time.
  • the comparator circuit 710 comprises a comparator 712 having inputs from a threshold value Y and the voltage node Vmem and associated comparator circuit.
  • the comparator 712 compares the value of Vmem and Y. If the value of Vmem is higher than Y, a pulse output p is generated. Otherwise, no pulse output p is generated.
  • the pulse output p is connected to a Reset MOSFET switch 708 and the R input of the SR latches 706, 708 so that a generated pulse output p resets the voltage node Vmem to zero and reset the SR latches 706, 708.
  • the voltage node Vmem and the outputs Q are therefore reset to zero after the comparator 712 outputs a pulse output p to ensure the Vmem and the SR latches are prepared for the next propagation of the neurons activation.
  • Fig. 7B is one example of many possible implementations to implement a LIF neuron in analogue circuits.
  • the SNN 14A is formed by a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC).
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • the LIF neuron can be implemented by programming the FPGA or the ASIC as an adder circuit.
  • the equation for implementing the LIF neuron is:
  • C is a constant scaling factor that can be implemented by a Look-Up Table (LUT).
  • LUT lists all possible divisions with associated results such that the program executed by the FPGA refers to the LUT to output a result for the division by C.
  • Y is the threshold input value
  • p is the output of the eqn. 1
  • X is either the pixel input value or an input from a pulse output p of a previous neuron 502
  • W is a user-determined leak term that determines the responsiveness of the neuron 502.
  • the pulse output p is generated with a compare function.
  • the image processing unit 12 outputs a predetermined bit stream format to be input to the FPGA or the ASIC via a Serial
  • SPI Peripheral Interface
  • Fig. 7C shows an implementation of the summing neuron 504.
  • the summing neuron 504 is similar to the neurons 502, except the summing neuron 504 has no comparator and associated pulse generation.
  • the summing neuron 504 includes n number of neuron activation circuits 740 (i.e., 740A, 740B,..., 740N), a capacitor C and a leak current nW.
  • the number n of neuron activation circuits 740 corresponds to the number of periphery neurons 502P connected to the summing neuron 504.
  • Each neuron activation circuit 740 includes an SR latch 742, a MOSFET switch 744 and a current X.
  • the SR latch 742 comprises an S input, an R input and a Q output.
  • the S input is connected to one of the periphery neurons 502P, whilst the R input is connected to a global clock (not shown) that periodically sends a reset signal to the SR latches 742.
  • the SR latches 742 are reset to prepare the SR latches 742 to sum the neuron activation of the periphery neurons 502P at the next At.
  • the Q output is connected to the MOSFET switch 744.
  • the Q output is activated (i.e., a "1" output) and the MOSFET switch 744 is activated thereby allowing current X to flow.
  • the currents X from the number n of activated periphery neurons 502P are added on the capacitor C.
  • Vmem gives the TSS output of the DNN 14B.
  • the current nW is a leakage current similar to that described in the LIF neuron of Fig. 7B hereinbefore and is determined such that summed currents X persists on the voltage node Vmem for a limited amount of time such that the sum of currents X has disappeared on a subsequent summing of currents X at the next At.
  • Time normalization is an additional step that can be performed when the pixelated image is projected onto the SNN 14A, and is used to normalize different size images on the SNN 14A in order to generate normalised TSS.
  • the total current drawn by the neurons 502 when the pixelated image is projected onto the SNN 14A is measured.
  • the current drawn by each neuron 502 is individually monitored.
  • a large image translates to a high activation level from the neurons 502.
  • the propagation of neurons activation along the strands 501 of the SNN 14A needs to be accelerated by decreasing the level of the leak current W.
  • a small image translates to a low activation level from the neurons 502.
  • the leak current W is increased to slow down propagation of neuron activation.
  • the measurement of neurons activity is proportional to the current drawn by the neurons 502.
  • the Time Series Sequence is the output of the summing neuron 504.
  • Figs. 8A and 8B are examples of the TSS generated by the summing neuron 504.
  • the TSS is unique to different objects, is easy to store and is an efficient way of representing 2D images in hardware. As shown in Fig. 8A, rotating an object produces very similar TSS. Similarly, Fig. 8B shows that similar objects with different sizes also generate similar TSS. Hence, the recognition of objects by the ORS 10 is not dependent on the object's rotation or scale.
  • the TSS generated by the SNN 14A is then input to the pattern identification unit 16, which in this implementation is a PCN.
  • the PCN 16 is a pattern identification unit capable of matching an identified pattern with a pattern stored in a database of the PCN 16. Each pattern stored in the database is associated with a particular object.
  • the PCN 16 is an artificial neural network that is capable of learning and recalling spatio-temporal patterns (e.g., the TSS).
  • the PCN is described in several articles, one entitled “Polychronization: Computation with Spikes” by E.M. Izhikevich and the other entitled “An analogue VLSI implementation of polychromous spiking neural networks" by Runchun (Mark) Wang et al., the contents of which are incorporated herein by reference in their entirety.
  • the PCN 16 functions by learning patterns of TSS and identifying that a TSS pattern presented to the PCN 16 is associated with a previously learned TSS pattern.
  • the PCN can either be trained by presenting the PCN 16 with the TSS of the object to be learned (i.e., supervised learning), or the PCN 16 can learn unsupervised.
  • the supervised method is favoured due to the speed with which patterns can be learned.
  • the unsupervised method is not as accurate as the supervised method. However, the unsupervised method allows an autonomous apparatus (e.g., autonomous vehicle, autonomous robot, etc.) to learn independently.
  • Fig. 9A is an ORS 10 using the DNN 14B.
  • the DNN 14B is capable of detecting an object 3, 5 from a captured image 6 without having to centre the object 3, 5 on the image 6. Further, the DNN 14B may utilise non-spiking neurons which allows a more detailed image to be processed.
  • Fig. 9B shows a flow chart of a method 900 for processing of captured images 6.
  • the method 900 commences with step 902 where the captured image 6, 6A, 6B, 6C, 6D is pixelated such that each pixel can be mapped onto a neuron 902 of the DNN 14B.
  • images 6 are captured by the image capturing unit 8 which is, for examples, an analogue camera, a digital camera, etc.
  • captured images 6, 6A, 6B, 6C, 6D may be further processed (e.g., high-pass filter, Gabor filter) before or after pixilation. The further processing is performed to accentuate particular features of the captured image 6, 6A, 6B, 6C, 6D (e.g.. horizontal lines, vertical lines, diagonals, etc.).
  • Step 902 then proceeds to step 904.
  • each active pixel is represented by either a current or a voltage and a passive pixel has no output.
  • each active pixel is represented by a "1", whilst a passive pixel is represented by a "0".
  • Each pixel may also be
  • the method 900 concludes at step 904.
  • Fig. 10 shows an example of a grid like two-dimensional (2-D) sheet of neurons 902 of the DNN 14B. No shape is imposed on the 2-D sheet of neurons 902, unlike the SNN 14A. A projected pixelated image, which is the output of the method 900, is mapped onto the 2-D sheet of neurons 902.
  • Fig. 1 1 A shows a schematic diagram of a neuron 902.
  • the neuron 902 includes a pixel input X, a threshold value Y and a leakage current W.
  • the pixel input X is either a voltage or a current for active pixels.
  • the threshold value Y is either a predetermined current or a predetermined voltage. If the pixel input X is greater than the threshold value Y, a pulse output p is generated. The width of the pulse p is predetermined.
  • the time between a change in the pixel input X and a pulse output p being generated is modulated by the leakage current W.
  • the leakage current W is controlled external to the neuron 902 and is not shown here.
  • the neuron 902 is called a leaky, integrate and fire (LIF) neuron.
  • Fig. 1 1 B is an analogue circuit of a spiking neuron 902 A having a pixel input X, a threshold value Y and a leakage current W.
  • the pixel input X is a current
  • the threshold value Y is a voltage
  • the leakage current W is a current
  • Vmem is a voltage node where the currents of the pixel input X and the leakage current W are added.
  • the operation of the spiking neuron 902A is similar to the operation of the neuron 502 as described in relation to Fig. 7B.
  • Fig. 1 IB is one of many implementations of a LIF neuron.
  • the LIF neuron can be implemented as per the equation 1 , as described hereinbefore.
  • Fig. 1 1 C is an analogue circuit of a non-spiking neuron 902B having a pixel input A and a capacitor C.
  • the pixel input A is a current that is received from a corresponding pixel and is input into the capacitor C.
  • a leakage current B draws current from the capacitor C.
  • the current B similar to the current W of Fig. 1 IB, is set external to the neuron 902 and determines the rate at which voltage node Vmem charges.
  • the current A is the sum of currents received from other neurons 902 connected to the neuron 902.
  • the currents of the neurons 902 in the analogue implementation of the DN 14B are added by summing the currents at a voltage node Vmem.
  • the voltage node Vmem which is the sum of currents A and B, is the output of the non-spiking neuron 902B.
  • the digital implementation of the non-spiking neuron 902B is the same as the spiking neuron 902A but without the comparison function (i.e., the "if statement in equation 1 is removed).
  • Fig. 12 is a flow diagram of a method 1200 for the DNN 14B to generate a TSS for the objects 3, 5 of the captured image 6.
  • the method 1200 commences at step 1202 by mapping the projected pixelated image of the image processing unit 12 onto a first layer of 2-D sheet of neurons 902 of the DNN 14B.
  • the mapping of the pixels onto the first layer of neurons 902 transforms the pixelated image into activation of neurons 902.
  • Step 1202 then proceeds to step 1204.
  • the activated neurons 902 are filtered using a Gaussian function or an approximate Gaussian function.
  • the Gaussian filtering operation is used to effectively 'blur' the pixelated image that has been projected onto the neurons 902.
  • Gaussian Filtering on the DNN 14B is performed in two steps.
  • the first step is to retinotopically mapp the first layer of neurons 902 to a second layer of neurons.
  • the second step is to distribute the current in each neuron in the second layer to multiple surrounding neurons in the second layer.
  • the connections between each neuron in the second layer to the multiple surrounding neurons in the second layer are weighted by the distance between the neuron and each of the surrounding neurons.
  • Gaussian Filtering is performed by distributing the current of a neuron in the second layer onto multiple surrounding neurons in the second layer depending on the distance-weighted scale.
  • Fig. 13A shows parallel pathways between the neurons 902 in the first layer to the neurons 910 in the second layer.
  • the neurons 902 employed can be either a spiking neuron 902 A or a non-spiking neuron 902B.
  • the neurons 910 employed can be either a spiking neuron 91 OA or a non-spiking neuron 910B.
  • each neuron 902 in the first layer is retinotopically mapped to neurons 910 in the second layer with connectivity that is weighted with the inverse of distance to effect the first step of the Gaussian Filtering process. Further, each neuron 902 in the first layer is further connected to multiple neurons 910 in the second layer to effect the second step of the Gaussian Filtering process. This implementation accelerates the first iteration of the Gaussian Filtering as the two steps are performed simultaneously.
  • the equation for connecting each neuron 902 to neurons 910 is: (eqn. 2)
  • Cw determines the amount of current input to the neurons 910 from the neuron 902.
  • Fig. 13B shows an example implementation of the parallel connectivity between neurons 902 and neurons 910 using non-spiking neurons 902B to implement a Gaussian Filtering.
  • Neuron 902 spreads the current output to adjacent neurons 910.
  • the size of the transistors M of neurons 910 that sink the current from neuron 902 is scaled with distance, and neurons 910 that are a distance d have transistor sizes that are scaled by the distance d.
  • the current/value of a neuron 902 is spread out over a number of neurons 910 resulting in a 'blurring' of the pixelated image.
  • the distributed current in neurons 910 is then returned to each of the neurons 910 itself by transistor RT, which completes a first iteration of the Gaussian Filtering. Each of the returned currents is further processed by the next iteration of Gaussian Filtering.
  • Fig. 13C shows an example implementation of a neuron 910 in the second layer using spiking neurons 902A.
  • the circuit of Fig. 13C includes neuron activation circuits 932 and 934, 936 which correspond to neuron 902 and surrounding neurons 910, respectively.
  • Each neuron activation circuit 932, 934, 936 functions in the same way as described in relation to neuron activation circuit 702, 704 of Fig. 7B.
  • the current X of neuron activation circuit 934 is the current from a corresponding retinotopically mapped neuron 902.
  • the currents X of neuron activation circuits 934 and 936 are currents from surrounding neurons 910 which have been scaled according to the distance d from the neuron 910.
  • the overall function of the circuit of Fig. 13C is the same as the circuit of Fig. 7B.
  • Figs. 13D and 13E show a neuron 910 connected to multiple surrounding neurons 910 to approximate the Gaussian filtering.
  • neuron 940 is the neuron 910 which current is to be distributed and neurons 941 are the surrounding neurons 910.
  • Fig. 13D shows insufficient connectivity to achieve Gaussian filtering.
  • Fig. 13E shows adequate connectivity between neurons 910 to achieve Gaussian filtering.
  • the minimum connectivity required to achieve the Gaussian filtering is 8 connections per neuron 910. If the connectivity is greater than 8 connections, the connectivity must spread out from the central neuron 910 in a circular manner. Step 1204 then proceeds to step 1206.
  • step 1206 normalization is performed on the neurons 910. As shown in Fig.
  • normalization operation determines the maximum current (or maximum voltage Vmem) in any of the neurons 910 and divides the values of all the currents (or the voltage node Vmem) in the other neurons 910 by the determined maximum value. Thus, the maximum current level is restored to unity.
  • the implementations of the Gaussian filtering as per Figs. 13B and 13C are costly because of the requirement to measure and compare the current of each neuron 910 during the normalization process.
  • the normalization process can be implemented in a circuit by finding a local maximum to determine the maximum current.
  • the average current/voltage could be determined and all the
  • Step 1206 proceeds to step 1208.
  • a thresholding operation is performed on the neurons 910.
  • the thresholding operation compares all the current/voltage values for the neurons 910 and deactivates neurons 910 which current/voltage is smaller than a predetermined value.
  • the thresholding operation is performed by setting the threshold value Y.
  • the thresholding operation is performed by subtracting the output of circuit 910B with a threshold current/voltage value. If the threshold current value is larger than the output of circuit 910B, the output of the circuit 910B falls to 0 and the neuron 910 is deactivated.
  • Step 1208 proceeds to step 1209.
  • step 1209 the number of Gaussian filtering is determined. If the Gaussian Filtering is determined to be insufficient (NO), step 1209 proceeds to step 1204 (i.e., steps 1204 to 1208 are repeated). Eight iterations of the Gaussian
  • the Gaussian Filtering may produce a region of activity not of a circular nature or produce several regions of activity.
  • the Gaussian filtering produces a circular region of neuron activation around an object 3, 5 in the image 6, 6A, 6B, 6C, 6D. The repetition of these operations ensures that only one object 3, 5 is focused on.
  • Figs. 14A to 14D show a pixelated image (i.e., regions of neurons activation) going through different number of Gaussian filtering iterations.
  • Fig. 14A shows a pixelated image before undergoing Gaussian filtering having multiple dark spots.
  • Fig. 14B shows a pixelated image after undergoing two iterations of Gaussian filtering.
  • there are two dark spots i.e., regions of activity
  • the DNN 14B can inhibit the activity of the neurons 14B that were connected in the vector field.
  • the Gaussian Filtering operation results in another object being focused on. Different regions of activity can be used to inhibit areas of the image in subsequent iterations of recognition until the entire scene is recognised.
  • Fig. 14C shows an intermediate step where the region of activity is not yet circular.
  • the region of activity needs to be circular in order to obtain a rotation/scale invariant TSS (i.e., a TSS pattern not affected by rotation or scaling of the object 3, 5).
  • Fig. 14D shows a pixelated image after undergoing eight iterations of Gaussian filtering, normalisation and thresholding. As can be seen in Fig. 14D, the region of activity is circular.
  • step 1209 proceeds to step 1210.
  • a second layer of neurons 902 with vector field is generated.
  • Fig. 15A is a flowchart of a method 1500 for generating the second layer of neurons 902 with the vector field.
  • the method 1500 commences at step 1502 where the Gaussian filtered regions of neuron activations are further filtered using a Gaussian function.
  • the further Gaussian Filtering performed at step 1502 is for accentuating the current distribution of the neurons 902 to get directionality for the vector field to be generated.
  • Fig. 15B shows the current distribution in the neurons 902 of the DNN 14B after Gaussian Filtering at step 1502 is performed.
  • the z-axis of Fig. 15B shows the current level, the plane of the x-y axes is the 2-D sheet of neurons 902.
  • the centre of the object is represented by a maximum current 1502
  • the remaining parts of the object is represented by decreasing amount of current as compared to the maximum current 1502.
  • Gaussian Filtering is not performed at step 1502
  • the current distribution of neurons 902 is less like a little mountain and more like a mesa (i.e., a flat top).
  • the process continues at step 1504.
  • the required vector field is determined by connecting neurons 902 to other neurons 902 having larger current/voltage. For example, according to the current distribution of Fig. 14E, the neurons 902 connect toward the centre of the object where the current is the largest such that the propagation of the neuron activation according to the vector field is toward the centre of the object to be identified. Once a vector field is determined, the method 1500 proceeds to step 1506.
  • a second 2-D sheet of neurons 902 with a vector field which is determined at step 1504, is generated.
  • the vector field is established by performing comparison functions of adjacent neurons 902.
  • the currents in neurons 902 are determined such that the lesser neuron 902 (i.e., neuron 902 with less current) is connected to the greater neuron 902 (i.e., neuron 902 with more current) and that current can only travel from the lesser neuron 902 to the greater neuron 902.
  • Fig. 16A is a hardware implementation to determine the vector field between two adjacent neurons A and B.
  • the hardware implementation of the vector field includes the neuron A having a current A, the neuron B having a current B, a comparison circuit 1610 and power supply VDD, whilst Fig. 16B is the connectivity between the two adjacent neurons A, B.
  • Fig. 16A if the current A is greater than the current B, a voltage node 1602 increases above VDD/2 and the comparison circuit 1610 generates a high output for SAB- Conversely, if the current B is greater than the current A, the voltage node 1602 decreases below VDD/2 and the comparison circuit 1610 generates a high output for SBA. Thus, the circuit of Fig. 16A outputs SAB, SBA and either SAB or SBA is high at a time.
  • Fig. 16B is an example of an implementation for the connectivity of the neurons A and B having a first unidirectional connection from neuron A to B with a switch SWBA, and a second unidirectional connection from neuron B to A with a switch SW A B . If the output SAB is high, the switch SWAB is closed and connection from neuron B to neuron A is established, otherwise (i.e., SBA is high), the switch SWBA is closed and connection from neuron A to neuron B is established.
  • a second layer of 2D sheet of neurons 902 is generated by resetting the currents in the neurons 902 (i.e., currents of the neurons 902 are set to zero) and leaving the connectivity intact. Step 1210 proceeds to step 1212.
  • the originally captured image 6 is input to the second layer of 2-D sheet of neurons 902 with vector field of step 1210.
  • Neurons 902 that are not connected to other neurons 902 i.e., neurons having no current
  • Fig. 17 is a high-level view of the neuron activation propagation for an object A.
  • Step 1212 proceeds to step 1214.
  • a TSS is generated by the central neuron similar to that of the SNN 14A. The generated TSS is then processed by the PCN as described hereinbefore.

Abstract

A method of rapidly identifying an object (3, 5) by a battery-powered apparatus (1) using an artificial neural network (14). An image (6) of the object (3, 5) is captured by an image capturing unit (8). The captured image (6) is then processed by an image processing unit (12) to be in a format that the artificial neural network (14) is able to process. The processed captured image is projected onto the artificial neural network which produces a Time Series Signature (TSS) pattern. A pattern identification unit (16) then identifies the object (3, 5) based on the generated TSS pattern.

Description

Real-time rotation, shift, scale and skew visual recognition system
Cross-Reference to Related Applications
The present application claims priority from the Australian provisional application 2013900288 filed on 30 January 2013 with NewSouth Innovations Pty Limited being the applicant and the contents of which are incorporated herein by reference.
Technical Field
The present invention relates to recognizing objects rapidly using artificial neural networks.
Background
Object recognition software requires a large amount of electrical power and is slow due to extensive, sequential mathematical operations. Such a large power consumption is, however, impractical for portable, battery-operated computers. The problem is further compounded when battery-operated computers typically utilize a central processing unit (CPU) that is less powerful than its mains-power-operated counterpart, thereby increasing computational time.
For example, a missile targeting a flying object (e.g., a plane) may encounter counter-measures (e.g., decoy flares) deployed by a plane to 'trick' the missile into hitting the flares. In such circumstances, the missile may employ object recognition software to discern the intended target from the decoys to prevent being 'tricked'. However, conventional power-hungry, slow object recognition software is impractical due to the missile's limited amount of power (i.e., battery-operated) and time-constraints in identifying objects, which means the missile may hit the flare long before any object is recognised.
In another example, a rapid, low-power object recognition software can be used by autonomous robotics to assist in navigation, fetching, cleaning and interacting with objects. Much of the current inability of autonomous robots to make a significant impact in the daily lives of people is the lack of ability to interact in real world environments. A vision system employing such a rapid, low-power object recognition system facilitates interaction with the environment without increasing the size and power requirements of the autonomous robot.
Thus, a need exists to provide rapid, accurate visual recognition software that can run on battery-operated CPUs.
Summary It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.
Disclosed are arrangements which seek to address the above problems by employing artificial neural networks to rapidly and accurately recognise objects. Aspects of the present disclosure provide fast, low-power and real-time object recognition system. Aspects of the present disclosure can be used in applications where power, size and speed are of paramount importance, for example, missile defence, unmanned aerial vehicle (UAV) (for military applications and civilian applications), autonomous vehicles, robotics (e.g., nano-robotics), etc.
Another aspect of the present disclosure requires no salience detection and is capable of recognizing a number of objects either simultaneously or sequentially. Another aspect of the present disclosure requires no image centring mechanism and can
dynamically reconfigure the artificial neural network to recognise different objects in a single image.
According to a first aspect of the present disclosure, there is provided a method for a method of identifying an object by a battery-powered apparatus, the method comprising: capturing a first image of the object by an image capturing unit; transforming the first image to centre the object in a second image; pixelating the second image; projecting the pixelated image onto a static neural network (SNN), the SNN comprises a plurality of strands of neurons and a summing neuron, wherein each neuron of the plurality of strands of neurons comprises electronic components and/or software codes for generating a pulse output and the summing neuron comprises electronic components and/or software codes for summing pulse outputs, wherein the neurons in each strand are connected together and the plurality of strands of neurons are radially arranged, wherein the neuron at one end of each of the plurality of strands of neurons is connected to the summing neuron, wherein each projected pixel of the projected pixelated image is mapped onto a neuron of the plurality of strands of neurons such that an active pixel of the projected pixelated image generates the pulse output, wherein the pulse output in each of the plurality of strands is propagated toward the neuron connected to the summing neuron, and wherein the summing neuron sums the pulse output of the neurons connected to the summing neuron; generating a Time Series Signature (TSS) pattern by the summing neuron based on the sum of the pulse output; and identifying of the object by a pattern identification unit based on the generated TSS pattern.
According to a second aspect of the present disclosure, there is provided a battery- powered apparatus for identifying an object, the apparatus comprising: an image capturing unit for capturing a first image of the object; an image processing unit for receiving the object of the first image, transforming the first image to centre the object in a second image, pixelating the second image, and projecting the pixelated image; a static spiking neural network (SNN) comprises a plurality of strands of neurons and a summing neuron, wherein each neuron of the plurality of strands of neurons comprises electronic components and/or software codes for generating a pulse output and the summing neuron comprises electronic components and/or software codes for summing pulse outputs, wherein the neurons in each strand are connected together and the plurality of strands of neurons are radially arranged, wherein the neuron at one end of each of the plurality of strands of neurons is connected to the summing neuron, the SNN configured for receiving the projected pixelated image, wherein each projected pixel of the projected pixelated image is mapped onto a neuron of the plurality of strands of neurons such that an active pixel of the projected pixelated image activates the neuron, and wherein the neuron activation in each of the plurality of strands is propagated toward the neuron connected to the summing neuron; summing the pulse output of the neurons connected to the summing neuron; and generating a Time Series Signature (TSS) pattern by the summing neuron based on the sum of the pulse output; and a pattern identification unit for identifying the object based on the generated TSS pattern.
According to another aspect of the present disclosure, there is provided a method of identifying an object by a battery-powered apparatus, the method comprising: capturing an image of the object by an image capturing unit; pixelating the image; projecting the pixelated image onto a dynamic neural network (DNN), the DNN comprises a plurality of neurons, wherein each neuron comprises electronic components and/or software codes for generating an electrical output, wherein each pixel of the projected pixelated image is mapped onto a neuron of the plurality of neurons such that an active pixel generates the electrical output; distributing the generated electrical outputs of the neurons amongst neurons which surround the neurons generating the electrical output according to a Gaussian function, normalising the distributed electrical outputs in the neurons; thresholding the normalised distributed electrical outputs, wherein the electrical output of each neuron of the plurality of neurons is based on the distributing, the normalising and the thresholding steps; connecting adjacent neurons of the plurality of neurons based on the electrical outputs of the adjacent neurons, wherein a neuron of the adjacent neurons with a lower electrical output is unidirectionally connected to a neuron of the adjacent neurons with a higher electrical output; propagating the electrical outputs in accordance with the connections established in the connecting step; generating a Time Series Signature (TSS) pattern by a neuron of the plurality of neurons with the highest electrical output by summing the electrical outputs; and identifying the object by a pattern identification unit based on the generated TSS pattern. According to another aspect of the present disclosure, there is provided a battery- powered apparatus for identifying an object, the apparatus comprising: an image capturing unit for capturing an image of the object; an image processing unit for receiving the object of the image, pixelating the second image, and projecting the pixelated image; a dynamic neural network (DNN) comprises a plurality of neurons, wherein each neuron comprises electronic components and/or software codes for generating an electrical output, wherein each pixel of the projected pixelated image is mapped onto a neuron of the plurality of neurons such that an active pixel generates the electrical output, the DNN configured for receiving the projected pixelated image, wherein each pixel of the projected pixelated image is mapped onto a neuron of the plurality of neurons such that an active pixel generates an electrical output; distributing the generated electrical outputs of the neurons amongst neurons which surround the neurons generating the electrical output according to a Gaussian function; normalising the distributed electrical outputs in the neurons; thresholding the normalised distributed electrical outputs, wherein the electrical output of each neuron of the plurality of neurons is based on the distributing, the normalising and the thresholding steps; connecting adjacent neurons of the plurality of neurons based on the electrical outputs of the adjacent neurons, wherein a neuron of the adjacent neurons with a lower electrical output is unidirectionally connected to a neuron of the adjacent neurons with a higher electrical output; propagating the electrical outputs in accordance with the connections established in the connecting step; and generating a Time Series Signature (TSS) pattern by a neuron of the plurality of neurons with the highest electrical output by summing the electrical outputs; and a pattern identification unit for identifying the object based on the TSS pattern.
According to another aspect of the present disclosure there is provided a computer program product including a computer readable medium having recorded thereon a computer program for implementing any one of the methods described above.
Other aspects of the invention are also disclosed.
Brief Description of the Drawings
At least one embodiment of the present invention is described hereinafter with reference to the drawings, in which:
Fig. 1 shows a real-world application of an object recognition system;
Fig. 2 shows the components of the object recognition system of Fig. 1 ;
Fig. 3A is a first implementation of the object recognition system using a static neural network;
Fig. 3B is a flow diagram for a method for processing an image for use by the object recognition system of Fig. 3 A; Fig. 4 shows an example of centring an object on an image for use by the object recognition system of Fig. 3 A;
Fig. 5A is a static neural network of the object recognition system of Fig. 3 A; Fig. 5B shows an alternative pattern for the static neural network of Fig. 5 A;
Fig. 6 is a strand of neurons of the static neural network of Fig. 5 A;
Fig. 7A is a schematic diagram for a neuron of the static neural network of Fig.
5A;
Fig. 7B is an implementation of the neuron of Fig. 7 A on an analogue circuit; Figs. 8A and 8B are examples of Time Series Sequence (TSS) generated by the object recognition system of Fig. 2;
Fig. 9 A is a second implementation of the object recognition system using a dynamic neural network;
Fig. 9B is a flow diagram for a method for processing an image for use by the object recognition system of Fig. 9A;
Fig. 10 is the dynamic neural network of the object recognition system of Fig. 9A;
Figs. 1 1 A to 1 1C are the neurons for use by the dynamic neural network of Fig.
10;
Fig. 12 is a flow diagram for a method for use by the dynamic neural network of Fig. 10 to generate a Time Series Sequence for an object;
Figs. 13A to 13E are neuron connectivity of the dynamic neural network for performing Gaussian Filtering;
Figs. 14A to 14D are examples of an image undergoing Gaussian Filtering iterations;
Fig. 15 is a flowchart of a method for generating a layer of neurons with a vector field; and
Fig. 16 is an implementation of the layer of neurons with the generated vector field of Fig. 15; and
Fig. 17 shows a vector field of the dynamic neural network to recognise an object. Detailed Description
Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.
Fig. 1 shows a missile 1 having an image capturing unit 8 and an object recognition system (ORS) 10 for identifying objects (e.g., a jet 3 and flare counter- measures 5). The missile 1 also has a missile control unit (MCU) 11 that controls the flight path of the missile 1.
As the missile 1 is propelling toward the intended target (i.e., jet 3), the image capturing unit 8 capture images 6 of the area which the missile 1 is heading toward. At the same time, the jet 3 may deploy flare counter-measures 5 to 'trick' the missile 1 into hitting the flares 5. In such circumstances, the image 6 captured by the image capturing unit 8 includes, inter alia, the jet 3 and the flare counter-measures 5.
The ORS 10 processes the captured image 6 and identifies each object 3, 5 in the captured image. The processing of captured images 6 and recognising objects 3, 5 by the ORS 10 is described in detail hereinafter.
The MCU 1 1 uses the identification of the objects 3, 5 to determine a flight path toward the intended target jet 3. Thus, the missile 1 avoids being 'tricked' into hitting the flare counter-measures 5.
The application of the ORS 10 in Fig. 1 is an example. The ORS 10 may be deployed in other applications such as, inter alia, a robotic eye for rapid identification of objects, robotic and UAVs navigation system for recognising key features in visual scenes, mobile security systems where power and size are important considerations, remote scientific monitoring where access is limited and battery changes restrictive, etc.
Fig. 2 is an implementation of the ORS 10 including an image processing unit 12, a neural network (NN) 14 and a pattern identification unit 16. The image processing unit 12 receives a captured image 6 from the image capturing unit 8 and processes the received image 6 into an image format that the NN 14 can process. The processed image is then projected onto the NN 14.
The NN 14 processes the projected image to output a Time Series Signature (TSS). A TSS is a time-series pattern identifying an object's features and is unique to different objects.
The pattern identification unit 16 is capable of matching an identified TSS with a TSS stored in a database. In this implementation, a Polychronous Network (PCN) 16 is used. The PCN 16, which is an artificial neural network, is capable of firing a neuron indicating that a time-series pattern (i.e., the TSS) is known when the PCN 16 recognizes that the time-series pattern has been previously identified. The functionality of the PCN 16 is similar to a content addressable memory.
Alternatively, the pattern identification unit compares the received TSS with a TSS database to determine the identity of the object 3, 5.
In the present disclosure, two implementations of the ORS 10 are described. The first implementation utilises a static neural network (SNN) 14A as the NN 14, shown in Fig. 3 A. The second implementation utilises a dynamic neural network (DNN) 14B as the SNN 14, shown in Fig. 9A.
Fig. 3B shows a flow chart of a method 300 for processing of captured images 6 by the image processing unit 12. The method 300 commences with step 302 by centring object 3, 5 on the captured image 6. As mentioned hereinbefore, images 6 are captured by the image capturing unit 8 which is, for examples, an analogue camera, a digital camera, etc.
The image processing unit 12 employs a salience detection method to detect objects 3, 5 in the captured image 6 to centre the object 3, 5 in an image. The salience detection method processes the captured image 6 to recognise areas in the image 6 belonging to an object 3, 5. The salience detection method determines areas of activities in a captured image 6 by transforming each object 3, 5 into a circle which the image processing unit 12 determines as the object 3, 5.
Fig. 4 shows the result of the image processing unit 12 employing a salience detection method to identify areas in the image 6 belonging to objects 3, 5 and creating separate images 6A to 6D of each object 3, 5 so that each separate image 6A, 6B, 6C, 6D has one object 3, 5 centred on each image 6A, 6B, 6C, 6D. Each separate image 6A, 6B, 6C, 6D is then processed by the SNN 14A and the PCN 16 to identify each object 3, 5. Some examples of salience detection methods are discussed in "A Model of Saliency- Based Visual Attention for Rapid Scene Analysis," by L. Itti, C. Koch, and E. Niebur in IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 1 1, November 1998.; "Modelling Visual Attention via Selective Tuning," by J.K. Tsotsos, S.M. Culhane, W.Y.K. Wai, Y.H. Lai, N. Davis, and F. Nuflo in Artificial Intelligence, vol. 78, no. 1-2, pp. 507-545, Oct. 1995, the contents of which are incorporated herein by reference in their entirety. Alternatively, the salience detection method is performed by the SNN 14 A.
In an alternative implementation, the image processing unit 12 in conjunction with a salience detection method and the image capturing unit 8 capture multiple images 6A, 6B, 6C, 6D with each captured image 6A, 6B, 6C, 6D having an object 3, 5 centred in the captured image 6A, 6B, 6C, 6D. In this implementation, a laser targeting system determines the range of an object and a mechanical system moves the image capturing unit 8 until the object 3, 5 is centred in each captured image 6 A, 6B, 6C, 6D. Processing continues at step 304.
At step 304, each image 6A, 6B, 6C, 6D is pixelated such that each pixel can be mapped onto a neuron of the SNN 14A. A one-to-one mapping of a pixel to a neuron of a neural network is called retinotopical mapping. The pixels of a pixelated image can be classified into active and passive pixels. Active pixels are pixels which represents a part of an object's feature. Passive pixels are pixels that do not represent any part of an object's feature. For example, in a black and white image of a fish, the pixels representing the outline of the fish are black whilst the pixels representing the background of the image are white. In this example, the black pixels are the active pixels and the white pixels are the passive pixels.
Before each of the images 6A, 6B, 6C, 6D is pixelated, the images 6A, 6B, 6C, 6D may be filtered by a high-pass filter to accentuate the outline of the object 3, 5 to be identified. The neurons and other details of the SNN 14A are described further hereinafter in relation to Figs. 5 to 8. The method 300 then advances to step 306.
At step 306, the pixelated image of step 304 is projected onto the SNN 14 A.
Before the pixelated image is projected onto the neurons of the SNN 14 A, the pixelated image can be further processed by other filters (e.g., high-pass filter, Gabor filter, etc.) to further accentuate the features of the captured image 6, 6A, 6B, 6C, 6D.
In a hardware implementation, the image processing unit 12 outputs either a current or a voltage for each active pixel, whilst a passive pixel has no output.
In a software implementation, the image processing unit 12 outputs a "1" for an active pixel and a "0" for a passive pixel. The method 300 concludes at the completion of step 306.
Fig. 5 A shows a SNN 14A for processing the pixelated image generated by the output of the image processing unit 12. The SNN 14A comprises a number of neuron strands 501 and a summing neuron 504. Each strand 501 comprises a number of neurons 502, connected together by connections 503, being arranged radially as shown in Fig. 5A. Fig. 5B shows an alternative pattern for arranging the strands 501.
A neuron 502 is a circuit or a software program for generating a pulse if an active pixel is mapped onto the neuron. The neurons 502 used by the SNN 14A are spiking neurons, which means that each neuron 502 outputs a bi-stable pulse (e.g., a high or a low voltage/current output, a binary "1" or "0" output). Implementations of the neurons 502 are described in detail in relation to Fig. 7.
Fig. 6 shows a connection of one strand 501 of the plurality of neurons 502 connected to a central neuron 505 and a summing neuron 504 of the SNN 14A of Fig. 5 A. One end of each of the plurality of neuron strands 501 is connected to the summing neuron 504. In the SNN 14A depicted in Fig. 5A, the neuron located at the periphery of the SNN 14A is connected to the summing neuron 504. The neuron located at the periphery of the SNN 14A is referred to as 502P hereinafter. The other end of the plurality of neuron strands 501, which in the SNN 14A depicted in Fig. 5A is located at the centre of the SNN 14A, is then connected to a centre neuron 505. Each neuron 502, 504, 505 is connected to another neuron 502, 504, 505 by the connection 503, which is a collection of wires for connecting the inputs and outputs of corresponding neurons 502, 504, 505.
When pixels of a pixelated image are projected onto the SNN 14A, neurons 502 corresponding to active pixels (i.e., pixels with output of "1") are activated. The operation of the SNN 14A is governed by a time-clock (not shown) such that, every At, an activated neuron 502 propagates the activation to a subsequent neuron on the strand 501. In the SNN 14A of Fig. 5 A, the neuron activation is propagated to the periphery neurons 502P. Thus, after ηΔί, the periphery neuron 502P is activated and the summing neuron 504 sums the total number of periphery neurons 502P activated after ηΔΐ. The series of sum of activated periphery neurons 502P at different intervals of At is called a Time Series
Signature (TSS).
For example, in Fig. 5 A, a neuron 510 is turned on as the corresponding pixel is active. After lAt, as determined by a time clock (not shown), neuron 512 is activated.
After 5 At, the periphery neuron 514 is activated. At 5 At, the summing neuron 504 sums the total number of the periphery neurons 502P being activated at time 5At.
In an alternative implementation, the centre neuron 505 is omitted and the plurality of neuron strands 501 is not connected at the centre of the SNN 14 A. This implementation does not affect the output TSS materially.
In another alternative implementation, the summing neuron 504 is located at the central neuron 505 and the propagation of neuron activation is in the opposite direction
(i.e., toward the centre of the SNN 14A). The TSS generated by such an implementation is the reverse of the TSS generated by the SNN 14A of Fig. 5A.
Fig. 7A is a schematic diagram of the neuron 502 including an input SI for receiving a previous neuron's pulse output p, a pixel input S2 for receiving a signal of a corresponding pixel, a threshold input value Y, a control signal W and a pulse output p.
The inputs SI and S2 are either a voltage or a current for a hardware implementation or a binary "0" or "1" for a digital implementation. A pulse output p is generated if either input
SI or S2 is greater than the threshold input value Y. The time taken by the neuron 502 to output a pulse output p when either input SI or S2 changes is modulated by the control signal W. The value of the user-determined control signal W determines the response time of the neuron 502. This implementation is called a leaky, integrate and fire (LIF) neuron.
The pulse output p is used as an SI input for a subsequent neuron 502 to effect the propagation of neuron activation.
Fig. 7B is an analogue implementation of the neuron 502 including two neuron activation circuits 702 and 704 and a comparator circuit 710. Each of the neuron activation circuits 702, 704 comprises a MOSFET switch 703 and an SR (Set-Reset) latch 706 and
708, respectively. Each of the SR latches 706 and 708 comprises an S input, an R input and a Q output. The S input is connected to a previous neuron output SI for the SR latch 706, whilst the S input for the SR latch 708 is connected to a corresponding pixel for the neuron 502. The R input is connected to the output of the comparator circuit 710. The Q output is connected to the MOSFET switch 703.
If either input SI or S2 is activated (i.e., either a previous neuron's pulse output p or an active pixel is received, respectively), the Q output is activated (i.e., a "1" output) and the MOSFET switch 703 is activated thereby allowing current X to flow and at voltage node Vmem, the currents X and W are added. In the circuit implementation, the current W flows out of the voltage node Vmem and, hence, the net current is X - W. The control signal W is implemented as a leakage current such that input current X persists on the voltage node Vmem for a limited amount of time.
The comparator circuit 710 comprises a comparator 712 having inputs from a threshold value Y and the voltage node Vmem and associated comparator circuit. The comparator 712 compares the value of Vmem and Y. If the value of Vmem is higher than Y, a pulse output p is generated. Otherwise, no pulse output p is generated. The pulse output p is connected to a Reset MOSFET switch 708 and the R input of the SR latches 706, 708 so that a generated pulse output p resets the voltage node Vmem to zero and reset the SR latches 706, 708. The voltage node Vmem and the outputs Q are therefore reset to zero after the comparator 712 outputs a pulse output p to ensure the Vmem and the SR latches are prepared for the next propagation of the neurons activation. The
implementation of Fig. 7B is one example of many possible implementations to implement a LIF neuron in analogue circuits.
In a digital implementation, the SNN 14A is formed by a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC). Hence, the LIF neuron can be implemented by programming the FPGA or the ASIC as an adder circuit. The equation for implementing the LIF neuron is:
, (X - W) . (X - W)
V mem = v mem z~' + ^ ^ '- ζ-' + ^ '- if Vmem > Y, then p = 1 , else p = 0 (eqn. 1 )
C is a constant scaling factor that can be implemented by a Look-Up Table (LUT). The LUT lists all possible divisions with associated results such that the program executed by the FPGA refers to the LUT to output a result for the division by C. Y is the threshold input value, p is the output of the eqn. 1„ X is either the pixel input value or an input from a pulse output p of a previous neuron 502 and W is a user-determined leak term that determines the responsiveness of the neuron 502. The pulse output p is generated with a compare function.
Thus, in the digital implementation, the image processing unit 12 outputs a predetermined bit stream format to be input to the FPGA or the ASIC via a Serial
Peripheral Interface (SPI) or the like. The FPGA or the ASIC is then able to match the pixels to the corresponding neurons 502 based on the received bit stream from the image processing unit 12.
Fig. 7C shows an implementation of the summing neuron 504. The summing neuron 504 is similar to the neurons 502, except the summing neuron 504 has no comparator and associated pulse generation. The summing neuron 504 includes n number of neuron activation circuits 740 (i.e., 740A, 740B,..., 740N), a capacitor C and a leak current nW.
The number n of neuron activation circuits 740 corresponds to the number of periphery neurons 502P connected to the summing neuron 504. Each neuron activation circuit 740 includes an SR latch 742, a MOSFET switch 744 and a current X. The SR latch 742 comprises an S input, an R input and a Q output. The S input is connected to one of the periphery neurons 502P, whilst the R input is connected to a global clock (not shown) that periodically sends a reset signal to the SR latches 742. The SR latches 742 are reset to prepare the SR latches 742 to sum the neuron activation of the periphery neurons 502P at the next At. The Q output is connected to the MOSFET switch 744.
If input S is activated (i.e., a previous neuron's pulse output p is received), the Q output is activated (i.e., a "1" output) and the MOSFET switch 744 is activated thereby allowing current X to flow. At voltage node Vmem, the currents X from the number n of activated periphery neurons 502P are added on the capacitor C. Thus, Vmem gives the TSS output of the DNN 14B. The current nW is a leakage current similar to that described in the LIF neuron of Fig. 7B hereinbefore and is determined such that summed currents X persists on the voltage node Vmem for a limited amount of time such that the sum of currents X has disappeared on a subsequent summing of currents X at the next At.
Time normalization is an additional step that can be performed when the pixelated image is projected onto the SNN 14A, and is used to normalize different size images on the SNN 14A in order to generate normalised TSS. To normalize time, the total current drawn by the neurons 502 when the pixelated image is projected onto the SNN 14A is measured. Alternatively, the current drawn by each neuron 502 is individually monitored.
A large image translates to a high activation level from the neurons 502. In such circumstances, the propagation of neurons activation along the strands 501 of the SNN 14A needs to be accelerated by decreasing the level of the leak current W. On the other hand, a small image translates to a low activation level from the neurons 502. In such circumstances, the leak current W is increased to slow down propagation of neuron activation. Thus, the measurement of neurons activity is proportional to the current drawn by the neurons 502.
As mentioned hereinbefore, the Time Series Sequence (TSS) is the output of the summing neuron 504. Figs. 8A and 8B are examples of the TSS generated by the summing neuron 504. The TSS is unique to different objects, is easy to store and is an efficient way of representing 2D images in hardware. As shown in Fig. 8A, rotating an object produces very similar TSS. Similarly, Fig. 8B shows that similar objects with different sizes also generate similar TSS. Hence, the recognition of objects by the ORS 10 is not dependent on the object's rotation or scale. The TSS generated by the SNN 14A is then input to the pattern identification unit 16, which in this implementation is a PCN.
The PCN 16 is a pattern identification unit capable of matching an identified pattern with a pattern stored in a database of the PCN 16. Each pattern stored in the database is associated with a particular object. In one implementation, the PCN 16 is an artificial neural network that is capable of learning and recalling spatio-temporal patterns (e.g., the TSS). The PCN is described in several articles, one entitled "Polychronization: Computation with Spikes" by E.M. Izhikevich and the other entitled "An analogue VLSI implementation of polychromous spiking neural networks" by Runchun (Mark) Wang et al., the contents of which are incorporated herein by reference in their entirety.
The PCN 16 functions by learning patterns of TSS and identifying that a TSS pattern presented to the PCN 16 is associated with a previously learned TSS pattern. The PCN can either be trained by presenting the PCN 16 with the TSS of the object to be learned (i.e., supervised learning), or the PCN 16 can learn unsupervised. The supervised method is favoured due to the speed with which patterns can be learned. The unsupervised method is not as accurate as the supervised method. However, the unsupervised method allows an autonomous apparatus (e.g., autonomous vehicle, autonomous robot, etc.) to learn independently.
Fig. 9A is an ORS 10 using the DNN 14B. The DNN 14B is capable of detecting an object 3, 5 from a captured image 6 without having to centre the object 3, 5 on the image 6. Further, the DNN 14B may utilise non-spiking neurons which allows a more detailed image to be processed.
Fig. 9B shows a flow chart of a method 900 for processing of captured images 6. The method 900 commences with step 902 where the captured image 6, 6A, 6B, 6C, 6D is pixelated such that each pixel can be mapped onto a neuron 902 of the DNN 14B. As mentioned hereinbefore, images 6 are captured by the image capturing unit 8 which is, for examples, an analogue camera, a digital camera, etc. Additionally, captured images 6, 6A, 6B, 6C, 6D may be further processed (e.g., high-pass filter, Gabor filter) before or after pixilation. The further processing is performed to accentuate particular features of the captured image 6, 6A, 6B, 6C, 6D (e.g.. horizontal lines, vertical lines, diagonals, etc.). Step 902 then proceeds to step 904.
At step 904, the image processing unit 12 outputs the pixels of the pixelated image so that each pixel can be mapped onto a neuron 902 of the DNN 14B. In a hardware implementation, each active pixel is represented by either a current or a voltage and a passive pixel has no output. In a software implementation, each active pixel is represented by a "1", whilst a passive pixel is represented by a "0". Each pixel may also be
represented by a binary word (e.g., a binary number of more than 1 digit) if non-spiking neurons are utilised by the DNN 14B. The method 900 concludes at step 904.
Fig. 10 shows an example of a grid like two-dimensional (2-D) sheet of neurons 902 of the DNN 14B. No shape is imposed on the 2-D sheet of neurons 902, unlike the SNN 14A. A projected pixelated image, which is the output of the method 900, is mapped onto the 2-D sheet of neurons 902.
Fig. 1 1 A shows a schematic diagram of a neuron 902. The neuron 902 includes a pixel input X, a threshold value Y and a leakage current W.
In a hardware implementation, the pixel input X is either a voltage or a current for active pixels. The threshold value Y is either a predetermined current or a predetermined voltage. If the pixel input X is greater than the threshold value Y, a pulse output p is generated. The width of the pulse p is predetermined. The time between a change in the pixel input X and a pulse output p being generated is modulated by the leakage current W. The leakage current W is controlled external to the neuron 902 and is not shown here. The neuron 902 is called a leaky, integrate and fire (LIF) neuron. There are two types of neuron 902 that can be used by the DNN 14B: a spiking neuron 902 A and a non-spiking neuron 902B.
Fig. 1 1 B is an analogue circuit of a spiking neuron 902 A having a pixel input X, a threshold value Y and a leakage current W. The pixel input X is a current, the threshold value Y is a voltage, the leakage current W is a current and Vmem is a voltage node where the currents of the pixel input X and the leakage current W are added. The operation of the spiking neuron 902A is similar to the operation of the neuron 502 as described in relation to Fig. 7B. Fig. 1 IB is one of many implementations of a LIF neuron.
In digital implementation (e.g., on an FPGA) the LIF neuron can be implemented as per the equation 1 , as described hereinbefore.
Fig. 1 1 C is an analogue circuit of a non-spiking neuron 902B having a pixel input A and a capacitor C. The pixel input A is a current that is received from a corresponding pixel and is input into the capacitor C. A leakage current B draws current from the capacitor C. The current B, similar to the current W of Fig. 1 IB, is set external to the neuron 902 and determines the rate at which voltage node Vmem charges. The current A is the sum of currents received from other neurons 902 connected to the neuron 902. The currents of the neurons 902 in the analogue implementation of the DN 14B are added by summing the currents at a voltage node Vmem. The voltage node Vmem, which is the sum of currents A and B, is the output of the non-spiking neuron 902B.
The digital implementation of the non-spiking neuron 902B is the same as the spiking neuron 902A but without the comparison function (i.e., the "if statement in equation 1 is removed).
Fig. 12 is a flow diagram of a method 1200 for the DNN 14B to generate a TSS for the objects 3, 5 of the captured image 6. The method 1200 commences at step 1202 by mapping the projected pixelated image of the image processing unit 12 onto a first layer of 2-D sheet of neurons 902 of the DNN 14B. The mapping of the pixels onto the first layer of neurons 902 transforms the pixelated image into activation of neurons 902. Step 1202 then proceeds to step 1204.
At step 1204, the activated neurons 902 are filtered using a Gaussian function or an approximate Gaussian function. The Gaussian filtering operation is used to effectively 'blur' the pixelated image that has been projected onto the neurons 902.
Gaussian Filtering on the DNN 14B is performed in two steps. The first step is to retinotopically mapp the first layer of neurons 902 to a second layer of neurons. The second step is to distribute the current in each neuron in the second layer to multiple surrounding neurons in the second layer. The connections between each neuron in the second layer to the multiple surrounding neurons in the second layer are weighted by the distance between the neuron and each of the surrounding neurons. Thus, Gaussian Filtering is performed by distributing the current of a neuron in the second layer onto multiple surrounding neurons in the second layer depending on the distance-weighted scale.
Fig. 13A shows parallel pathways between the neurons 902 in the first layer to the neurons 910 in the second layer. The neurons 902 employed can be either a spiking neuron 902 A or a non-spiking neuron 902B. Similarly, the neurons 910 employed can be either a spiking neuron 91 OA or a non-spiking neuron 910B.
In this implementation, each neuron 902 in the first layer is retinotopically mapped to neurons 910 in the second layer with connectivity that is weighted with the inverse of distance to effect the first step of the Gaussian Filtering process. Further, each neuron 902 in the first layer is further connected to multiple neurons 910 in the second layer to effect the second step of the Gaussian Filtering process. This implementation accelerates the first iteration of the Gaussian Filtering as the two steps are performed simultaneously. The equation for connecting each neuron 902 to neurons 910 is: (eqn. 2)
Cw ~ d where Cw is the connection weight and d is the distance between neurons 902 to neurons 910. Thus, Cw determines the amount of current input to the neurons 910 from the neuron 902.
Fig. 13B shows an example implementation of the parallel connectivity between neurons 902 and neurons 910 using non-spiking neurons 902B to implement a Gaussian Filtering. Neuron 902 spreads the current output to adjacent neurons 910. The size of the transistors M of neurons 910 that sink the current from neuron 902 is scaled with distance, and neurons 910 that are a distance d have transistor sizes that are scaled by the distance d. Thus, the current/value of a neuron 902 is spread out over a number of neurons 910 resulting in a 'blurring' of the pixelated image. The distributed current in neurons 910 is then returned to each of the neurons 910 itself by transistor RT, which completes a first iteration of the Gaussian Filtering. Each of the returned currents is further processed by the next iteration of Gaussian Filtering.
Fig. 13C shows an example implementation of a neuron 910 in the second layer using spiking neurons 902A. The circuit of Fig. 13C includes neuron activation circuits 932 and 934, 936 which correspond to neuron 902 and surrounding neurons 910, respectively. Each neuron activation circuit 932, 934, 936 functions in the same way as described in relation to neuron activation circuit 702, 704 of Fig. 7B. In the circuit of Fig. 13C, the current X of neuron activation circuit 934 is the current from a corresponding retinotopically mapped neuron 902. The currents X of neuron activation circuits 934 and 936 are currents from surrounding neurons 910 which have been scaled according to the distance d from the neuron 910. The overall function of the circuit of Fig. 13C is the same as the circuit of Fig. 7B.
Figs. 13D and 13E show a neuron 910 connected to multiple surrounding neurons 910 to approximate the Gaussian filtering. As an illustration, neuron 940 is the neuron 910 which current is to be distributed and neurons 941 are the surrounding neurons 910. Fig. 13D shows insufficient connectivity to achieve Gaussian filtering. Fig. 13E shows adequate connectivity between neurons 910 to achieve Gaussian filtering. The minimum connectivity required to achieve the Gaussian filtering is 8 connections per neuron 910. If the connectivity is greater than 8 connections, the connectivity must spread out from the central neuron 910 in a circular manner. Step 1204 then proceeds to step 1206.
At step 1206, normalization is performed on the neurons 910. As shown in Fig.
13 (which is a collective reference to Figs. 13A to 13E), current is spread from a neuron 910 to surrounding neurons 910. The spreading of current from each neuron 910 to multiple surrounding neurons 910-reduces the intensity of the current at neuron 910, effectively 'blurring' the image. To prevent the currents from getting too small as further Gaussian Filtering is performed, a normalization operation is performed. The
normalization operation determines the maximum current (or maximum voltage Vmem) in any of the neurons 910 and divides the values of all the currents (or the voltage node Vmem) in the other neurons 910 by the determined maximum value. Thus, the maximum current level is restored to unity.
The implementations of the Gaussian filtering as per Figs. 13B and 13C are costly because of the requirement to measure and compare the current of each neuron 910 during the normalization process. Alternatively, the normalization process can be implemented in a circuit by finding a local maximum to determine the maximum current. In an alternative implementation, the average current/voltage could be determined and all the
currents/voltages of the neurons 910 are scaled based on the determined average current/voltage. The average current can be determined by measuring the current drawn from the power supply, dividing by the number of neurons 910 and scaling by a predetermined factor. Step 1206 proceeds to step 1208.
At step 1208, a thresholding operation is performed on the neurons 910. The thresholding operation compares all the current/voltage values for the neurons 910 and deactivates neurons 910 which current/voltage is smaller than a predetermined value. For spiking neurons 91 OA, the thresholding operation is performed by setting the threshold value Y. For non-spiking neurons 910B, the thresholding operation is performed by subtracting the output of circuit 910B with a threshold current/voltage value. If the threshold current value is larger than the output of circuit 910B, the output of the circuit 910B falls to 0 and the neuron 910 is deactivated. Step 1208 proceeds to step 1209.
At step 1209, the number of Gaussian filtering is determined. If the Gaussian Filtering is determined to be insufficient (NO), step 1209 proceeds to step 1204 (i.e., steps 1204 to 1208 are repeated). Eight iterations of the Gaussian
filtering/normalization/thresholding operations are preferable. If less or more than 8 iterations of Gaussian Filtering are performed, the Gaussian Filtering may produce a region of activity not of a circular nature or produce several regions of activity. The Gaussian filtering produces a circular region of neuron activation around an object 3, 5 in the image 6, 6A, 6B, 6C, 6D. The repetition of these operations ensures that only one object 3, 5 is focused on. By changing the number of iterations of the Gaussian Filtering, however, multiple object recognition is possible by determining local maximums in the
normalization procedure rather than a global maximum. Using local maximums, different regions of the captured image 6 are normalized and being threshold by different values and therefore are not deactivated as would be the case if only a single image is focused on.
Figs. 14A to 14D show a pixelated image (i.e., regions of neurons activation) going through different number of Gaussian filtering iterations. Fig. 14A shows a pixelated image before undergoing Gaussian filtering having multiple dark spots. Fig. 14B shows a pixelated image after undergoing two iterations of Gaussian filtering. As can be seen in Fig. 14B, there are two dark spots (i.e., regions of activity) which can be used to perform simultaneous object recognition. If more than one object is to be identified in a captured image 6, then the DNN 14B can inhibit the activity of the neurons 14B that were connected in the vector field. Thus, the Gaussian Filtering operation results in another object being focused on. Different regions of activity can be used to inhibit areas of the image in subsequent iterations of recognition until the entire scene is recognised.
Fig. 14C shows an intermediate step where the region of activity is not yet circular. The region of activity needs to be circular in order to obtain a rotation/scale invariant TSS (i.e., a TSS pattern not affected by rotation or scaling of the object 3, 5). Fig. 14D shows a pixelated image after undergoing eight iterations of Gaussian filtering, normalisation and thresholding. As can be seen in Fig. 14D, the region of activity is circular.
If the Gaussian Filtering is determined to be sufficient (YES), step 1209 proceeds to step 1210.
At step 1210, a second layer of neurons 902 with vector field is generated. Fig. 15A is a flowchart of a method 1500 for generating the second layer of neurons 902 with the vector field. The method 1500 commences at step 1502 where the Gaussian filtered regions of neuron activations are further filtered using a Gaussian function. The further Gaussian Filtering performed at step 1502 is for accentuating the current distribution of the neurons 902 to get directionality for the vector field to be generated.
Fig. 15B shows the current distribution in the neurons 902 of the DNN 14B after Gaussian Filtering at step 1502 is performed. The z-axis of Fig. 15B shows the current level, the plane of the x-y axes is the 2-D sheet of neurons 902. As can be seen in Fig. 15B, the centre of the object is represented by a maximum current 1502, whilst the remaining parts of the object is represented by decreasing amount of current as compared to the maximum current 1502. If Gaussian Filtering is not performed at step 1502, the current distribution of neurons 902 is less like a little mountain and more like a mesa (i.e., a flat top). The process continues at step 1504.
At step 1504, the required vector field is determined by connecting neurons 902 to other neurons 902 having larger current/voltage. For example, according to the current distribution of Fig. 14E, the neurons 902 connect toward the centre of the object where the current is the largest such that the propagation of the neuron activation according to the vector field is toward the centre of the object to be identified. Once a vector field is determined, the method 1500 proceeds to step 1506.
At step 1506, a second 2-D sheet of neurons 902 with a vector field, which is determined at step 1504, is generated.
In a hardware implementation, the vector field is established by performing comparison functions of adjacent neurons 902. In accordance with the Gaussian Filtered neuron activity region of step 1502, the currents in neurons 902 are determined such that the lesser neuron 902 (i.e., neuron 902 with less current) is connected to the greater neuron 902 (i.e., neuron 902 with more current) and that current can only travel from the lesser neuron 902 to the greater neuron 902. Fig. 16A is a hardware implementation to determine the vector field between two adjacent neurons A and B. The hardware implementation of the vector field includes the neuron A having a current A, the neuron B having a current B, a comparison circuit 1610 and power supply VDD, whilst Fig. 16B is the connectivity between the two adjacent neurons A, B.
In Fig. 16A, if the current A is greater than the current B, a voltage node 1602 increases above VDD/2 and the comparison circuit 1610 generates a high output for SAB- Conversely, if the current B is greater than the current A, the voltage node 1602 decreases below VDD/2 and the comparison circuit 1610 generates a high output for SBA. Thus, the circuit of Fig. 16A outputs SAB, SBA and either SAB or SBA is high at a time.
Fig. 16B is an example of an implementation for the connectivity of the neurons A and B having a first unidirectional connection from neuron A to B with a switch SWBA, and a second unidirectional connection from neuron B to A with a switch SWAB. If the output SAB is high, the switch SWAB is closed and connection from neuron B to neuron A is established, otherwise (i.e., SBA is high), the switch SWBA is closed and connection from neuron A to neuron B is established.
Once connectivity between neurons 902 is determined, a second layer of 2D sheet of neurons 902 is generated by resetting the currents in the neurons 902 (i.e., currents of the neurons 902 are set to zero) and leaving the connectivity intact. Step 1210 proceeds to step 1212.
At step 1212, the originally captured image 6 is input to the second layer of 2-D sheet of neurons 902 with vector field of step 1210. Neurons 902 that are not connected to other neurons 902 (i.e., neurons having no current) do not propagate their activation while neurons 902 that are connected, propagate the neuron activation according to the vector field (i.e., effectively toward the central neuron). Fig. 17 is a high-level view of the neuron activation propagation for an object A. Step 1212 proceeds to step 1214. At step 1214, a TSS is generated by the central neuron similar to that of the SNN 14A. The generated TSS is then processed by the PCN as described hereinbefore.
Industrial Applicability
The arrangements described are applicable to the computer and data processing industries and particularly for industries requiring automated rapid, accurate object identification.
The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.
In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not "consisting only of. Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings.

Claims

CLAIMS:
1. A method of identifying an object by a battery-powered apparatus, the method comprising:
capturing a first image of the object by an image capturing unit;
transforming the first image to centre the object in a second image;
pixelating the second image;
projecting the pixelated image onto a static neural network (SNN), the SNN comprises
a plurality of strands of neurons and a summing neuron, wherein each neuron of the plurality of strands of neurons comprises electronic components and/or software codes for generating a pulse output and the summing neuron comprises electronic components and/or software codes for summing pulse outputs,
wherein the neurons in each strand are connected together and the plurality of strands of neurons are radially arranged,
wherein the neuron at one end of each of the plurality of strands of neurons is connected to the summing neuron,
wherein each projected pixel of the projected pixelated image is mapped onto a neuron of the plurality of strands of neurons such that an active pixel of the projected pixelated image generates the pulse output,
wherein the pulse output in each of the plurality of strands is propagated toward the neuron connected to the summing neuron, and
wherein the summing neuron sums the pulse output of the neurons connected to the summing neuron;
generating a Time Series Signature (TSS) pattern by the summing neuron based on the sum of the pulse output; and
identifying of the object by a pattern identification unit based on the generated TSS pattern.
2. A battery-powered apparatus for identifying an object, the apparatus comprising: an image capturing unit for capturing a first image of the object;
an image processing unit for receiving the object of the first image, transforming the first image to centre the object in a second image, pixelating the second image, and projecting the pixelated image;
a static spiking neural network (SNN) comprises a plurality of strands of neurons and a summing neuron, wherein each neuron of the plurality of strands of neurons comprises electronic components and/or software codes for generating a pulse output and the summing neuron comprises electronic components and/or software codes for summing pulse outputs, wherein the neurons in each strand are connected together and the plurality of strands of neurons are radially arranged, wherein the neuron at one end of each of the plurality of strands of neurons is connected to the summing neuron, the SNN configured for
receiving the projected pixelated image, wherein each projected pixel of the projected pixelated image is mapped onto a neuron of the plurality of strands of neurons such that an active pixel of the projected pixelated image activates the neuron, and wherein the neuron activation in each of the plurality of strands is propagated toward the neuron connected to the summing neuron;
summing the pulse output of the neurons connected to the summing neuron; and
generating a Time Series Signature (TSS) pattern by the summing neuron based on the sum of the pulse output; and
a pattern identification unit for identifying the object based on the generated TSS pattern.
3. A method of identifying an object by a battery-powered apparatus, the method comprising:
capturing an image of the object by an image capturing unit;
pixelating the image;
projecting the pixelated image onto a dynamic neural network (DNN), the DNN comprises a plurality of neurons, wherein each neuron comprises electronic components and/or software codes for generating an electrical output, wherein each pixel of the projected pixelated image is mapped onto a neuron of the plurality of neurons such that an active pixel generates the electrical output;
distributing the generated electrical outputs of the neurons amongst neurons which surround the neurons generating the electrical output according to a Gaussian function,
normalising the distributed electrical outputs in the neurons;
thresholding the normalised distributed electrical outputs, wherein the electrical output of each neuron of the plurality of neurons is based on the distributing, the normalising and the thresholding steps; connecting adjacent neurons of the plurality of neurons based on the electrical outputs of the adjacent neurons, wherein a neuron of the adjacent neurons with a lower electrical output is unidirectionally connected to a neuron of the adjacent neurons with a higher electrical output; propagating the electrical outputs in accordance with the connections established in the connecting step;
generating a Time Series Signature (TSS) pattern by a neuron of the plurality of neurons with the highest electrical output by summing the electrical outputs; and
identifying the object by a pattern identification unit based on the generated TSS pattern.
4. A battery-powered apparatus for identifying an object, the apparatus comprising: an image capturing unit for capturing an image of the object;
an image processing unit for receiving the object of the image, pixelating the second image, and projecting the pixelated image;
a dynamic neural network (DNN) comprises a plurality of neurons, wherein each neuron comprises electronic components and/or software codes for generating an electrical output, wherein each pixel of the projected pixelated image is mapped onto a neuron of the plurality of neurons such that an active pixel generates the electrical output, the DNN configured for
receiving the projected pixelated image, wherein each pixel of the projected pixelated image is mapped onto a neuron of the plurality of neurons such that an active pixel generates an electrical output;
distributing the generated electrical outputs of the neurons amongst neurons which surround the neurons generating the electrical output according to a Gaussian function;
normalising the distributed electrical outputs in the neurons;
thresholding the normalised distributed electrical outputs, wherein the electrical output of each neuron of the plurality of neurons is based on the distributing, the normalising and the thresholding steps;
connecting adjacent neurons of the plurality of neurons based on the electrical outputs of the adjacent neurons, wherein a neuron of the adjacent neurons with a lower electrical output is unidirectionally connected to a neuron of the adjacent neurons with a higher electrical output;
propagating the electrical outputs in accordance with the connections established in the connecting step; and generating a Time Series Signature (TSS) pattern by a neuron of the plurality of neurons with the highest electrical output by summing the electrical outputs; and
a pattern identification unit for identifying the object based on the TSS pattern.
PCT/AU2014/000059 2013-01-30 2014-01-30 Real-time rotation, shift, scale and skew visual recognition system WO2014194345A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2014277600A AU2014277600A1 (en) 2013-01-30 2014-01-30 Real-time rotation, shift, scale and skew visual recognition system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AU2013900288A AU2013900288A0 (en) 2013-01-30 Real-time rotation, shift, scale and skew visual recognition system
AU2013900288 2013-01-30

Publications (1)

Publication Number Publication Date
WO2014194345A1 true WO2014194345A1 (en) 2014-12-11

Family

ID=52007290

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2014/000059 WO2014194345A1 (en) 2013-01-30 2014-01-30 Real-time rotation, shift, scale and skew visual recognition system

Country Status (2)

Country Link
AU (1) AU2014277600A1 (en)
WO (1) WO2014194345A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468023A (en) * 2016-01-20 2016-04-06 谭圆圆 Unmanned aerial vehicle control method, device and system
CN105739520A (en) * 2016-01-29 2016-07-06 余江 Unmanned aerial vehicle identification system and identification method thereof
CN106780546A (en) * 2016-12-06 2017-05-31 南京航空航天大学 The personal identification method of the motion blur encoded point based on convolutional neural networks
US10019907B2 (en) 2015-09-11 2018-07-10 Qualcomm Incorporated Unmanned aerial vehicle obstacle detection and avoidance
WO2018218640A1 (en) 2017-06-02 2018-12-06 SZ DJI Technology Co., Ltd. Systems and methods for multi-target tracking and autofocusing based on deep machine learning and laser radar
US10452951B2 (en) 2016-08-26 2019-10-22 Goodrich Corporation Active visual attention models for computer vision tasks

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5239593A (en) * 1991-04-03 1993-08-24 Nynex Science & Technology, Inc. Optical pattern recognition using detector and locator neural networks
US5850470A (en) * 1995-08-30 1998-12-15 Siemens Corporate Research, Inc. Neural network for locating and recognizing a deformable object
US20020076088A1 (en) * 2000-12-15 2002-06-20 Kun-Cheng Tsai Method of multi-level facial image recognition and system using the same
US20040002928A1 (en) * 2002-06-27 2004-01-01 Industrial Technology Research Institute Pattern recognition method for reducing classification errors
US20100128975A1 (en) * 2008-11-24 2010-05-27 Toyota Motor Engineering & Manufacturing North America, Inc. Method and system for object recognition based on a trainable dynamic system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5239593A (en) * 1991-04-03 1993-08-24 Nynex Science & Technology, Inc. Optical pattern recognition using detector and locator neural networks
US5850470A (en) * 1995-08-30 1998-12-15 Siemens Corporate Research, Inc. Neural network for locating and recognizing a deformable object
US20020076088A1 (en) * 2000-12-15 2002-06-20 Kun-Cheng Tsai Method of multi-level facial image recognition and system using the same
US20040002928A1 (en) * 2002-06-27 2004-01-01 Industrial Technology Research Institute Pattern recognition method for reducing classification errors
US20100128975A1 (en) * 2008-11-24 2010-05-27 Toyota Motor Engineering & Manufacturing North America, Inc. Method and system for object recognition based on a trainable dynamic system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FARAWAY, JULIAN ET AL.: "Time series forecasting with neural networks: a comparative study using the air line data.", JOURNAL OF THE ROYAL STATISTICAL SOCIETY: SERIES C (APPLIED STATISTICS), vol. 47, no. 2, 1998, pages 231 - 250 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10019907B2 (en) 2015-09-11 2018-07-10 Qualcomm Incorporated Unmanned aerial vehicle obstacle detection and avoidance
CN105468023A (en) * 2016-01-20 2016-04-06 谭圆圆 Unmanned aerial vehicle control method, device and system
CN105739520A (en) * 2016-01-29 2016-07-06 余江 Unmanned aerial vehicle identification system and identification method thereof
US10452951B2 (en) 2016-08-26 2019-10-22 Goodrich Corporation Active visual attention models for computer vision tasks
US10776659B2 (en) 2016-08-26 2020-09-15 Goodrich Corporation Systems and methods for compressing data
CN106780546A (en) * 2016-12-06 2017-05-31 南京航空航天大学 The personal identification method of the motion blur encoded point based on convolutional neural networks
CN106780546B (en) * 2016-12-06 2019-08-16 南京航空航天大学 The personal identification method of motion blur encoded point based on convolutional neural networks
WO2018218640A1 (en) 2017-06-02 2018-12-06 SZ DJI Technology Co., Ltd. Systems and methods for multi-target tracking and autofocusing based on deep machine learning and laser radar
CN109479088A (en) * 2017-06-02 2019-03-15 深圳市大疆创新科技有限公司 The system and method for carrying out multiple target tracking based on depth machine learning and laser radar and focusing automatically
EP3613208A4 (en) * 2017-06-02 2020-06-24 SZ DJI Technology Co., Ltd. Systems and methods for multi-target tracking and autofocusing based on deep machine learning and laser radar
US11006033B2 (en) 2017-06-02 2021-05-11 SZ DJI Technology Co., Ltd. Systems and methods for multi-target tracking and autofocusing based on deep machine learning and laser radar
US11283986B2 (en) 2017-06-02 2022-03-22 SZ DJI Technology Co., Ltd. Systems and methods for multi-target tracking and autofocusing based on deep machine learning and laser radar

Also Published As

Publication number Publication date
AU2014277600A1 (en) 2015-09-17

Similar Documents

Publication Publication Date Title
US11600007B2 (en) Predicting subject body poses and subject movement intent using probabilistic generative models
US11651199B2 (en) Method, apparatus and system to perform action recognition with a spiking neural network
US10902615B2 (en) Hybrid and self-aware long-term object tracking
WO2014194345A1 (en) Real-time rotation, shift, scale and skew visual recognition system
US10198689B2 (en) Method for object detection in digital image and video using spiking neural networks
Subramaniam A neuromorphic approach to image processing and machine vision
Nguyen et al. Change detection by training a triplet network for motion feature extraction
US10268919B1 (en) Methods and apparatus for tracking objects using saliency
Junejo et al. Single-class SVM for dynamic scene modeling
Junejo et al. Dynamic scene modeling for object detection using single-class SVM
Molina-Cabello et al. Neural controller for PTZ cameras based on nonpanoramic foreground detection
Zou et al. A low-power VGA vision sensor with embedded event detection for outdoor edge applications
Fomin et al. Study of using deep learning nets for mark detection in space docking control images
WO2018052496A1 (en) Method for object detection in digital image and video using spiking neural networks
Kim et al. Implementation of visual attention system using artificial retina chip and bottom-up saliency map model
Jaramillo-Avila et al. Visual saliency with foveated images for fast object detection and recognition in mobile robots using low-power embedded GPUs
EP3572983A1 (en) Low dimensional neural network based architecture
Silveira et al. 3D robotic mapping: A biologic approach
Rodrigues et al. Fully convolutional networks for segmenting images from an embedded camera
Kadim et al. Training configuration analysis of a convolutional neural network object tracker for night surveillance application
Puck et al. Ensemble based anomaly detection for legged robots to explore unknown environments
Wei et al. A biologically inspired spatiotemporal saliency attention model based on entropy value
Cuperlier et al. Attention-based smart-camera for spatial cognition
Jin et al. Hierarchical template matching for robust visual tracking with severe occlusions
Alshaikh et al. Movement Classification Technique for Video Forensic Investigation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14808347

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2014277600

Country of ref document: AU

Date of ref document: 20140130

Kind code of ref document: A

122 Ep: pct application non-entry in european phase

Ref document number: 14808347

Country of ref document: EP

Kind code of ref document: A1