Image Fusion Systems
The present invention relates to an apparatus and method for identifying points of correspondence between multiple image sequences captured by multiple image acquisition devices each acquiring images in a different medium, such as different parts of the electromagnetic spectrum or sound waves.
Multi-sensor image fusion is the combination of images from sensors, sensitive to different physical phenomenon. The fused image can provide greater information than the individual images and as such multi-sensor image fusion is an increasingly important research area with many applications including robotics, medical imaging, manufacturing, defence and remote sensing.
An important stage in image fusion is the coregistration of images from the different sensors. A popular coregistration -approach is to identify points of correspondence (POC) between the different sensor images and use these to determine the parameters of the chosen registration transform. These points of correspondence
are typically found by looking for similar features in the different images, such as intensity contours. Local or global correlation methods are often used in this process .
A disadvantage of this approach to coregistration is that, for many combinations of sensors, there is little correlation between the images they form making it difficult to identify enough points of correspondence to fuse the images. Thus the approach is limited as, typically, the less the correlation between the individual images the greater the benefit is likely to be had by fusing the images.
It is an object of at least one embodiment of the present invention to provide apparatus and method for registering images which obviates or mitigates the disadvantages in the prior art .
It is a further object of at least one embodiment of the present invention to provide apparatus and method for registering images which, by virtue of the activity in a scene, uses the interframe differences found at similar locations in the images as points of correspondence to coregister the images.
According to a first aspect of the present invention, there is provided apparatus for automatically registering images from a plurality of image sequence acquisition devices each acquiring images in a different medium to form a single image sequence, the apparatus comprising means for combining the images by finding or locating
points of correspondence using non-static regions that appear in at least two of the images.
Preferably the medium are selected from a group comprising any region of the electromagnetic spectrum and sound. In a preferred embodiment the medium comprise visible and infrared.
Preferably the apparatus further comprises means for building region maps from the at least two images .
Preferably also the apparatus further comprises means for overlapping the region maps. Additionally the apparatus may further comprise means for matching region markers which are close to each other as points of correspondence to coregister the images.
Advantageously the apparatus further comprises means to fuse the coregistered images into a single image sequence .
According to a second aspect of the present invention there is provided an imaging system comprising a plurality of image sequence acquisition devices each acquiring images in a different medium, image registration means for combining the images by finding or locating points of correspondence using non-static regions that appear in at least two of the images, image fusion means for fusing the images using the points of correspondence and image display means for displaying the fused single image sequence.
At least one of the plurality of image sequence acquisition devices may be a passive device. Additionally at least one of the plurality of image sequence acquisition devices may be an active device.
Advantageously the plurality of image sequence acquisition devices comprise at least two sensors. Preferably the two or more sensors are selected from a group comprising video cameras, thermal infra red cameras, radar, millimetre wave radar, ground penetrating radar, ultrasound, near-infrared and ultraviolet.
Where the two or more sensors include video cameras. The video cameras may be of any recognised format, for example CCIR format colour video cameras .
Where the two or more sensors include radar the radar may be millimetre wave radar or ground penetrating radar.
Preferably the image display means is a standard colour monitor, such as Cathode Ray Tube (CRT) or Liquid Crystal Display (LCD) . Alternatively the image display means is a television, projector or head-up display system.
Preferably the imaging system includes processing means for further processing the fused image. The further processing means may carry out an automatic function. The automatic function may be quality inspection, motion detection or the setting of an intruder alarm.
Preferably, the at least two sensors directly output images in digital format.
According to a third aspect of the present invention there is provided a method of registering images comprising the steps of:
(a) finding or locating points of correspondence in multiple image sequences, the image sequences being captured by multiple image sequence acquisition devices each acquiring images in a different medium; and
(b) combining the outputs of the multiple image sequence acquisition devices into a single image sequence;
(c) characterised in that the points of correspondence are obtained using non-static regions that appear in two or more of the multiple images.
Preferably the points of correspondence are identified by building Interest Region maps from the multiple images. Additionally, the corresponding regions may be identified from the Interest Region maps. Preferably also, the corresponding region maps are overlapping and Interest Markers which are close to each other are matched as Points of Correspondence.
According to a fourth aspect of the present invention there is provided a method of finding points of correspondence in multiple images, each acquired in a different medium using non static regions that appear in all of the multiple images by;
(a) acquiring multiple images using image acquisition means;
(b) building Interest Region maps from the multiple images;
(c) identifying corresponding regions from the Interest Region maps of different images;
(d) overlapping the registered region maps from the different images; and
(e) matching Interest Markers which are close to each other as Points of Correspondence.
An example embodiment of the invention will now be described with reference to the accompanying figures in which;
Figure 1 illustrates apparatus for combining the output of multiple image sequence acquisition devices for display or further processing;
Figure 2 illustrates an example implementation of the apparatus of the present invention having a two camera image fusion device in a typical urban environment;
Figure 3 is a flow chart illustrating the processor for finding points of correspondence from different images in a two sensor implementation;
Figure 4 illustrates the first stage in the process described, wherein different region maps are generated from the image sequences; and
Figure 5 shows matching coregister Interest Region maps .
The apparatus of the present invention automatically combines the output of multiple image sequence acquisition devices into a single image sequence for display or for further processing by a machine vision system. The apparatus enables more information to be provided in a single image sequence and can be provided by any of the individual image sequence acquisition devices alone. Thus the devices are sensitive to different physical phenomenon and capture the images in different medium. Specifically, the apparatus of the present invention provides for registration of 2D multispectral images based on the moving portions of the images to be fused, as opposed to current systems where the point of correspondence is determined from static positions on the images. The apparatus of the present invention may find application in areas such as medical diagnosis, CCTV surveillance systems, security alarm systems, fire fighting, automatic inspection, surveying, aviation and wildlife watching.
Figure 1 illustrates an apparatus for combining the output of multiple image sequence acquisition devices 1 into a single image sequence which can be displayed at 2. The process involves a coregistration stage 3 and a fusion/combination stage 4.
The image sequence acquisition device described in the present invention comprises image acquisition means, means to coregister the images, means to combine or fuse the images into a single image and means to output the fuse to image sequence for display of further processing. Specifically, the image sequence acquisition devices comprise sensors that can form two-dimensional images
sequence of the scenes that they are exposed to and can output the image sequence as a digital representation.
In a preferred embodiment there is one primary sensor and one or more secondary sensors. The example embodiment described herein uses a CCIR format colour video camera as a primary sensor and an NTSC thermal infrared camera as a secondary sensor. The embodiment also uses the digitiser to convert the CCIR/NTSC format images to digital representation.
The sensors may be passive, as in the example embodiment described herein or alternatively may be active sensor such as radar. Examples of alternative image sequence acquisition devices include Millimetre Wave Radar, Ground Penetrating Radar, Ultrasound, Near-Infrared and Ultraviolet. An important aspect is that the images taken do not need to be the same size. In addition rather than use external means to digitise the images, the sensor system may directly output images in digital format .
Typically, the imaging devices will be positioned so that they overlook/observe and acquire images from the same target scene. Referring to Figure 2, a primary 5 and secondary 6 imaging device, in this case a camera, is attached to a pylon 7 and positioned to overlook a typical urban scene 8. The cameras are at similar height and point in approximately the same direction. Although the individual cameras must overlook the same scene, the cameras may be at physically different locations and may look at the scene from different elevations, rotations, and distances. As in the example embodiment, the images may be at different resolutions and it is therefore
necessary to coregister the images before they can be combined.
Registration is achieved by means of any suitable image transform. This transform may be a global transform such as the well known Projective Transform or a local transform such as the well known Elastic Deformation Transform. The Projective Transform is achieved by an electronic system that implements the following 8 parameter relationship between the input co-ordinates x = i x, y)τ and the out co-ordinates X' = ( x ', y') τ :
where a±, i = 1..8 are transform parameters. As the transform parameters are likely to change infrequently and slowly, they can be calculated in parallel more slowly than real time. The parameters of the chosen transform are determined from Points of Correspondence between the individual images. The projective transform has 8 parameters and so requires 4 Points of Correspondence to be specified. Given more than 4 Points of Correspondence, the well known Least Squares approach can be used to automatically determine the parameters.
There is often very little correlation between images formed by different imaging sensors. However, activity
in a scene is likely to yield inter frame differences in similar locations. If these locations can be identified, then they can be used as points of correspondence (POC) to coregister the images. The process of finding points of correspondence from difference images is illustrated for a two sensor embodiment in Figure 3. Referring to the flow chart of Figure 3, it can be seen that the process of finding points of correspondence between image sequences can be achieved by building Difference Region maps, through frame differencing, thresholding and region growing, finding an approximate transform then matching regions from the difference frequencies .
The first stage of the process is therefore to build Interest Region Meps from the image sequences. This stage comprises 3 steps which can be seen in Figure 4. Referring firstly to Figure 4a, in the example embodiment two reference images 9 and 10 are taken from the image sequence and the Interest Map is calculated by taking for each pel in the image the absolute difference between the earlier frame pel intensity and the later frame pel intensity. Next, according to Figure 4b, a threshold operator is applied to remove small differences, which can be attributed to noise in the sequence, thus yielding a binary image 11. Finally, referring to Figure 4, a region growing operator is applied to find all regions in the binary image and the centres of gravity (average pel location) of the regions are calculated, these are termed Interest Markers 12. This stage is applied to all the image sequences and it is important that the reference images from each sequence are taken at the same time points or very close together so that the Interest Maps correspond to the same time period. Alternatively a time
stamp may be attached to each interest marker, and only markers with similar time stamps in the different images sequence allowed to be points of correspondence. As well as or instead of using different images, moving objects in image sequences may be tracked over time and their positions at each frame recorded as interest markers.
The second stage of the process to find points of correspondence is to identify corresponding regions from the Interest Region Maps of different image sequences. This is achieved by assuming that the registration between the difference maps can be approximated by a global transform such as the projective transform. The parameters ψ that minimise the distance between the Interest Markers in one sequence xτ and their closest neighbours (under the transform) in the second sequence XR are identified. This is achieved by an electronic system that minimises the following equation:
where f(x; ψ) is the chosen transform of the point x using parameters ψ and | . || denotes the distance. Xτ is chosen to be the image sequence with the smallest number of regions, or either sequence if both yield the same number of regions . In the example embodiment the distance measure is the Cartesian distance and the electronic system conducts and exhaustive search of the quantised transform parameter space. The search is constrained given that limits of the difference in translation/rotation/scaling can easily be identified. The level of quantisation of the parameter space is a
compromise between speed and precision of the coregistration. This stage can only be carried out once sufficient interest markers are identified in the individual images to determine parameters of the chosen transform. To improve robustness, time stamps may be attached to interest markers so that only interest markers with similar time stamps may be points of correspondence .
In the example embodiment, the transform was chosen to be a rotation by θ and translation by b = (bx,by)τ; thus there are three parameters ψ = {bx,by,θ}. The search was constrained to horizontal/vertical translations of ± 50 pels in 2 pel increments and rotation of ±10° in 1° increments .
In the third and final stage of the process to find ■ Points of Correspondence, the registered region maps from the different image sequences 13 and 14 as illustrated in Figure 5a are overlapped as shown in Figure 5b. Interest Markers 15 that appear close to each other are matched as being Points of Correspondence, as illustrated in Figures 5c. By taking multiple samples over time, a number of Points of Correspondence can be recorded to improve accuracy. These can then be used to determine the transform parameters. Some difference regions may not appear in all of the image sequences due to being out of sight or invisible to some sensors. Such regions can be identified as having no matching regions, and hence discarded, as illustrated in Figure 5. Where one region is very close to two or more regions from a coregistered image, the multiple matches can be averaged to give the single region as illustrated in Figure 5. This may be
caused by problems in thresholding the Interest Map. The next step in the process is the fusion process in which the coregistered images are combined and further processed, for example by using information from corresponding areas in multiple images to control an alarm system or by combining the different images into a single image. In the fusion step, information of interest to the application is preserved from the individual images. In the example embodiment presented here, an RGB colour visual image sequence is combined with a monochrome Thermal INFRARED Image sequence and an RGB colour fused image is created by an electronic device as the following relationship between the its input and output :
where rF (rv) , gF(gv)and bF(bv)are the red, green and blue intensity components of the Fused (Visual) pel respectively and mIR is the monochrome intensity of the Thermal INFRARED pel. This has the effect of making the fused image appear similar to the colour visual image but with hot objects slightly red and cold objects slightly blue .
The fused image sequence output by the device is in a format suitable for an image display device or for further processing by a machine vision system. The image display device used in the example embodiment described
here is a standard colour monitor. Examples of alternative means of display include television, projectors and head-up display systems. Examples of further processing systems include automatic quality inspection and automatic motion detection.
The advantage of the present invention lies in the fact that there is provided an apparatus and method for automatically finding points of correspondence between images formed by sensors that are sensitive to different physical phenomenon wherein the technique assumes that temporal differences in the image sequences are likely to occur in similar places rather than rely on correlation of static features in the different images. Example results from a thermal INFRARED visual image application demonstrate that this technique can successfully be applied to find points of correspondence in areas of motion in the scene. These are the areas of most interest in some applications such as surveillance.
A further advantage of the present invention lies in the fact that the apparatus and method can be used to identify points of correspondence to coregister images using images formed by different imaging sensors where there is very little correlation.
Further modifications and improvements may be incorporated without departing from the scope of the invention herein intended.