METHOD AND SYSTEM FOR THE DETECTION OF PERSONS
The use of vision technology in an industrial and production environment has teen common for many years. The application of these vision techniques in a public environment is still in its infancy.
The currently used vision techniques are used to record images of specific situations or events using cameras . The recorded images are then viewed and interpreted immediately or later by people.
A drawback is that operators are needed to interpret visual material, which is expensive. An object of the present invention is to obviate such a drawback.
The invention relates to a combination of a system and determined applications therefor. The system has for instance the functions of detecting people and objects, analyzing and tracking the movement of people and objects, and of recording, alerting and reporting.
The present invention relates to a method as according to claim 1.
The present invention also provides a system for detecting people, objects and/or events, comprising:
- one or more cameras for recording images of the vicinity thereof; - image processing means for processing the recorded images; and
- analysis means for automatically analyzing the processed images so as to generate the desired detection weighting, alerting and/or report. According to the present invention a system can consist of one or more cameras or other sensors, processing unit/units, method(s) of communicating with the vicinity and software.
Embodiments according to the present invention comprise for instance a combination of different cameras, sensors and measurement areas as indicated in the following figure.
Embodiments and possible applications therefor are shown in the table above. Given as example is application 8 (fire detection) . This can be applied in a production environment {for instance a machine which could catch fire) , and the application makes use of object behaviour (for instance movement of flames) . The fire can be detected using an embodiment suitable for this purpose (r;uch as multi-sensor technique to enable detection o£ both the image and the sound of an explosion and fire) .
Possible applications are:
Functional elements embodiments are:
A. Image analysis and reporting: object tracking, user interface, report
B. Image (pre)processing: movement detection background subtraction systems calibration
C. System Architecture: number and type of camera/optics number and types of processing units and networks
A. Image analysis and reporting:
A.I The possibility of incorporating knowledge concerning the situation in the image in the model of the movement so as to improve the estimation of movement (such as for instance in a queue, where you know that people move slowly in a determined direction in a determined part of the image) . A.2 The possibility of measuring determined things in specific parts of the image (passage/direction for instance, or average time spent in a particular area) and the possibility of defining these parts via a user interface (this latter is not done but is something which is taken into account in the blob-track software) .
A.3 Use of Desk Top SPI to process stored images off-line with other settings, or use of Desk Top to do further classification of events.
B. Image (pre)processing:
B.I Use of YUV as against other colour spaces, among other things to suppress light/dark variation. YUV is a
format which is provided directly by almost all cameras and is hereby also very efficient.
B.2 The possibility of learning a background image via bootstrapping, i.e. a technique for learning a full background image in a situation where the background is never wholly visible because (for instance) people are always walking in front of it. We use a histogram for this bootstrapping, whereby it is not even necessary for the background to be visible in the image for the greater part of the time as long as the background colour occurs often.
B.3 Use of a combination of optical flow and other filter techniques to make a model of the movement in the image and to detect the movement.
B.4 Adjustment of cameras on the basis of persons found up to that moment.
C.
Cl Use of 2 cameras with the same image field: IR and normal. For instance to eliminate luggage trolleys.
Applications/products
1. Counting and tracking people in and through spaces
2. Measuring queues 3. Checking direction
4. Monitoring passage/obstruction
5. Person object interaction
6. Monitoring object
7. Recognition of suspicious persons (on the basis of movement patterns, features and behaviour)
8. Detection of violence (multi-sensoring)
9. Fire detection etc.
Advantages of the invention are, among others, the reduction in the number of human operators for security at check-in desks and the like, with associated improved service and reliability and low hardware costs. Further advantages are the possibility of optimizing the lay-out of premises, in particular (covered) shopping centres and the like through analysis of the people flows. The present invention also provides the possibility of enhancing safety through the detection of suspicious objects and persons. Further advantages, features and details of the present invention will be elucidated on the basis of the following description of preferred embodiments thereof with reference to the accompanying drawing, in which: figure 1 shows a schematic representation of the preferred embodiment of the operation of the device according to the present invention; figure 2 is a schematic representation of the system shown in fig. 1 during off-line use; figures 3A-3E show further preferred embodiments of use of the system according to the present invention; figures 4A-C show a schematic top view of an application of the system in operation according to the present invention; figure 5 is a schematic top view of a so-called switchback queue arrangement wherein method and system according to the present invention can be applied; figure 6 shows a graph of the distribution of the service to passengers plotted against time obtained using the embodiment of fig. 5; figure 7 shows a graph of the waiting time in minutes until the service is provided in the embodiment of fig. 5; figure 8 is a schematic representation of a queue application of the present invention;
figures 9 and 10 show respective graphs associated with the application of fig. 8 as according to fig. 6 and 7; figure 11 shows another queue application of system and method according to the present invention; figure 12 shows a histogram associated with the application of fig. 11; figure 13 shows a graph associated with the application of fig. 11; figure 14 is a schematic representation of the possible movement routes in a space such as a station hall/ shopping centre; figure 15 shows an actual recording associated with the schematic representation of fig. 14; figure 16 shows an analysis associated with figure 15; figure 17 is a schematic representation associated with figure 15; and figure 18 is a flow diagram of the operation of the preferred embodiment of the system according to the present invention.
Described in fig. 1 is a system for:
1. people and object detection;
2. movement analysis and tracking; and
3. recording, alerting and reporting. (combination of one or more of all three) .
Camera 10 records images 11 of a scene with one or more stationary background (objects A2) and a person to be observed (object Al) . Each recorded image 11 is compared to a composite reference image 12. This reference image 12 is a representation of the more or less stationary background A2. The reference image is made up of previously recorded images. This can for instance be an average image from a past period.
Processing 13 is comparison of images 11 and 12. In its simplest form this can for instance be subtraction of the two images, so that the difference between the two images remains as resulting image 14. All that can be seen in the resulting image 14 is the difference (the person to be observed) as object Al (a so-called blob) .
Recording and processing of images 11 and 14 goes on continuously. Processing 15 is determining of numerical information from the successive images 14. It is for instance possible to determine from the position of object Al through time that object Al is one person who is moving in time. When object Al passes an (imaginary) line in the image a counter can for instance be incremented. This is indicated as example at location 17 in graph 16. This graph shows how often an object has passed during a determined period. Another example is that graph 16 shows the speed of movement of object Al.
In order to make operation more reliable, it is possible to assemble and store a plurality of reference images. For instance a reference image at a particular illumination (e.g. lighter, during the day) and a reference image with another illumination (e.g. darker, in the evening) . It is then possible to use processing 13 with both reference images (or a combination thereof) so that image 14 represents objects Al optimally. Fig. 2 shows how the system can be used off-line (so without interaction with a camera) . The recorded images shown in fig. 1 are stored in an image database 21. These can be the directly recorded images but, in order to save storage capacity, can also be optionally processed or sorted images (e.g. sorted in accordance with whether there was anything to see in the image) .
The graph already mentioned in fig. 1 (in fig. 1 16, in fig. 2 22) of occurred and numerical events is also
available in the form of a numerical database. A user 25 of the system can select a determined event on the graph via user interface 23. The selection is also used to find the associated images (with a number of images before the event and a number of images thereafter) in the database and to show these images 24 to user 25.
The user can then view the images further and optionally draw conclusions and also add these to database 21 or to graph 22. Fig. 3A shows the current situation at most so-called service centres: At a decentral location images are recorded with a number of cameras 31 of a situation (in which noteworthy events usually occur sporadically) and transmitted live over a network 32 to the central service centre where the images are viewed by operators 34 on displays 33.
Interpretation and conclusions by operators 34. Frequently images without noteworthy events taking place.
Fig. 3B shows a first application of the described system. The described system 35 receives the recorded images parallel to the "normal" display to the operators. The recorded images are analyzed automatically as specified in fig. 1 and 2 and only the relevant images (where a particular event takes place, for instance someone enters a determined area) are shown to the operator via monitor 36 (parallel to the current situation) . The information on this monitor will hereby be more relevant. This process is known as "compression" of the flow of images. A compression of 90 to 95% is normal in most practical situations (for a specific camera something from which an operator can extract useful information will take place in only 5 to 10 % of the time) .
In fig. 3C a subsequent application with the described system is placed locally at the cameras. Display monitor 36 is placed centrally.
In fig. 3D the described system is used to compress all incoming images and to show the interesting images to operator 37. Since there is less image material to view because of the compression that has taken place, fewer operators are required.
In fig. 3E the processing of the images takes place locally at the cameras, and only the interesting images are transmitted over a less heavily loaded network 38.
Different combinations and optimizations can of course be envisaged and applied. Implementation will depend on financial, functional and technical preconditions.
In fig. 4 the service time is calculated by the system as follows . The vending machine for which people are waiting (e.g. a ticket vending machine) is designated as 46 in the figure. Area 41 is the waiting area and area 42 the transaction area where the person is located who is performing the transaction with the vending machine. Persons 43, 44 and 45 are tracked by the above described image processing software. In fig. 4A there is a queue of four persons 43.
Person 44 is carrying out the transaction and person 45 is finished and leaves area 42 at time tl.
In fig. 4B person 44 leaves the transaction area at time t2. There are at that moment four people waiting (person 47 has joined the queue) . The service time is estimated as follows: number of people waiting is: four transaction time person 2: t2-tl
(estimated) waiting time at that moment: 4 x (t2-tl) (estimated) transaction time at that moment: (t2-tl) (estimated) service time at that moment:
4 x (t2-tl) + (t2-tl) = 5 x (t2-tl)
In fig. 4C person 48 leaves the transaction area. The service time is estimated as follows: number of people waiting is: three transaction time person 3: t3-t2 (estimated) waiting time at that moment: 3 x (t3-t2) (estimated) transaction time at that moment: (t3-t2) (estimated) service time at that moment:
3 x (t3-t2) + (t3-t2) = 4 x (t3-t2)
In order to measure the service time directly, it is possible to follow the person just serviced back in time in the recorded images, and to determine only this person when this person joined the queue. Reliability is increased by searching specifically for one person.
Logistics and capacity
1. Counting passengers
2. Measuring kiosk utilization
3. Check on returning passengers (automatic boarding)
Security
4. Tracking people in public space
5. Checking in underground luggage areas for deviating movements of luggage
6. Checking doors: no passageway and no suspicious movement
7. Checking direction of opening of door
8. Checking for blocking of safety doors
9. Recognition of suspicious patterns of moving people
10. Checking multi-storey car park (driving back and more than one vehicle passing through a barrier at one time)
11. Detection of suspicious stationary packages Operation
12. Measuring queues and waiting times (customs, check-in, etc.)
Luggage
13. Checking luggage (holders) for projecting items
Commerce
14. Counting people (passengers and visitors)
Using two cameras 51, 52 (fig. 5) the number of people passing is measured in determined areas in a so-called switchback queue arrangement such as is usual at the check-in at airports, on the basis of which the distribution of the number of passengers for servicing can be registered automatically (fig. 6) as well as the waiting time until service (fig. 7) . Using four cameras 82, 83, 84 and 85 connected to a processing module 86 (fig. 8) the number of people to be serviced in queue 81 is counted (camera 82) , the number of people joining the queue is registered (camera 83), the number of people serviced is registered (camera 84) and the people to be serviced from the front of the queue (camera 85) . On the basis of the transaction capacity, i.e. the number people per second or per minute who can be serviced, the so-called service time, i.e. the waiting time plus transaction time, can thus be determined. Figure 9 and 10 show the associated graphs.
In the so-called boarding process (fig. 11) the distribution of the passengers and the total (fig. 12 and 13 respectively) can be determined in similar manner using the transaction capacity and a camera 111, wherein use is made of the departure time of 10.00 hours.
In an area 140 (fig. 11) it is possible using a camera 141 on a square in a shopping centre, departure hall
of an airport and the like to determine the main direction as indicated with arrows in fig. 14. In the present case there are four main directions with possibilities of movement in both directions. With camera 144 (fig. 15) an image 150 or 151 (fig. 15) can for instance then be obtained with associated analysis (fig. 16) . Bottlenecks and jams among others events can hereby be automatically detected.
Fig. 17 also shows representations of specific analyses of the detected tracks. In the flow diagram of figure 18 is indicated from the object model and a camera how the blobs are found relative to the background model and how they are tracked, split/combined and how areas of blobs are tracked.
Fig. 19A, B, C, D, E, F show the different options: 19A two cameras at two different points in time; 19B a single camera with a single image; 19C multiple cameras with a single image; 19D multiple cameras with other sensors and a single image; 19E overlapping multiple images with two or more cameras; and
19F multiple cameras, multiple, non-overlapping images .
Fig. 20 shows the modules required for the basic principle.
Fig. 21 shows a camera application 210 at a vending machine, coffee machine or the like, wherein a specific person I is recognized and a sound signal is generated in order to provide this known person with the correct coffee or to present his/her desired product. This application can also be seen in fig. 22A, while in fig. 22B the camera observes that a group of young people approaching the camera could be interested in a new drink.
Through the co-action of two cameras 241 and 242, for instance using a mirror 243 (fig. 24), wherein one of the cameras has an infrared sensitivity, a luggage trolley can for instance be filtered out relative to a person. The SPI application software has a developed functionality with which shadow detection can be realized. Shadow detection is important because shadow areas can disrupt the measurements (shadow is seen in that case as a moving object/person) . It is generally the case that shadow pixels should not be considered as different from the background.
Shadow detection is realized for pixels where a great difference is to be found with the background model. The pixel colour is compared to the colour of the background pixel. Point of departure is that the available colour information (red, green, blue) of a shadow pixel remains unchanged and that the colour intensity decreases for a shadow pixel (e.g. the same colour red, but darker) . In addition to inspection at pixel level, the surrounding structure is then also inspected. If in the vicinity of the possible candidate shadow pixel the surrounding pixels all have the same structure as in the background model (display the same image) , of which only the intensity is darker, the pixel is then classified as shadow.