EP2590151A1

EP2590151A1 - A framework for the systematic study of vehicular mobility and the analysis of city dynamics using public web cameras

Info

Publication number: EP2590151A1
Application number: EP11187682.7A
Authority: EP
Inventors: Pan Hui; Gautam S. Thakur
Original assignee: Technische Universitaet Berlin; Deutsche Telekom AG
Current assignee: Technische Universitaet Berlin; Deutsche Telekom AG
Priority date: 2011-11-03
Filing date: 2011-11-03
Publication date: 2013-05-08

Abstract

A method of providing a framework for a large scale monitoring, collecting, analysis, modelling and visualization of vehicular mobility over a communication network, wherein the method comprises the steps of: a) monitoring, collecting and storing a plurality of traffic snapshots from a traffic server on a regular basis using a network communication protocol; b) within an automated and self-learning process, detecting at least one snapshots deemed with error and/or useless traffic information and removing the detected at least one snapshot from the plurality of snapshots; c) within a systematic process, extracting and storing large-scale traffic information from the traffic snapshot images; and d) using the extracted traffic information for the purpose of modelling and analysing vehicular traffic; and/or e) using the extracted traffic information for the purpose of knowledge discovery of vehicular traffic; and/or f) using the extracted traffic information for the purpose of realistic vehicular modelling, scenario generator and the analysis of network routing protocols in case of moving vehicles; and/or g) using the extracted traffic information for the purpose of traffic visualization and knowledge of vehicular traffic levels at different instances of time and spaceand / or h) using the extracted traffic information for the purpose of traffic visualization and knowledge of vehicular traffic levels at different instances of time and space; and/or i) using the extracted traffic information from traffic images, collection of driving distance and driving time between the camera locations and places of attraction, number of lanes, width of roads for city profiling and using any of all of these information for the design and analysis of new metric to capture spatial, temporal and spatio-temporal features that will qualitatively and quantitatively model and profile cities and help to compare them with each other.

Description

Technical Field

The present invention relates to a method of providing a framework for a large scale monitoring, collecting, analysis, modelling and visualization of vehicular mobility over a communication network and to a computer program for implementation of said method. The framework can relate to a set of methods and a system for the large scale monitoring, collecting and processing of vehicular traffic images, using them for knowledge discovery, modelling and analyzing them for future vehicular networks and profiling cities.

Background of the invention

For various vehicular studies frameworks that span multiple domains are of interest. Such domains include vehicular traffic monitoring and forecasting, spatial, temporal and spatio-temporal knowledge discovery, modelling and analysis of vehicular mobility traffic, its applicability to the design and evaluation of vehicular networks, and traffic visualization.
Vehicular related studies are important to efficiently operate their movement across city and highways. Therefore, collecting realistic data from existing vehicular movement sources and using them to design and optimize the current system and use it, as an experience to design future roadways has been long touted as one of the effective ways. Several departments of transportation use expensive and sensor operated devices to monitor statistics of micro and macro mobility of vehicles. However, such efforts are small in numbers, largely undisclosed and carried out for small duration.
In most cases, the respective department of transportations, as well as civil and engineering departments do the vehicular traffic measurements. The main goal behind their study focuses on the efficient and effective mitigation plans, reducing congestion on roadways, optimizing road network and connectivity among locations to name a few. However, in order to carry out such activities, first is the expensive nature of such sensor devices which are required to be place beneath roads or mounted at specific locations so as to detect vehicular activity. Another downside is the availability of such data to research community and its validation. And third, is the size and duration of the data captured.
Since the urban planning, transportation department and other provide limited publicly access to vehicular data, the level of research in discovering long term patterns, spatio-temporal impacts have studies, however not at a very broad level. For example, in [14], Bai et. al took two days sample from two location to demonstrate the inter-arrival patterns follow exponential distribution. In [23], authors show the presence of self-similarity in inter-arrival process of the vehicles. However, the dataset assumed are for few days only and span only specific subsection of highways. It could be helpful to question or improve such studies by analyzing longer durations covering large sections of cities and highways.
Around the world, an ever-increasing problem of vehicular traffic congestion on the roads has become very severe. In the urban mobility report of 2010 [5,6], congestion caused urban Americans to travel 4.8 billion hours more and to purchase an extra 3.9 billion gallons of fuel for a congestion cost of $115 billion. On average, an individual wasted 34 hours in traffic congestions and the cost to the average commuter has increased by 230% in two decades. Congestions not only affect people during the peak period, but also at other hours, approximately half of total delay occurs in the midday and overnight.
The reasons to this problem are ranging from population growth to slow increase in infrastructure. First, an increase in population and migration to large cities has only added more complexity with 100 largest metropolitan regions, contribute 70% of the gross domestic product and have 69% of the jobs. Second, the roadway expansions haven't seen any significant growth in decades, but traffic has grown more than 30 percent faster. Finally, a third factor causes many trips to be delayed by events that are irregular, but frequent. For example, crashes, vehicle breakdowns, improperly timed traffic signals; special events and weather are factors that cause a variety of traffic congestion problems. The effects of these events are made worse by the increasing travel volumes.
In tackling these problems, most of the known approaches focus on isolated efforts to improve conditions as and when it arose at few locations. As of now the transportation and engineering sciences have proposed to get as much service as possible. First, by timing the traffic signals so that more vehicles see green lights, improving road and intersection designs, or adding a short section of roadways. Second, by adding more capacity in critical corridors new streets and highways, new or expanded public transportation facilities, and larger bus and rail fleets. Third, by changing the usage patterns like flexible work hours, avoid travelling during the rush hours. These approaches can be regarded as insufficient unless a comprehensive picture of city structure and the traffic distribution across its key intersection in a collective manner is resulting.
Several measures have been proposed recently to counter the traffic congestion and provide better management for the traffic throughput. For transportation and civil engineering point of view segregate efforts have been made to improve junctions, bus and express lanes, and car pooling to name a few. Although limited relevance for short-term change, City planning and urban design [5,6] practices can have a huge impact on levels of future traffic congestion.
In several studies, clustering has been applied to study vehicular traffic. In one such, vehicular traffic through a sequence of traffic lights on a highway [7], where all signals turn on and off synchronously is studied and the dynamical behaviours of vehicles are carried by analyzing traffic patterns. Their results have shown that clustering of vehicles varies with the cycle time of signals and are controlled by varying both split and cycle time of signals. On the same line in [8], authors studied simple aggregation model that mimics the clustering of traffic on a one-lane roadway and derived derive an analytical solution for the probability of a single car and an asymptotically exact expression for the joint mass-velocity distribution function.
In modelling approaches CORRSIM [9,19] and VISSIM [10,22] are two of the prominent simulators used for micro modelling of traffic. However, the approach in congestion simulation and its resolution requires lot of planning in designing scenarios and the effectiveness is still questionable. In few other approaches, researcher tried to model the congestion and traffic using mathematical models and derives a close form of expression. In [11], Bando et. al. proposed dynamical model of traffic congestion based on the equation of motion of each vehicle by analyzing the stability of traffic flow and the evolution of traffic congestion is observed with the development of time. The implications of empirical time headway distributions of traffic flow [12] and underlying stochastic process has shown to model fluctuations of traffic flow. However, these approaches also lack a comprehensive study for traffic congestions and its effect on other non-congested segments of the city.
In order to show the current congestion of vehicular traffic, search companies like Google and Microsoft, and several Department of Transportation are using crowd sourcing. They use the input from several sensory input provided by smart phone users to demonstrate the speed of current traffic at major sections of roads during rush hours. Recently, many department of transportation have also deployed such sensors that provide real time information about the traffic on roads. However, there are drawbacks to these kinds of visualizations. First, they categorize the traffic as low, medium and fast, but lack the information about travel time, quantitative nature of traffic like speed etc. Also, there are no provisions to predict traffic patterns during weekends and weekdays and use historic information as a tool for robust analysis. In the described work, it is also focussed on these invalidated issues.
Thus, there is a need for a systematic method or framework for accurate monitoring, knowledge discovery, modelling an analysis and its application to develop future vehicular networks and simulators. Along side, a timely need is also develop modem graphical interfaces for the prediction and real time demonstration of vehicular traffic.

Related publication

[2] J. F. Roddick. A bibliography of temporal, spatial and spatio-temporal data mining research. SIGKDD, 1999.
[3] C. Stauffer. Adaptive background mixture models for real-time tracking. In IEEE CVPR, 1999.
[4] Z. Sun, G. Bebis, and R. Miller. On-road vehicle detection: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28:694-711, 2006.
[5] T. L. David Schrank and S. Turner, ".
[6] J. Barnett, An introduction to urban design, 1st ed. Harper & Row, New York 1982.
[7] T. Nagatani, "Clustering and maximal flow in vehicular traffic through a sequence of traffic lights," Physica A: Statistical Mechanics and its Applications, vol. 377, no. 2, pp. 651 - 660, 2007.
[8] E. Ben-Naim, P. L. Krapivsky, and S. Redner, "Kinetics of clustering in traffic flows," Phys. Rev. E, vol. 50, no. 2, pp. 822-829, Aug 1994.
[9] A. Halati, H. Lieu, and S. Walker, "CORSIM-corridor traffic simulation model," in Proceedings of the Traffic Congestion and Traffic Safety in the 21 st Century Conference, 1997, pp. 570-576.
[10] N. E. Lownes and R. B. Machemehl, "Vissim: a multiparameter sensitivity analysis," in Proceedings of the 38th conference on Winter simulation, ser. WSC '06. Winter Simulation Conference, 2006, pp. 1406-1413.
[11] M. Bando, K. Hasebe, A. Nakayama, A. Shibata, and Y. Sugiyama, "Dynamical model of traffic congestion and numerical simulation," Physical Review, vol. 51, pp. 1035- 1042, Feb. 1995.
[12] P. Wagner, "Modeling traffic flow fluctuations," Journal of the ICE, vol. 4, pp. 121-130, 1936.
[13] J. C. Dunn, "Well-separated clusters and optimal fuzzy partitions," Journal of Cybernetics, vol. 4, no. 1, p. 95, 1974.
[14] F. Bai and B. Krishnamachari. Spatio temporal variations of vehicle traffic in vanets: facts and implications. In Vanet, 2009.
[15] F Bai, N Sadagopan, and A Helmy. Important: A framework to systematically analyze the impact of mobility on performance of routing protocols for adhoc networks. In Infocom, 2003.
[16] J.J. Blum and et. al. Challenges of intervehicle ad hoc networks. ITS, IEEE Tran. on, 2004.
[17] L. Briesemeister and et. al., In Intelligent Vehicles Symposium, 2000.
[18] V Bychkovsky and et. al. A Measurement Study of Vehicular Internet Access Using In Situ Wi-Fi Networks. In Mobicom, 2006.
[19] Li Zhang; Jizhan Gou; Ke Liu; McHale, G.; Ghaman, R.; Ling Li; "Simulation Modeling and Application with Emergency Vehicle Presence in CORSIM," Vehicular Technology Conference Fall (VTC 2009-Fall), 2009 IEEE 70th, vol., no., pp.1-7, 20-23 Sept. 2009 doi: 10.1109/VETECF.2009.5378787
[20] P. Hui and et. al. Planet scale human mobility measurement. In ACM HotPlanet, 2010.
[21] J Jiru and D Eilers. Car to roadside communication using ieee 802.11p technology. Industrial Ethernet Book Issue, 2010.
[22] Chen, H.; Zhang, X.; Liu, G.P.; "Simulation and Visualization of Empirical Traffic Models Using VISSIM," Networking, Sensing and Control, 2007 IEEE International Conference on, vol., no., pp.879-882, 15-17 April 2007 doi: 10.1109/ICNSC.2007.372897
[23] Q Meng and H. L. Khoo. Self-similar characteristics of vehicle arrival pattern on highways. Jour. of Transportation Engineering, 135(11):864-872, 2009.
[24] J. Ott and D. Kutscher. Drive-thru internet: Ieee 802.11b for "automobile" users. In Infocom, 2004.
[25] M. Pi orkowski and et. al, Trans: realistic joint traffic and network simulator for vanets. Sigmobile CCR, 2008.
[26] J. Singh, N. Bambos, B. Srinivasan, and D. Clawin. Wireless lan performance under varied stress conditions in vehicular traffic scenarios. In Vehicular Technology Conference, 2002. Proceedings. VTC 2002-Fall. 2002 IEEE 56th, .
[27] N. Wisitpongphan and et. al. Routing in sparse vehicular ad hoc wireless networks. IEEE Comm., 2007.
[28] J Yeo and et. al. Crawdad: a community resource for archiving wireless data at dartmouth. Sigcomm CCR, 2006.
[29] S Yousefi and et. al. Vehicular ad hoc networks (vanets): Challenges and perspectives. In ITS, 2006.
[30] X Zhang and et. al. Study of a bus-based disruption-tolerant network: mobility modeling and impact on routing. In MobiCom, 2007.
[31] M. Piccardi. Background subtraction techniques: a review. In Systems, Man and Cybernetics, 2004 IEEE International Conference on, .
[32] R Stanica, E Chaput, and A Beylot. Simulation of vehicular ad-hoc networks: Challenges, review of tools and recommendations. Computer Networks, 2011.
[33] L. Briesemeister, L. Schafers, and G. Hommel. Disseminating messages among highly mobile hosts based on inter-vehicle communication. In Intelligent Vehicles Symposium, 2000. .
[34] R. Lienhart and J. Maydt. An extended set of haar-like features for rapid object detection. In Image Processing. 2002. Proceedings. 2002 International Conference on, .
[35] A. Elgammal, D. Harwood, and L. Davis. Non-parametric model for background subtraction. In D. Vernon, editor, .

Summary of the invention

According to the invention this need is settled by a method as it is defined by the features of independent claim 1. Preferred embodiments are subject of the dependent claims.
In particular, the gist of the invention is the following: A method of providing a framework for a large scale monitoring, collecting, analysis, modelling and visualization of vehicular mobility over a communication network, wherein the method comprises the steps of: a) monitoring, collecting and storing a plurality of vehicular traffic snapshots from a traffic server on a regular basis using a network communication protocol; b) within an automated and self-learning process, detecting at least one snapshots deemed with error and/or useless traffic information and removing the detected at least one snapshot from the plurality of snapshots; c) within a systematic process, extracting and storing large-scale traffic information from the traffic snapshot images; and d) using the extracted traffic information for the purpose of modelling and analysing vehicular traffic; and/or e) using the extracted traffic information for the purpose of knowledge discovery of vehicular traffic; and/or f) using the extracted traffic information for the purpose of realistic vehicular modelling, scenario generator and the analysis of network routing protocols in case of moving vehicles; and/or g) using the extracted traffic information for the purpose of traffic visualization and knowledge of vehicular traffic levels at different instances of time and space and/or h) using the extracted traffic information for the purpose of traffic visualization and knowledge of vehicular traffic levels at different instances of time and space; and/or i) using the extracted traffic information from traffic images, collection of driving distance and driving time between the camera locations and places of attraction, number of lanes, width of roads for city profiling and using any of all of these information for the design and analysis of new metric to capture spatial, temporal and spatio-temporal features that will qualitatively and quantitatively model and profile cities and help to compare them with each other.
In the context of the invention, the term "framework" can relate to a set of methods and a system for the large scale monitoring, collecting and processing of vehicular traffic images, using them for knowledge discovery, modelling and analyzing them to study vehicular traffic patterns for future vehicular networks and for profiling cities. The term "snapshot" can relate to an image of a traffic related life situation such as of a road or the like. In particular, such an image can be a digital image. The term "regular basis" can relate to any systematically repeating such as e.g. one per hour or twice a minute or the like. The method according to the invention preferably is a computer implemented method. The traffic server can particularly be a traffic server of a department of transportation. Thereby department of transportation can relate to a private, official or governmental organisation dealing with organisation of traffic and/or transportation.
A principle activity of the method according to the invention can be image processing and efficient retrieval of traffic information from the snapshots or images. With regard to image processing, many studies [33] have been carried out that look into aspects of both background subtraction [3, 31, 32] and object detection [34]. In former methods [35], difference in the current and reference frame is used to identify objects. In detection approaches, learning the object features (shape, size etc.) are used to detect and classify them. Within the method of the invention, temporal methods can be used for background subtraction to calculate a relative numerical value instead of counting cars. By applying the method according to the invention, it has been found that background subtraction can be much faster than object detection such that efficiency of the method and system can be increased.
Within the method according to the invention, e.g., publicly available online web camera images as snapshots can be used as an inexpensive way to measure traffic activity allowing for implementing a comparably inexpensive respective method or system. These cameras can be monitored, spread across cities, e.g. for past eight months and these images can be filed on a server, e.g. a media server.
In order to address the problem underlying the invention, a complete and a systematic method or process is provided in order to monitor, analyse, model and visual vehicular traffic and mobility. Thus proposed invention can implement a reliable platform for the study of vehicular objects and structural dynamics that affect cities.
The systematic process is divided into a componential structure that is built upon the previous one. Initially it is started with data collection process. The basis of the traffic analysis is to capture spatio-temporal data about traffic trends, do so for longer time period, that helps to systematically analyse with advance statistical techniques.
In the first step, data can be collected from several sources that contain snapshots of vehicles passing by many locations. In recent times, departments of transportation (DoTs) have installed online traffic web cameras at key intersections to help general public know current trends in the traffic flow. These web cameras are basically facing towards the roads of some prominent intersections throughout city and highways. At regular interval of time, these cameras capture still pictures of on-going road traffic and send them in form of feeds to the DoTs server, i.e. commonly a media server. According to the invention an automatic script can be provided and the permission from concerned DoTs can be taken to acquire images at a very finer interval of around 30 seconds per image. Furthermore, the location information of these cameras can be used to gather structural and spatial dynamics of city. All pairs driving distances and driving times can be gathered between all such locations from Google maps, number of lanes, width of roads and places of attraction near to camera locations for the purpose of investigating the backbone and reason about the nature of profiling cities. The snapshot or image data set can contain images from plural traffic web cameras such as, e.g., from 2700 traffic web cameras, with a overall dataset of 7.5 Terabytes and containing around 125 millions images. The structural dynamics data set can, e.g., contain 2.2 million pairs of locations information.
Within the method according to the invention a fast background image subtraction algorithm can be provided to extract the traffic densities from the snapshots or images for the purpose of analyzing the traffic on roads. A semi-supervised learning and hierarchical clustering can be used based on distances to overcome the challenges of outlier detection and removal in millions of traffic images.
Within the method according to the invention a quantitative value can be generated that represents intensity of the traffic on roads. This value can be associated for a location and can have temporal vectors that show the time at which this traffic is recorded. Thus, a logical extension hereafter to perform a spatio-temporal analysis investigating various aspects of traffic during weekday and weekends etc. Also, data mining techniques can be applied on huge imagery repository to perform spatio-temporal analysis of the traffic and accordingly profile cities.
To provide a concrete analysis and for developing a prediction model, statistical modelling of the data can be provided. Furthermore using mixture models and Markov Model can be used to develop prediction models. Realistic modelling [30] of vehicular mobility can be particularly challenging due to a lack of large libraries of measurements [20] in the research community. The imagery dataset can be one of the largest available libraries of vehicular traffic snapshots. It can help to bridge the gap for a lack of such measurements. Initial results of analysed method according to the invention show that at least 82% of individual cameras with less than 5% deviation from four cities follow Log logistic distribution and also 94% cameras from Toronto follow gamma distribution. The aggregate results from each city also demonstrate that Log logistic and gamma distribution pass the KS-test with 95% confidence. Furthermore, many of the camera traces exhibit long-range dependence, with self-similarity evident in the aggregates of traffic (per city).
The framework within the method according to the invention also lays a strong foundation for the visualization environment. It is possible to support both desktop and handheld platforms to demonstrate traffic conditions. E.g., combination of GoogleEarth, OpenEarth, KML and Matlab can be used to show the traffic on desktop machine. On handheld devices applications for both iOS and Android can be developed to implement the method according to the invention. In graphical visualization the framework can provide visual demonstration of traffic overlay on the maps. It can also provide knobs to control and visualize the traffic in several different forms and thus helps predict the traffic not only for the current but for a week time and several hours of the days.
Preferably, the traffic server is accessible via Internet, wherein the information requirement of a query comprises a location of a site of interest and information about the site of interest and wherein a framework server is accessible via the communication network. In this context, the term "Internet" relates to the commonly known global system of interconnected computer networks that use the standard internet protocol suite (TCP/IP) to serve users worldwide. On internet the world wide web (WWW) is implemented which is a system of interlinked hypertext documents accessed via the Internet. With a web browser, web-pages or web-sites that may contain text, images, videos and other multimedia can be accessed and navigated. The site of interest can be any web-site providing relevant data or information. The location of the site of interest can, e.g., be a Internet address such as an internet protocol (IP) address or a uniform resource locator (URL) of the WWW. The information about the site of interest can be an image or snapshot or the like.
Preferably, in step b) supervised machine learning algorithms are used for detecting at least one snapshots deemed with error and/or useless traffic information (outliers) and removing the detected at least one snapshot from the plurality of snapshots. The method can use a combination of colour values and sum total of the dispersion that is generated from an algorithm for the identification of appropriate snapshots or images. In other cases, with a threshold set, snapshots or images can be detected and removed. The snapshots or images with zero size, consecutive duplicate images are removed.
Preferably, the information requirement comprises a plurality of information requirement attributes and the information property comprises a plurality of information attributes.
Preferably, within the systematic process of step c) the method comprises methods in a highly scalable environment with unique weighted metrics. Particularly, the method can comprises the use of history of information and previous downloaded snapshots to figure out the traffic pattern. While, it also can use the future information as well to normalize the spikes cause by temporal activities.
Preferably, the method further comprising the step of processing an image of pixels from each snapshot of the plurality of snapshots after the detected at least one snapshot is removed from the plurality of snapshots in step b). Thereby, the method preferably further comprises: attributing a traffic density at a time instance by summing of deviation in counts of the pixels; and spatio-temporally storing traffic density information in a database with known column attributes. Such spatio-temporally storing traffic density information in the database allows for improving the retrieval process performance. Employing database normalization to third normal form (3NF) can be performed in order to share primary keys and to reduce the burden of large size database and number of records. Like this, the traffic can be efficiently stored accordingly to cameras from which snapshots or images are downloaded, e.g. from a media server. Thereby, the values can be non-normalized and hence can be used for the purpose of not comparing between two cameras. Also, the use of number of lanes and width of the road to normalize the traffic values can be formulated and compared among different locations and their cameras.
Thereby, within knowledge discovery spatio-temporal information of the camera and the traffic preferably is used to understand patterns. The spatial information can be geo-coordinates of the places while temporal information can come in form of traffic information collected for each location between, e.g., the 12 hours of the day. The method or framework can be uniquely designed to use the information extracted from the images in knowledge discovering. The knowledge discovery can be categorized in spatial analysis, wherein the geo-coordinates of the camera locations are used to know the structure and dynamics of the cities; temporal analysis where the traffic information of the cameras is used to know the patterns at various hours of the day and week and several other comparisons; and spatio-temporal information being discovered using the method described above. Thus, causality methods and global knowledge about the city can be investigated.
Preferably, the extracted information is used in step d) as part of traffic density estimation. Such modelling is performed against known statistical distribution to investigate the presence and tracing to existing statistical parameters. Like this, the method can help to know the underlying patterns and development of prediction model.
Preferably, in step d) a prediction model is developed by using mixture models and hidden Markov models. This allows for developing prediction as learned using the mixture models. Various Gaussian mixture models thus can help to understand distribution of traffic across wide range of time interval.
Preferably, the information requirement comprises a plurality of information requirement attributes and the information property comprises a plurality of information attributes.
Preferably, the method comprises the use of data mining techniques and machine learning techniques in any one of steps d) to i).
Preferably, standard available and plural types of wireless communication technologies to communicate between vehicles-to-vehicles and vehicles-to-roadside-to-vehicles infrastructure are used or created.
Preferably, in step h) a traffic visualization application is developed to show it using cartographic maps of respective cities for a desktop like computer or a handheld like device, which is developed and used in real or near real time scenarios to demonstrate the traffic congestion and intensity, high congestion zones, prediction of traffic in future events, and history of traffic at a particular moment of the day on that location.
Preferably, in step i) a profiling metric is developed to profile and compare cities. The profiling metric can include all or some of the information mentioned in step i). It is also suggested to collect this information and save it in database for further processing and retrievals.
A further aspect of the invention relates to a computer program comprising code means adapted to implement the method of any one of the preceding claims when being executed. With such a computer program the method described above can conveniently be implemented and distributed and the corresponding effects and advantages can be provided.

Brief description of the figures

The method according to the invention and the computer program according to the invention are described in more detail herein below by way of exemplary embodiments and with reference to the attached drawings, in which:

Fig. 1: shows monitoring, collection and storage of vehicular image snapshots. As of now, snapshots from eleven cities or states across the word have been collected covering five continents. The total number cameras exceed 3000 in number and the total number of images collected exceeds 110 million. To store these images a storage unit of more than 10 TB in size is required;
Fig. 2: shows the complete framework. The framework is divided into four components - (i) measurements and pre-processing, used for monitoring, collection, and detection and removal of the imagery data set. (ii) Knowledge discovery, for spatio-temporal characterization, clustering and related activities to discovery knowledge. (iii) Modelling and Analysis, for the analysis of traffic distribution, mixture modelling and development of traffic prediction models. (iv) Activities related to vehicular modelling in network, communicating between cars, car to roadside, graphical visualization of traffic on devices and desktops and finally design and evaluation for the vehicular communication protocols [18];
Fig. 3: shows the distribution and curve fitting of traffic for different hours of day;
Fig. 4: shows the deployment of cameras in the city of London and Sydney. The camera actually covers a big area covering major intersections and highways. Thus, traffic visualization is one of the important activities in this case. (a) shows the deployment of cameras for the city of London. (b) shows the deployment of cameras for the city of Sydney;
Fig. 5: shows different types of traffic images captured for same camera, A visual appearance shows the different type of traffic patterns found and used for the analysis and quantitatively defined using a background subtraction algorithm. (a) shows low traffic, (b) medium traffic, (c) high traffic an possibly a congestion;
Fig. 6: shows an example of outlier detection and removal, (a) shows the traffic values that contain falsely detected (b) after removing the outliers from the dataset, a clean set of values results. Outlier detection is important part of the activity, as many images downloaded contain erroneous snapshots, blacked out images, etc.;
Fig. 7-a to 7-1: show the distribution of traffic against well-known probability distributions. The distribution is compared for three cities based on the traffic volume during the day and shows that it can change over the time and during several hours of the day and weeks. Four cities are considered to model their traffic. Results shows that well known distributions like normal and exponential do not satisfy the criteria and the relative error in modelling the real data using these distribution can give misleading results. Instead, gamma is found, log-logistic like distributions are better equipped to model the real data and these model closely followed;
Fig. 8: shows 1^st best fit of known distributions that can be used to model the real traffic data. Best fits covering locations of four cities. The values in the box show deviation. The log-logistic distribution is found to give better modelling flexibility and it also closely follows the traffic data;
Fig. 9: shows a sample of distribution for a set of individual cameras and then for the aggregate distribution deviation, which is calculated using KS-test;
Fig. 10-a and 10-b: show the k-mediod clustering for the distance between camera coordinates and the clustering using time to travel between the same set of camera pairs. If there exists synchronization between the distance to drive and time to travel then the clustering effect look similar. However, here because of the physical dynamics like small intersections, or may be because of large highways and different speeds, very less correlation is found. In Fig. 10-c, the distribution of clusters and size of clusters found after the k-mediod clustering algorithm is shown. The error bars show the pair of cameras with different assignments because of the discrepancy between distance and time;
Fig. 11-a to 11-c: show the similar comparison as Fig-10 for the city of Toronto and for the greater Toronto area. The deviation between the distance and time clustering can be seen to be very less, prompting higher correlation between each other;
Fig. 12-a to 12-c: show the similar comparison as Fig-10 for the city of London and for the greater Toronto area. The deviation between the distance and time clustering can be seen to be very high, prompting very low correlation between metrics and hence very uneven traffic for the city;
Fig. 13-a to 13-c: show the similar comparison as Fig-10 for the city of Sydney and for the greater Toronto area. The deviation between the distance and time clustering can be seen to be relative good, prompting significant correlation between metrics and even traffic for that city across many locations;
Fig. 14-a to 14-d: show on a plane, geographical mapping of coordinates is used to represent the locations and distance between them as the length of edges interconnecting these locations for all the four cities. The actual geographical coordinates are transformed, G(T) and G(L) in the figure to make better visualization of the links and locations. The spatio-temporal congestion is modeled as graph. The vertices are the camera locations;
Fig. 15-a to 15-c: show the CDF correlation of traffic densities between hour differences of the day for three cities;
Fig. 16-a to 16-d: show a 42 days hour-by-hour average correlation for the traffic densities with different lags for the state of Connecticut;
Fig. 17-a to 17-d: show a 42 days hour-by-hour average correlation for the traffic densities with different lags for the city of London;
Fig. 18-a to 18-d: show a 42 days hour-by-hour average correlation for the traffic densities with different lags for the city of Toronto;
Fig. 19-a to 13-d: show a 42 days hour-by-hour average correlation for the traffic densities with different lags for the city of Sydney;
Fig. 20: shows distribution of different level of traffic congestion scenarios for the city of Sydney. The green dots show low congestion, orange show medium while red shows the high congestion traffic on the intersections and highways;
Fig. 21: shows the traffic hotspots for the city of Toronto. As seen the relative traffic is higher on these location for a period for 42 days; and
Fig. 22: shows various scenarios for the visualization of traffic on desktop and handheld devices;
Fig. 23: shows the CDF distribution fitting for 23(a) Connecticut 23(b) London 23(c) Sydney and 23(d) Toronto.

Detailed description of the invention

The present invention will be now described in view of several aspects such as the framework, system model, analysis and systematic process.

I. Framework in brief

The framework is the core of the method according to the invention. It integrates all aspects of the vehicular image monitoring processing and its further usage to model and perform knowledge discovery.

Table 1: Dataset

City	# of Cameras	Duration	Interval	Records	Database Size
Bangalore	160	30/Nov/10-01/Mar/11	180 sec.	2.8 million	357 GB
Beaufort
	70	30/Nov/10-01/Mar/11	30 sec.	24.2 million	1150 GB
Connecticut	120	21/Nov/10-20/Jan/11	20 sec.	7.2 million	435 GB
Georgia	777	30/Nov/10-02/Feb/11	60 sec.	32 Million	1400 GB
London	182	11/Oct/10-22/Nov/10	60 sec.	1 million	201 GB
London (BBC)	723	30/Nov/10-01/Mar/11	60 sec.	20 million	1050 GB
New York	160	20/Oct/10-13/Jan/11	15 sec.	26 million	1200 GB
Seattle	121	30/Nov/10-01/Mar/11	60 sec.	8.2 million	600 GB
Sydney	67	11/Oct/10-05/Dec/10	30 sec.	2.0 million	350 GB
Toronto	89	21/Nov/10-20/Jan/11	30 sec.	1.8 million	325 GB
Washington	240	30/Nov/10-01 /Mar/11	60 sec.	5 million	400 GB
Total	2709	-	-	125.2 million	7468 GB

II. Monitoring and Data Collection process

There are thousands, if not millions, of outdoor cameras currently connected to the Internet, which are placed by governments, companies, conservation societies, national parks, universities, and private citizens. Outdoor webcams are usually mounted on a roadside pole with easy accessibility, installation and maintenance, and they have seen enormous applications not only in adaptive traffic control and information systems, but also in monitoring the weather conditions, advertising the beauty of a particular beach or mountain, or providing a view of animal or plant life at a particular location. The connected global network of webcams is viewed as a highly versatile platform, enabling an untapped potential to monitor global trends, or changes, in the flow of the city, and providing large-scale data to realistically model vehicular, or even human, mobility.
In this section, the methodology for the data collection is introduced and a high level statistics of the data traces is given. Vehicular mobility traces are collected using the online webcam crawled by a specific crawler. A majority of these webcams are deployed by the Department of Transportations (DoT) in each city. They are used to provide real time information about road traffic conditions to general public via online traffic web cameras. These web cameras are basically installed on traffic signal poles facing towards the roads of some prominent intersections throughout city and highways. At regular interval of time, these camera captures still pictures of on-going road traffic and send them in form of feeds to the DoTs media server. For the purpose of this disclosure, 10 cities are chosen with large number of webcam coverage and the permission is taken from concerned DoTs to collect these vehicular imagery data for several months. Cities in North America, Europe, Asia, and Australia are covered. In Fig. 1, the experimental infrastructure to download and maintain the image data is shown. Since these cameras provide better imagery during the daytime, the disclosure is limited to download and analyze them only during such hours. On average, 15 Gigabytes of imagery data is downloaded per day from over 2700 traffic web cameras, with a overall dataset of 7.5 Terabytes and containing around 125 millions images. Table-1 shows the high level statistics of datasets collected. Each city has a different number of deployed cameras and a different interval time to capture images. For example, cameras for the city of Sydney capture images at an interval of one minute while for the state of Connecticut the interval time between two consecutive snapshots is only 20 seconds. The wide spread geographical deployment of these cameras covering major sections of city and highways. Fig.-4(a) and 4(b) give an example of the camera deployments in the city of London and Sydney by mapping the Global Positioning System (GPS) location of the cameras to Google maps. The area covered by the cameras in London is 950 km² and that in Sydney is 1500 km². Hence, this disclosure is believed to be comprehensive and to reflect major trends in traffic movement of cities.

III Geo-Coordinates Data

The cameras are installed at critical intersections and hence carry vital spatial significance. For example, the within the context of the disclosure it has been shown that few of these cameras are installed on major highway patches and inside inner city downtown where the possibility of traffic flow is more significant than deserted locations. So a disclosure involving such locations will eventually give a good estimate of its structural significance. It is sought out to collect the physical information of these cameras as reference points to study the structural analysis. The physical information of these cameras includes latitude and longitude coordinates, zip code and state, directional view, camera locations.
After gathering the physical information about these cameras, geo-coordinates (latitude and longitude) are indentified that best fit the criteria to investigate structural dynamics. First, for each city all possible pairs of cameras are generated and second the driving distances and corresponding driving time among such pairs are calculated. The Google Maps API is used to gather information about all such pairs of locations. The details of this data set are shown in Table-1. As can be seen, the average time is very high to travel in the city of London.

IV Outlier Detection and removal

An important aspect of collecting images on such a large scale requires automated processes to manage and extract useful information. As mentioned, different cameras have different refreshing rate, images have to be continuously downloaded at a specific time-interval for each camera. To ensure that not even a single traffic snapshot is missed, the download time interval is kept a little shorter than the camera-refreshing rate. However, this results in few duplicate images that are filtered out as a first step towards outliers' detection and removal. Normally, the downloaded data set contains images, which are the snapshot of vehicular traffic on the roads. But in many instances, the images are corrupted with zero sized or with extraneous bytes (noise). Next, if the camera instrument is non-functional or has mechanical errors, the traffic-monitoring server replaces current traffic snapshot with error notification image.
The challenge here is to detect all such errors and remove them before modelling and statistical analysis. The analysis become more complex as it is not necessary to know the kind of distribution underlying and hence any statistical techniques that rely on some distribution (boxplot etc.) cannot be used. Semi-supervised learning and data mining is used to overcome the challenges of outliers' detection and removal in millions of traffic images.
In the present case, data set X containing all types of images as X = {x1, x2, x3, ..., xn} is treated. Later on this set is divided into two parts: the data points in X1 = {x1, x2, x3, ..., x1} mapped to labels in Y1 = {y1, y2, y3, ..., y1}. The provided input feature includes but not limited to image size, color depths, multi-channel color arrays and image segmentation stderrs for detecting outliers. The second part contains points with unknown labels represented as $Xu = \{x 1 + 1, x 1 + 2, x 1 + 3, \dots, x 1 + u\}$

such that u >> 1. The already known and learned labelled points are later used to find cluster boundaries and assigning class to each cluster.
In this case, low-density separation assumptions are used that help to cut the dataset into clusters. The identified clusters are separated out as outliers, which are mostly distant from the regular traffic density data. In Fig.-6(a) and 6(b), the results of detecting and removing the outliers are compared.

V. Background subtraction algorithm

It is aimed to estimate traffic density on roads considering the number of vehicles or pedestrians crossing the road. A sequence of images (I1(x,y)+I2(x,y)...+Iz(x,y) is captured by webcams. Considering the problem, it is necessary to be able to separate information needed, e.g. number of vehicles and pedestrians from the background image, which is normally road and buildings around. The main factor that can distinguish between vehicles and background image (road, buildings) is the fact that the vehicles are not in a stationary situation for a long period of time, however the back ground is stationary. The solution for the problem then seems to be applying a sort of high pass filtering over a sequence of images captured by a webcam over time. The high pass filter removes the stationary part of the images (road, buildings, etc.), and keeps the moving components (mainly vehicles). In order to implement such a high pass filter, result of a low pass filter is subtracted over a sequence of images, from each still image. This is practically equal to implementing a high pass filter over sequence of images. In order to obtain low pass filtering effect, a moving average filter is run over a time sequence of images obtained from one webcam. The duration of moving average filter can be adjusted in an adhoc way. The moving average filter is simply implemented by averaging over intensity map for several images in certain duration. At the output of moving average filter, the intensity of each pixel is obtained by averaging intensity of corresponding pixels in the interval. The output of the moving average filter (low pass filter) is normally the required background image, which is still image of street and buildings. Therefore, subtracting each image from the output of low pass filter, gives the moving components (e.g. vehicles). This is in fact the high pass component of the image over time. Having the high pass component of the image, the vehicles are highlighted from background. One may then use regular object detection techniques [4] to identify and count number of vehicles in the high pass filtered image. However, applying such techniques may require heavy load of computation, and in the same time it can be unnecessary. As an alternative, simply number of active pixels (pixels with a value higher than a certain threshold) are counted. Such a process can be much faster than detecting and counting objects in an image. In the same time, it can be much more effective, because it is looked for the percentage of the street (road) which is covered by vehicles (as an indicator of how crowded is the street), rather than number of vehicles. Number of vehicles cannot be necessarily a good indicator of crowdedness, as a long vehicle may introduce more traffic than a small one. Secondly, it overcomes the issues that object detection algorithm face in conditions of severe congestions. One of them is visibility of boundary contours used to separate objects from one another. In contrary, counting number of active pixels can indicate what percentage of the road is covered, no matter how many vehicles are in the road. Said that, consider an image can be represented as $I (x y) = L (x y) + T (x y) + N (x y)$

where I(x,y) is the captured image, L(x,y) is the low pass filter and T (x,y) and N(x,y) are respectively the traffic and associated noise with the images. In first step, a low pass filter is generated using the aforementioned technique of moving average. Initially, a give data pixel is averaged with its right and left neighbors. For the purpose of this disclosure, the number of its neighbors is kept z = 100. The averaging results in the removal of dominant trends. These dominant trends are T (x,y) and N (x,y). This low pass filter remains constant for one camera, $L (x y) = \frac{I_{1} (x y) + I_{2} (x y) ..... + I_{n} (x y)}{z}$
To get the traffic density associated with an image the low pass filter is subtracted and set a threshold (τ) to reject a resulted pixel value below it so as to reduce the effect of noise (shadows etc.) N (x, y). In summary $Iʹ (x y) = I (x y) - L (x y)$
Such that I' = (x, y) > τ. Later, the image is converted to grayscale I'(x, y) and sum the pixels to get the traffic density (d). $d = \sum_{x = 0}^{m} \sum_{y = 0}^{n} Iʺ (x y)$
Table 2: Geo-coordinate pairing

City # Pairs Avg. Dis.(km) Avg. Time(m) Area

Connecticut 74801 6.4 42 60%

London 32580 1.56 26 53%

Sydney 4422 3.2 33 67%

Toronto 43055 12.5 88 66%

VI. Knowledge Discovery

a. Geo-Coordinates Pairing

After gathering the physical information about these cameras, geo-coordinates (latitude and longitude) are identified that best fit the criteria to investigate structural dynamics. First, for each city all possible pairs of cameras are generated and second the driving distances and corresponding driving time among such pairs are calculated. The Google Maps API is used to gather information about all such pairs of locations. The details of this data set are show in the Table-II. As can be seen, the average time is very high to travel in the city of London.

b. Method

Data mining techniques are applied on huge imagery repository to profile cities. In the first step, the spatial structure of cities is identified. To do this, k-medoid clustering is separately applied on geo-coordinates driving distance and then on corresponding driving time pairs of cameras for each city and compare the deviation. This helps to discover the irregularity caused because of spatial feature around these locations. However, since this clustering is hard partition Dunn index [13] is used to identify the correct number of clusters. In the second step, autocorrelation function is applied and temporal mining on traffic densities across individual and aggregate cameras for the cities. Finally, in the third step similarity and spatio-temporal mining is used to profile cities and reason about how they are different from each other. The process is divided into two steps: 1) Data pre-processing and 2) Knowledge discover as shown in the Fig.-2.

i. Clustering using K-mediod algorithm

The K-medoid [2] is a hard partitioning algorithm whose objective function is to partition the input data X into c clusters. This algorithm is used for the purpose using distance measure that results from the geo-coordinates information. From given data set of cameras pairs X Kmediod algorithm allocates each data point to one of the c clusters thereby minimizing the intra-cluster sum of squares. $\sum_{i = 1}^{c} \sum_{k \in A_{i}} {‖ x_{k} - v_{i} ‖}^{2}$
where Ai is the distance point between any two camera pairs in the i-th cluster and vi is the mean for that point over cluster i and also the cluster center. This center is the nearest objects to the mean of data in one cluster V = {v_i ∈ X |1 ≤ i ≤ c }.
Cluster Size: Each city is defined by its own cluster sizes. Since K-medoid is a hard partitioning that requires input to the number of clusters. However this apriori is not available. Dunn Index[9] (D.I) is used to measure a good quality of number of clusters with different values of k. This index is defined as $D (c) = \min_{i \in c} \{\min_{j \in c, i \neq j} \{\frac{\min_{x \in c_{i}, y \in c_{j}} d (x y)}{\max_{k \in c} \{\max_{x, y \in c} d (x y)\}}\}\}$
The main goal of the measure is to maximize the inter-cluster distances and minimize the intra-cluster distances. Therefore, the number of cluster that maximizes D(c) is taken as the optimal number of the clusters. Table 3: Clustering analysis

City D.I # Clusters ρ

Connecticut 0.001 6 6%

London 0.0002 10 52%

Sydney 0.005 4 5%

Toronto 0.0008 5 10%

ii. Spatial Analysis of Cameras

Initially, it is looked into distance and time pairing among the cameras of cities. The distance and time distribution of all camera pairs are sampled to investigate if the there is a linear mapping in these metrics. Surprisingly, a deviation is found wherein a lot of deviation in this statistics as show in Fig. 10 to 13. The bar graphs show the size of clusters generated after running 1000 iterations of k-medoid algorithm. The error bars are the deviation in the data points (camera pairs) that are present in distance cluster but not in time clusters and vice versa for time. In the Table-III this deviation is shown using p . For the city of London it is found that intersection are very close to each other and hence tendency to have a slower traffic in inherent in the city infrastructure. On the other hand in the city of Toronto, the locations are mixed of both small and well-planned highways that makes the traffic distributed evenly, however the internal structure of the Toronto shows the deviation among clusters mainly in 2 and 3. Similar behaviour is observed for the city of Connecticut where the clusters 1,4,5 and 6 have large deviation and have a comparatively larger number of small distance inter-section. While for the Sydney, it seems the clusters have very low deviation and an analysis of clusters show even distribution of traffic across all its roads. It further shows that Sydney is better planned and has less organic growth as compared to other cities. Similar analysis is found for other cities as well, however to keep the brevity of results only four of the ten cities are discussed. In Table-III the average deviation among clusters membership shows the inconsistency among distance and time.

c. Temporal Analysis of Traffic Densities

In the previous section on spatial analysis, the clustering variation is discussed for geodesic driving distance and time among a set of critical intersection points of cities. At any given time these statistics remain the same unless cities are structurally modified. In this section, it is more focussed on the temporal aspect of traffic distribution in these cities. The time series imagery data set of cameras is used to perform this analysis. In this activity the following questions are asked:

Q. 1: What is the nature of traffic distribution for these cameras? Do all cameras have same distributions?
Q. 2: Does the nature of distribution is predictable over a long period of time?
Q. 3: Which events impact a deviation in the traffic distribution? How to identify such events?

In order to get an impression of the traffic distribution, it is started with a qualitative analysis of densities in four cities. This step helps to get a high level representation and reasoning methods about the behaviour of traffic. Later on, it is made a logical base to investigate precise quantitative information. To answer first question, data set is sampled into hourly represent for a period of 42 days and use spectral method to generate the relationship between various days. A sampled view taken from four cities show those cameras have varying distribution against the popular notion of 'rush hours'. It is found to be difficult to estimate an aggregate statistical parameter that defines all cameras. For example, studies for 42 days of traffic density have shown at least four types of temporal activity; (i) when the traffic reaches its height during the morning and evening hours while relative smoothness during the afternoon hours; (ii) when the traffic is evenly distribution across whole time period; (iii) when the traffic instances show a high traffic similar trends in which the traffic is populated during whole day and it seems this place is a busy market area; (iv) traffic patterns are found to be not very regular and hard to estimate a correct traffic model (random distribution of traffic). This kind of variation in traffic gives challenges in developing an aggregate forecasting mechanism for a city wide traffic. It also rejects a popular notion of 'rush hours' concept since different cameras seems to posses their own distribution of traffic during varying hours of the day.

i. Level ot Traffic Congestion:

The Traffic Congestion is a function of maximum amount of traffic that a camera has ever experienced in the duration of trace analysis. Mathematically it is defined as. For a camera C1 at hour h of the day k, the traffic densities will be d(Ck ), the level of congestion is $T = \frac{d (C_{p_{q}}^{r})}{\sum_{i = 1}^{n} \sum_{h = 1}^{12} \sum_{d = 1}^{m} \max (d (C_{i_{h}}^{k}))}$

ii. Traffic Congestion Correlation:

The power of Correlation coefficients is utilized to measure the degree to which traffic congestion of two cameras values and direction of a linear relationship are associated. In the described case this technique is used to correlate the traffic change during several hour of day. The correlations are analyzed for 1-4 hour lags for each camera with itself. For example, it is tried to investigate what is the correlation between the traffic at 7 AM and 8 AM, 7 AM and 9 AM etc. The following is used to calculate the value of correlation coefficient p . For two cameras C and C2 at hour h of the day k, their with traffic densities will be d(Ck ) and d(Ck ), respectively. The 1h 2h correlation of their densities is calculated as the value of p is varying: -1 ≤ r ≤ +1 . Thus it is for positive linear correlations and negative linear correlations, respectively. If two cameras have a strong positive linear correlation, the value of p is reaches to +1. A value of exactly +1 indicates a perfect positive match. It indicates a relationship between two cameras such that as values for one increase, values for other also increase. If two cameras have a strong negative linear correlation the value of p is close to -1. Its value is equals -1 indicating a perfect negative match. The negative values indicate a relationship between two cameras such that as density for one cameras increase, the density for other camera decrease. If there is no linear correlation or a weak linear correlation, the value of p is close to zero. A value near zero means that there is a random, nonlinear relationship between the two cameras. In order to analyze the traffic change the correlation of traffic densities is investigated in four different time lags from one to four hours. To do this, the traffic densities are sampled of each camera into hours from 7 A.M to 7 P.M. Then correlations are found between consecutive hours for 42 days. For example, the correlation coefficient of traffic densities between 42 days of 7 A.M and 8 A.M, 7 A.M and 9 A.M. is found. The results are accumulated for each camera separately and compiled in form a CDF that shown in Fig.-15 lt is seen that for the city of Connecticut and Sydney in Figs. 15(a) and 15(d), the hourly traffic change is highly correlated, almost 80% of cameras' next hour traffic is 70% correlated to its current hour. More surprising is the two hours difference of traffic densities for cameras of these two cities, where 80% of next two hours traffic is only 50% or less correlated to the current hour. And around 60% cameras have 30% correlation for a time lag 3-4 hours. While in case of the city of London the next hour traffic density for 80% cameras is close to 60% correlated to the current hour. It goes further down to 30% for next two hours and around 15-20% for a 3-4 hour difference. When compare with the Toronto in Fig. 15(c), the latter's 80% cameras' next hour traffic is nearly 55% correlated to the current hour. For the same city its next 2-4 hours traffic density nearly 30% correlated to the current hour. These observations tells that London and Toronto traffic has high fluctuations than Connecticut and Sydney, which are rather very smooth. Any prediction model build on this requires more insight into the transitioning traffic for London and Connecticut because of their high variability. After looking into a high level picture of traffic change for several consecutive hours, it is moved to analyze the variability contributed to the CDFs of Fig. 15 by the individual hours. This helps to know which hour of the day's traffic change on a large scale compared to its previous hours. The result of such an analysis is shown in Fig. 8 for all four cites with hour lags. It is started by analyzing the city of Connecticut, where the correlation for next hour in Fig. 16(a) is low at 7 AM and 12 noon and it is relatively low also for 2-4 hours lag as seen in Figs. 16(c)-16(d). The city of London normally has a relatively low correlation with around 50% during 7 AM in Fig. 17(a), but it suddenly drops to less than 1% for 2-4 hours lag in Figs. 17(b) and 17(c). However, afternoon traffic remain relatively 40% correlated for 1-2 hour lag and very high during the later hours of the day with 60%-80% correlation for 1-3 hour lags as seen in Figs. 17(a)-17(d). It seem from the Figures of London $ρ (C_{1}^{k} C_{2_{h}}^{k}) = \frac{n * Σ d (C_{1_{h}}^{k}) * d (C_{2_{h}}^{k}) - (Σ d (C_{1_{h}}^{k})) * (Σ d (C_{2_{h}}^{k}))}{\sqrt{n * (Σ d {(C_{2_{h}}^{k})}^{2}) - {(Σ d (C_{1_{h}}^{k}))}^{2} *} \sqrt{n * (Σ d {(C_{2_{h}}^{k})}^{2}) - {(Σ d (C_{2_{h}}^{k}))}^{2}}}$

that evening time traffic density do not change very much for all the four lags. Its shows that London hourly traffic fluctuates more during the morning times and early evening, but relatively stable during the noon times, wich is opposite to Connecticut, although the average correlation for London is nearly 50%-60% only. As expected for the city of Sydney, the traffic is relatively 60%-70% correlated for one hour lag in Fig. 18(a) and show nearly consistent correlation of 40%55% for 2-4 hour lag in Figs. 18(b)-18(d). This analysis shows the trends as seen in previous section where CDFs of Sydney are relatively more stable and highly correlated. The case of Toronto is nearly similar to the London. There is average correlation of 50% for one-hour lag and nearly 20%-30% correlation for other lags. At some point for both London and Toronto, the correlation is negative. These analyses provide a stepping-stone to develop a model for predicting next hour's traffic based on the current condition.

d. Spatio-temporal Analysis

In this section, the finding from spatial and temporal activity is utilized to analyze the spatio-temporal patterns of traffic. The camera locations are modelled for each city as a graph G = (V,E), with V as set of vertices representing camera locations and E is set of edges as the correlation of traffic congestion between two cameras representing the width of connecting edge (roads). The size of the vertex represents the amount of congestion experienced by that camera. By modelling in such way, a weighted graph view of the spatio-temporal activity pattern on a scale of the size of the city results. It in turn provides a qualitative reasoning about the locations and its inter-connecting roads (as edges) that experience normal than usual traffic. Alternatively, a thin link represent congestion level is uncorrelated at these two locations, while a thick link show congestion level is highly correlated at these two locations.
On a plane, geographical mapping of coordinates is used to represent the locations and distance between them as the length of edges interconnecting these locations for all the four cities. The actual geographical coordinates are transformed (G(T) and G(L) in Figs. 14(a)-14(d)) to make better visualization of the links and locations.
Analysis: In order to evaluate each city, analysis at various hours is performed and the traffic congestion trends are analyzed. 42 days of traffic congestion are systematically sampled into hours for each camera and calculated a correlation coefficient between all the camera pairs. In the graph this correlation is input as width of each link interconnecting those two locations. To calculate the traffic congestion at each camera level, the correlation coefficient of the traffic of 42 days is taken. This is input to the graph as of varying sized vertices representing the level of congestion experienced at each point in the graph. The results to this analysis are shown in the Fig. 14. To make it more elaborative, the edges and nodes are filtered with 0.7 ≤ T. The sparseness of graphs is also a measure how much dense congestion are for that city. The results show that for the city of London the traffic is present at all links and during many hours of the day. However, few nodes experience more traffic than usual. This corroborate the previous results from temporal analysis as well, where it is found that London traffic is varying in nature and experience relatively more congested than other cities. However, using this methodology it can visually aspect specific sections of the city with different traffic congestion estimates. The city of Connecticut also depict similar statistics about the congestion trends. Internally, the total time that traffic is present on the links in case of Connecticut is smaller than London is observed. Thus, later experience traffic for long hours. As expected the city of Sydney has less traffic as well as less congestion. It turned out that this city traffic demonstrated less traffic compared to other cities. Not only this, the links are less congested and individual locations also experience very less traffic. Similar analysis is found for Toronto as well, with very few links experiencing congested traffic and there are few routes, which are prone to congestion, otherwise the city has good traffic management. On a note, by setting threshold the level of sparseness in a graph also tell the nature of congestion experienced by that city. In case of London and Connecticut the traffic on its roads are higher than Sydney and Toronto. Not only the location experience congestion but the correlation among the locations that is the road networks also experience larger traffic. In summary, spatio-temporal analysis of traffic cameras and traffic flow is observed. The analysis is started out by spatially mining the cameras to model the distance and time. It is found that spatial analysis has shown the difference between cities structural dynamics. It is found that London particularly is more prone to congestion because of its inconsistency in driving distance and time. Secondly from the temporal analysis it is also found that London traffic is not only less correlated but has large fluctuation in the predicting the traffic values. The spatio-temporal analysis has shown that many links are congested. On the other hand for city of Connecticut and Sydney, spatial and temporal correlations are found to be higher and relative show stable traffic distribution. The traffic of Toronto comes in between these two extremes where it is busy during very small portion of the day and rather shows a stable temporal mining and spatio-temporal mining.

VII. Modelling

As a first step toward realistic modelling of vehicular communication network, it is focused on two studies of traffic arrival process in this disclosure: modelling the densities (d) against well known probability distributions and analyzing the typical traffic burstiness using self-similarity analysis. The objective of this disclosure thus helps to understand the underlying statistical patterns and model the arrival processes. The models are selected based on their applicability in every day statistical analysis and by several iterations of modelling that showed the traffic closely follow (less deviation) one or more of the discussed probability distributions. Due to page limit and as early study, in this section only results from 4 represented cities (London, Sydney, Toronto, and Connecticut) with in total 458 cameras and 12 million images are presented. An important and underlying fact about the traffic densities is the approximation to relative traffic on the roads. This assumption is different from counting cars using loop detectors or other sensors. As shown in the Fig.-5(a)-5(c), three traffic scenarios are depicted of varying intensities from low to fully congested intersection for the same camera as captured by the density parameter (d).

a. Traffic flow characterization

In order to investigate the nature of traffic a holistic approach is taken to systematically extract individual and aggregate flows of the traffic densities from the images. Each individual flow constitutes a distribution of traffic densities that demonstrate the flow of traffic as viewed from an individual camera. This helps to better understand traffic intensity at a microscopic level of each intersection. The aggregate traffic combines the flows from the entire camera in timely ordered fashion. The main advantage from analyzing aggregate traffic is to understand the emergent properties and helps to model and profile the city and make intelligent guesses about different city based on this aggregate.

On analyzing the traffic, an important activity to factorizes the granularity of traffic for various purposes. For example, hourly patterns provides a good estimate on the nature of congestions during morning and evening times which otherwise flow at individual density level may not depict. On the other hand, the finer granularity helps to understand sudden spikes in the traffic flow and congestion mitigation plan [26]. In this disclosure, it is chosen to look into all these patterns by modelling flows against well-known probability distributions. Fig. 5 gives an example of the traffic density on hourly basis for one of the camera in Sydney. It can be observed that there is in general high traffic density during the peak hours and low traffic density between 10am and 2pm (off peak time), which provides positive confirmation that the algorithm can effectively detect traffics.

Table 4: Dominant Distribution by Ranking

City	1st Best Fit	2nd Best Fit	3rd Best Fit
Connecticut	L[87%]	G[11%]	E[0.5%]
London	L[42%]	G[39%]	W[16%]
Sydney	L[62%]	G[32%]	N[2%]
Toronto	G[46%]	W[31%]	L[21%]

E=Exponential, G=Gamma, L=Loglogistic, N=Normal, W=Weibull

Table 5: Dominant distribution by % deviation in KS-tests

City	≤ 3%	≤ 5%
Connecticut	L[62%], G[15%], W[3%]	L[94%], G[44%], W[19%]
London	G[34%] L[34%], W[10%], N[0.5%]	L[82%], G[70%], W[47%], N[7%]
Sydney	L[88%], G[61%], W[4%], N[2%]	L[98%], G[88%], W[44%], N[18%]
Toronto	G[75%], W[58%], L[34%]	G[94%], W[88%], L[87%1], E[4%], N[1%]

Figs. 7(a)-7(i) show the cumulative density function of the traffic for three individual cameras in each city, with low, medium and high average traffic. It can be seen that traffic at individual cameras can vary a lot, but in general Log-Logistic, Gamma and Weibull distribution can capture some of the key features of the data. Log-logistic is the best approximation for the individual camera traffics in all the four cities, and it is further shown the detail statistics of the fitting in Table-4 that best fits, which had shown least order of deviation against KS-test.

In Table-5, the deviation is measured from empirical data and sample the camera at 3% and 5% error levels. In Fig. 8, results show the average dominance of each of four distributions. It is found that even on individual aggregation level, the log-logistic distribution provides a good estimate for empirical data. As evident, Log-logistic and Gamma closely matches the empirical data distribution.
Finally, Figs. 23(a)-23(d) show the cumulative statistics for the aggregated traffic for each city. It can be observed that different cities have different aggregated traffic, for example it can be seen that London in general has more traffic than Connecticut.

VIII. Visualization

The fourth and final component of the framework is the choice of visualization environment. It is planned to support both desktop and handheld platforms to demonstrate traffic conditions. A combination of GoogleEarth, OpenEarth, KML and Matlab are used to show the traffic on desktop machine. On handheld devices applications are developed for both iOS and Android. In graphical visualization the framework provides visual demonstration of traffic overlay on the maps. It also provides knobs to control and visualize the traffic in several different forms and thus helps predict the traffic not only for the current but also for a week time and several hours of the days.

a. Visualization Scenarios

In this section, four scenarios are illustrated to study traffic mobility. These are shown in Fig. 22 and they exhaustively model different types of spatio-temporal dimensions of traffic mobility. Graphical visualizations are used to show them, which is the last component of the proposed framework. It is planned to write client programs to demonstrate on desktop as well as on handheld devices (Android and iPhones). It is believed that it also aids in qualitatively reasoning the traffic patterns and will eventually prove to be a good exercise into extracting a set of metrics to develop vehicular simulators and mobility models.

i. Real Time Traffic Update: In this example, right up to the minute traffic density is demonstrated. To do this, the latest image is downloaded and evaluated and store its value in the database server as shown in the Fig. 1. At regular intervals, client program pulls this information and overlay on geological maps.
ii. Traffic Similarity: It is interesting to visualize sections of a city bearing similar traffic patterns. An activity that help in correlating its different parts based on the traffic coherency. In the demonstration, it is planned to assign traffic values of different into three separate classes (low, medium and high) and locations bearing similar class have identical colored (green, orange and red) icons respectively. An example of a traffic snapshot for the city of Sydney is shown in the Fig.-20. It can be inferred, nearest locations show similar traffic.
iii. Traffic Hot Spots: A set of locations that show unusual traffic patterns are of interest to the community. Such hot spots can be important to study as well because of their distinctness from regularity. For example, a very low traffic density highways can always opted if nearby highways are congested. In Fig. 21 high traffic intensity hot spots are shown for the city of Toronto for a period of 21 days.
iv. Modelling and Prediction: The initial analysis shows that the distribution of traffic densities is multi-modal in nature. In order to reveal their underlying statistics and perform classification it is planned to use Gaussian mixture models. This activity helps to approximate look-alike traffic distribution of several cameras in a systematic manner by classifying them into a set of few mixtures. A mixture of four distributions for a camera for a period of 42 days is shown in the Fig. 3. Hidden Markov Models are used to develop predictors and forecasting mechanism for traffic mobility across several hours of the days and weeks.

IX. Mobility Modelling for Vehicular Network

The experience gained from the analysis and modelling of traffic densities potentially aids in future design and evaluation of vehicular networks. Today, most of the simulation tools input generic or random scenarios and disregard the challenges brought by mobility in vehicular networks [15, 17, 29]. In the described case, the benefit of having large library [28] of realistic traces coupled with modelling results prove to be very helpful in developing rich scenarios for testing protocols, network dynamics, scalability of traffic, topology size estimation, and the analysis of traffic patterns. The data-driven realistic simulation tools and mobility models are necessary for accurate evaluation of vehicular routing protocols [27] and services. However, the analysis shows that traffic characterization and communication network analysis tools (e.g. ns2) are separately developed and therefore lack a tight integration [25, 29]. The gathering and analyzing real traffic data can aid in identifying metrics (e.g. spatio-temporal density) to develop data driven mobility models and simulators. The unique challenges (e.g. high speed, intermittent connectivity) in inter-vehicle [16] and car-to-roadside [21] communication require the development of robust and efficient routing protocols [18]. The cameras' geo-coordinates and their traffic density distribution can be used to develop and test new performance metrics and protocols. These tests can be carried out at individual locations as well as at the city-scale level.
While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive. It will be understood that changes and modifications may be made by those of ordinary skill within the scope and spirit of the following claims. In particular, the present invention covers further embodiments with any combination of features from different embodiments described above and below.
The invention also covers all further features shown in the Figs. individually although they may not have been described in the afore or following description. Also, single alternatives of the embodiments described in the figures and the description and single alternatives of features thereof can be disclaimed from the subject matter of the invention.
Furthermore, in the claims the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. A single unit or step may fulfill the functions of several features recited in the claims. The terms "essentially", "about", "approximately" and the like in connection with an attribute or a value particularly also define exactly the attribute or exactly the value, respectively. The term "about" in the context of a given numerate value or range refers to a value or range that is, e.g., within 20%, within 10%, within 5%, or within 2% of the given value or range. A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet [18,24] or other wired or wireless telecommunication systems. In particular, e.g., a computer program can be a computer program product stored on a computer readable medium which computer program product can have computer executable program code adapted to be executed to implement a specific method such as the method according to the invention. Any reference signs in the claims should not be construed as limiting the scope.

Claims

1. A method of providing a framework for a large scale monitoring, collecting, analysis, modelling and visualization of vehicular movement, with an added advantage to extend and employ a communication network, wherein the method comprises the steps of:

a) monitoring, collecting and storing a plurality of vehicular traffic snapshots from a traffic server on a regular basis using a network communication protocol;

b) within an automated and self-learning process, detecting at least one snapshots deemed with error and/or useless traffic information and removing the detected at least one snapshot from the plurality of snapshots;

c) within a systematic process, extracting and storing large-scale traffic information from the traffic snapshot images; and

d) using the extracted traffic information for the purpose of modelling and analysing vehicular traffic; and/or

e) using the extracted traffic information for the purpose of knowledge discovery of vehicular traffic; and/or

f) using the extracted traffic information for the purpose of realistic vehicular modelling, scenario generator and the analysis of network routing protocols in case of moving vehicles; and/or

g) using the extracted traffic information for the development of new car to car, car to roadside, roadside to car to roadside message and information routing on wireless protocols based on the standard 802.X technologies; and / or

h) using the extracted traffic information for the purpose of traffic visualization and knowledge of vehicular traffic levels at different instances of time and space; and/or

i) using the extracted traffic information from traffic images, collection of driving distance and driving time between the camera locations and places of attraction, number of lanes, width of roads for city profiling and using any of all of these information for the design and analysis of new metric to capture spatial, temporal and spatio-temporal features that will qualitatively and quantitatively model and profile cities and help to compare them with each other.

2. Method according to claim 1, where the traffic server is accessible via Internet, wherein the information requirement of a query comprises a location of a site of interest and information about the site of interest and wherein a framework server is accessible via the communication network.

3. Method according to claim I or 2, wherein in step b) supervised machine learning algorithms are used for detecting at least one snapshots deemed with error and/or useless traffic information and removing the detected at least one snapshot from the plurality of snapshots.

4. Method according to any one of the preceding claims, wherein the information requirement comprises a plurality of information requirement attributes and the information property comprises a plurality of information attributes.

5. Method according to any one of the preceding claims, where within the systematic process of step c) comprises methods in a highly scalable environment with unique weighted metrics.

6. Method according to any one of the preceding claims, further comprising the step of processing an image of pixels from each snapshot of the plurality of snapshots after the detected at least one snapshot is removed from the plurality of snapshots in step b).

7. Method according to claim 6, further comprising
attributing a traffic density at a time instance by summing of deviation in counts of the pixels; and
spatio-temporally storing traffic density information in a database with known column attributes.

8. Method according to claim 7, where within knowledge discovery spatio-temporal information of the camera and the traffic is used to understand patterns.

9. Method according to any one of the preceding claims, wherein the extracted information is used in step d) as part of traffic density estimation.

10. Method according to any one of the preceding claims, wherein in step d) a prediction model is developed by using mixture models and hidden Markov models.

11. Method according to any one of the preceding claims, wherein the information requirement comprises a plurality of information requirement attributes and the information property comprises a plurality of information attributes.

13. Method according to any one of the preceding claims, comprising the use of data mining techniques and machine learning techniques in any one of steps d) to i).

14. Method according to any one of the preceding claims, wherein in step g) standard available and plural types of wireless communication technologies to communicate between vehicles-to-vehicles and vehicles-to-roadside-to-vehicles infrastructure are used or created.

15. Method according to any one of the preceding claims, wherein in step h) a traffic visualization application is developed to show it using cartographic maps of respective cities for a desktop like computer or a handheld like device, which is developed and used in real or near real time scenarios to demonstrate the traffic congestion and intensity, high congestion zones, prediction of traffic in future events, and history of traffic at a particular moment of the day on that location.

16. Method according to any one of the preceding claims, wherein in step i) a profiling metric is developed to profile and compare cities.

17. Computer program comprising computer executable program code adapted to be executed to implement the method of any one of the preceding claims when being executed.