WO2002033558A1 - Multimedia sensor network - Google Patents

Multimedia sensor network Download PDF

Info

Publication number
WO2002033558A1
WO2002033558A1 PCT/US2001/031799 US0131799W WO0233558A1 WO 2002033558 A1 WO2002033558 A1 WO 2002033558A1 US 0131799 W US0131799 W US 0131799W WO 0233558 A1 WO0233558 A1 WO 0233558A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
network
network according
sensor
sensors
Prior art date
Application number
PCT/US2001/031799
Other languages
French (fr)
Inventor
Andrew A. Kostrzewski
Sookwang Ro
Tomasz P. Jannson
Chih-Jung Judy Chen
Original Assignee
Physical Optics Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Physical Optics Corporation filed Critical Physical Optics Corporation
Priority to AU2002213121A priority Critical patent/AU2002213121A1/en
Publication of WO2002033558A1 publication Critical patent/WO2002033558A1/en

Links

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B25/00Alarm systems in which the location of the alarm condition is signalled to a central station, e.g. fire or police telegraphic systems
    • G08B25/007Details of data content structure of message packets; data protocols
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01VGEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
    • G01V1/00Seismology; Seismic or acoustic prospecting or detecting
    • G01V1/22Transmitting seismic signals to recording or processing apparatus

Abstract

A multimedia network (10) includes a sensor network (12), a communication bridge (30) and a user network (e.g., the Internet). The sensor network includes a set of interconnected sensors coupled to a control module (56). The control module receives a set of sensed data from the sensors and generates a homogenized data stream based on the sensed data. The communication bridge is coupled to the sensor network and buffers the homogenized data stream. The user network is coupled to the communication bridge and receives the homogenized data stream from the sensor network. The user network transmits data back to the control module through the communication bridge.

Description

MULTIMEDIA SENSOR NETWORK BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to an intelligent sensor network configured to process, compress and transmit highly synchronized data in real-time to a network. In particular, the invention relates to transmitting homogenized TCP/IP-packetized data streams of sensor data from a sensor network to a user network (e.g., the Internet).
2. Description of the Related Art
The purpose of human vision is not to detect the simple presence or absence of light, but rather to detect and identify objects. Objects are defined principally by their contours and the visual cortex registers a greater difference in the brightness between adjacent visual images, faithfully recording the actual physical difference in light intensity.
Based on studies of the electrical activity of individual neurons in the visual cortex, David Hubel and Thornton Wiesel (Nobel prize winners in 1981) discovered that these cells are highly sensitive to contours, responding best not to circular spots but rather to light or dark edges. These cells are classified using a complex hierarchical system based on different response characteristics. Different localized sets of neurons in the visual cortex are specialized to carry codes for contours, color, spatial position and movement. This segregation of functions in the brain results in a first level object edge extraction that isolates important objects from a background, even if the object is very close in color to the background plane, quickly followed by a more detailed review of the particular object. It is for this reason that while watching streaming video, we are relatively tolerant of single frame image quality insofar that the image "details" are on a second level as compared to the recognition of the objects. On the contrary, we easily and quickly detect frame-to-frame synchronization errors the video based on tracking the moving objects in each frame. REAL-TIME MULTIMEDIA ON A NETWORK The sensitivity to synchronization errors has directly impacted the development of transmitting real-time multimedia over asynchronous networks including the Internet. Unfortunately, TCP/IP, Ethernet and X.25 that are used in the mainstream data networks are designed for handling non-continuous data and are ill- suited for continuous media as continuous data requires high-rate isochronous transmission. Non-continuous data is more error-sensitive than continuous data, but less demanding for factors such as delay and jitter.
Traditional protocols for highly-synchronized data typically provide best- effort type, point-to point communication and therefore cannot reliably support isochronous traffic, nor can these protocols efficiently serve in multipoint situations such as teleconferencing and co-operative work. Real-time capability ensures timely reaction to events such as the generation of video frames by a camera. When these frames are transmitted by a protocol without real-time capability for highly- synchronized data, and displayed by a non-real-time operating system, the display is jerky because of unpredictable delays in the transmission and the decoding. The required values necessary to implement a real-time capability to guarantee that the delays are bounded are defined using QoS (Quality of Service) specifications, which in turn depend on the multimedia network's synchronization requirements. Specific implementation of a QoS specification is discussed below in relation to transmitting multimedia data between a sensor network and a user network.
As a result of the rapid advances and decreasing cost of processing hardware, storage devices, and communication networks, the number of obstacles to the delivery of a highly-synchronized video stream over an asynchronous network is decreasing. However, due to the complexity of the delivery of highly-synchronized data over an asynchronous network, current multimedia networks have not been able to provide real-time, full-frame streaming video with a guaranteed quality of service due to the variation in access delays over the delivery path.
In particular, after retrieving the data from a multimedia server, the data is transmitted over a communication network to the end-user system, and finally it is displayed on the end-user system at a constant data rate. Each of these components (i.e., storage, network and end-system) imposes difficult constraints on the design and development of this type of multimedia network. In this regard, current multimedia requirements are 15-fps (frames per second) animation, 30-fps NTSC (National Television Systems Committee) television quality video and 60-fps HDTV (High
Definition Television) video, and require a corresponding high bandwidth to transmit the data. In prior art networks, transmission latency and image quality are typically sacrificed at the expense of increased transmission bandwidth. As technology advances, there is a growing need for a network that is capable of processing, compressing and transmitting real-time multimedia traffic such as voice, audio and video over packet data networks such as Local Area Networks (LAN's), frame relay and the Internet at relatively low bandwidths without compromising system latency or image quality. Additionally, these systems often require the use of a lengthy set of different formats in order to transmit various forms of data (e.g., MPEG for video, wavelet or JPEG for still imagery, etc.). In addition to synchronization problems, current networks require entirely different subsystems and protocols to process different types of data. In particular, current networks are unable to process, compress and transmit all different types of sensor data into a single, homogeneous, TCP/IP-packetized stream. Additionally, current networks do not support the interpretation of high-resolution spatial data (still imagery) with compressed temporal data (video). The integration of the spatial and temporal data allows a user to view a high-resolution still image based on a highly compressed temporal data stream without the need for separate JPEG or similar still- image encoding and compression.
The integration of spatial and temporal data within a multimedia sensor network is compounded by the problem of preserving enough data to reconstruct an adequate amount of spatial structure of an important scene in a video to analyze the particular scene at a future date, while maintaining a relatively low transmission bandwidth. The balancing of the redundancies, of course, is difficult because a user does not have the a priori knowledge of which particular scene may be important in the future. Therefore, given the slow development of delivering real-time multimedia over a packetized network, none of the existing network systems can provide integrated and synchronized data from different sensors at a minimum transmission bandwidth while maintaining enough image resolution of the data to reconstruct any particular scene.
NETWORKS AND PROTOCOLS Networks are designed to share resources and can be as simple as two computers connected together or as complex as over 20 million computers connected together (the Internet). Other devices such as printers, disk drives, a terminal server and communication servers are also connected to networks. There are certain network rules (protocols) that establish rules for all operations within a network and how devices outside of the network must interact.
In particular, the Internet includes client computers that access information and servers that sort and distribute information. A program becomes a client when it sends a request to a server and waits for a response. The client typically runs on a user's computer and facilitates access to information provided by the server (e.g., Netscape and Internet Explorer as WWW clients, xftp and Fetch as FTP clients).
Information is typically transmitted over the network in packets of data and each packet includes source and destination addresses, a packet length and time-to- live information. The packets travel along links that are guided to the destination address by routers that optimize the travel path. After packets reach the destination address, they are reassembled.
Networks are classified by how they operate (e.g., type of connection) and how they are configured (e.g., topology). Circuit-switched networks include a dedicated connection between two computers to guarantee capacity (e.g., the telephone system). Therefore, once a circuit is established, no other network activity will decrease the capacity of the circuit. Unfortunately, these networks are typically costly. On the other hand, packet-switched networks (e.g., the Internet) divide the network traffic into packets to allow concurrent connections among computers. Multiple computers share the network, thereby decreasing the amount of connections. As network activity increases, however, the network resources between the communicating computers decreases.
Star, hierarchical and loop configurations are examples of point-to-point network topologies wherein each computer communicates directly with its neighbor and relies of its neighbor to relay data around the network. Bus and ring configurations are examples of broadcast network topologies wherein a message is placed on the bus or ring containing the name of the intended receiving computer, all computers listen constantly, and if the name is identified by listener, then the message is captured. In this configuration, only one node can broadcast at a time.
Current forms of multimedia place a large burden on the amount of data to be transmitted over the Internet. A measure of this amount of data that can be transmitted from one end of a medium to the other end, or the range of frequencies that can be passed, is the bandwidth. Therefore, the design of the physical layer plays a fundamental role in the amount, and hence the speed of data transmission.
Local Area Networks link several computers in a building and a LAN can itself be linked to other LANs. LANs provide the high speed connections, but cannot span large distances. In this regard, typical LANS are usually bus-based (Ethernet) networks that span a small building and operate between 4 Mbps and 2 Gbps.
Wide Area Networks are not constrained by the physical distance between the endpoints in that they are intended to be used over large distances and operate at speeds between 9.6 Kbps and 45 Mbps. As discussed above, networking protocols establish communication on a lower level of computer communication up to how application programs communicate.
Each step in this protocol is called a layer. The International Standards Organization
(ISO) defines a 1 -layer model for network communication protocol that is formally called the Open Systems Interconnection (OSI) model that exists in networks. Each layer is regarded as a black box with well-defined inputs and outputs, but the particular inner working details of the layer are independent, thereby allowing new versions, updates or better methods to be written without affecting the entire system.
Communication takes place only at the layer appropriate for the task.
The OSI model includes the following 7 layers in the particular top-down ordering:
Figure imgf000008_0001
1. The Physical Layer is the interface between the medium and the device that transmits bits (ones and zeros) and defines how the data is transmitted over the network, what control signals are used and the mechanical properties of the network.
2. The Data Link Layer provides low-level error detection and correction (e.g., if a packet is corrupted, this layer is responsible for retransmitting the packet).
3. The Network Layer is responsible for routing packets of data across the network (e.g., a large e-mail file is divided up into packets and each packet is addressed and sent out in this layer).
4. The Transport Layer is an intermediate layer that higher layers use to communicate to the network layer. This layer hides the complexities of low-level network communication from the higher layers.
5. The Session Layer is the user's (transparent) interface into the network. This layer manages the "current" connection (or session) to the network and maintains the communication flow. 6. The Presentation Layer ensures computers speak the same language by converting text to ASCII or EBCDIC form and encoding/decoding binary data for transport.
7. The Application Layer includes communication between the user's programs (e.g., a file transfer or e-mail program). In particular, there are several network protocols that support specific layers (such as XTP and TP5), and other protocols that are in the form of entire protocol suites (such as the Heidelberg Transport System and Tenet). All of these protocols combine different transmission modes with the possibility of resource reservation. Stream Protocol-II, SPII, and Resource ReServVation Protocol (RSVP) are both transport-level protocols that support guaranteed performance over one-way multicast (point-to-multipoint) communications.
As previously discussed, there is significant pressure to update the Internet from its original point-to-point, best effort-type service to a multimedia-capable network. Next Generation IP is designed to include format provisions for expanded addressing, multicast routing, labeling flows by QoS specifications and security and privacy. Unfortunately, few if any of these provisions have yet been developed or implemented. At best, current networks support a conventional teleconferencing system (160x120, 6-bit gray scale, lOfps, 11.05kHz audio, total bandwidth of 1 megabit/second). Clearly, any new network distributing real-time multimedia over the Internet will have to deliver highly-compressed data with a corresponding low transmission rate, that maintains frame-to-frame synchronization.
A. BUFFERING Once the difficulties associated with bandwidth limitations have been overcome, any good transmission system needs to be able to deal with the finite and possibly dynamic delay between the sender and the receiver. In the case of the most familiar of multimedia data streams, the television signal, engineers need only worry about a small, fixed delay due to the distance between the transmitter and the television set to which the television signal in the form of an electro-magnetic wave that propagates at the speed of light.
For a modern packet switching internetwork, however, the delay is not fixed, and it dynamically varies with the bandwidth utilization that the network is currently enduring (as well as with other variables). When networks become congested, extreme delays of many orders of magnitude greater than that of the average expected delay can result. Within a conventional networked connection, the ideal, timely delivered multimedia data stream is delayed, thereby distorting the presentation of timed sequential data beyond what is acceptable to a user.
Multimedia streaming was developed to overcome, or at least temporarily stave off the effects of varying transmission delays. Multimedia streaming buffers an amount of the data before presenting it to the user. The rate of data output is independent of the input rate, as long as there is enough data in the cache to source the required amount of output. If the input rate begins to lag behind the output rate, eventually there will not be enough data in the cache to support the high output rate, and the data stream runs dry. Depending on the speed of the network that will be used to transport the multimedia data, streaming applications may buffer anywhere from a few seconds of data to a few minutes of data. Designers of streaming applications assume that the reservoir of data is never emptied because the output rate is in excess of the input rate and that the reservoir dries up when the end of the multimedia stream is reached. Although this method of multimedia data presentation offers high-fidelity output at the user's end, it utilizes quite a bit more of the available network bandwidth than is acceptable for what is typically regarded as a "nicely-behaved application." If the streaming protocol is built on top of a fully reliable protocol such as TCP, segments of data that are lost due to congestion will be retransmitted until they have been acknowledged properly, adding to the congestion of the network. Unfortunately, if difficulty is encountered in transmitting a particular segment, the protocol will not attempt to bypass it, and the transmission will become stuck on that segment. In many cases, these difficulties deplete the multimedia reservoir, forcing the incoming network data to completely source the presented data stream. Reliable data transmission protocols such as TCP are not designed to handle these special requirements of real-time applications. These protocols are designed for low bandwidth, interactive applications such as telnet, and potentially high- bandwidth, non-interactive applications such as electronic mail handling and ftp. In this sense, the best-effort transport services inherent to UDP may be better suited for the delivery of multimedia data payloads. In particular, the TCP/IP breaks the data down into a series of packets or frames. Buffers are created to hold the frames and the data contained in them. Due to the inherent asynchronous nature of the network, the buffers must be deployed and de-deployed dynamically from a pre-allocated series of buffer pools or from the global memory pool. The buffers have a variable lifetime depending on various conditions including network throughput, whether the included frame is part of a stream transmission, or whether the packet is a datagram.
A common TCP/IP buffering mechanism ("mbufs") allows for variable length frames. Small amounts of data are placed directly in the data area of the mbuf, but larger amounts of data require mbufs to be extended with clusters. A cluster is a data structure that holds heterogeneous data for extending the data-carrying capability of an mbuf. The mbuf contains header information and the data area that is extendable with the cluster if the frame does not fit into the data area. While these buffering mechanisms help reduce delay and increase synchronization, these protocols still do not support reliable transmission of real-time, multimedia data over a network.
B. TRANSACTION CONTROL PROTOCOL INTERNET PROTOCOL (TCP/IP^
A socket is a way to speak to other programs using standard Unix file descriptors.
Any I O process in Unix programs reads or writes to a file descriptor. A file descriptor is simply an integer associated with an open file. This open file, however, can be a network connection, a FIFO, a pipe, a terminal, an "on-a-disk file", or just about anything else. Therefore, file descriptors are the means by which current programs communicate with other programs over the Internet. In particular, a connection is established by calling a socket system routine that returns a socket descriptor, and a communication is established through it using a set of specialized send and receive socket calls. The send and receive calls offer better control over the data transmission than calling the normal read and write calls directly. There are a variety of different sockets including DARPA Internet addresses
(Internet Sockets), path names on a local node (Unix Sockets), and CCITT X.25 addresses (X.25 Sockets). In the current network environment, two general types of internet sockets include "Datagram Sockets" and "Stream Sockets" (SOCK_STREAM and SOCK_DGRAM, respectively). Datagram sockets are often referred to as "connectionless sockets" and stream sockets are two-way connected communication streams (e.g., if two items are output into the socket in the order "1, 2", they should arrive in the order "1, 2" at the opposite end).
Many current network applications use these stream sockets (e.g., telnet and WWW browsers using the HTTP protocol that, in turn, use stream sockets to fetch pages). These stream sockets achieve a certain level of data transmission quality using the Transmission Control Protocol (TCP: RFC-793). TCP is designed to ensure data arrives sequentially. The Internet Protocol (IP: RFC-791) only handles the Internet routing of the data.
Datagram sockets also use IP for routing, but they use the User Datagram Protocol (UDP: RFC-768) instead of TCP. Characteristic of datagram sockets is the fact that the packets may or may not arrive and they may arrive out of order. If, however, the packet arrives, the data within the packet will not contain errors. The datagram sockets are connectionless because an open connection does not have to be maintained. In particular, a packet is assembled, it is labeled with an IP header with the destination information, and the packet is transmitted without the need for any connection. These sockets are generally used for packet-by-packet transfers of information including tftp and bootp.
Each datagram includes its own protocol on top of UDP. For example, the tftp protocol implements the functionality that for each packet that it sent, the recipient sends back a packet indicating receipt of the packet ("ACK" packet). If the sender of the packet does not receive the ACK packet within a predetermined time period, the sender will retransmit the packet until the ACK packet is received.
In the TCP/IP network, a packet is born and then it is wrapped ("encapsulated") in a header (and maybe footer) by the first protocol (e.g., the tftp protocol), then the entire packet including the header (e.g., a tftp header) is encapsulated again by the next protocol (e.g., UDP), and then again by the next protocol (e.g., IP), then again by the final protocol on the hardware (physical) layer (e.g., Ethernet). Thereafter, when another computer receives the packet, the hardware strips the Ethernet header, the kernel strips the IP and UDP headers, the tftp program strips the tftp header, and then the receiving computer finally has the data.
The current Internet environment uses a system of network functionality called a Layered Network Model based on the 7-layer OSI model discussed above. Using the layered network model, the same socket program is used without regard to how the data is physically transmitted (e.g., serial, thin Ethernet, AU1) because programs on the lower levels deal with these issues. Therefore, the actual network hardware and topology is transparent to the socket programmer. As illustrated in FIGURE 1, the "stack" corresponding to the Layered Network Model includes (1) Application/Presentation/Session; (2) Transport; (3) Network; (4) Data Link; and (5) Physical.
FIGURE 2 illustrates the current implementation of TCP encapsulation including a sequence number field and an acknowledgement number field to maintain the order of the individual datagrams. The state bits (Finish, Synch, Reset, Push, Acknowledge and Urgent) are used by the protocols on both ends to keep track of the state of the connection and manage the connection. The actual data transmission, however, is buffered and asynchronous. Unfortunately, this buffering and asynchronous transmission has resulted in little progress towards delivering real-time, highly-synchronous, streaming multimedia over a network. In order to support multimedia, a system would have to implement timer management associated with the retransmissions to compensate for lost packets and an intelligent, dynamic buffering manager to buffer packets until they are retransmitted or discarded.
The data link layer isolates the layers above from the physical and electrical transmission details. The majority of the implementations of the data link layer for TCP/IP do not implement any mechanism for reliability. The upper part of the data link layer handles the framing details and the interface with the upper layers. The lower part is often referred to as a device driver for the Network Interface Cards (NIC), including device management, interrupt handling and DMA control. FIGURE 3 illustrates the data link layer encapsulation that includes the upper part and the lower part. After prepending the LLC headers in front of the data, the lower half of the link layer picks up the data and sets up the DMA and hardware for frame transmission. The network layer encompasses the Internet domain knowledge, contains the routing protocols, and understands the Internet addressing scheme. In this regard, the IP is the part of the network layer in the stack for handling the routing of packets across network interfaces. Domain naming and address management are also included in this layer. IP further includes a mechanism for fragmentation and reassembly of packets that exceed the data link layer's maximum transmission unit (MTU). The MTU is the maximum packet size that can be transmitted across a physical layer.
FIGURE 4 illustrates the IP layer encapsulation. The IP layer prepends its headers on the front of the data it receives from the transport protocols, establishes the route for the packet according to routing tables, and inserts the appropriate address in the IP header. The IP layer also calculates the checksum and sets the time-to-live (TTL) field.
The transport layer implements sequenced packet delivery known as connection-oriented transfer by incorporating retrying and sequencing necessary to correct for lost information at the lower layers. As discussed above, the transport for IP provides two different protocols: TCP for the connection-oriented transmission, and UDP for connectionless transmission. IP does not make any assumptions about the reliability in the underlying data link and physical layers. If a mechanism for reliable transfer is implemented, it is implemented above the network layer. Associated with the transport layer are (1) sockets; and (2) the Application Programming Interface (API).
In network operating systems, the API defines a standard way of using network features by implementing various end-to-end protocols and interfacing these protocols with application programs. Unlike the interaction between the protocols that is buried deep in the operating system kernel, transport protocols are directly available to application programs through the API.
The session layer was originally intended to keep track of a logged-in user talking to a central time-sharing system (e.g., Telnet). TCP/IP only incorporates protocols through the transport layer, thereby resulting in the session layer not being differentiated from the application layer. Similarly, the presentation layer is included in the application layers and maps the user's view of the data as a structured entity to the lower layer protocol's view as a stream of bytes. Finally, the application layer generally encompasses all applications of TCP/IP networking, including network file systems, web server or browsers, and client/server transaction protocols. Moreover, network management is in the application layer, although there are hooks to the lower layers to gather statistics.
A variation of the Layered Network Model collapses the layers into (1) an Application Layer {telnet, ftp, etc.); (2) a Host-to-Host Transport Layer {TCP, UDP); (3) an Internet Layer {IP and routing); and (4) a Network Access Layer (previously Network, Data Link, and Physical). These layers directly correspond to the encapsulation of the original data.
Therefore, for stream sockets, the data is simply sent out using a send command. For datagram sockets, the packet is encapsulated in any of a variety of methods and sent out using a sendto command. The kernel builds the Transport Layer and Internet Layer on top of the packet and the hardware builds the Network Access Layer. The router strips the packet to the IP header, consults its routing table, and routes the packet.
C. USER DATA PROTOCOL OJDP.
The User Datagram Protocol is becoming an important player in the realm of multimedia protocols because it is essentially an interface to the low-level Internet Protocol, and it offers a fast checksum and I/O multiplexing through Berkeley sockets. In the current development environment, UDP is the choice of many multimedia network developers for applications that cannot be constrained by the flow control mechanism in TCP. In this environment, however, operation without any flow control in place quickly fills the local socket-level buffers, and the UDP datagrams are discarded before they even reach the physical network. In general, applications that generate datagrams faster than the kernel can handle the data result in poor utilization of CPU time and degradations in performance are observed, resulting in larger Round Trip Times (RTTs) and slower network throughputs. On the contrary, protocols such as TCP, with a highly-refined flow control mechanism, attempt to dynamically center in on that optimum transmission rate through the feedback loop formed by data transmission and subsequent data acknowledgment. Given this observation, UDP networks that operate without flow control are an unlikely candidate in the current network environment to handle high- bandwidth, highly-synchronized multimedia data streams. As discussed above, any network that is expected to handle this type of real-time, multimedia data must preserve the linear presentation of timed-sequenced data. Unfortunately, current networks or protocols that attempt to preserve this relationship significantly compromise the overall quality of the image.
In the current implementation of UDP, the data to be sent over the network is wrapped in a layer that is recognized by the delivery service that will be used and contains an address for the destination and an address of the sender. The best-effort delivery service of UDP involves one attempt at delivery and, upon failure, the discarding of the entire data packet.
D. REAL-TIME TRANSPORT PROTOCOL fRTP) RTP is the Internet-standard protocol for the transport of real-time data, including audio and video. It is used for media-on-demand as well as interactive services such as Internet telephony. RTP consists of a data element and a control element (RTCP).
The data element of RTP is a thin protocol providing support for applications with real-time properties such as continuous media (e.g., audio and video), including timing reconstruction, loss detection, security and content identification.
RTCP provides support for real-time conferencing of groups within an Internet, including source identification and support for gateways like audio and video bridges, as well as multicast-to-unicast translators. UDP/IP is RTP's target networking environment, but there have been efforts to make RTP transport- independent so that it can be used over CLNP, IPX or other protocols. Unfortunately, RTP does not address the issue of resource reservation or quality of service control, but relies entirely on resource reservation protocols such as RSVP.
E. ATM NETWORK
Asynchronous transfer mode (ATM) is a high-performance, cell-oriented switching and multiplexing technology that utilizes fixed-length packets to carry different types of traffic. ATM is a technology that enables carriers to capitalize on a number of revenue opportunities through multiple ATM classes of services. Services based on asynchronous transfer mode (ATM) and synchronous digital hierarchy (SDH) and synchronous optical network (SONET) architectures were developed to provide the infrastructure for the evolving multimedia market. Unfortunately, ATM provides little support for multicasting.
ATM technology has its history in the development of broadband ISDN in the 1970s and 1980s. From a technical view, ATM is an evolution of packet switching. Similar to packet switching for data (e.g., X.25, frame relay, transmission control protocol [TCPJ/Internet protocol [P]), ATM integrates the multiplexing and switching functions, and is typically a good match for bursty traffic (in contrast to circuit switching). Additionally, ATM allows communication between devices that operate at different speeds. Unlike packet switching, ATM generally supports high- performance, multimedia networking and has been implemented in a broad range of networking devices including PCs, workstations, server network interface cards, switched-Ethernet and token-ring workgroup hubs, workgroup and campus ATM switches, ATM enterprise network switches, ATM multiplexers, ATM-edge switches, and ATM-backbone switches.
ATM is also a capability that can be offered as an end-user service by service providers (as a basis for tariffed services) or as a networking infrastructure for these and other services. The most basic service building block is the ATM virtual circuit, which is an end-to-end connection that has defined end points and routes, but does not include dedicated bandwidth. Bandwidth is allocated on demand by the network as users have traffic to transmit. ATM also defines the following international standards to meet a broad range of application needs:
1. High performance via hardware switching;
2. Dynamic bandwidth for bursty traffic (e.g., audio is bursty, as both parties are neither speaking at once nor all the time; video is bursty, as the amount of motion and required resolution varies over time); 3. Class-of-service support for multimedia traffic allowing applications with varying throughput and latency requirements to be met on a single network; 4. Scalability in speed and network size supporting link speeds of Tl/El to OC-12 (622 Mbps);
5. Common LAN/WAN architecture allowing ATM to be used consistently from one desktop to another (e.g., traditionally, LAN and WAN technologies have been very different, with implications for performance and interoperability);
6. Opportunities for simplification via switched VC architecture (e.g., specific to LAN-based traffic that is connectionless in nature including billing, traffic management, security, and configuration management); and
7. International standards compliance in central-office and customer- premises environments allowing for multivendor operation.
In ATM networks, all information is formatted into fixed-length cells consisting of 48 bytes (8 bits per byte) of payload and 5 bytes of cell header. The fixed cell size ensures that time-critical information such as voice or video is not adversely affected by long data frames or packets. The header is organized for efficient switching in high-speed hardware implementations and carries payload-type information, virtual-circuit identifiers, and header error check.
The ATM connection standard organizes different streams of traffic in separate calls, thereby allowing the user to specify the resources required and allowing the network to allocate resources based on these needs. Multiplexing multiple streams of traffic on each physical facility (between the end user and the network or between network switches), combined with the ability to send the streams to many different destinations, results in cost savings through a reduction in the number of interfaces and facilities required to construct a network.
ATM standards define (1) virtual path connections (VPCs), which contain (2) virtual channel connections (VCCs). A virtual channel connection (or virtual circuit) is the basic unit, which carries a single stream of cells, in order, from user to user. A collection of virtual circuits can be bundled together into a virtual path connection. A virtual path connection can be created from end-to-end across an ATM network. In this case, the ATM network does not route cells belonging to a particular virtual circuit. All cells belonging to a particular virtual path are routed the same way through the ATM network, thus resulting in faster recovery in case of major failures.
An ATM network also uses virtual paths internally for the purpose of bundling virtual circuits together between switches. Two ATM switches may have many different virtual channel connections between them, belonging to different users. These can be bundled by the two ATM switches into a virtual path connection that serves the purpose of a virtual trunk between the two switches. The virtual trunk is then handled as a single entity by, perhaps, multiple intermediate virtual path cross connects between the two virtual circuit switches. Virtual circuits are statically configured as permanent virtual circuits (PVCs) or dynamically controlled via signaling as switched virtual circuits (SVCs). They can also be point-to-point or point-to-multipoint, thus providing a rich set of service capabilities. SVCs are the preferred mode of operation because they can be dynamically established, thus minimizing reconfiguration complexity. ATM allows the user to specify the resources required on a per-connection basis (per SVC) dynamically. There are the five classes of service defined for ATM (as per ATM Forum UNI 4.0 specification). The QoS parameters for these service classes are as follows:
Figure imgf000020_0001
The technical parameters of ATM include:
Figure imgf000020_0002
Figure imgf000021_0001
Its extensive class-of-service capabilities make ATM the technology of choice for multimedia communications.
As real-time voice services have been traditionally supported in the WAN via circuit-based techniques (e.g., via TI multiplexers or circuit switching), it is natural to map these circuits to ATM CBR PVCs using circuit emulation and ATM adaptation Layer 1 (AALl). There are, however, significant disadvantages in using circuit emulation in that the bandwidth must be dedicated for this type of traffic (whether there is useful information being transmitted or not), thereby providing a disincentive for corporate users to implement circuit emulation as a long-term strategy. For example, a TI 1.544-Mbps circuit requires 1.74 Mbps of ATM bandwidth when transmitted in circuit-emulation mode.
As technology evolves, the inherent burstiness of voice and many real-time applications can be exploited (along with sophisticated compression schemes) to decrease the cost of transmission through the use of VBR-RT connections over ATM.
The addition of more bandwidth-effective voice coding (e.g., standard voice is coded using 64-kbps PCM) is economically attractive, particularly over long-haul circuits and TI ATM interfaces. Various compression schemes have been standardized in the industry (e.g., G720 series of standards). Additionally, making these coding schemes dynamic provides the network operator the opportunity to free up bandwidth under network-congestion conditions. QUALITY OF SERVICE (QoS)
Quality of Service (QoS) is a measure of the ability to control and assign network bandwidth to specific traffic so as to provide predictable levels of IP -based data throughput based on the relative importance of the processes associated with the particular traffic. Traditionally, implementing QoS on the Internet has not been a priority because it has not been needed on LANs where bandwidth is cheap and overprovisioning is relatively easy. With the pressure to deliver real-time multimedia, however, the concept of QoS is increasingly important. Unfortunately, despite current efforts, the current Internet environment does not support QoS. Various applications have different requirements for the handling of their traffic in a network. Applications generate traffic at varying rates and generally require that the network be able to carry traffic at the rate at which they generate it. Additionally, applications are more or less tolerant of traffic delays in the network and of variation in traffic delay. In other words, certain applications can tolerate some degree of traffic loss while others cannot. These requirements are typically expressed using the following QoS-related parameters:
1. Bandwidth - the rate at which an application's traffic must be carried by the network;
2. Latency - the delay that an application can tolerate in delivering a packet of data;
3. Jitter - the variation in latency; and
4. Loss - the percentage of lost data.
If infinite network resources were available, then all application traffic could be carried at the required bandwidth, with zero latency, zero jitter and zero loss. Obviously, network resources are not infinite, thereby resulting in the network resources not always meeting demands. QoS mechanisms control the allocation of the network resources to application traffic in a manner that meets the application's service requirements. MULTIMEDIA DATA COMPRESSION
As previously discussed, synchronized data refers to the digital data streams that are highly correlated in the time domain. The data occurs at regular intervals, contrary to the asynchronous communication between computers and devices.
Figure imgf000023_0001
Figure imgf000023_0002
o Receiver
The consequence of highly-correlated, synchronous data is that the data can be highly compressed due to the redundancies inherent in the data.
Image compression reduces the amount of data necessary to represent a digital image by eliminating spatial and/or temporal redundancies in the image information. Video compression is a four dimensional problem that includes scenes containing an object that continuously changes without jumps and has no edges. There are also scenes where there are cuts that are large jumps in the temporal domain and large jumps in the spatial domain (e.g., edges).
Image compression is generally classified into two categories: (1) lossless and (2) lossy. Loseless compression reduces the amount of image data stored and transmitted without any loss of information, thereby resulting in no image degradation. On the contrary, lossy compression results in information loss, thereby resulting in at least some image degradation. The majority of compression standards use lossy compression to compress image data to fit a set of network constraints (e.g., limited memory for storage of data and/or limited bandwidth available for transmission of data). Lossy compression, however, would be unnecessary if the image data could be compressed enough to meet the network constraints using lossless compression of the data. Lossy compression techniques typically use cosine-type transforms like DCT and wavelet compression that have a tendency to lose high frequency information due to limited bandwidth. Fractal compression also suffers from high transmission bandwidth requirements and slow coding algorithms. The compression of a digital signal reduces the bandwidth required for signal storage or transmission. For example, a high definition television ("HDTV") signal can require as much as 1 billion bits per second. By reducing the amount of data by as much as a factor of fifty (e.g., to 20 million bits), present-day compression techniques facilitate compact storage and real-time transmission of complex signals. There are several compression techniques that are typically applied to transmit video signals through satellite transmission and cable, and enable the storage of the video on compact disk or in a computer memory.
Assuming 5122 number of pixels, 8-bit gray level, and 30 Hz full-motion video rate, a bandwidth of 60 Mbps is required to transmit this data. To compress data into the required data rate of 128 kbps from such a full-video uncompressed bandwidth of 60 Mbps, a 486:1 still image compression rate is required. In the case of VGA full-motion video with 221 Mbps, a 1726:1 compression is required. The 128 kbps is even more difficult to achieve given continuity of communication issues when degradation of power budget or multi-path errors of wireless media further reduce the 128 kbps data rate required for multimedia parallel channelization. Some well-known compression techniques include "JPEG," "MPEG-1," "MPEG-2," "MPEG-4," "H.261," and "H.263." The primary goal of most of these compression techniques is to take an input stream of full-length video or audio, determine redundancies that exist in the signal, and encode those redundancies such that the input signal is compressed to be shorter in length. Compression can be used to eliminate spatial and temporal redundancies. For example, pixel values in a region of an image frame may be converted into information indicating that the region can be reproduced based upon another part of the same image frame or of a previous image frame, respectively. The prior art compression algorithms generally rely on block-based, tile- based, or object-based encoding. One of the image frames is divided into a number of square tiles, and the frame is compressed so that relatively less data is used for the image representation. In a typical image compression, pixels for each tile will be separately compressed to remove either spatial redundancies within the same frame or temporal redundancies between frames.
A digital processing device compares pixels in each tile in one of the frames with image pixels found near the same tile location within the other frame. The digital processing device compares pixels from a reference tile with pixel subsets of a fixed "search window" to determine a "closest match.'' After locating the "closest match", the digital processing device calculates a motion vector and a set of pixel value differences called "residuals." The "search window" for each tile defines a maximum set of boundaries beyond which searching will not be performed for the "closest match."
In particular, a portion of one of the images includes many pixels, wherein a pixel is the smallest element of a picture consisting only of a single color and intensity. An image frame typically consists of hundreds of tiles in both of X and Y directions, and each tile may have, for example, eight rows and eight columns of pixels. Searching for the "closest match" of a data block is conventionally performed within the fixed search window about an expected location of the closest match. Each square subset of contiguous pixels is sequentially compared to the data block and the "closest match" is the particular subset which differs least from the data block. A motion vector identifies the location of the "closest match" with respect to the expected location, and the associated residuals contain pixel-by-pixel differences also arranged in a square tile.
The motion vector and residuals are then encoded in a compact manner, usually through "run-length coding," "quantization" and "Huffman coding." The digital processing device repeats this process for each tile until the entire image frame is compressed.
During later decompression, for example at a network, television station, editing facility, or at an end-viewer's computer or television, the image frame is completely recalculated from an already-decompressed frame by reconstructing each tile using motion vectors and residuals. The various standards mentioned above generally operate in this manner, although some new standards call for subdividing . images into variable size objects instead of tiles. The underlying compression principles, however, are similar. A. MPEG-1 MPEG compression results from the desire to distribute and display motion pictures in digital form, such as by means of a computer system. MPEG ("Motion Pictures Experts Group") compression standards for motion picture video data prescribe a set of variable-length structures for encoding motion picture video and audio data, which is read by and interpreted by a digital proce^or so as to display motion pictures. Established in 1988, the MPEG working group (formally known as ISO/IEC JTC1/SC29/WG11) is part of JTCl, the Joint ISO/IEC Technical Committee on Information Technology and was founded by Leonardo Chiariglione.
MPEG achieves a high compression rate by storing only the changes from one frame to another, instead of each entire frame. The video information is then encoded using the DCT technique. MPEG uses a type of lossy compression, since some data is removed, but the diminishment of data is generally imperceptible to the human eye. The two major MPEG standards are: MPEG-1 and MPEG-2. The most common implementations of the MPEG-1 standard provide a video resolution of 352x240 at 30fps. This produces video quality slightly below the quality of conventional VCR videos. A second standard, MPEG-2, offers resolutions of 720x480 and 1280x720 at
60 fps, with full CD-quality audio. This is sufficient for all the major TV standards, including NTSC, and even HDTV. MPEG-2 is often used by DVD-ROMs. MPEG-2 can compress a 2 hour video into a few gigabytes.
The MPEG video compression algorithm employs two basic techniques: block-based motion compensation for the reduction of the temporal redundancy, and transform domain (DCT) coding for the reduction of spatial redundancy. The motion compensation technique is applied both in the forward (causal) and backward (non- causal) direction. The remaining signal (prediction error) is coded using the transform-based technique. The motion predictors (e.g. the motion vectors discussed above) are transmitted together with the spatial information.
The MPEG-2 standard uses the same set of algorithms as MPEG-1, and has an additional support for interlaced video sources and scalability options. Although there are minor differences in the syntax, the MPEG-2 standard is conceptually a super-set of MPEG-1.
B. TEMPORAL REDUNDANCY REDUCTION
To support random access to the stored video while exploring the maximum redundancy reduction using temporal predictions, three types of pictures are defined in MPEG: intra ( I) pictures, predicted ( P) pictures, and bidirectionally interpolated ( B) pictures. / pictures provide access points for random access, but only with a moderate compression. P pictures are coded with reference to a previous picture, which can be either an I or P picture. B pictures are intended to be compressed with a low bit rate, using both the previous and future references. The B pictures are never used as the references. The MPEG standard does not impose any limit to the number of B pictures between the two references.
The I-frame is sent every fifteen frames regardless of video content. This introduction of I-frame asynchronously into the video bitstream in the encoder is wasteful and introduces artifacts because there is no correlation between the I-frames and the B and P frames of the video, thereby resulting in wasted bandwidth. In particular, if an I-frame has been inserted into B and P frames containing no motion, bandwidth is wasted because the I-frame was unnecessary.
On the other hand, if an I-frame is not inserted where there is significant motion in the video bitstream, significant errors and artifacts are introduced, thereby exceeding bandwidth and resulting in a blocking effect in the video image. If, however, an I-frame is inserted where there is motion, the B and P frames will already be correlated to the new motion sequence and the video image will not suffer from image degradation. Using standard MPEG compression, the probability of the I- frame being inserted where there is significant motion is relatively low as compared with implementing a compression technique that purposely inserts the I-frame as warranted by video content.
C. MOTION COMPENSATION Motion-compensated prediction assumes that the current picture can be locally modeled as a translation of the pictures of some previous time. In the MPEG standard, each picture is divided into blocks of 16 x 16 pixels, called a mαcroblock. Each macroblock is predicted from the previous or future frame, by estimating the amount of the motion in the macroblock during the frame time interval. The MPEG syntax specifies how to represent the motion information for each macroblock. It does not, however, specify how such vectors are to be computed. Due to the block- based motion representation, many implementations use block-matching techniques, where the motion vector is obtained by minimizing a cost function measuring the mismatch between the reference and the current block. Although any cost function can be used, the most widely-used choice is the absolute difference (AE).
To find the best matching macroblock producing the minimum mismatch error, AE is calculated at several locations in the search range. The conceptually simplest, but the most computationally-intensive search method, known as the full search or exhaustive search, evaluates the AE at every possible pixel locations in the search area. To lower the computational complexity, several algorithms with a reduced number of search points have been developed. One such algorithm is Three- Step-Search (TSS). This algorithm first evaluates the AE at the center and eight surrounding locations of a 32 x 32 search area. The location that produces the smallest AE then becomes the center of the next stage, and the search range is reduced by half. This sequence is repeated three times.
D. SPATIAL REDUNDANCY REDUCTION
For the reduction of spatial redundancy in each /picture or the prediction error in P and B pictures, the MPEG standard employs a DCT-based coding technique. The two-dimensional DCT is separable, and it can be obtained by performing one- dimensional DCTs on columns and one-dimensional DCTs on rows. An explicit formula for the 8 x 8 two-dimensional DCT can be written in terms of the pixel values and the frequency-domain transform coefficients.
The transformed DCT coefficients are then quantized to reduce the number of bits to represent them and also to increase the number of zero-value coefficients. A combination of quantization and run-length coding (described below) contributes to most of the compression. A uniform quantizer is used in the MPEG standard, with a different step size for each DCT coefficient position. Since the subjective perception of the quantization error varies with the frequency, higher frequency coefficients are quantized more coarsely, using a visually-weighted step-size. In addition, different quantization matrices are used for intra-coded and inter-coded blocks, since the signal from intra-coding has different statistical characteristics from the signal resulting from prediction or interpolation. Intra-coded blocks contain energy in all frequencies and are likely to produce blocking artifacts if too coarsely quantized. On the other hand, blocks coded after the motion prediction contain predominantly high frequencies and can be subject to much coarser quantization. E. ENTROPY CODING The quantized DCT coefficients are then rearranged into a one-dimensional array by scanning them in a zig-zag order. This rearrangement puts the DC coefficient at the first location of the array and the remaining AC coefficients are arranged from the low to high frequency, in both the horizontal and vertical directions. The assumption is that the quantized DCT coefficients at higher frequencies would likely be zero, thereby separating the non-zero and zero parts. The rearranged array is coded into a sequence of the run-level pair. The run is defined as the distance between two non-zero coefficients in the array. The level is the non-zero value immediately following a sequence of zeros. This coding method produces a compact representation of the 8 x 8 DCT coefficients, since a large number of the coefficients have been already quantized to zero value.
F. MPEG-2
One of the motivations for creating the MPEG-2 standard was to introduce the support for interlaced video sources. Since the MPEG-1 standard was targeted at the bit rate of around 1.5 Mbits/s, it was assumed that the source video signal will be digitized at around 352 x 240 for 60 Hz systems (e.g., in U.S.) and 352 x 288 for 50 Hz systems (e.g., in Europe). The standard video signals carry twice the scan lines as the above sampling rates, with an interlaced scanning order. Therefore, the simplest way of creating a half-size digital picture was simply sampling only one field from each frame. The other field was always discarded. Since only one field from every frame is used, these sampled fields form a progressively-scanned video sequence. Therefore, MPEG-1 only addresses the coding parameters and algorithms for progressively-scanned sequences. However, it should be noted that the syntax of the MPEG-1 standard does not constrain the bit rate or the picture size to any such values. As MPEG-2 is targeted for coding broadcast-quality video signals, it is necessary to digitize the source video at its full bandwidth, resulting in both even and odd field pictures in the sequence. Since these two fields are separated by a time interval, coding the sequence using the MPEG-1 algorithm does not produce good quality pictures as MPEG-1 assumes that there is no time difference between successive lines in the picture. The MPEG-2 standard provides a means of coding interlaced pictures by including two field-based coding techniques: field-based prediction and field-based DCT.
In MPEG-2, the term picture refers to either a frame or a field. Therefore, a coded representation of a picture may be reconstructed to a frame or a field. During the encoding process, the encoder has a choice of coding a frame as one frame picture or two field pictures. If the encoder decides to code the frame as field pictures, each field is coded independently of the other, i.e., two fields are coded as if they were two different pictures, each with one-half the vertical size of a frame.
In frame pictures, each macroblock can be predicted (using motion compensation) on a frame or field basis. The frame-based prediction uses one motion vector per direction (forward or backward) to describe the motion relative to the reference frame. In contrast, field-based prediction uses two motion vectors, one from an even field and the other from an odd field. Therefore, there can be up to four vectors (two per direction, and forward and backward directions) per macroblock. In field pictures, the prediction is always field-based, but the prediction may be relative to either an even or odd reference field.
Independent of the prediction mode, each macroblock in a frame picture can be coded using either frame-based or field-based DCT. The frame-based DCT is same as the DCT used in MPEG-1. The field-based DCT, however, operates on alternating rows, i.e., rows from the same field are grouped to form an 8 x 8 block.
SUMMARY OF THE INVENTION It is an object of this invention to provide a multimedia network including a sensor network, a communication bridge and a user network (e.g., the Internet). The sensor network includes a set of interconnected sensors coupled to a control module. The control module receives a set of sensed data from the sensors and generates a homogenized data stream based on the sensed data. The communication bridge is coupled to the sensor network and buffers the homogenized data stream. The user network is coupled to the communication bridge and receives the homogenized data stream from the sensor network. The user network transmits data back to the control module through the communication bridge.
Yet another object of this invention is to provide a method for providing multimedia data over a network including the steps of processing a set of multimedia information including a set of temporal data and a set of spatial data, compressing the set of temporal data and the set of spatial data, and interpreting the set of spatial data from the set of temporal data.
Another object of this invention is to provide a multimedia sensor network configured to integrate temporal data with spatial data. The network includes a plurality of sensors configured to generate multimedia data, and a processor. The processor processes, compresses and transmits the multimedia data. The processor includes an encoder coupled to a local area network, wherein the local area network transmits compressed temporal data through a first communication channel and the compressed spatial data through a second communication channel.
Another object of this invention is to provide a network including a sensor network, an intelligent compression module, a communication bridge and a user network. The sensor network includes a plurality of interconnected sensors coupled to a control module, wherein the control module receives a set of sensed data from the plurality of sensors including a set of temporal data and generates a homogenized data stream based on the sensed data. The intelligent compression module is coupled to the sensor network and a set of spatial data is interpreted from the set of temporal data. The communication bridge is coupled to the sensor network and buffers the homogenized data stream received from the sensor network. The user network is coupled to the communication bridge, receives the homogenized data stream from the sensor network and transmits a set of input data to the control module through the communication bridge.
Another object of this invention is to provide a multimedia network including a sensor network, an intelligent compression module, a communication bridge, and a user network. The sensor network includes a plurality of interconnected sensors coupled to a control module, wherein the control module receives a set of sensed data from the plurality of sensors including a set of temporal data and generates a data stream based on the sensed data. The intelligent compression module is coupled to the sensor network and a set of spatial data is interpreted from the set of temporal data. The communication bridge is coupled to the sensor network and includes a buffer manager to buffer the data stream received from the sensor network and a quality of service manager to guarantee a particular bandwidth for the transmission of the data stream. The user network is coupled to the communication bridge, receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge.
Another object of this invention is to provide a multimedia network including a sensor network, a communication bridge and a user network. The sensor network includes a plurality of interconnected sensors coupled to a control module, wherein the control module receives a set of sensed data from the plurality of sensors including a set of temporal data and generates a data stream based on the sensed data. The communication bridge is coupled to the sensor network and includes a buffer manager to buffer the data stream received from the sensor network and a quality of service manager to guarantee a particular bandwidth for the transmission of the data stream. The user network is coupled to the communication bridge, receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge.
Another object of this invention is to provide a tracking network including a sensor network, a communication bridge and a user network. The sensor network includes a plurality of interconnected sensors coupled to a control module, wherein the plurality of sensors track a moving object in a monitoring area, and the control module receives a set of sensed data from the plurality of sensors and generates a data stream based on the sensed data. The communication bridge is coupled to the sensor network and buffers the data stream received from the sensor network. The user network is coupled to the communication bridge, receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge.
Yet another object of this invention is to provide a tracking network including a sensor network, an intelligent compression module, a communication bridge and a user network. The sensor network includes a plurality of interconnected sensors coupled to a control module, wherein the plurality of sensors track a moving object in a monitoring area, and the control module receives a set of sensed data including a set of temporal data from the plurality of sensors and generates a data stream based on the sensed data. The intelligent compression module is coupled to the sensor network and a set of spatial data is interpreted from the set of temporal data. The communication bridge is coupled to the sensor network and buffers the data stream received from the sensor network. The user network is coupled to the communication bridge, receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge. Another object of this invention is to provide a tracking network including a motion detection network, a communication bridge and a user network. The motion detection network includes a plurality of interconnected sensors coupled to a control module, wherein the plurality of sensors track at least one moving object in a monitoring area, and the control module receives a set of sensed data including a set of temporal data from the plurality of sensors, and generates a data stream based on the sensed data. The communication bridge is coupled to the motion detection network and buffers the data stream received from the motion detection network. The user network is coupled to the communication bridge, receives the data stream from the motion detection network and transmits a set of input data to the control module through the communication bridge. The control module receives at least a set of location coordinates corresponding to the at least one moving object from a first sensor and transmits the set of location coordinates to a second sensor that tracks the at least one moving object.
Yet another object of this invention is to provide a method of tracking at least one moving object in a monitoring area including the steps of testing a plurality of interconnected sensors coupled to a control module in a motion detection network to determine if any of the plurality of sensors is activated based on movement of the at least one moving object, processing a set of sensed data, including a set of temporal data, from a first sensor to calculate a set of location coordinates associated with each of the at least one moving objects, tracking each of the at least one moving objects with a second sensor based on the set of location coordinates associated with each of the at least one moving objects, transmitting a second set of data from the second sensor to an object recognition module coupled to the control module to determine if each of the at least one moving objects is in a set of significant objects, and transmitting the second set of data from the second sensor to a user network through a communication bridge. Another object of this invention is to provide a method of tracking at least one moving object in a monitoring area including the steps of testing a plurality of interconnected sensors coupled to a control module in a motion detection network to determine if any of the plurality of sensors is activated based on movement of the at least one moving object, processing a set of sensed data, including a set of temporal data, from a first sensor to calculate a set of location coordinates associated with each of the at least one moving objects, tracking each of the at least one moving objects with a second sensor based on the set of location coordinates associated with each of the at least one moving objects, transmitting a second set of data from the second sensor to an object recognition module coupled to the control module to determine if each of the at least one moving objects is in a set of significant objects, intelligently compressing the second set of data by interpreting a set of spatial data from the set of temporal data, and transmitting the compressed second set of data from the second sensor to a user network through a communication bridge.
Another object of this invention is to provide a method of fusing data in a sensor network including the steps of approximating an initial draft of a set of fuzzy rules corresponding to a set of sensed data from a plurality of interconnected sensors coupled to a control module, mapping the initial draft of the set of fuzzy rules to a location and a curve of a set of membership functions, fine-tuning the location of the set of membership functions for optimal performance of the set of fuzzy rules using a neural network, submitting a set of training data to a fuzzy rule base and the neural network, generating a set of initial fuzzy membership functions using the neural network, submitting the set of initial fuzzy membership functions to the fuzzy rule base, generating an actual output from the fuzzy rule base, comparing the actual output with a desired output contained in the set of training data, adjusting a set of neural network weights, thereby adjusting the set of membership functions, and presenting the adjusted set of membership functions to the fuzzy rule base until a difference between the actual output and the desired output is below a predetermined minimum threshold value.
Yet another object of this invention is to provide a network including a sensor network, a gateway software agent, a host software agent, a communication bridge, and a user network. The sensor network includes a set of local area sensors, a set of middle area sensors, and a set of wide area sensors coupled to a control module. The control module receives a set of sensed data from the set of local area, the set of middle area and the set of wide area sensors, and generates a data stream based on the set of sensed data. The gateway software agent is coupled to the set of middle area sensors, intelligently filters a contextual meaning from the sensed data and determines whether the sensed data is meaningful. The host software agent is coupled to the set of wide area sensors and collects, processes and transmits the sensed data to the control module. The communication bridge is coupled to the sensor network and buffers the data stream received from the sensor network. The user network is coupled to the communication bridge, receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge. Finally, another object of this invention is to provide a network including a sensor network, a communication bridge and a user network. The sensor network includes a set of local area sensors, a set of middle area sensors, and a set of wide area sensors coupled to a control module. The control module receives a set of sensed data from the set of local area, the set of middle area and the set of wide area sensors, and generates a data stream based on the set of sensed data. The communication bridge is coupled to the sensor network and buffers the data stream received from the sensor network. The user network is coupled to the communication bridge, receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge. Each of the set of local area sensors, the set of middle area sensors, and the set of wide area sensors monitors a limited region of a monitoring area, and a portion of each the limited regions overlaps with the limited region corresponding to each of the set of local area, the set of middle area, and the set of wide area sensors.
BRIEF DESCRIPTION OF THE DRAWINGS
Preferred exemplary embodiments of the invention are illustrated in the accompanying drawings in which like reference numerals represent like parts throughout, and in which:
FIGURE 1 is a diagram of a Layered Network Model according the prior art; FIGURE 2 is a diagram of TCP encapsulation according to the prior art; FIGURE 3 is a diagram of the Data Link Layer encapsulation according to the prior art; FIGURE 4 is a diagram of the IP layer encapsulation according to the prior art;
FIGURE 5 is a diagram of a multimedia network system according to the present invention;
FIGURE 6 is a diagram of the sensors and the Internet in the multimedia network system according to the present invention; FIGURE 7 is a diagram of the topography of a sensor network according to the present invention; FIGURE 8 is a diagram of a Gateway Software Agent (GSA) according to the present invention;
FIGURE 9 is a diagram of a network of Gateway Software Agents according to the present invention; FIGURE 10 is a flowchart of the transmitter flow of the sensor network according to the present invention;
FIGURE 11 is a flowchart of the receiver flow of the sensor network according to the present invention;
FIGURE 12 is a diagram of the hardware configuration for performing meaningful I-frame insertion according to the present invention;
FIGURE 13 is a flowchart of the error accumulation procedure in the motion estimation according to the present invention;
FIGURE 14 is a diagram of a system for encoding multiple channels of video data according to the present invention; FIGURE 15 is a diagram of the multimedia network system according to the present invention;
FIGURE 16 is a diagram of a LAN design of the multimedia network system according to the present invention;
FIGURE 17 is a diagram of a single-channel encoder with a communication interface according to the present invention;
FIGURE 18 is a diagram of a decoder with a communication interface according to the present invention;
FIGURE 19 is a diagram of the frame data content of an original frame, a compressed meaningful I-frame at low compression, a compressed I-frame at high compression in the MPEG stream, and compressed MPEG daughter frames according to the present invention;
FIGURE 20 is a diagram of a frame cycle corresponding to MPEG frames according to the prior art;
FIGURE 21 is a diagram of standard MPEG time performance according to the present invention;
FIGURE 22 is a diagram of time performance related to inserting an I-frame at the transition point between scenes according to the present invention; FIGURE 23 is a diagram of the transmission of only the meaningful I-frames at a low compression rate according to the present invention;
FIGURE 24 is a flowchart of the meaningful I-frame insertion procedure according to the present invention; FIGURE 25 is a diagram of a circular buffer according to the pres .i invention;
FIGURE 26 is a diagram of the memory of a circular buffer according to th^ present invention;
FIGURE 27 is a flowchart of the encoder process implementing buffer management according to the present invention; FIGURE 28 is a flowchart of the decoder process implementing buffer management according to the present invention;
FIGURE 29 is a flowchart of the buffer write process according to the present invention;
FIGURE 30 is a flowchart of the buffer read process according to the present invention;
FIGURE 31 is a diagram of a neural network for a tracking system according to the present invention;
FIGURE 32 is a diagram of a tracking module according to the present invention; FIGURE 33 is a flowchart of the tracking process in the sensor network according to the present invention;
FIGURE 34 is a flowchart of the object identification process in the sensor network according to the present invention; and
FIGURE 35 is a diagram of the tracking of separate objects in the sensor network according to the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
MULTIMEDIA NETWORK SYSTEM
Referring to FIGURE 5, a multimedia network system 10 includes an intelligent, integrated sensor network 12 having a variety of sensors including a set of Local Area Sensors 14 (LAS), a set of Middle Area Sensors (MAS) 16, and a set of Wide Area Sensors (WAS) 18.
Sensor network 12 is coupled to a sensor fusion defect module 20, a tracking module 22, and a compression module 24. Compression module 24 includes a bit selective error correction module 26 and intelligent agent module 28. Sensor network 12 is coupled to a bridge 30 over a communication line 32. Bridge 30 includes a QoS Manager 34 and a Buffer Manager 36. Bridge 30 connects a highly synchronous, real-time TCP/IP -packetized multimedia data stream on line 32 from a sensor network 12 to an asynchronous user network 38 (e.g., the Internet) over a communication line 40.
Buffer Manager 36 intelligently and interactively changes the buffering settings of the data buffers in response to the network and process conditions, thereby minimizing latency in the network while maintaining frame-to-frame synchronization of the data. Buffer Manager 36 further minimizes the use of memory while maintaining the asynchronous communication in network 38 and over lines 32 and 40. Similarly, QoS Manager 34 reduces latency in network 38 while maintaining the quality of the full-frame multimedia data stream. In this regard, low-level implementation of TCP/IP in the present invention includes modifying the transport layer by implementing a Buffer Manager API and a QoS Manager API (as opposed to the standard Microsoft™ Winsock APIs) to optimally configure and manage the NIC transmitter (encoder) and NIC receiver (decoder) buffers on the physical layer. Accordingly, as opposed to simply modifying the application layer, the present invention operates on the layers below the application layer (e.g., the transport and physical layers) to ensure synchronous transmission of data in an asynchronous network environment. In general, bridge 30 provides full-frame, multimedia data to network 38 by synchronizing and integrating the synchronous multimedia data streams and the asynchronous network streams. In addition to real-time multimedia data, data that is transmitted on lines 32 and 40 also includes a variety of sensor data such as temporal data (video), spatial data (still imagery), audio data, sensor telemetry, infrared data, control data, messages, laser data, magnetic data, thermal data, seismic data, motion data, and chemical data. Accordingly, data transmitted by sensor network 12 to Internet 38 may comprise one-dimensional audio data or one-dimensional sensor data, as opposed to the "streaming" video and audio that is also transmitted to Internet 38. In the case of streaming video and audio, any delay in transmission is even more critical than the exact transmission of packets on lines 32 and 40. WAS 18 in network 12 is mounted on a mobile platform and configured to transmit data using wireless communication signals.
As illustrated in FIGURE 6, each of a plurality of sensors 42 in sensor network 12 are TCP/IP-addressed, thereby allowing a user at a remote location to connect to Internet server 38 and directly communicate with any of sensors 42. In one preferred embodiment of the present invention, sensors 42 are high-resolution cameras that can be placed along the U.S. /Mexican border, and a government agent sitting in an office in Washington D.C. can connect to Internet server 38 and select any of sensors 42 to view real-time video of activity being recorded by cameras 42. FIGURE 7 illustrates the hierarchic, "molecular" sensor network architecture of sensor network 12. Network 12 includes LAS sensors 14, MAS sensors 16 and WAS sensors 18, wherein a communication link 44 connects WAS sensors 18 together, a communication link 46 connects WAS sensors 18 to MAS sensors 16, a communication link 48 connects MAS sensors 16 to LAS sensors 14, and a communication link 54 connects WAS sensors 18 to a control module 56. An emergency communication link 50 connects LAS sensors 14 to each other and an emergency communication link 52 connects MAS sensors 16 to each other.
In the preferred embodiment of the present invention, communication links 44, 46, 48, 50 and 52 are wireless communication links and communication link 54 is a satellite relay to control module 56. LAS sensors 14 and MAS sensors 16 each have omnidirectional antennae, thereby simplifying network reconfiguration and sensor relocation. MAS sensors 16 and WAS sensors 18 include stand-alone, highly- distributed 8 billion operations per second (BOPS) supercomputer-path processing power for the reduction of intra- and inter-network communication bandwidth, information overflow and for supporting redundancy and the self-healing capability of network 12. Compression module 24 (FIGURE 5) includes specific graphic ICs that perform simple arithmetic operations on 256 parallel processors at 8 BOPS performing application-specific integrated circuit (ASIC) operations.
All sensor data in sensor network 12, including video, is homogenized into TCP/IP -packetized streams, thereby significantly simplifying the real-time multimedia data transfer over network 38 (e.g., the Internet). The homogenization, or fusing of sensor data, is the joining together of data from a variety of different sensors (e.g., a sensor pair fusing of vision and GPS for autonomous navigation). The sensor fusion of data is described in detail in the invited paper "Soft Computing and Soft Communication For Synchronized Data," the Society of Photo-Optical Instrumentation Engineers Proceedings, Vol. 3812, pp. 55-67 (1999), incoφorated herein by reference. Moreover, as illustrated by communication links 44, 46, 48, 50 and 52, sensor network 12 is a highly redundant system with a self-healing and configurable architecture that is extremely tolerant to jamming, multi-path errors, and sensor failure with electromagnetic interference immunity in a frequency hopping spread spectrum wireless or fiber sensor communication network.
Sensor network 12 operates in a nearly-autonomous mode based on specialized compression in compression module 24 implementing bit selective eπor correction 26 and using intelligent agent module 28. In effect, sensor network 12 is insensitive to the physical communication layer because of the multiple options for communication paths between any two nodes on network 12 and the choice of a data routing network protocol that is redundant. In an alternative embodiment of the present invention, sensor data is optimized for a non-TCP protocol (e.g., TCP/IP with elements of ATM packetizing, UDP, etc.).
The high level of intranetworking (e.g., QoS low-latency information transmission and processing) is based on compression module 24 in which all types of multimedia data (including TV-grade video) are compressed into packetized digital streams up to 4000:1 (VGA) and negotiated through a Gateway Software Agent 58 (FIGURE 8). This intelligent, high compression ratio further allows data rates to be as low as a few kilobits per second - even for imagery. For even further data protection, bit selective eπor coπection module 26 performs bit coπection at the TCP level as well as further hierarchic lossless and lossy data compression. The minimum latency (100ms or lower) is based on a low transmission bandwidth and bit-selective eπor coπection, thereby reducing buffering. Sensor network 12 is autonomously adaptable and tolerant to sensor failure, insofar that network 12 relies on progressive accuracy and automatic sensor network reconfigurability to compensate for sensor failure. Sensor fusion defect module 20 processes any "requests for help" from sensor network 12 in the near-autonomous mode. In the prefeπed embodiment of the present invention, sensor network 12 is autonomous 90% of the time and only at critical moments will sensor fusion defect module 20 alert an operator for assistance in the form of 3-D teleoperation at WAS sensor 18.
In particular, the "request for help" from sensor network 12 that is processed by sensor fusion defect module 20 is in response to contradictory data acquired from stationary and/or mobile sensors. For example, a GPS system transmits satellite data of visual landmarks from a vertical point of view and this data is translated to a horizontal view using different techniques such as template matching. If the satellite data contradicts data being transmitted in network 12 from other sensors relating to the same visual landmarks, sensor fusion defect module processes a "request for help" to resolve the contradictory information.
Additionally, sensor network 12's 8 BOPS of processing power eliminates the power constraints of conventional small communication platforms, thereby making sensor network 12 data-centric instead of node-centric. Sensor network 12 is only limited by the fact that each sensor has a limited (but overlapping) view of a scene or a limited (but overlapping) monitoring area. In this sense, each sensor in network 12 synthesizes a partial scene based on its own data and that of its two closest neighbors. These data overlaps create redundancy that, in turn, increases the robustness of sensor network 12.
A. SENSOR NETWORK ARCHITECTURE As introduced in reference to FIGURES 5 and 7, intelligent sensor network 12 includes the following categories of sensors interconnected through various communication links:
1. "Local" or "Point" Sensors (LAS) 14 such as magnetic, simple seismic, simple chemical, temperature, and wind sensors that detect locally (e.g., reading 1 m/s and 10Hz vibration), etc.;
2. "1-D" or "Middle Area Sensors" (MAS) 16 such as voice, spectrometers, x-rays, and complex chemical sensors that are non-imaging sensors of characteristics such as intensity vs. wavelength; and
3. "2-D" or "Wide Area Sensors" (WAS) 18 such as video, forward-looking infrared, imaging radar, and complex seismic sensors that include 2-D imagery, 3-D volumetric renderings, as well as 4-D video (3-D spectral video sequences) and higher dimensionality (e.g., hyperspectral video).
In the prefeπed embodiment of the present invention, sensor fusion of data from a large number of heterogeneous sensors 14, 16 and 18 in sensor network 12 is based on neuro-fuzzy processing. Fuzzy rules are generally derived based on human experts' knowledge and observation. This process of deriving the fuzzy rules is labor intensive and it may be impossible to retain the accuracy of the fuzzy rules as the number of sensors increases and new types of sensors are added. On the contrary, sensor network 12 is preferably a system based on neuro-fuzzy processing that relies on human experts approximating initial drafts of the fuzzy rules and then mapping the rules to the location and curve of the membership functions. The neural network in network 12 fine-tunes the location of the membership functions to ensure the optimal performance of the fuzzy rules.
In particular, training data is submitted to both the rule base and the neural network. The neural network presents the initial fuzzy membership functions to the fuzzy rule base. Using these functions, the fuzzy rule base generates an actual output that is compared with the desired output contained in the training data. The training algorithm changes the neural network weights, thus adjusting the membership functions. The new functions are then presented to the rule base, and the process repeats until the difference between the actual and the desired outputs is minimized. Additional sets of training data are iteratively applied until the final membership function parameters, and therefore the membership function shapes, converge to final values. Based on the definition of the membership functions, the operation of the fuzzy rule base in the present invention closely mimics the operation represented by the training data.
The 4000: 1 compression ratio realized with compression module 24, and consequent bandwidth reduction, supports a highly redundant architecture as described above. This level of compression opens up additional channels, thereby offering sensor network 12 additional communication channels for situations that require a rerouting of the data or additional bandwidth. For example, if one of sensors 14, 16 or 18 is broken, open channels provide redundant communication paths for the data to bypass the broken sensor.
The natural network hierarchy of sensor network 12 is derived from the fact that the hierarchic level associated with each LAS sensor 14, MAS sensor 16 and
WAS sensor 18 defines its communication platform level. For example, LAS sensor 14 does not require digitization (or it can use low bandwidth digitization), thereby transmitting analog data to MAS sensor 16 over standard analog RF channels that require a few cubic centimeters of hardware and drawing around 0.1 W of power. These types of sensors are primarily in "sleep" mode and are activated by a "wakeup" signal from MAS sensor 16. In an alternative embodiment of the present invention, LAS sensor 14 includes sophisticated analog video.
In order to reduce bandwidth, MAS sensor 16 requires not only digitization but Gateway Software Agents 58 as illustrated in FIGURE 8. Gateway Software Agents 58 are self-organized fuzzy controllers that operate with a million operation per second (MOPS) processing power. Gateway Software Agent 58 includes an analog-to-digital converter 60, a template matching block 62 coupled to a filter bank 64, a decision making block 66 and a communication interface (CI) 68. In the prefeπed embodiment of the present invention, communication interface 68 is a "Harris prism" or some other common CI. In the alternative, template matching block 62 may be replaced with a similar target/signature recognition system. Template matching block 62 receives digitized input data from converter 60 and digitally cross-coπelates the sample signal with a set of filter signals from filter bank 64, thereby generating a coπelation peak D. In decision making block 66, the magnitude of D is compared with a predefined threshold value T. In the alternative, the magnitude of D is compared with multiple threshold values in an advanced fuzzy logic system. If the coπelation peak D is greater than or equal to the threshold value T, then a positive decision is sent to communication interface 68 and transmitted to WAS sensor 18. The output signal is organized as a simple fuzzy sentence logic such as "audio data indicates human voice making it highly probable that a person is passing no farther than 100 mfrom the border". MAS sensor 16 also collects and digitizes analog signals from LAS sensors 14 and formulates the output in a fuzzy sentence as described above.
Therefore, the molecular architecture of sensor network 12 makes information filtering and transfer hierarchic. Moreover, sensor information filtering is performed intelligently — meaning within the "contextual meaning" of the information, thereby significantly reducing bandwidth because a fuzzy sentence output requires only a few bits to transmit. On the contrary, digitized input data requires at least a few kilobits of bandwidth.
WAS sensor 18 communication, however, is very complex as compared to LAS sensor 14 and MAS sensor 16 processing. As discussed above, WAS sensor 18 requires 8 BOPS of processing power or even higher because the full frames with Mbits or Gbits of data must be evaluated in real-time to maintain real-time operation. In the prefeπed embodiment of the present invention, each WAS sensor 18 includes a processor with 8 BOPS implanted on a 3 inch x 2 inch x 0.5 inch printed circuit board. The complexity of processing power required for WAS sensor 18 is derived from template matching in 2-D space and processing a full-image frame of approximately 7 Mb of data in a fraction of a second. Each WAS sensor 18 collects data from MAS sensor 16 and transmits the data over communication links 54 to control module 56.
B. INTELLIGENT COMPRESSION Sensor network 12 reduces bandwidth using compression module 24 by implementing simple compression relying on redundancy or intelligent compression based on intelligent or software agents similar to Gateway Software Agent 58.
Standard MPEG compression is based on a simple subtraction operation in that the MPEG method includes a mother frame and daughter frames that represent a pixel-by-pixel difference between the mother frame and a subsequent frame. As discussed above, other prior art compression standards also eliminate redundancy in a similar manner, but these methods do not filter out important information from the frames. Performing a simple operation like subtraction at high speeds leads to operating with parallel systems. On the other hand, if a network is processing different operations, the network will suffer from internal latency problems. In the prefeπed embodiment of the present invention, however, compression module 24 includes significant processing power to calculate a simple subtraction operation numerous times. This processing power combined with the simple subtraction processing results in network system 10 evaluating a full-frame of data within 3 frames latency (90ms). Therefore, any latency issues in network system 10 are primarily based on operating with parallel systems.
Similarly, in an alternative embodiment of the present invention, compression module 24 of sensor network 12 includes a different compression algorithm such as template matching. Template matching is based on pattern recognition techniques that include template-by-template (or pixel-by-pixel) comparisons to stored data in a database. Sensor network 12 is configured to implement template matching based on a simple comparison of data, similar to the simple subtraction in MPEG. In standard networks, however, template matching is an extremely slow process, thereby motivating users to turn to Fourier processing. Unfortunately, Fourier processing is complex and not invariant to off-plane movement (rotation).
Gateway Software Agent 58 in FIGURE 8 implements template matching of the characteristics of multidimensional distributions. As explained above, Gateway Software Agent 58 is simply a "gateway" in that it only decides whether the data information is meaningful or not. If the information is meaningful, then agent 58 transmits the information to WAS sensor 18, otherwise the data is not transmitted. Considerably more complex than Gateway Software Agent 58 is a Host Software Agent 70 illustrated in FIGURE 9 that collects, processes and transmits visual information in the form of video, imagery, radar imagery and other sensed data, in addition to data from MAS sensor 16. Host Software Agent 70 collects, processes and transmits this data using a concept called progressive accuracy — meaning that the visual information transmission is organized in such a way that only an "information sample" is transmitted to control module 56.
Thereafter, Host Software Agent 70 negotiates with control module 56 to determine which subset of more complete information should be transmitted. Even at a compression ratio of 4000:1, the original VGA bandwidth of 221 Mbps in sensor network 12 is not always reduced to a level that avoids nodal overload. Therefore, the implementation of progressive accuracy in sensor network 12 guards against nodal overload by selecting transmitting a "first cut" of critical data, quickly followed by more detailed information as required by control module 56. In the prefeπed embodiment of the present invention, data from Host Software
Agent 70 to control module 56 is packetized in the transport layer using TCP/IP. In the alternative, the data is packetized using a combination of TCP/IP and ATM. Even though ATM includes fast flow control, hard QoS (as opposed to QoS emulation) and high voice quality, ATM also has a fixed cell size, a significant amount of operation system interrupts and high computational overhead. The prefeπed embodiment of packetizing using TCP/IP is based on variable-size packets, a 6:1 decrease in operation system interrupts as compared to ATM, cheap Ethernet, and efficient trunk packets. On the other hand, "standard" implementation of TCP/IP includes very slow flow control and no QoS provisions. Therefore, in implementing TCP/IP in the present application, sensor network 12 also includes bridge 30 supporting QoS Manager 34 and Buffer Manager 36.
As illustrated in FIGURE 9, LAS sensors 14 are coupled to MAS sensors 16 via communication links 48. LAS sensors 14 are also coupled to each other via communication links 50. MAS sensors 16 are coupled to WAS sensors 18 via communication links 46 and to each other via communication links 52. MAS sensor 16 is also coupled to Gateway Software Agent 58. WAS sensors 18 include a flash memory 72 and are coupled to Host Software Agent 70 and control module 56. A visualization module 74, a graphic overlay module 76 and a memory 78 are coupled to Host Software Agent 70, and visualization module 74 is also coupled to control module 56. Host Software Agent 70 also transmits intelligent, filtered, sensor information within the "contextual means" of the information, thereby reducing transmission bandwidth, in addition to applying progressive accuracy to transmit a "first cut" of the critical data.
Returning to the U.S./Mexican border example, a sudden influx of people trying to cross the border would quickly overload sensor network 12 if network 12 did not apply a "first cut" or progressive accuracy to the transmission of data. Therefore, in anticipation of such an overload, sensor network 12 is configured to identify and transmit to control module 56 an initial sampling of critical information based on processing by network 12. As discussed in relation to the way in which human vision works, this approach models the way in which a person first recognizes the contours and edges of an object, followed by a more detailed analysis of the entire scene. Alternatively, there are situations in which it is desirable to apply progressive accuracy in sensor network 12 to transmit information at low bandwidths (e.g., 16 kbps). As a comparison, the 4000:1 MPEG compression ratio discussed below and realized by compression module 24 (based on compressing from VGA 221 mbps divided by 4,000 = 56 kbps and rounding up to 64 kbps for overhead), is a sufficient bandwidth for ADSL. In order to maintain 16 kbps, however, the compression ratio will have to be 16,000:1 (64/4 = 16). At this compression ration, however, the image is significantly reduced in quality.
As an alternative to compressing the entire image at 16,000:1 for 16kbps, network 10 defines and isolates a window of opportunity around a moving object relying on 8 BOPS of processing power in network 10. In the prefeπed embodiment of the present invention, the object in the window is assigned the majority of the available bandwidth while the background is assigned a very small percentage (but not zero) of the available bandwidth. A user at control module 56 watches the defined window with the target object and determines whether the particular image requires further analysis. If the user requires additional data, a video clip from network system 10 is sent to control module 56. The user views either the original compressed video clip or a still image that has been compressed at a lower compression ratio. The window associated with a budget bandwidth (communication) of 16 kbps
B is defined by — wherein Bo is the uncompressed VGA frame (7.3Mbits),
k=(CD/CI), wherein CD is the average compression ratio of an MPEG stream, CI is the compression ratio of only the I-frames in this synchronized I-frame MPEG cycle, and CW is the window compression.
Furthermore, the compressed bandwidth for the background is defined by
— wherein CB is the background compression. The frame bandwidth (CT) is
(CB)K equal to the compressed bandwidth for the window plus the compressed bandwidth
for the background: B° + B° (k ~ 1) = CT . K(CW) (CB)K
Simplifying this equation:
Figure imgf000049_0001
T Tih-.eref rore: l + (k - 1) = ^ C • , k and ^ — 1 + (k - 1) = ^ C • , k
(CW) (CB) x y
In the example of transmitting critical high-resolution video to control module 56 with an available bandwidth of 16kbps at 22 fps, the frame bit volume must be 0J3kbps, which is equivalent to 10,000:1 compression. Obviously, this 10,000:1 compression ratio is a prohibitively high compression ratio for compression module 24.
Operating with this constraint, compression module 24 defines a window of opportunity that is 1/64 of the total area (640 x 480). Using the progressive accuracy model, the 1/64 window is compressed with a relatively low compression ratio (e.g., 1000:1) and the balance of the image is compressed with a high compression ratio (e.g., 11,600:1) just to approximate the background of the image. In the second step of the progressive accuracy, assuming that the window of opportunity defines an area of specific interest to the user, a full VGA video clip is transmitted to control module 56. Alternatively, still images are transmitted to control module 56 using one of the formats outlined below:
Figure imgf000050_0001
Using the hypercompression method described below, the still image includes high resolution details not likely to be included in the compressed video clip. On the contrary, still images of video clips in the prior art network systems are not representative of meaningful data because the insertion of the I-frame occurs every 15 frames in accordance with MPEG standards independent of the context of the frames instead of in response to scene changes.
In particular, the still images from the video stream from sensor network 12 are meaningful because the I-frames are intelligently inserted as needed coπesponding to the beginning of a scene, and therefore represent the full temporal event based on constantly monitoring an accumulated eπor that is derived from an actual eπor between coπesponding blocks of a cuπent frame and a predicted frame coπesponding to the cuπent frame.
Therefore, sensor network 12 and compression module 24 interpret meaningful spatial data (still imagery) from the temporal data (video) without resorting to an entirely different image compression method of the spatial data (e.g., JPEG), that would require additional processing resources. Moreover, in the context of progressive accuracy and a window of opportunity, compression module 24 simultaneously compresses in real-time the video data and the imagery at different compression ratios. As discussed above, the imagery is compressed at a relatively low compression ratio to preserve the image quality as much as possible.
Again, contrary to state-of-the-art network systems, a user views either the video or the high resolution still image, extracted from the video stream in sensor network 12, or both. The still image is meaningful because compression module 24 encodes the I-frames based on changes in the scenes. This encoding method results in a still image that represents the entire scene. Both the insertion of the meaningful I- frame and the integration of video and imagery are discussed in detail below. Additionally, U.S. Application Nos. 09/617,621 filed July 17, 2000, 09/136,624 filed August 19, 1998, and 08/901,832 filed July 28, 1997, incorporated herein by reference, fully disclose data compression relying on meaningful I-frames to reconstruct a scene from a video clip.
In the prefeπed embodiment of the cuπent invention, the temporal and spatial data are transmitted to control module 56 in real-time. In an alternative embodiment, sensor network 12 stores the temporal and/or spatial data in flash memory 72, thereby providing the user with transmission data "off-line". A 2GB flash memory can store approximately 1 minute of uncompressed data (40fps), but implementing even 1001 :1 compression results in being able to store 2000 minutes of data (approximately 33 hours of video).
C. SELF-HEALING ARCHITECTURE
The survivability of a network is defined as the ratio of the number of users a random user can communicate with previous to the failure to the number for a given component failure. Since sensor network 12 is highly redundant, the network survivability coefficient is close to unity. Sensor network 12 is self-healing insofar that emergency communication lines 50 between LAS sensors 14 and emergency communication lines 52 between MAS sensors 16 will replace communication lines 44, 46 and 48. In this regard, the network routing protocol is based on a "mapping" function or lookup table.
FIGURE 10 illustrates the transmitter flow of the self-healing process of sensor network 12. In a first step 80, data is acquired and packetized in a step 82. Sensor network 12 then establishes a communication channel with a receiving node in a step 84. A data channel is requested from the receiving node in a step 86 and a step 88 tests if an acknowledgment signal is received. If an acknowledgment signal is not received in step 88, the process continues to a step 90 to test if the process timed out. If the process is not timed out yet, control passes back to step 88 to continuously test whether the acknowledgment signal has been received. If, however, the process is timed out in step 90, another receiving node is selected in a step 92 and control passes back to step 84 to establish a connection with a receiving node.
After the acknowledgment signal is received in step 88, sensor network 12 determines in a step 94 if there is sufficient bandwidth available for transmission of the data. If sufficient bandwidth is not available, control passes back to step 84 to establish a communication channel with a receiving node. On the other hand, if sufficient bandwidth is available, the data header is sent in a step 96 and the data is sent to the destination node in a step 98.
Similar to the transmitter flow process described above, FIGURE 11 illustrates the receiver flow of the self-healing sensor network 12. After data is received in a step 100, the process tests if the data was requested in a step 102. The process does not proceed past step 102 until the data matches the requested data. In a step 104, after the requested data is received, the communication bandwidth is checked for sending new data. Thereafter, the communication bandwidth limit is communicated to the requesting node in a step 106. In a step 108, the process waits for the data header and in a step 110 the data is received and buffered. The received data is then multiplexed with the acquired sensor data into a transport stream in a step 1 1 1. Finally, in a step 112, the data is transmitted to the destination node.
INTEGRATED TEMPORAL AND SPATIAL DATA Cuπently, most effective still image compression techniques such as JPEG and wavelet are derived from linear base decomposition. In fact, by using Fourier orthogonal cosine basis (as in JPEG) or the Fourier-related bi-orthogonal basis (as in wavelet), a linear superimposition principle is applied. The Fourier approach is perhaps the most common in signal processing and compression. These still image compression methods based on Fourier-like linear and quasi-orthogonal bases are infinite in practice but truncated in practice. In the particular case of object edges, truncation creates image disturbances and artifacts, which cause serious image degeneration.
Quite surprisingly, however, the novel approach to signal processing and compression disclosed in U.S. Application Nos. 09/136,624 filed August 19, 1998 and 08/901,832 filed July 28, 1997, incoφorated herein by reference, is at least equally effective as the Fourier approach. The compression disclosed in these previous patent applications is based on Newtonian polynomial space ("hypercompression"). While this space is highly nonlinear, it nevertheless creates a very effective base for discretization of manifolds, in both non-singular and singular cases. In other words, by transforming a continuum (e.g., a manifold) into a discrete data set, the information content is significantly reduced, thereby leading to extremely high compression ratios (CR) for both still images and video. After polynomial data reduction, data is obtained from the image continuum, leading to datery (e.g., imagery in data form) with minimal information loss.
The hypercompression approach described in the present invention is based on soft computing technologies. Soft computing differs from conventional (hard) computing in that, unlike hard computing, it is tolerant of imperfections, uncertainty, and partial truth. Soft computing attempts to solve problems that are inherently well solved by people, but not well suited for classical algorithmic solution. The basic processing tools of soft computing include fuzzy logic, neural networks, genetic algorithms, and simulated annealing. Combining these tools or merging them into combinations such as fuzzy neural networks, genetically tuned neural networks, or fuzzified neural networks makes them more flexible and increases their efficiency. These soft computing techniques are efficient in search and optimization problems, especially when the search space is large, multidimensional, and not fully characterized. Standard optimization and search techniques such as steepest descent and dynamic programming fail under the same conditions. As discussed above, the prefeπed embodiment of the present invention relies on soft computing techniques (e.g., neuro-fuzzy processing) to solve the problem of fusing together the data from heterogeneous sensors 14, 16 and 18 in sensor network 12. Furthermore, the present invention relies on soft computing techniques (e.g., genetic algorithms) for the intelligent video and still image compression discussed below.
A. SINGLE CHANNEL I-FRAME INSERTION
The single channel "meaningful I-frame" hypercompression method is described in copending U.S. Patent Application Nos. 09/617,621 filed July 17, 2000, 08/901,832 filed July 28, 1997 and 09/136,624 filed August 19, 1998 (incoφorated by reference). These applications disclose the insertion of I-frames into the video bitstream based on video content. The eπor or difference between all coπesponding microb locks or segments of the cuπent frame and the predicted frame are accumulated and compared to a threshold to determine whether the next subsequent frame sent should be an I-frame. If the eπor or difference is large (coπesponding to high motion eπor), the I-frame is sent. If the eπor or difference is small, the I-frame is not sent and the frame sequence is unaltered.
Thus, full synchronization of the I-frame insertion with changes in the scene is achieved and bandwidth is significantly reduced because the I-frames are only inserted when necessary. In other words, the decision whether to insert an I-frame is based on analyzing the eπors between the I-frame and the B and P frames into which it will be inserted. The differences are transmitted to a decoder. Motion estimation systems also "skip" a number of frames (intermediate frames) that can be readily estimated because they typically include relatively few motion changes from the previous frame.
Refeπing to FIGURE 12, the hardware for performing I-frame insertion is depicted in block format including a host computer 114 communicating with a video processor board 116 over a PCI bus 118. Host computer 114 is preferably a 500 MHz Pentium® PC (Pentium® is a registered trademark of Intel Coφoration of Santa Clara, California). A PCI bus controller 120 controls communications over PCI bus 118. A memory (EPROM) 122 stores and transfers the compression coefficients to PCI bus controller 120 so all of the internal registers of PCI bus controller 120 are set upon start-up.
An input video processor 124 is a standard input processor responsible for scaling and dropping pixels from a frame. Input video processor 124 includes a standard composite NTS signal input 126 and a high-resolution Y/C signal input 128 having separated luminance and chromance signals to prevent contamination. Input video processor 124 scales the 702x480 resolution of the NTSC input 126 to standard MPEG-1 resolution of 352x240. Video processor 124 further includes an A/D converter (not shown) to convert the input signals from analog to a digital output.
An audio input processor 130 includes a left input stereo signal 132 and a right input stereo signal 134 that are converted using an A/D converter (not shown). The output from audio input processor 130 is input into a digital signal processor (DSP) audio compression chip 132. The output from DSP 132 is input into PCI bus controller 120 that sends the compressed audio onto PCI bus 118 for communication with host computer 114.
Similarly, the output of input video processor 124 is input into an ASIC (Application Specific Integrated Circuit) 134 including a DTC-based compression chip 136 and a motion estimator chip 138. ASIC 134 is responsible for signal transport, buffering and formatting of the video data from input video processor 124 and controls both the DTC-based compression chip 132 and motion estimator chip 138. Output of ASIC 134, compression chip 136 and motion chip 138 are fed to PCI bus controller 120 for sending the compressed video on PCI bus 118 for communication with host computer 114. The compressed video stream from video processor board 1 16 undergoes lossless compression in host computer 114 using standard lossless compression techniques including statistical encoding and run-length coding. After lossless compression, the video and audio are multiplexed using standard methods into a standard video signal. The packets containing the video and audio are interleaved into a single bit stream with proper labeling for synchronization during playback.
The eπors that are calculated in motion estimator 138 between the cuπent frame and the predicted third subsequent frame are transmitted to host computer 114 over PCI bus 118 for transmission to the encoder (not shown) to recreate the cuπent frame using that eπor or difference signal and the motion vectors generated during motion estimation.
In accordance with the motion estimation described in copending U.S. Patent Application Nos. 08/901,832 filed July 28, 1997 and 09/136,624 filed August 19, 1998, the eπors between the cuπent frame and the predicted third subsequent frame are accumulated in host computer 114.
Referring to FIGURE 13, the eπor accumulation in the motion estimation procedure includes reading the eπor buffer in the compression processor 136 through PCI bus 118 at a step 140. Thereafter, in a step 142, that eπor is accumulated in an error buffer created in software in host computer 114 so that the accumulated error will equal the preexisting eπor plus the present error. At a step 144, the accumulated eπor is compared to a threshold eπor. If the accumulated eπor is larger than the threshold eπor, then a new I-frame is sent in a step 146 and the eπor buffer in the compression processor 136 is not read again for that particular frame.
On the other hand, if the accumulated eπor is not greater than the threshold eπor, the process loops back up to a step 148 to choose the next subsequent microblock in that particular frame. If there is a subsequent microblock in that frame, then the process continues to step 140 to read the eπor buffer in compression processor 136. This eπor is accumulated in the accumulated buffer at step 142 and compared to the threshold at step 144. This iterative looping continues until the accumulated eπor exceeds the threshold, at which point it is no longer necessary to test any more microblocks for that particular frame because the accumulated eπor became so high that host computer 114 determined that a new I-frame should be sent to restart the motion sequence.
If, on the other hand, the accumulated eπor for all of the microblocks of an entire frame never exceed the threshold, then at a step 150 the standard MPEG compression process will continue without modifications (e.g., the next B or P frame will be grabbed and compressed).
B. MULTI-CHANNEL I-FRAME INSERTION As previously disclosed in copending U.S. Application No. 08/901,832 filed July 28, 1997, stereoscopic data consists of multiple channels of input that typically include two sources looking at an object from two points of view. Clearly there is a significant amount of redundant information between the two sources. Another common application of multi-channels is capturing video in a "look-around" environment where multiple cameras are utilized to look at a range of scenery or a designated object, with each camera accounting for one channel of data representing a particular view of the designated object or scene (e.g., from a variety of angles). It is desirable to coordinate the multiple sources such that the redundant information between the sources do not have to be encoded and transmitted to re-create the entire video of each source, thereby maximizing throughput of data and conserving bandwidth. Similarly, a single camera may be used to look at a spectral image whereby the signal obtained is divided into separate channels based upon narrow bandwidth windows using filters. When looking at such images, hundreds of channels can be realized within a few nanometers. Notably, the image data in each such channel contains a tremendous amount of coπelated data vis-a-vis adjacent channels, each channel coπesponding to a slightly different bandwidth. It is very inefficient to transmit full video content of each of these channels.
In a related example, data captured by a single source at different times may have a significant amount of coπelated data, as may be the case when using a video phone from a particular environment to send information over the Internet. For example, if the user transmits a video phone message over the Internet on a subsequent day from the same place as the previous day, mudi of the suπounding information will stay the same, and only certain aspects of the transmission will change. Due to the amount of similar data from each of the transmissions, it is inefficient to encode and transmit all the information contained in each message.
FIGURE 14 illustrates a system 152 for encoding multiple channels of video data according to the present invention. System 152 includes an encoder 154 with a series of multiple inputs 156 for receiving video data signals Si, S2, . . . SN from multiple sources or channels 158. In conjunction with firmware 160, encoder 154 processes the video data signals input from channels 158 in groups comprising a predetermined number of frames from each channel. Firmware 160 is preferably artificial intelligence (Al) fuzzy logic software that controls the encoding process, including determining when I-frames should be inserted. The Al/fuzzy logic software achieves high throughput, and consequently higher resolution of the video signals. Encoder 154 further includes compression software 162 for further compressing particular portions of the encoded video data in accordance with standard video compression techniques, such as MPEG intra-frame video data compression. This additional level of data compression enhances efficient use of available bandwidth without sacrificing video quality. Intelligent and interactive buffer management of a buffer 164 in encoder 154 and a parallel buffer 166 in a decoder 168 forms a bridge between the highly synchronized real-time video stream and a asynchronous network 170. In certain applications, after encoding at least a portion of the video data from each channel 158, encoder 154 transmits the resultant signals, in an appropriate sequence, to receiver/decoder 168 that includes firmware 172 to recreate the video images. After reconstructing the video images of each channel 158 based on the encoded transmitted signals, decoder 168 transmits the decoded signals to a display unit 174.
The number of skipped frames is typically dependent upon the type of video being compressed such that for a high action video, wherein the differences between successive frames of each channel are relatively many, fewer frames should be skipped because there is a higher risk that significant data may be lost which may compromise video quality. Predicted "B" frames for each channel coπespond to the number of skipped frames, the B frames being filler frames for the "skipped" frames. An intra-frame difference exceeding a threshold can trigger an I-frame, as can an initial inter-frame difference or a predicted inter-frame difference.
Encoder/transmitter 154 of FIGURE 14 preferably uses parallel processing such that while earlier encoded channels of data are being transmitted, the subsequent channels of video data are being encoded. Furthermore, although the method has been specifically described as encoding and transmitting one channel of data at a time, the channels can be encoded/transmitted in pairs to increase throughput. C. DATA HOMOGENIZATION
As illustrated in FIGURE 15, network system 10 homogenizes sensor data from a variety of sensors 178 that includes, for example, a first video camera 180, a second video camera 182, an infrared sensor 184, a seismic sensor 186 and an imaging radar 188. The homogenized and packetized data is transmitted to a video encoder and a sensor data formatting module. In the prefeπed embodiment of the present invention, data from camera sensors 180 and 182 is transmitted over communication lines 194 and 196, respectively, to encoder 190. Similarly, data from IR sensor 184, seismic sensor 186 and imaging radar 188 is transmitted over communication lines 198, 200 and 202, respectively.
In the prefeπed embodiment of the present invention, a compression processor 204 is an application-specific integrated circuit (ASIC) board (e.g., PCMCIA packaging drawing < 0.5 W) configured for supercomputer-grade 8 BOPS video processing per non-local sensor. As previously discussed above, processor 204 is cost-effective because 256 ASICs work in parallel.
Encoder 190 further includes a buffer 206 configured to synchronize data over an asynchronous local area wireless network 208 to a buffer 210 in a decoder 212. Encoded data from video encoder 190 and formatted data from formatting unit 192 is transmitted to local area wireless network 208 via a bus 214 and a bus 216, respectively. Data from instrumentation controls 218 is also transfeπed via a data transfer unit 220 to local area wireless network 208 via a bus 222. Asynchronous spread spectrum network 208 transmits a synchronous stream of interactive video data to a display 224 and an instrumentation module 226 via decoder 212 over a bus 228 and 230, respectively. Local area wireless network 208 is also coupled to a remote- network 232. Processor 204 integrates video with high-quality still imagery, thereby providing a user with temporal data (e.g., video) and spatial data (e.g., still imagery) through separate channels on bus 228. A user can display both the video on a screen 234 and/or the still images on a separate screen 236. Referring to FIGURE 16, a simplified asymmetric network example of sensor network 12 includes a series of cameras 238 coupled to a series of processors 240 using a 64 kbps or 128 kbps channel including a high-speed trunk line (2 Mbps). Each processor 240 is further coupled to a series of displays 242 and a server 244.
FIGURE 17 illustrates a video encoder 246 for a single channel example that includes an image compression ASIC 248 and a motion estimation ASIC 250 coupled to a pair of SD RAM (256x32) 252 and 254, respectively. Composite and S-Video are coupled to video processor 256 via a series of low pass filters 258. A buffer 260 is coupled to SD RAM 252. Video processor 256 communicates with a PCI interface 262 via a bus 264. Similarly, an address decoder 266 communicates with PCI interface 262 via a bus 268.
A buffer 270, a buffer 272, a buffer 274 and a set of fractional TI controllers 276 communicate over a PCI bus 278. A data module 280 is coupled to buffer 270 through a data interface 282. Audio is transmitted to an A/D converter 284 coupled to buffer 272 through an analog device unit 286. Similarly, a SRAM 288 is coupled to buffer 272 through analog device 286. A video logic control unit 290 also communicates over bus PCI 278. The decompressed data is transmitted from buffer 274 to TI controller 276 through an MPEG data FIFO unit 292 down to a line interface unit 294 and out to a network 296.
On the other side of the data transmission, a decoder 298 is illustrated in FIGURE 18. Data from network 296 is transmitted through a line interface unit 299 to a network controller 300 coupled to a C51 controller 302. In general, the data from the spread spectrum wireless network is processed by a chip set including an MPEG video decoder 304 and a video processor 306. Video overlay graphics 308 are superimposed on the video interconnection before it is displayed. Controller 300 transmits the data to a data FIFO queue 310 and then through an MUX 312 to decoder 304. A PCI bus 314 is coupled to a 16-bit register 316 and video processor 306. Register 316 is also coupled to decoder 304 through MUX 312. Decoder 304 transmits the audio data to an MPEG audio decoder 318 that is coupled to an audio D/A converter 320. On the other side decoder 304, the RGB signal from processor 306 is transmitted to a video D/A converter 322 and then to a display 324. Compression processor 204 is configured for approximate real-time processing, computing, compression and transmission of multimedia data. In network 10, there is a strong coπelation between software and hardware design, leading to minimizing processing overhead, thereby maximizing transmission speed and efficiency. As a result, the processing speed of compression processor 204 is maximized at the expense of computing generally. For example, the processing speed of compression processor 204 is equivalent to 100 Pentiums, while the actual computing performed by processor 204 is restricted to only simple arithmetic operations. Additionally, fuzzy-logic, neural, genetic-algorithmic, and other artificial intelligence (Al) processing are widely applied. Stationary image and video are computed and/or transmitted with only statistical accuracy, based on statistical global evaluation, not only on the transmitter, but also on the receiver level, if necessary.
Therefore, network 10 in general and compression processor 204 in particular provide image processing, editing, or even doctoring in real-time, thereby creating an illusion that video about natural events is transmitted while, in fact, these events can be doctored "on the fly." In other words, due to imagery transmission in only approximate or statistical form (yet tolerable to the human eye) and, therefore, significant data reduction (or compression), processor 204 provides imagery/video processing/transmission with a minimum transmission delay (or latency). In this regard, processor 204 relies upon highly specialized chip-set integrated circuit (IC) electronics that minimizes processing overhead and provides selected operations with supercomputer-type speed (e.g., speeds equivalent to 100 Pentium Is or 20 Pentium πis).
As discussed above, compression processor 204 may be a compact 2" x 3" PC-board or a fully-upgradable PCMCIA-card with minimum power consumption (<1W) despite 8 BOPS (eight billion-operations-per-second) processing. Compression processor 204 primarily processes synchronized temporal data including video events and spatial data including high-resolution still imagery. Processor 204 also processes other multimedia data (e.g., audio sensor and data). Due to the high coπelation (or high redundancy) of the data, the potential compression is also high. Therefore, in spite of the fact that original data rates of video, audio, and data are drastically different, the compression ratios are also very different, thereby resulting in comparable data rates for all three media as follows.
Figure imgf000062_0001
*) Also, some sensor and telemetry data
D. SIGNIFICANT EVENTS
While transmitting significant events recorded by sensors 178, compression processor 204 transmits and stores both the temporal sequence of events and the precise spatial structure of each important scene that has been recorded in order to preserve any unanticipated critical event for further analysis. For example, video camera 178 may record all individuals crossing the border from Mexico into California on a given day of the week. Government agents (e.g., FBI, DEA, etc.) can remotely access in real-time surveillance camera 178 over the Internet to scan the video searching for individuals attempting to illegally enter the United States. After camera 178 has continuously recorded individuals crossing the border, an agent is alerted to the fact that a particular individual crossed the border illegally on a certain date. The agent accesses the particular video from the specific date coπesponding to camera 178 to determine if there is any evidence of the individual illegally crossing the border.
Unfortunately, the standard quality of surveillance video on a network is typically of very poor quality. The best solution, of course, would be to preserve all time-space scenery of the individuals crossing the border, with the original data content (e.g., no compression at all). This is obviously an impractical scenario because the original transmission bandwidth for uncompressed video is too high (e.g., 221 Mbps for VGA/NTSC-video standard). A practical solution that is feasible using compression processor 204 is to use image compression ASIC 248 to balance redundancies in the space-time domain, thereby allowing a user to view a past image on display 224 via a separate channel. Importantly, this particular still image is compressed at a lower ratio (e.g., 40: 1) than the I-frames in the MPEG stream, so that the image contains enough data to adequately reconstruct the scene that the user is interested in. This still image is meaningful because image compression ASIC 248 compressed the MPEG video stream by modifying the standard MPEG compression algorithm and inserting an I- frame as needed to coπespond with a scene change in the video. These scene changes can result from a variety of conditions including a change of a video clip (e.g., a movie), sudden camera movement, sudden object movement, sudden noise, etc.
In particular, motion estimation is important to compression because many frames in full-motion video are temporally coπelated (e.g., a moving object on a solid background such as an image of a moving car will have high similarity frame to frame). Efficient compression can be achieved if each component or block of the current frame to be encoded is represented by its difference with the most similar component, called the predictor, in the previous frame and by a vector expressing the relative position of the two blocks from the cuπent frame to the predicted frame. The original block can be reconstructed from the difference, the motion vector and the previous frame.
The frame to be compressed can be partitioned into microblocks which are processed individually. In a cuπent frame, microblocks of pixels, for example 8x8, are selected and the search for the closest match in the previous frame is performed. As a criterion of the best match, the mean absolute eπor is typically used because of the trade-off between complexity and efficiency. The search for a match in the previous frame is performed in a, for example, 16x16 pixel window for an 8x8 reference or microblock. A total of, for example, 81 candidate blocks may be compared for the closest match. Larger search windows are possible using larger blocks 8x32 or 16x16 where the search window is 15 pixels larger in each direction leading to 256 candidate blocks and as may motion vectors to be compared for the closest match. Once the third subsequent frame is predicted, the standard methods provide that the eπor between a microblock in the cuπent frame and the coπesponding microblock in the predicted frame are compared and the eπor or difference between them is determined. This is done on a microblock by microblock basis until all of the microblocks in the cuπent frame are compared to all of the microblocks in the predicted frame. In the standard process, these differences are sent to the decoder real time to be used by the decoder to reconstruct the original block from the difference, the motion vector and the previous frame. The eπor information is not used in any other way in the prior art. On the contrary, in the cuπent invention, the eπor or difference calculated between microblocks in the cuπent frame and the predicted frame are accumulated or stored and each time an eπor is effectively calculated in real-time between a microblock in the cuπent frame and the coπesponding microblock in the predicted frame, that eπor is accumulated to the existing eπor for the frame. Once all of the eπors for all of the blocks in the cuπent frame as compared to the predicted frame are generated and summed, that accumulated eπor is then used to determine whether a new I-frame should be inserted.
This method is MPEG-compatible and results in high-quality video images. The accumulated error is used by comparing it to a threshold that is preset depending upon the content or type of the video such as action, documentary or nature. If the threshold for a particular cuπent frame is exceeded by the accumulated eπor, this means that there is a significant change in the scene that waπants sending an entire new I-frame.
Consequently, a new I-frame is compressed and sent, and the motion estimation sequence begins again with that new I-frame. If the threshold is not exceeded by the accumulated eπor, then the differences between the cuπent frame and the predicted frame are sent as usual and this process continues until the threshold is exceeded and the motion estimation sequence is begun again with the sending of a new I-frame. If the accumulated eπor for that frame ever exceeds a threshold, then the whole frame (e.g., every block) is coded as an I-frame. In the present invention, frames are coded either as I-frames or as block differences. The system analyzes the accumulated eπor on a block-by-block basis so not all of the blocks in a frame need to be analyzed if the threshold is exceeded. At that point, an I-frame is encoded. Alternatively, could compute for all blocks then compare to threshold. E. COMPRESSING SPATIAL I-FRAME The temporal data displayed on screen 234 is transmitted through a first communication channel with identical data throughputs as the spatial data displayed on screen 236 that is transmitted through a second digital communication channel. The temporal data transmitted through the first communication channel includes a highly compressed MPEG stream (approximately 4000:1), while the spatial data transmitted through the second communication channel includes compressed I-frames only that have been compressed at a ratio significantly lower than the MPEG stream in order to ensure high image quality.
In particular, the spatial signals are compressed using a low compression ratio, (CI)o. The value of (CI)o is significantly lower than the value of (CI) wherein CI is the compression ratio of an I-frame in the synchronized I-frame MPEG cycle. The average I-frame cycle contains N-number of frames, including a single I-frame (mother frame) and N-l frames (daughter frames), with an average compression ratio of CD. Compression processor 204 is configured to determine the following:
1. The average compression ratio of the MPEG I-frame cycle (CR); and
2. The average compression ratio of the I-frame in the video channel, (CI), into that of single frame in the second channel, (CI)o, to determine the necessary value to preserve the same data throughputs for both the first and second channels transmitting temporal and spatial data, respectively.
Compression processor 204 computes the average compression ratio of the N- frame synchronized cycle based on: _
Figure imgf000066_0001
Therefore,
Figure imgf000066_0002
where FR is the single frame original content. For example, based on the VGA/NTSC-video standard, FR=640 x 480 x 24 = 7, 3 x 106 bits.
Equation (1) then reduces to:
Figure imgf000066_0003
where
, Δ CD k = (3) v Cl J
wherein CD is the compression of the data frame and CI is the compression ratio of an I-frame in the synchronized I-frame MPEG cycle.
There are the following three particular cases for k:
a) k = 1, then (CR) = (Cl) = (CD) (4) b) k = oo, then (CR) = (CR) = N(Cl) (5) For example, compression processor 204 processes 90 frames of data (N=90) with CI = 300 and calculates the following:
(CR) = 27,000 (6) c) k>l , then (CR) > (CI), and(CR) < (CD) (7)
Referring to FIGURE 19, the average compression ratio of the N-frame synchronized cycle is illustrated for an original frame 326, a compressed I-Frame corresponding to a significant event 328, a compressed I-Frame (MPEG) 330, and a series of daughter frames (MPEG) 332.
Compression processor 204 computes the value (CR) based on a typical value of N=90 frames of data, CI = 350 and k=10. Based on the following experimental table,
CR = 3181.
Figure imgf000067_0001
Compression processor 204 further computes the following ratio:
(CR)β = constant = N (8)
(CI)
Now in order to preserve the same channel throughputs for first channel transmitting MPEG-1 video and second channel transmitting single I-frames, compression processor 204 calculates the following ratio:
Figure imgf000068_0001
which is equivalent to the following equation:
η = = 1 + (10)
(Cl)o k
Figure imgf000068_0002
* Typical Case In other words, the above table presents equivalent values for the I-frame still imagery, given the η coefficient as a function of the k coefficient. In this regard, the η coefficient is the compression ratio of the I-frame in the MPEG encoding using meaningful I-frame insertion (CI) to that for still imagery (CI)o with the same channel bandwidth. Using an typical example of k=40, the η coefficient is 9.9 (approximately 10), so the bandwidth of a single stationary MPEG frame (with meaningful I-frame insertion) is typically 10 times higher than that of an MPEG video I-frame. Therefore, if (CI)=300, then (CI)o = 30 which is a sufficiently low value to preserve all of the critical stationary object details. The compression ratio of the I-frame (CI) can be high due to high stationary JPEG compression, reducing of pixel resolution, and reducing of color contrast.
F. COMPRESSION PROCESSOR AND RELATED HARDWARE
A typical scene change in a movie occurs approximately every 3 seconds, while the prior art MPEG compression changes the I-frame every 0.5 seconds. Therefore the immediate compression gain realized by the meaningful I-frame insertion in the present invention is 6:1. System latency in network 10 is significantly reduced by graphic integrated chips providing highly parallel operations (e.g., 256 parallel processors). Network 10 further includes fuzzy logic automatic control of the frame global eπor evaluation. In particular, compression processor 204 computes the normalized global eπor (GE) based on:
Figure imgf000069_0001
where N is the number of pixels in the frame (e.g., 640 x 480 for VGA standard), and di and dio are RGB-pixel gray levels for a given I-frame and reference frame. During the processing of data, compression processor 204 further calculates:
(GE) ≤ T (12)
(GE) ≥ T (13) where T is an a ?r/σrt-defined threshold value.
Referring to FIGURES 20-23 illustrating various frame cycles, if (GE) < T, compression processor 204 continues the I-frame cycle, whereas in FIGURE 22 the standard I-frame cycle of FIGURE 21 is discontinued if (GE) ≥ T and a new I-frame 334 is inserted. Compression processor 204 statistically evaluates the global eπor (the difference between an I-frame and any other frames in the I-frame synchronized frame subset) or any block eπor within the I-frame synchronized frame subset. In standard MPEG encoding illustrated in FIGURES 20 and 21 , the I-frame synchronized frame subset is fixed in length so that an I-frame 336 is inserted every 15 frames regardless of frame content. Therefore, due to the lack of meaning associated with standard I-frames in MPEG encoding because of the fixed insertion of the frame regardless of content of the frame or timing of the scene, a lower compression ratio applied to the standard I-frame in the MPEG encoding would still be meaningless. In other words, a user would not be able to rely on any of the I- frames in the standard MPEG encoding scheme, regardless of the compression ratio, because these I-frames do not represent the full temporal even in the form of a still image associated with a particular scene. In the prefeπed embodiment of the present invention as illustrated in FIGURE
22, the I-frame synchronized frame subset is variable in length (e.g., not always 15 frames as in state-of-the-art MPEG-1, but less or more than 15 depending on the duration of the scene). When a particular scene changes at a transition point 336, compression processor 204 evaluates in real-time the global eπor of a given frame. If the eπor is too high, new I-frame 334 is generated, thereby starting a new I-frame synchronized frame subset 338. New I-frame insertion and consequent new motion estimation can be increased not only by overall motion within the entire frame, but also by motion in a selected region of the frame such as a particular target. In a comparison of FIGURES 21 and 22, inserting new I-frame 334 immediately after scene transition point 336 ensures that the global eπor level is minimized, thereby avoiding a series of motion artifacts 340 depicted in FIGURE 21 of standard MPEG encoding. Intelligent I-frame insertion implemented by compression processor 204 in network 10 is performed in real-time to ensure overall compression latency of less than 200 ms. Moreover, the global eπor evaluation is provided within 90msec (three frames) latency. Paradoxically, the quality of the highly-compressed (4000:1) image using intelligent I-frame insertion may be even better than the quality of a standard MPEG image compressed at a relatively low compression ratio because of the elimination of artifacts 340 using intelligent I-frame insertion.
FIGURE 23 illustrates the frame cycle associated with the transmission of spatial data on the second channel. (CI)o is the compression ratio of an I-frame 342 that is compressed at a lower ratio than the compression ratio (CI) of I-frame 334 in the MPEG frame cycle of FIGURE 22 that synchronizes the insertion of the I-frames with the scene changes.
In the prefeπed embodiment of the present invention, compression processor 204 implements the soft computing algorithm to implement the insertion of a meaningful I-frame at a scene change in a video stream and includes processing power on the magnitude of 8-10 BOPS. Moreover, network system 10 supports original, high-quality, full-motion color video (NTSCNGA- 221 Mbps) at a compression ratio of 3946:1 (a ten- fold increase over the state-of-the-art compression schemes). A bandwidth of 56 kbps is supported with time-multiplexed control data. A bandwidth of 112 kbps is supported for 3-D. Finally, a bandwidth of just a few kbps is likewise supported for extremely low bandwidth applications similar to "cartooning".
Related to the prefeπed hardware implementation of network 10, there are the following three embodiments of the integrated chip packaging for compression processor 204:
Figure imgf000071_0001
Based on the prefeπed hardware and software configuration of the prefeπed embodiment of the present invention, network 10 provides a real-time stream of digital video of TV-quality or NTSC/VGA: 640x480, 24bpp, 30fps, that maintains a low bandwidth (64kbps) and low latency (<= 0.5 sec). Additionally, network 10 includes real-time frame evaluation wherein compression processor 204 generates an I-frame regularly in 0.5 sec intervals (15-frames) and a typical change of scene is approximately every 3 sec. Therefore, a 6-times factor is obtained directly from the real-time frame evaluation of compression processor 204 when the I-frame is generated only if needed (when the actual accumulated eπor exceeds a particular threshold, thereby indicating movement coπesponding to a change of scene).
G. PATTERN RECOGNITION AND SOFTWARE AGENTS The role of software agents was discussed in relation to FIGURES 7-9, and is equally applicable to MAS sensors 16 and WAS sensors 18 coπesponding to sensors 178 in FIGURE 15. In this regard, one of these sensors 178 could be a spectrometer that transmits measurement results in the form of a percentage. Sensor 178 transmits a conclusion in the form of a pathologic sentence (any sentence within vocabulary). Software agents intelligently filter information, thereby significantly reducing the bandwidth required to transmit a message while preserving the meaning of the transmission.
Similarly, these software agents are preferably applied by compression processor 204 to reduce bandwidth requirements associated with the compression of (1) the meaningful I-frame MPEG stream; and (2) the meaningful I-frames extracted from the MPEG stream in (1) and compressed at a lower ratio to preserve image quality.
As originally discussed in connection with FIGURES 8 and 9, intelligent processes or software agents are species of Artificial Intelligence (Al) and include applications that assist computer users to use the tools available on the computer system and exhibit human intelligence and behavior such as robots, expert systems, voice recognition, natural and foreign language processing.
Digitization of video transmission, coupled with high-data compression, opens up a new area of applications that not only apply to compression into low-bandwidth communication, but also can restore some mature pattern recognition techniques, such as template matching. In this regard, the state-of-the-art compression systems have abandoned template matching because it cannot be performed in real-time without the processing power found in compression processor 204.
As an example, compression processor 204 performs template matching on two high-resolution frames having 1024 x 1024 pixels (or approximately N2 = 106 pixels) and full RGB color (24 bits/pixel). Given 8 BOPS, the two high-resolution frames can be compared within a few milliseconds or 1000 frames/second.
An analog of Fourier processing includes a cross-coπelation operation comparing two identical format frames with some translation or rotation about the center of gravity. If the two frames are identical, then for a particular translation and rotation, there is an exact match, otherwise a match will not exist. In order to derive any possible practical applicability for purely-electronic, real-time template matching, consider a N2-dimensional vector, ay where aij- is the vector component, and i,j.=l,2,...,N If ay- is the RGB gray-level for the ijth pixel, then, a >0 (14)
Slowly- varying (a ) - vector represents the averaged background. Then:
\ = * - (** >) (15) is a relative vector, and
Figure imgf000073_0001
Ri(0.0) (16) where R is Eucledean distance between two relative vectors.
This equation is simply the application of the pixel-by-pixel subtraction and j, is the number of the pixel. If (i = j), then the equation is zero and there is a full match. On the other hand, if a value is left over, then there is some sort of coπelation present.
Another way to simply this idea is to apply Boolean logic wherein: 1+1=0; 0+0=0; 1+0=1; 0+1=1. If summation is applied instead of subtraction, the same meaningful result will appear. In other words, the summation will be 0 when two pixels are the same and 1 if the pixels are different. This evaluation results in less overhead in computing a difference and a square.
If two frames are identical, and position-rotation adjusted, then
R~(θ,θ) = 0 (17)
Assuming two frames are being checked to determine if they are identical, and assuming this is a 2-D case wherein Δθ = 3° (or +/- 3° tolerance), and
N
+ translation tolerance (19)
2E
where E is the reduction factor for which E=100, is equal to 1000/100 = 10-pιxels tolerance, then the number of possible sampling points defined in 3-D space is:
Figure imgf000074_0001
and for N=l 03, E=l 00 and Δθ = 5° ≤ 0.1 ,
M = 103 x l0"1 x 102 = 104
The M-number of cross-coπelation analysis is in the form:
Figure imgf000074_0002
Assuming that the evaluation of single sampling points takes 1msec, then the total time is 10 sec. The following table includes the total time results for various N, and ΔΘ. and E=50.
Figure imgf000074_0003
The following table includes results for E = 100.
Figure imgf000074_0004
The outlined area in the above-identified table defines the total times of practical applicability for purely electronic real-time template matching implemented in network 10, simulating the cross-coπelation operation. This template matching is an example of an alternative embodiment of the present invention as applied to object recognition and targeting instead of video compression.
H. SINGLE CHANNEL INTEGRATION FOR LOW BANDWIDTH AND BIT-SELECTIVE ERROR CORRECTION
The integration of temporal and spatial based on the insertion of the meaningful and I-frame at a scene change in a video stream has further applications beyond transmitting the highly compressed video (4000: 1) originally at 221 Mbps
(VGA, 640x480, 30fps, 24bpp) down to 56 kbps over the first channel and simultaneously transmitting the coπesponding low compression, high-resolution images (40:1, 640x480, 24 bpp) at 56 kbps over the second channel to display 224. In particular, some platforms provide a maximum bandwidth of 16 kbps.
An automatic selection surveillance process control for temporal and spatial events through a single low-bandwidth channel is disclosed in FIGURE 24. After a start step 344, a user is given a choice in a step 346 whether to operate in autonomous mode. If the user selects not to operate autonomously, control passes to a step 348 for the user to select:
1. Data Rate;
2. Key frame spacing; 3. Enhanced video; and
4. Resolution (temporal and spatial).
Thereafter, a communication channel is established in a step 350. If, however, the user selects autonomous operation at step 346, the default values for the compression parameters are used in a step 352 and two-stream connection of the temporal and spatial data is requested at a step 354. At a step 356, the quality of the transmission is compared against a threshold based on the bit eπor rate (BER). In general, a 10'4 BER is the maximum acceptable eπor level for minimum artifacts with a compression ratio of 4000:1 as described above in connection with meaningful I- frame compression. If the transmission quality is not sufficient, additional eπor coπection is applied at a step 358.
At step 358, bit selection eπor coπection module 26 (FIGURE 5) applies bit- selective (BSEL) eπor reduction based on the observation that in the meaningful I-. frame compression there are some bits (e.g., I-frame bits and synchronization control bits) that are more important than other bits. In typical bit-eπor coπection, control bits are added to coπect the eπor. In this case, if one bit changes, the sum becomes uneven and the eπor, in addition to the location of the eπor, is detected.
The approximate foπnula of 2k= m wherein m is the number of information bits and k is the number of control bits works well assuming m is very large. For example, m=1000 and k=10 means that only 10 regulating bits (1%) are needed for every 1,000 bits. In practical situations, however, there is not simply a one bit eπor. Therefore, using the meaningful I-frame compression, bit selection eπor coπection module 26 applies an additional level of eπor coπection on the transport layer. In particular, due to the importance of the bits specific to the specialized compression implemented by compression processor 204 (e.g., the synchronization bits that maintain the basic connections between the I-frames and the daughter frames), internal eπor coπection is performed on the packets prior to the packets being transmitted over the network. The eπor coπection applied by module 26 on the transport layer is in addition to the eπor coπection that is performed on the physical layer inherent in the IP protocol.
Assuming a 10"4 BER is considered the threshold for acceptable image quality, the bits arrive at module 26 already having been subjected to the IP eπor correction in the physical layer. Module 26 applies additional eπor coπection to the significant bits to reduce the significant bits to a level of 10'5 BER while the other bits are left at a level of 10'3 BER, or reduced only to 10'4 BER. In other words, the significant bits are compressed by two orders of magnitude and the other bits are reduced by one order of magnitude, or not reduced at all. Eπor coπection methods that are applied by module 26 include, as a simple example, traditional Hamming codes developed by Richard Hamming that are based on algebraic ideas originating in the number- theoretic research of Gauss in 1801. The method can also be applied to more complex codes such as the Reed-Solomon (RS) code that can coπect multiple eπors and bursts of eπor based on finite fields and associated polynomial rings.
Module 26 uses the following table to determine how much bandwidth is required and an optimal size of the MTU-internal packets to significantly reduce BER by applying, for example, Hamming codes (one eπor plus position).
Figure imgf000077_0001
If module 26 compresses from BER 10"3 to BER 10"5, then short streams of 20 bits (m=20) are used with 25% overhead bits (5 control bits are added to the 20 information bits). Longer streams of m=200 and 4% overhead bits are used to reduce from BER 10"3 to BER 10"4.
After module 26 applies bit selective eπor correction to the data, the coπected data is retested at step 356 until the BER > threshold. Thereafter, at a step 360, data is continuously transmitted until a termination step 362.
COMMUNICATION BRIDGE
As illustrated in FIGURE 5, communication bridge 30 between sensor network 12 and network 38 includes QoS manager 34 and intelligent and interactive buffer manager 36. In particular, bridge 30 is responsible for ensuring the delivery of the highly-synchronized multimedia streams despite the inherently asynchronous "on- demand" nature of networks. As discussed above, the fundamental challenge of transmitting of real-time multimedia over the Internet is to create such a bridge to dynamically adjust timing and buffering network parameters to guarantee that the synchronized data as sent is synchronously received despite network latency (e.g., routers and switch delay) and prioritization problems.
In the prefeπed embodiment of the present invention, there are three components that contribute to the bridge between the highly-synchronous multimedia data stream and the asynchronous network: (1) protocols; (2) buffer management/circular buffers; and (3) global network optimization/Quality of Service (QoS). A. PROTOCOLS
Communication bridge 30 in network system 10 creates modifications in many of the standard layers in the stack including:
1. Modification on the transport layer to implement circular buffer management of multimedia data streams, apply intelligent and interactive buffering based on dynamic network conditions and prioritization schemes, and implementing QoS with socket development using internet protocols; and
2. Modification on the transport layer by implementing additional bit selective eπor coπection by module 26 (FIGURE 5) after initial eπor coπection performed on the physical layer. The TCP/IP sockets subprotocol is extended by incorporating intelligent and interactive buffering. In this regard, bridge 30 relies on Microsoft™ QoS emulation and an Application Program Interface (API) configured to implement a circular buffer that intelligently and interactively buffers multimedia data based on network conditions.
B. BUFFER MANAGEMENT/CIRCULAR BUFFER
Depending on bit rate and network conditions, buffer sizes are statistically predicted based on standard models. When a packet is going to be sent over the network, there is a statistic indicative of how much of the packet can safely be transmitted over the network and how much of the packet needs to be buffered based on the condition of the network and the relative priority of the packet.
The maximum allowable packet size (MTU) is configurable by the operating system. In the prefeπed embodiment of the present invention, the MTU is set at 1500 bytes (e.g., sent from encoder 190 to the receiver at 1500 bytes and decoder 212 receives in 2k packets) to minimize network latency. Referring to FIGURE 15, transmission between encoder 190 and decoder 212 is controlled by network interface cards (NIC) (e.g., PCMCIA wireless LAN card IEEE 80211) having encoder buffer 206 and decoder buffer 210. Standard transmission rates are 64 kbps to 1.5Mbps at 29.97fps (standard for NTSC) with image size SIF(320x240) (standard MPEG). Cuπent implementation of standard NIC cards and buffers 206 and 210 do not include the type of intelligent and interactive buffer management necessary to negotiate the synchronous multimedia stream through the asynchronous network. In this regard, buffer manager 36 also includes variable rate allocation to manage network bandwidth based on priorities of different users and processes.
For example, assuming a network includes 1,000 connected users and 1 Gbps of total bandwidth, based on an even distribution of bandwidth, each user is allocated 1 Mbps of bandwidth, which in implementation is actually closer to 700kbps. Typically, only about 70% of bandwidth is useable given eπors, collisions between packets and buffering and waiting time. If one user needs 1.6 Mbps but is only allocated 700kbps, while a different user is allocated 700kbps but is only using 300kbps, buffer manager 36 "negotiates" the process to allocate the excess bandwidth from the second user to the first user.
In the prefeπed embodiment of the present invention, two different methods are implemented for buffer management: (1) Global Smoothing; and (2) Local Smoothing. Global smoothing is a time-based method to determine network bandwidth optimal operating parameters. Statistics are gathered over a particular period of time (e.g., 1 month) based on bandwidth usage coπelated to time, and statistical optimization methods provide manager 36 with optimal operating bandwidth parameters. Similarly, using local smoothing, manager 36 creates a window of statistics based on a minimal time delay.
Referring to FIGURE 25, a circular (ring) buffer 364 is implemented in RAM memory (encoder buffer 206 and decoder buffer 210). Buffer 364 is a permanently allocated buffer with a read and a write position. The read position will never be greater than the write position, since if it were then the program would be reading data that had not been written to the buffer. Likewise, the write position will never wrap the read position, otherwise the data that had not been read would be overwritten. Since buffer 364 has no "ending" element, it is important during iteration to avoid infinite loops. In the present invention, these infinite loops are avoided by maintaining a count of the number of list entries and controlling the loop by measuring the number of values scanned. An alternative approach includes using a special, recognizable node in the linked list as a sentinel. The sentinel is never removed from the list. Iteration will check at each increment, and halt when the sentinel node is encountered.
Another variation includes maintaining links at each node that point both forward and backward. This aπangement results in a doubly-linked circular list that includes maintenance of two links in order to discover both the predecessor and successor of any node, thereby making insertions and removals possible with only a pointer to a single node instead of two pointers required when defining operations using iterators.
FIGURE 26 illustrates a different view of circular buffer 364 residing in a memory 366. Buffer manager 36 in bridge 30 compensates for asynchronous network 10 over which synchronous data streams are sent by implementing circular buffer 366 to accommodate for latency and timing complications. In the prefeπed embodiment of the present invention, buffer manager 36 "tunes" the timing between a write process 368, and a read process 370 in memory 366. Memory 366 is divided into 2k increments 372.
In particular, when packets have to be sent, network conditions may force the process to wait, but writing process 370 continues. Thereafter, when network conditions accept the sending of the packet, the packet is sent over network 10. Writing process 368 is traditionally the limiting process in that packets can be sent faster than they can be written. In this regard, the MPEG encoding/decoding process is a slower process than sending the packets. Therefore, buffer manager 36 maintains writing process 368, and reading process 370 "catches up" to writing process 368 as network conditions allocate resources to write process 368. The timing between reading process 370 and writing process 368 results in a gap 372 that is managed by buffer manager 36.
FIGURE 27 illustrates the encoding process including the flow from an input video block to a video encoder block 376 to a video MPEG compression block 378, and then to a buffer in main memory (DRAM) 380. A network module block 382 includes a software driver to manage a circular network buffer 384 connected to a network block 386. Similarly, FIGURE 28 illustrates the decoding process including a network interface block 388 connected to a circular network buffer (DRAM) 390 that is connected to a decoding block 392 and then to a display 394. FIGURE 29 illustrates the write implementation of circular buffer 364 including data from a block 396 sent to a block 398 to test if circular buffer 364 is available. If buffer 364 is not available, the process waits. As soon as buffer 364 is available, data is written to buffer 364 in a step 400 and the total size count of the buffer is incremented at a block 402. Thereafter, control passes back to start block 396.
As illustrated in FIGURE 30, the read implementation of circular buffer 364 includes testing if buffer 364 is available in a step 404. If buffer 364 is not available, the process waits. As soon as buffer 364 is available, data is read from buffer 364 in a step 406 and the coπesponding memory is unallocated (released) in a step 408.
Finally, the total buffer size is decreased in a step 410 and control passes back to test block 404.
There are several parameters that manager 36 determines for buffer 364 that are forced from the server on each of the nodes. Traditionally, video on the Internet is transmitted asynchronously so that it is chopped based on frames arriving earlier and later with a variable delay. Buffer 364 is configured to maintain a constant stream, thereby maintaining a consistent timing of delivery of the frames in the stream. For example, each user on the network may have a different priority and a different video rate. The parameters controlling distributed transmitter and receiver buffers 206 (encoder) and 210 (decoder) are modified by manager 36 to force a different transmission rate or priority on particular sensors (e.g., a camera) and dynamically adjust the width of buffers 206, 210.
For example, in the event that cameras on the U.S. /Mexican border are continuously monitoring activity, and in a very short period of time the activity level significantly increases (e.g., an "invasion" of sorts), manager 36 dynamically adjusts buffers 364 and transmission rates for a set of "priority" cameras so that network system 10 does not overload. In this situation, some sensors (cameras) located in key locations are assigned a higher priority and more network resources (transmission bandwidth) than sensors (cameras) that are located in less strategic locations. The intelligent buffering is combined by manager 36 with QoS manager 34
(FIGURE 5) to minimize latency problems from the "on-demand" asynchronous network. QoS manager 34 engages in a negotiated process to maintain thresholds (e.g., certain cameras will not drop below a certain bandwidth like 512kbps). For example, transmission rates on priority cameras may be in the range of 512kbps - 1Mbps, and the non-priority cameras may have transmission rates in the 64 kbps - 512kbps range. If the network 10 is not loaded, then the non-priority cameras will transmit at full capacity (512kbps), and manager 36 "tunes" or optimizes buffer 364 to this 512kbps bandwidth. As the load on network 10 increases, buffer manager 36 adjusts the variety of buffering parameters to accommodate the load while maintaining synchronous transmission of the multimedia data stream based on optimization methods. In particular, QoS manager 34 negotiates each process through network 10 by dynamically adjusting bandwidth, latency, jitter and loss parameters depending on the operating load of the network and the priority of the data. The bandwidth parameter is the rate at which an application's traffic must be carried by the network. The latency parameter is the delay that an application can tolerate in delivering a packet of data. The jitter parameter is the variation in the latency and the loss is the percentage of lost data.
As discussed above, the network interface cards in encoder 190 and decoder 212 are physical connections to network 10. The interface cards receive data from the PC, process the data into an appropriate format, and transmit the data over a cable to another LAN interface card. The other card receives the data, translates the data into a form the PC understands, and transmits the data to the PC.
The role of NIC 190 and 212 is broken into eight tasks: host-to-card communications, buffering, frame formation, parallel-to-serial conversion, encoding and decoding, cable access, handshaking, and transmission and reception. In general, the first step in transmission is the communication between the personal computer and the network interface card. There are three ways to move data between the PC's memory to the NIC 190 and 212: I/O, direct memory access, and shared memory. Data comes into NIC 190 and 212 faster than it can be converted from a serial or parallel format, depacketized, read, and sent. This is true in both directions.
NTCs 190 and 212 also form frames - the basic units of transmission that include a header, data and trailer. The header includes an alert to signal that the frame is on its way, the frame's source address, destination address, and clock information to synchronize transmission. Headers also include preamble bits used for various purposes, including setting up parameters for transmission and a control field to direct the frame through the network, a byte count, and a message type field. The trailer contains eπor checking information (the cyclical redundancy check (CRC)). Before data is sent through network 10, there is a short period of communication between cards 190 and 212. During this period, cards 190 and 212 negotiate the parameters for the upcoming communication. The transmitting card sends the parameters it wants to use. The receiving card answers with its parameters. Parameters include the maximum frame size, how many frames should be sent before an answer, timer values, how long to wait for an answer, and buffer sizes. The card with the slower, smaller, less complicated parameters always wins because more sophisticated cards can "lower" themselves while less sophisticated cards are not capable of "raising" themselves.
C. GLOBAL NETWORK OPTIMIZATION/QUALITY OF SERVICE
As discussed above, QoS manager 34 (FIGURE 5) works in parallel with buffer manager 36 to optimize and dynamically adjust network parameters based on network load and prioritization. As priorities and transmission rates vary, a variety of parameters including frame resolution, frame rate, color depth, frame drop out frequency, and buffer width are adjusted to optimize the new bandwidth rate.
Manager 36 intelligently monitors the load of the network and dynamically calculates optimal operating parameters for buffers 364. This is a multidimensional optimization problem because each parameter is a dimension in the optimization space and the parameters are nonlinearly linked (e.g., reducing the frame rate does not necessarily mean that you are going to reduce the data rate by the same factor). QoS manager 34 applies a function (e.g., a genetic algorithm) to find the global minimum of the multifunction. If the continuous functions are not well-defined, then a fuzzy neural network teaches the network to recognize various cases.
QoS implementation in an ATM network is fundamentally different than an implementation of QoS in the Internet because ATM is connection-based so there is a pipe from the client to the server to guarantee the sending of a certain bandwidth. In LAN networks, connections are emulated. In the prefeπed embodiment of the present invention, QoS manager negotiates bandwidth based on prioritization on the network using QoS schemes (e.g., Microsoft QoS emulation) to give parity to real-time video transfer so that the server emulates packets and guarantees bandwidth.
TRACKING MOVING OBJECTS
As discussed above, system network 10 including sensor network 12 is based on integrating fuzzy logic with neural networks to automate the tasks of sensor data fusion and determining alarms. One application of using the sensor fusion in sensor network 12 described above is in security systems having a number of camera sensors monitoring activity and transmitting surveillance data to control module 56 (e.g., tracking objects along the U.S./Mexican border and determining whether individuals are illegally crossing the border). In this regard, sensor fusion in network 12 includes the capability to adapt dynamically to changing external conditions, perform real-time tracking of objects using camera sensors and recognize objects based on intelligent neurofuzzy sensor control. The merged multisensor data is converted into knowledge and used to interpret whether an alarm condition exists (e.g., illegal border crossings). Cuπent "security" systems spend a significant amount of resources interfacing with heterogeneous sensors, and a relatively insignificant amount of time "integrating" data. Contrary to these standard systems, system network 10 has a low false alarm rate due to intelligent multisensor data fusion and includes remote activation, manual control of each sensor 14, 16 or 18, real-time remote sensor information retrieval, and on-line programming of system 10 through a graphical interface. Some prior art security systems are based on neural networks that cannot adapt to sensor failures within a sensor suite and require extended training, a large amount of memory and powerful processors, thereby significantly increasing the complexity of the system. If the number of sensors is large, the capability to dynamically adapt to changing external conditions is particularly important. The traditional solution to sensor failure is to exclude the data from the malfunctioning sensor from the sensor fusion. Therefore, a neural network traditionally achieves a better recognition rate because the incoπect data is excluded from the input. Implementing this type of neural network includes the following steps:
1. Train a group of neural networks (N = Ni, N2, N3, ... Nn) offline with each neural network N; coπesponding to a sensor failure situation; 2. Determining automatically and continuously whether each sensor is working properly; and
3. Automatically selecting which network (N = Ni, N2, N3, ... Nn) coπesponds to the detected sensor failure situation.
On the contrary, the prefeπed embodiment of the present invention avoids the computational overhead associated with this implementation of a neural network by implementing a fuzzy logic-based functionality evaluator that detects sensor failures in real-time and a fuzzy logic generator that fuzzes the large number of neural networks into a manageable number of groups. Additionally, an intelligent decision aid based on a neurofuzzy network automates and assists in deciding whether an alarm is true or false.
A. FUNCTIONALITY EVALUATOR BASED ON NEUROFUZZY LOGIC
Sensor network 12 includes a functionality evaluator to determine the functionality of each sensor 14, 16, and 18 in real-time so that any malfunctioning sensors are eliminated from multisensor network 12, ignoring worthless or wrong data. The evaluator is based on fuzzy logic and includes the following steps:
1. Fuzzify the input;
2. Apply a fuzzy operator;
3. Apply an implication operator;
4. Aggregate the outputs; and 5. Defuzzify the output.
For example, tracking sensor network 12 may include numerous red-sensitive CCD cameras as sensors. Five membership functions are be used for the input gray scales (very high, high, medium, low and very low). Three membership functions are used to evaluate the output credibility of each CCD camera (high, medium and low). Based on the following fuzzy rules, the output for the input universe of discourse is generated. Using fuzzy logic, the outputs for all possible inputs are generated in advance, and the outputs are saved as a lookup table for use in real-time operation. IF (gray level is very high), THEN (credibility is low). IF (gray level is high), THEN (credibility is low). IF (gray level is medium), THEN (credibility is medium). IF (gray level is low), THEN (credibility is high).
IF (gray level is very low), THEN (credibility is high). B. FUZZY WEIGHT GENERATOR
The fuzzy weight generator automatically groups the alternative neural networks into clusters based on the coπelation among the neural networks. Therefore, all of the neural networks belonging to a given cluster are fuzzed into a single fuzzy neural network, replacing the large number of neural networks otherwise required.
The large number of neural networks that coπespond to each sensor failure situation are grouped into clusters based on the coπelation among the neural networks. Therefore, all of the neural networks belonging to a given cluster are fuzzed into a fuzzy neural network, replacing the large number of neural networks otherwise required.
Unsupervised competitive learning is preferably implemented by sensor network 12 because the desired class labels among the available neural networks that coπespond to each sensor failure situation are unknown. Given some input sample data, the output neurons of a competitive network compete amongst themselves for activation, and only one output neuron can win the competition at any time (e.g., "winner take all").
After the input sample data is categorized into the appropriate fuzzy neural networks, each fuzzy neural network is trained with its sample data using modified learning vector quantization. The objective of this quantization is to find the set of reproduction vectors that represents an information source with the minimum expected "distortion". The learning vector quantization is especially powerful in pattern recognition and signal processing applications.
INTELLIGENT DECISION AID Refeπing to FIGURE 31, as generally described above, sensor network 12 that is configured to track a moving object also includes an intelligent decision aid 412 to process and integrate all of the information from heterogeneous sensors 14, 16 and 18 in a reasonable amount of time. In particular, intelligent decision aid 412 integrates a neural network-like structure that adaptively generates fuzzy logic rules. In this "neurofuzzy" approach, initial drafts of fuzzy rules are prepared by experts and mapped to the location and curve of membership functions to build a fuzzy rule base 414. A neural network 416 tunes the location of the membership functions to optimize the performance of the fuzzy rules. A set of training data 418 is submitted to both rule base 414 and neural network 416.
Neural network 416 presents the initial fuzzy membership functions to fuzzy rule base 414. Fuzzy rule base 414 then generates an actual output, which is compared with the desired output contained in training data 418. The training algorithm changes the neural network weights, thereby adjusting the membership functions. The new functions are then presented to rule base 414, and the process repeats until the difference between the actual and desired outputs is minimized. Additional sets of training data are iteratively applied until the final membership function parameters, and therefore the membership function shapes, converge to their final values. With membership functions defined in this manner, the operation of fuzzy rule base 414 closely mimics the operation represented by training data 418.
D. REAL-TIME OBJECT TRACKING AND IMAGE SEGMENTATION
Video motion detection is based on analog or activity detection. Analog detectors respond to changes in the video output level from a camera, but any slight change in the video level often triggers a false alarm (e.g., blowing leaves or rain). Activity detection is a form of video motion detection using limited digital technology. Instead of detecting the change in video levels, activity detection discerns changes in individual pixels or cells of a video image. On the contrary, sensor network 12 is an actual digital motion detector insofar that network 12 determines the size, direction and number of objects within a particular environment.
A tracking system 420 based on sensor network 12 includes a motion detection sensor module 422, a wide field of view (FOV) camera 424 coupled to a wide FOV channel 426, a naπow FOV camera 428 coupled to a naπow FOV channel 430, and a transmitter 432 coupled to control module 56.
In the prefeπed embodiment of the present invention, cameras 424 and 428 are digital cameras. Alternatively, cameras 424 and 428 are analog cameras and tracking system 420 further includes the additional interface elements coπesponding to analog cameras 424 and 428 (e.g., A/D converters, etc.).
Tracking system 420 includes two stationary "fish eye" cameras 424 with each camera having a field of view of 180° coupled to two naπow FOV cameras 428 that are capable of tracking multiple objects using time division multiplexing (e.g., dividing tracking time coπesponding to a single camera between multiple objects). The polar coordinates of objects based on "fish eye" cameras 424 are converted to Cartesian coordinates by preprocessing the data provided to channel 426.
Each narrow FOV camera 428 has tilt/pan capabilities that are controlled via a RS232 connection using the Video System Control Architecture (VISCA) protocol (up, down, left, right, zoom in, zoom out). In this configuration, the VISCA message packet has a packet header in the first byte containing the source address and the destination address and a terminator in the last byte (FF). The maximum message is 14 bytes. Tracking system 420 is in "sleep mode" while active wide FOV cameras 424 remain active to detect any potential objects that may be of interest to control module 56.
Each of the cameras 424 and 428 are IP-addressed in sensor network 12, thereby allowing a user to inteπogate each camera over user network 38 (e.g., the Internet). Control commands are also taken from users over network 38 to control sensors (cameras) 424 and 428. In an alternative embodiment of the present invention, a series of motion detectors are placed along the border, thereby allowing wide FOV cameras 424 to remain in a "sleep mode" along with naπow FOV cameras 428 until one of the motion detectors indicates a need for system 420 to "wake up".
In yet another alternative in the present invention, a "bug eye" sensor replaces the motion detectors to alert wide FOV cameras 424 to "wake up" based on activity along the border. In particular, the "bug eye" sensor is a directional sensor having numerous small, nonimaging, optical elements placed on the hemisphere of the sensor with a specific separation between each element. When there is a change in the intensity on one of the elements with a very naπow field of view, the coπesponding element is activated and tracking system 420 receives directional data from sensor (camera) 424. This sensor is based on a nonimaging optic multiaperture compound eye architecture satisfying the Liouville theorem. These "ommatidial" nonimaging optic fields-of-view in the bug eye overlap with neighboring ommatidia (e.g., single nonimaging optic tip "eyelets") to some degree, permitting fuzzy logic-based fuzzy metrology.
Finally, in another alternative embodiment of the invention, a series of motion detectors with wireless transmitters are distributed among naπow FOV cameras 428. When one of the motion detectors is activated, tracking system 420 "wakes up" and moves naπow FOV cameras 428 to the coordinates of the activated motion detector, thereby eliminating the use of wide FOV cameras 424 to determine the coordinates of a particular object. In motion detection sensor module 422, an object in an input scene 434 is detected in a sensor block 436 based on a motion detector detecting movement of the object or a "bug eye" sensor detecting a change in intensity. If sensor 436 is a "bug eye" sensor, an object detection sensor 438 is triggered to indicate an object has been detected in input scene 434. A neurofuzzy processor 440 detects the direction of the detected object and the coπesponding coordinates of the object are calculated in a block 442 and provided to channel 426.
Wide FOV camera 424 may include a data formatter 444 and/or a data digitalization block 446 depending on whether camera 424 is an analog or digital camera. In the prefeπed embodiment discussed above, camera 424 is a digital camera. The video from camera 424 is transmitted to a single frame buffer 448 coupled to a camera platform motion compensator 450. The video is also fed to a variable delay block 452 and fed to a reference frame buffer 454 coupled to a similar camera platform motion compensator 456. The cuπent video frame stored in buffer 448 is compared to the reference video frame stored in buffer 454 in a frame subtraction unit 458.
Frame subtraction unit 458 determines any movement of any objects in input scene 434 based on a pixel-by-pixel subtraction of two consecutive frames. The data indicating which pixels have changed state between the cuπent frame and a reference frame is transmitted to object unit 460 that is configured to determine the number of blobs in the image. A blob is a region of connected pixels that show similar properties, such as color and light intensity, and dissimilar properties from its neighboring pixels. If two blobs are considered "connected" based on unit 460, then the two blobs are merged into a large blob. The similarity between pixels is calculated against predefined constants. Changing the values of these constraints in unit 460 restricts or loosens the connectivity between the blobs. Furthermore, a wide FOV camera such as camera 424 results in proportionally small objects. The threshold parameters defining the difference between noise and objects (blobs) is also adjustable within tracking system 420.
Unit 460 performs standard segmentation on the blobs to form object(s) so that various rectangles are formed around groups of similar pixels and a set of rectangles typically creates an "object". Tracking system 420 calculates a point representing the local center of gravity coπesponding to each of the rectangular boxes, and then a global center of gravity based on the local center of gravity points to form a global centroid coordinate of what is then defined as the "object". If a single object is defined, naπow FOV cameras 428 are moved to the global centroid coodinate of the object. If a single object is defined and multiple naπow FON cameras 428 are available for tracking, either both cameras 428 may be moved to the global centroid coordinate or each of the cameras may be moved to a portion of the image by essentially dividing the single object in half.
Unit 460 calculates whether multiple objects exist in input scene 434 and the coπesponding coordinates based on the interconnectivity between the blobs. In order to split input scene 434 to allow naπow FOV cameras 428 to track multiple blobs, unit 458 calculates the distance between the two farthest points included in the segmentation (xi, yi) and (x2, y2). Unit 460 also calculates the coordinates coπesponding to the center of the line (xc, yc) formed between the two farthest points included in the segmentation. Finally, unit 460 calculates the center of the line (x ι, y*ι) and (x* 2, y 2) between the center of the line and each of the respective end points [(xc, yc) and (xi, yi)] and [(xc, yc) and (x2, y2)]. The coordinates (x i, y i) and (x 2, y 2) are sent to each of the two naπow FOV cameras 428 to track the two objects. Unit 460 further iterates the division of input scene 434 based on either the line division algorithm described above, determining multiple coordinates based entirely on centroid analysis or using convex hull theory in computational geometry to find the smallest polygon that covers all of the areas of the blobs.
The convex hull method includes finding either the smallest polygon or alternatively fitting a circle around the pixels in a blob. In this case, the largest distance between any two points on the circle is calculated. The third point that defines the minimum area circle is calculated by unit 460. The largest angle of the triangle formed by these three points is then calculated (e.g., the opposite angle across from the largest length of the triangle). Finally, the midpoint of the segment defined by this largest angle approximates the separation point for the region governed by the two naπow FOV cameras 424. Assuming the "fish eye" lens are used in both wide FOV cameras 424, the polar coordinates are preprocessed prior to unit 458 into rectangular coordinates using the relationships:
x = r cos θ and y = r sin θ
After the objects have been defined in unit 458, unit 460 may also include a first object recognition filter for applying various computational and pattern recognition techniques and condition tables to determine whether any objects of any significance have entered into the wide FOV camera's 424 view. For example, the feature set defining an object includes object size, object direction, average speed of object, path of object, and location of object in relation to specialized areas. Unit 460 compares the feature set parameters associated with each of the objects to a condition table to determine whether the particular object(s) are of any interest to tracking system 420. Assuming at least some of the objects are of interest to system 420, the coordinates of each object are fed from a block 462 to a zoom control block 464. In naπow FOV channel 430, zoom control block 464 moves naπow FOV cameras 428 to the coordinates received from object coordinate unit 462 and receives high-resolution video of the particular object of interest. After acquiring more detailed information, tracking system 420 automatically determines in a second object recognition unit 466 whether the objects of interest are significant objects that should be tracked based on a comparison of the objects feature set to a condition table. If unit 466 determines there are significant objects to be tracked, the video signal is forwarded to a video multiplexer 468 that is coupled to other naπow FOV cameras in a block 470 to divide the tracking time of individual tracking cameras 428 between objects. In particular, if there are more significant objects that must be tracked than the number of naπow FOV cameras 428, video multiplexer unit 468 divides the amount of camera time spent tracking significant objects between multiple objects depending on how many cameras are available.
The video is then compressed in a video compression input module 472 and transmitted to a compression module based on motion estimation that uses soft computing techniques to apply hypercompression to the data with a processor 474. The compression data from module 472 and processor 474 are transmitted to an intelligent I-frame insertion module 476 that inserts meaningful I-frames at the beginning of each scene change in a video as described above. The video is then forwarded to a compressed digital data stream formatter unit 478 and sent to wireless data spread spectrum transmitter 432 for wireless transmission to control module 56. FIGURE 33 illustrates the autonomous operation of tracking system 420 in connection with the tracking example mentioned above. After system 420 is set up at a start block 480, all sensors 14, 16 and 18 are in a sleep mode 482. Every sensor including motion detector 436 is tested in a block 484 and a block 486 determines if motion detector 436 has been activated. As long as motion detector 436 is not activated, tracking system 420 remains in "sleep" mode. However, as soon as motion detector 436 is activated, tracking system 420 switches from "sleep" mode into a "YELLOW" alarm mode in a block 488.
As described in connection with FIGURE 32, in an object block 490 two frames of the video are subtracted from one another to determine the pixels that have changed and the blobs are segmented to form objects based on the centroids of the segments. Noise is filtered out of the picture in a block 492 and a first level object- recognition is applied in a block 494 to the objects identified in block 490. Based on this initial object recognition analysis in block 494, an alarm condition is set in a block 496 conditioned upon whether the object is recognized as a significant object that should be tracked. If block 496 determines the object is not significant, control passes back up to block 484 to wait for additional objects to pass in front of wide FOV camera 424. On the other hand, if an object is determined in block 496 to be significant, tracking system 420 switches from a YELLOW alarm to a RED alarm in a block 498.
As tracking system 420 is iterating through each object to determine whether a RED alarm condition should be set, tracking system 420 also passes control from block 494 to an object coordinate unit 500 to determine the object coordinates of each of the objects identified in block 490.
The coordinates from block 500 are fed to a block 502 to move naπow FOV cameras 428 to the coordinates of each of the objects. At this point, tracking system 420 performs a second object recognition on each of the objects to determine whether a significant object is detected. A block 504 tests whether each of the objects is a significant object. If the particular object is not significant, control passes back to object recognition block 494 to continue to iteratively filter though each of the objects until no more objects are left to test.
If, however, any of the objects out of block 502 is determined to be significant, tracking system 420 switches to the "RED" alarm in block 498 and narrow FOV cameras 428 in a block 506 track each of the significant objects by sending video of the objects in a block 508 to control module 56. In particular, the video of the significant objects can be sent real-time to display 224 (FIGURE 15) in a block 510 or the video can be stored for future reference in a clip 512. Finally, the video of the significant event can be sent to decision-making unit 514 for the user to take over manual oveπide control of system 420.
Block 516 tests whether the significant object is out of the field of view. If the object is not out of the field of view, signal continues to be sent at block 508. On the other hand, as soon as the significant object moves out of the field of view, the RED alarm is canceled in a block 518, the YELLOW alarm is canceled in a bock 520 and control passes to the next sensor in a box 522 and then to block 484 to continuously check all of the sensors. FIGURE 34 summarizes the object tracking algorithm of tracking system 420. In order to determine relative object movement between frames, image subtraction is performed on a pixel-by-pixel basis in a step 524. Objects are defined in a block 526 and a first level object recognition is performed on the blobs from step 526 in a step 528. If the first level object recognition determines the object is not significant, decision block 530 passes control back to frame subtraction block 524. If, however, a significant object is going to be tracked, control passes from block 530 to a block 532 to calculate the center of gravity for each object.
A naπow FOV camera 428 is assigned to each of the objects in a step 534 and a second level object recognition is applied in a step 536 to determine whether naπow FOV cameras 428 should track the objects in a decision block 538. If the objects are significant and tracked, video is transmitted back to control module 56 in a step 540. On the other hand, if the object is not significant, control passes from decision block 538 back to frame subtraction block 524. FIGURE 35 illustrates a pair of blobs 542 and 544 including rectangular segments 548 with each segment including a center of gravity point 550. Noise artifacts 552 are eliminated by tracking system 420 and narrow FOV cameras 428 can track blobs 542 and 544 based on a global center of gravity points 554 and 556, respectively. Narrow FOV cameras may also track blobs 542 and 544 using points 558 and 560, respectively, based on taking a center point 562 between a first end 564 and a second end 566, and then dividing formed between end points 564 and 566 and center point 562, respectively, in half to obtain points 558 and 560. Naπow FOV cameras 428 track objects 542 and 544 based on the coordinates coπesponding to points 558 and 560. Alternatively, depending on the number of cameras and the number of objects, cameras 428 can multiplex between multiple objects.
OPERATION OF MULTIMEDIA SENSOR NETWORK
In operation, heterogeneous sensors 14, 16, 18 connected to one another in multimedia sensor network 12 transmit sensed data to control module 56 in the form of homogeneous, TCP-IP packets through bridge 30 to user network 38 (e.g., the Internet). The sensed data is fused together by implementing a neurofuzzy network relying on gateway software agents 58 and host software agents 70 to form fuzzy sentence outputs that require significantly less bandwidth than transmitting the unprocessed, sensed data from each sensor 14, 16, 18 directly to control module 56. Compression module 24 coupled to sensor network 12 relies on bit selective eπor coπection module 26 and intelligent agent module 28 to compress the sensed data into packetized digital streams and integrate the data through intelligent agent module 28. Sensor network 12 processes and transmits the sensed data nearly autonomously until compression module 24 is unable to fuse together the sensed data because the data from a particular set of sensors contradicts the data from a different set of sensors. In this case, sensor fusion defect module 20 requests user intervention to resolve the contradictory data.
The homogenized data is transmitted through bridge 30 that maintains the synchronization between the highly-synchronous, sensed data (e.g., multimedia streams) and the inherent asynchronous nature of user network 38 (e.g., the Internet). In particular, the ability to maintain the synchronous nature of the data despite the inconsistent network resources is based on QoS manager 34 that guarantees bandwidth for particular processes based on variations in network resources and buffer manager 36 that implements circular buffer 364 in NIC encoder buffer 206 and NIC decoder buffer 210.
In particular, buffer manager 36 dynamically and interactively adjusts buffer parameters (timing of read/write processes, buffer width, etc.) based on prioritization and network conditions that maximize resources. While the buffer is waiting for network resources to transmit data, the write process continues to write data into the buffer so that when network resources permit, data is transmitted and the read process catches up to the write process, thereby maintaining the synchronous transfer of data over asynchronous user network 38.
One application of network 10 implements sensor network 12 as a network that tracks moving objects. Motion detection sensor module 422 "wakes" up wide field-of-view camera 424 when motion is detected in a particular monitoring area. Wide field-of-view camera 424 transmits data to control module 56 that, based on a first level of object recognition, determines whether any of the identified objects in the monitoring area are significant. If any significant objects are detected, naπow field-of-view camera 428 similarly moves to the object's coordinates determined by wide field-of-view camera 424 and transmits additional data to control module 56.
After conducting a second level of object recognition, real-time video of the object is transmitted from control module 56 to user network 38 over bridge 30 if the object is determined to be significant. Naπow field-of-view camera 428 continues to track the significant object until the object is outside of the particular monitoring area. A user can access either the real-time video of the object over user network 38 (e.g., the Internet) or real-time high quality still images of particular scenes. The temporal and spatial data can also be stored in flash memory and viewed on an as- needed basis. The compression that enables the transmission of the still images that are interpreted from the video is based on the meaningful insertion of an I-frame at each scene change in the video. Therefore, the I-frames contain data representative of an entire scene in a video because the I-frames are not simply inserted at predetermined intervals regardless of the content of the video (which is standard practice in prior art compression methods).
While the detailed drawings, specific examples, and particular formulations given describe exemplary embodiments, they serve the purpose of illustration only. For example, while compression processor 204 implements data compression based on an MPEG strategy, other standards (e.g., JPEG) may also be used as the basis for data compression and homogenization. Additionally, sensor network 12 can be implemented to fulfill a variety of business-to-business (B2B) and/or business-to- consumer (B2C) models (e.g., visual shopping, distance learning, etc.). For example, a camera could be mounted on a remote robot and a consumer shopping on Internet 38 would receive streaming video over lines 32 and 40 from sensor network 12. Alternatively, sensor network 12 could support interactive video with the consumer on Internet 38. Similarly, sensor network 12 could support interactive video of a professor in a classroom teaching students over Internet 38. Therefore, the configurations shown and described are not limited to the precise details and conditions disclosed. Furthermore, other substitutions, modifications, changes and omissions may be made in the design, operating conditions and aπangement of exemplary embodiments without departing from the spirit of the invention as expressed in the appended claims.

Claims

WHAT IS CLAIMED IS:
1. A network comprising: a sensor network including a plurality of interconnected sensors coupled to a control module, wherein the control module receives a set of sensed data from the plurality of sensors and generates a homogenized data stream based on the sensed data; a communication bridge coupled to the sensor network, wherein the bridge buffers the homogenized data stream received from the sensor network; and a user network coupled to the communication bridge, wherein the user network receives the homogenized data stream from the sensor network and transmits a set of input data to the control module through the communication bridge.
2. The network according to claim 1 , wherein the homogenized data stream is a packetized data stream of multimedia data.
3. The network according to claim 2, wherein the user network is the Internet.
4. The network according to claim 3, wherein the plurality of interconnected sensors includes a set of local area sensors.
5. The network according to claim 4, wherein the plurality of interconnected sensors includes a set of middle area sensors.
6. The network according to claim 5, wherein the plurality of interconnected sensors includes a set of wide area sensors.
7. The network according to claim 6, wherein a first communication bus provides a first communication path between each of the wide area sensors. 8. The network according to claim 7, wherein a second communication bus provides a second communication path between the set of wide area sensors and the set of middle area sensors. 9. The network according to claim 8, wherein a third communication bus provides a third communication path between the set of middle area sensors and the set of local area sensors.
10. The network according to claim 9, wherein a fourth communication bus provides a fourth communication path between the set of wide area sensors and the control module.
1 1. The network according to claim 10, wherein a first emergency communication bus provides a first emergency communication path between each of the local area sensors.
12. The network according to claim 11, wherein a second emergency communication bus provides a second emergency communication path between each of the middle area sensors. 13. The network according to claim 12, wherein the first, the second and the third communication paths are wireless.
14. The network according to claim 13, wherein the fourth communication path is a satellite relay to the control module.
15. The network according to claim 12, wherein the set of local area sensors generates a set of local area sensed data, the set of middle area sensors generates a set of middle area sensed data, and the set of wide area sensors generates a set of wide area sensed data.
16. The network according to claim 15, wherein the control module fuses together the local area sensed data, the middle area sensed data and the wide area sensed data.
17. The network according to claim 16, wherein the fusing of the local area sensed data, the middle area sensed data and the wide area sensed data is based on neuro-fuzzy processing.
18. The network according to claim 17, wherein the fusing of the local area sensed data, the middle area sensed data and the wide area sensed data further comprises: approximating an initial draft of a set of fuzzy rules; mapping the initial draft of the set of fuzzy rules to a location and a curve of a set of membership functions; fine-tuning the location of the set of membership functions for optimal performance of the set of fuzzy rules using a neural network; submitting a set of training data to a fuzzy rule base and the neural network; generating a set of initial fuzzy membership functions using the neural network; submitting the set of initial fuzzy membership functions to the fuzzy rule base; generating an actual output from the fuzzy rule base; comparing the actual output with a desired output contained in the set of training data; adjusting a set of neural network weights, thereby adjusting the set of membership functions; and presenting the adjusted set of membership functions to the fuzzy rule base until a difference between the actual output and the desired output is below a predetermined minimum threshold value. 19. The network according to claim 18, wherein additional sets of training data are iteratively submitted to the fuzzy rule base and the neural network until a set of parameters associated with the set of membership functions converges to a final value.
20. The network according to claim 19, wherein based on a definition of the set of membership functions, the fuzzy rule base mimics the set of training data.
21. The network according to claim 20, wherein the sensor network is a redundant network providing a set of alternative communication paths between the local area sensors, the middle area sensors and the wide area sensors.
22. The network according to claim 21, wherein the sensor network operates primarily in an autonomous mode.
23. The network according to claim 15, wherein the set of local area sensors includes at least one of a magnetic sensor, a simple seismic sensor, a simple chemical sensor, a temperature sensor, and a wind sensor.
24. The network according to claim 23, wherein the set of local area sensors are in a sleep mode until activated by a wakeup signal transmitted from one of the middle area sensors to one of the local area sensors over the third communication bus.
25. The network according to claim 24, wherein one of the local area sensors transmits analog data to one of the middle area sensors over an analog RF channel.
26. The network according to claim 24, wherein the set of middle area sensors are one-dimensional sensors.
27. The network according to claim 26, wherein the set of middle area sensors includes at least one of a voice sensor, a spectrometer, an x-ray, and a complex chemical sensor.
28. The network according to claim 27, wherein the middle area sensed data is digitized.
29. The network according to claim 28, wherein the middle area sensed data is compressed using a gateway software agent.
30. The network according to claim 29, wherein the gateway software agent processes the middle area sensed data and determines whether the middle area sensed data is meaningful.
31. The network according to claim 30, wherein the middle area sensed data is transmitted from the set of middle area sensors to the set of wide area sensors over the second communication bus if the gateway software agent determines that the middle area sensed data is meaningful. 32. The network according to claim 31 , wherein the gateway software agent is a self-organized fuzzy controller that operates with a million operations per second (MOPS) processing power. 33. The network according to claim 32, wherein the gateway software agent further comprises: an analog-to-digital converter; a template matching module; a filter bank coupled to the template matching module; a decision making module; and a communication interface. 34. The network according to claim 33, wherein the communication interface is a Harris prism.
35. The network according to claim 33, wherein a coπelation peak representing the digital cross-coπelation of a sample signal with a set of filter signals from the filter bank is compared to a predefined threshold value.
36. The network according to claim 35, wherein the decision making module sends a positive signal to the communication interface if the coπelation peak is greater than or equal to the predefined threshold value.
37. The network according to claim 36, wherein the middle area sensed data is transmitted to the set of wide area sensors over the second communication bus if the communication interface receives the positive signal from the decision making module.
38. The network according to claim 37, wherein the middle area sensed data is not transmitted to the set of wide area sensors over the second communication bus if the communication interface does not receive the positive signal from the decision making module. 39. The network according to claim 27, wherein the set of wide area sensors are two-dimensional sensors.
40. The network according to claim 39, wherein the set of wide area sensors includes at least one of a forward-looking infrared sensor, an imaging radar, a complex seismic sensor, a two-dimensional imagery sensor, a three- dimensional volumetric sensor, a four-dimensional video with a set of three- dimensional spectral video sequences, a hyperspectral video sensor, and a video sensor.
41. The network according to claim 40, wherein the wide area sensed data is compressed using an intelligent agent. 42. The network according to claim 41, wherein the wide area sensed data is compressed using a host software agent. 43. The network according to claim 42, wherein the host software agent transmits the wide area sensed data and the middle area sensed data to the control module over the fourth communication bus. 44. The network according to claim 43, wherein the host software agent collects, processes and transmits the homogenized data stream including the middle area sensed data and the wide area sensed data to the control module.
45. The network according to claim 44, wherein the homogenized data stream includes a set of visual information in the form of video and imagery.
46. The network according to claim 45, wherein the host software agent transmits a sample of the homogenized data stream to the control module. 47. The network according to claim 46, wherein the host software agent negotiates with the control module to determine a subset of the homogenized data stream to be transmitted from the set of wide area sensors to the control module after the sample is transmitted to the control module.
48. The network according to claim 47, wherein the control module packetizes the homogenized data in a transport layer.
49. The network according to claim 48, wherein the homogenized data is packetized using a TCP/IP protocol.
50. The network according to claim 48, wherein the homogenized data is packetized using a TCP/IP protocol and an ATM protocol. 51. The network according to claim 48, wherein the control module packetizes the homogenized data in a transport layer.
52. The network according to claim 6, wherein the set of local area sensors and the set of middle area sensors include at least one omnidirectional antenna.
53. The network according to claim 6, wherein the set of middle area sensors and the set of wide area sensors include highly-distributed, 8 billion operations per second (BOPS) processing power.
54. The network according to claim 53, wherein the sensor network is data- centric.
55. The network according to claim 6, wherein each of the set of local area sensors, the set of middle area sensors, and the set of wide area sensors includes a limited view of a monitoring area.
56. The network according to claim 55, wherein a portion of the limited view overlaps with each of the set of local area sensors, the set of middle area sensors and the set of wide area sensors. 57. The network according to claim 56, wherein each of the set of local area sensors, the set of middle area sensors and the set of wide area sensors monitors a portion of the monitoring area and synthesizes a set of neighboring data based on a first set of sensed data from a first sensor adjacent to a particular sensor and a second set of sensed data from a second sensor adjacent to the particular sensor. 58. The network according to claim 57, wherein each of the set of local area sensors, the set of middle area sensors and the set of wide area sensors and the set of middle area sensors further processes the first set of sensed data and the second set of sensed data along with the set of local area sensed data, the set of middle area sensed data or the set of wide area sensed data associated with the particular sensor. 59. The network according to claim 58, wherein the sensor network is redundant based on the overlap of the portion of the monitoring area associated with each of the set of local area sensors, the set of middle area sensors and the set of wide area sensors.
60. The network according to claim 59, wherein the redundant sensor network is further based on a bit selective eπor coπection module.
61. The network according to claim 60, wherein the bit selective eπor coπection module is implemented on a transport layer.
62. The network according to claim 61, wherein the sensor network includes at least two communication paths between a particular local area sensor and a particular middle area sensor, and between the particular middle area sensor and a particular wide area sensor.
63. The network according to claim 62, wherein the sensor network is insensitive to a physical communication layer.
64. The network according to claim 63, wherein the control module routes the set of local area sensed data, the set of middle area sensed data or the set of wide area sensed data through one of the at least two communication paths if one of the set of local area sensors, the set of middle area sensors or the set of wide area sensors fails.
65. The network according to claim 64, wherein a sensor fusion defect module transmits a request to the control module if the sensor network is unable to resolve a conflict between the set of local area sensed data, the set of middle area sensed data, and the set of wide area sensed data.
66. The network according to claim 65, wherein a running time associated with the sensor network operates in an autonomous mode 90% of the time.
67. The network according to claim 66, wherein the sensor fusion defect module requests assistance from a user in the form of a three-dimensional teleoperation at the set of wide area sensors.
68. The network according to claim 6, wherein the set of local area sensors and the set of middle area sensors include at least one omnidirectional antenna.
69. The network according to claim 67, wherein the request is to resolve inherent contradictions in the sensed data. 70. The network according to claim 1, wherein the transmission of the sensed data is optimized for a TCP/IP protocol.
71. The network according to claim 1, wherein the transmission of the sensed data is optimized for an ATM protocol.
72. The network according to claim 1, wherein the transmission of the sensed data is optimized for a TCP/IP protocol and an ATM protocol.
73. The network according to claim 1, wherein the sensed data is compressed by a compression module and packetized into a digital data stream, and transmitted through a gateway software agent.
74. The network according to claim 73, wherein the sensed data is compressed at a rate of 4000:1.
75. The network according to claim 1, wherein the sensor network is a frequency hopping spread spectrum network.
76. The network according to claim 1, wherein the sensor network is a fiber sensor communication network. 77. The network according to claim 1, wherein the sensed data is transmitted from the control module through the bridge to the user network using a contextual meaning of the sensed data. 78. The network according to claim 77, wherein the sensed data is transmitted using a fuzzified output. 79. The network according to claim 78, wherein the output is in the form of a fuzzy sentence.
80. The network according to claim 1, wherein the sensor network is self-healing and includes a transmitter flow of data and a receiver flow of data.
81. The network according to claim 80, wherein the transmitter flow of data comprises: creating a set of packetized data based on the sensed data; establishing a communication channel with a receiving node; requesting a data channel from the receiving node; testing if sufficient bandwidth exists to transmit the sensed data if an acknowledgment signal is received from the receiving node for the request for the data channel; and transmitting a data header associated with the packetized data to a destination node.
82. The network according to claim 81, wherein a different receiving node is chosen if the acknowledgment signal is not received within a predetermined time limit.
83. The network according to claim 82, wherein a new communication channel is established with the receiving node if there is not sufficient bandwidth to transmit the sensed data.
84. The network according to claim 83, wherein the receiver flow of data comprises: communicating to a requesting node whether sufficient bandwidth exists to receive the sensed data if a set of received data is the same as a set of requested data; waiting for the data header; receive and multiplex the set of received data with a set of acquired sensor data into a transport stream; transmit the transport stream to the destination node.
85. The network according to claim 12, further comprising: a gateway software agent coupled to the set of middle area sensors; and a host software agent coupled to the set of wide area sensors and the control module, wherein the host software agent includes a visualization module, a graphic overlap module and a memory.
86. The network according to claim 85, wherein the visualization module is coupled to the control module.
87. The network according to claim 81, wherein a different receiving node is chosen if the acknowledgment signal is not received within a predetermined time limit.
88. The network according to claim 1, wherein the control module defines a window of opportunity around a moving object in a monitoring area.
89. The network according to claim 88, wherein the control module compresses the moving object at a lower compression rate than a set of background data in the window of opportunity associated with the moving object.
90. The network according to claim 89, wherein the lower compression rate is approximately a 1000:1 compression ratio and the set of background data is approximately a 11,600:1 compression ratio.
91. The network according to claim 81, wherein a different receiving node is chosen if the acknowledgment signal is not received within a predetermined time limit.
92. The network according to claim 1, wherein the sensed data includes a set of temporal data and a set of spatial data.
93. The network according to claim 92, wherein the set of spatial data is interpreted from the set of temporal data.
94. The network according to claim 93, wherein the temporal data is transmitted through a first communication channel and the set of spatial data is transmitted through a second communication.
95. The network according to claim 94, wherein the data throughput associated with the first communication channel is the same as the data throughput associated with the second communication channel.
96. A method of providing multimedia data over a network comprising the steps of: processing a set of multimedia information including a set of temporal data and a set of spatial data; compressing the set of temporal data and the set of spatial data; and interpreting the set of spatial data from the set of temporal data.
97. The method according to claim 96, wherein the set of temporal data is transmitted through a first communication channel and the set of spatial data is transmitted through a second communication channel.
98. The method according to claim 97, wherein the data throughput associated with the first communication channel is the same as the data throughput associated with the second communication channel.
99. The method according to claim 98, wherein the set of temporal data includes digital video of a scene having a plurality of frames.
100. The method according to claim 99, wherein the set of spatial data includes a segment of high-resolution still images coπesponding to one of the plurality of frames of the digital video.
101. The method according to claim 100, wherein the number of the frames in the segment is variable based on a start location and an end location of the scene.
102. The method according to claim 101, wherein each of the segments includes an I-frame representing a corresponding scene.
103. The method according to claim 102, wherein the step of compressing the set of spatial data further includes inserting the I-frame at the start location of the coπesponding scene.
104. The method according to claim 103, wherein the I-frame is inserted based on computing a difference between the I-frame and the remaining frames in the segment.
105. The method according to claim 104, wherein the difference is calculated based on:
Figure imgf000108_0001
where N is the number of pixels in the frame (e.g., 640 x 480 for VGA standard), and di and dio are RGB-pixel gray levels for a given I-frame and a reference frame. 106. The method according to claim 105, wherein a new I-frame is implanted if
(GE) > T wherein T is a predefined threshold value. 107. The method according to claim 106, wherein the digital video is transmitted in real-time to a display through the first communication channel simultaneously with the real-time transmission of the I-frames through the second communication channel to the display. 108. The method according to claim 107, wherein the same data throughput is maintained for the first channel and the second channel based on
(FC)3 N _ (FC) , v _ (CR) lc^ - (cipor icIj° - -^ N
where FC is the single frame original content, N is the number of frames, CR is the average compression ratio of the N-frame synchronized cycle, Clo is the compression ratio of the I-frames in the second channel and k is the ratio of CD (the average compression ratio) to CI (the compression ratio of the video in the first channel).
109. The method according to claim 108, wherein the same data throughput is maintained for the first channel and the second channel based on
Figure imgf000109_0001
where N is the number of frames, Clo is the compression ratio of the I-frames in the second channel and k is the ratio of CD (the average compression ratio) to CI (the compression ratio of the video in the first channel). 110. A multimedia sensor network configured to integrate temporal data with spatial data, the network comprising: a plurality of sensors configured to generate multimedia data; and a processor configured to process, compress and transmit the multimedia data, wherein the processor includes an encoder coupled to a local area network, wherein the local area network transmits compressed temporal data through a first communication channel and the compressed spatial data through a second communication channel. 111. The network according to claim 1 10, wherein the data throughput of the first communication channel is the same as the data throughput of the second communication channel.
112. The network according to claim 1 11, wherein the plurality of sensors include local sensors, non-imaging middle area sensors and image-based wide area sensors.
1 13. The network according to claim 112, wherein the encoder includes an image compression processor and a motion estimation processor.
1 14. The network according to claim 100, wherein the temporal data includes digital video of a scene having a plurality of frames.
115. The network according to claim 100, wherein the spatial data includes a segment of high-resolution still images coπesponding to one of the plurality of frames of the digital video.
116. The network according to claim 115, wherein the number of the frames in the segment is variable based on a start location and an end location of the scene.
117. The network according to claim 116, wherein each of the segments includes an I-frame representing a coπesponding scene. 118. The network according to claim 117, wherein the number of the frames in the segment is variable based on a start location and an end location of the scene.
119. The network according to claim 100, wherein an I-frame is inserted at the start location of the coπesponding scene.
120. The network according to claim 119, wherein the I-frame is inserted based on computing a difference between the I-frame and the remaining frames in the segment.
121. The network according to claim 120, wherein the difference is calculated based on
Figure imgf000110_0001
where N is the number of pixels in the frame (e.g., 640 x 480 for VGA standard), and di and dio are RGB-pixel gray levels for a given I-frame and a reference frame.
122. The network according to claim 121, wherein a new I-frame is implanted if (GE) > T wherein T is a predefined threshold value. 123. The network according to claim 122, wherein the compressed digital video is transmitted in real-time to a display through the first communication channel simultaneously with the real-time transmission of the compressed I-frames through the second communication channel to the display. 124. The network according to claim 123, wherein the same data throughput is maintained for the first channel and the second channel based on
Figure imgf000111_0001
where FC is the single frame original content, N is the number of frames, CR is the average compression ratio of the N-frame synchronized cycle, Clo is the compression ratio of the I-frames in the second channel and k is the ratio of CD (the average compression ratio) to CI (the compression ratio of the video in the first channel).
125. The network according to claim 124, wherein the same data throughput is maintained for the first channel and the second channel based on
(CI) - ^ N - l η = / = 1 +
(Cl)o k where N is the number of frames, Clo is the compression ratio of the I-frames in the second channel and k is the ratio of CD (the average compression ratio) to CI (the compression ratio of the video in the first channel).
126. The network according to claim 125, wherein the number of the frames in the segment is variable based on a start location and an end location of the scene.
127. A network comprising: a sensor network including a plurality of interconnected sensors coupled to a control module, wherein the control module receives a set of sensed data from the plurality of sensors including a set of temporal data and generates a homogenized data stream based on the sensed data; an intelligent compression module coupled to the sensor network, wherein a set of spatial data is interpreted from the set of temporal data; a communication bridge coupled to the sensor network, wherein the bridge buffers the homogenized data stream received from the sensor network; and a user network coupled to the communication bridge, wherein the user network receives the homogenized data stream from the sensor network and transmits a set of input data to the control module through the communication bridge. 128. The method according to claim 127, wherein the set of temporal data coπesponds to video data and the set of spatial data coπesponds to still imagery.
129. The method according to claim 128, wherein the compression module encodes the set of temporal data by inserting an I-frame at a start location of each of a plurality of scenes.
130. The method according to claim 129, wherein the I-frame is inserted based on computing a difference between the I-frame and a set of remaining frames in the scene.
131. The method according to claim 127, wherein the user network is the Internet. 132. A multimedia network comprising: a sensor network including a plurality of interconnected sensors coupled to a control module, wherein the control module receives a set of sensed data from the plurality of sensors including a set of temporal data and generates a data stream based on the sensed data; an intelligent compression module coupled to the sensor network, wherein a set of spatial data is interpreted from the set of temporal data; a communication bridge coupled to the sensor network, wherein the bridge includes a buffer manager to buffer the data stream received from the sensor network and a quality of service manager to guarantee a particular bandwidth for the transmission of the data stream; and a user network coupled to the communication bridge, wherein the user network receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge. 133. The network according to claim 132, wherein the set of temporal data coπesponds to video data and the set of spatial data coπesponds to still imagery.
134. The network according to claim 133, wherein the compression module encodes the set of temporal data by inserting an I-frame at a start location of each of a plurality of scenes.
135. The network according to claim 134, wherein the I-frame is inserted based on computing a difference between the I-frame and a set of remaining frames in the scene.
136. The network according to claim 133, wherein the communication bridge maintains synchronicity of the set of temporal data as the set of temporal data is transmitted through the asynchronous user network. 137. The network according to claim 136, wherein the communication bridge dynamically adjusts the timing of the transmission of the set of temporal data and a set of network buffer parameters to ensure the set of temporal data is synchronously received by the user network despite a network latency inherent in the user network. 138. The network according to claim 137, wherein the network latency results from a set of routers and a delay associated with a plurality of switches. 139. The network according to claim 138, wherein the buffer manager implements a circular buffer to buffer the data stream from the sensor network to the user network. 140. The network according to claim 139, wherein the user network is the Internet.
141. The network according to claim 140, wherein the circular buffer is implemented by modifying a transport layer.
142. The network according to claim 141, wherein an application program interface implements the circular buffer. 143. The network according to claim 142, wherein the transport layer is modified by extending a TCP/IP socket subprotocol. 144. The network according to claim 142, wherein the transport layer is modified by applying a bit selective eπor coπection on the data stream after an initial eπor coπection is implemented on a physical layer. 145. The network according to claim 132, wherein the control module packetizes the data stream.
146. The network according to claim 145, wherein the control module configures a maximum allowable packet size associated with the packetized data stream.
147. The network according to claim 146, wherein maximum allowable packet size is 1500 bytes. 148. The network according to claim 147, wherein transmission of the packetized data stream between the sensor network and the user network is controlled by the communication bridge.
149. The network according to claim 148, wherein transmission of the packetized data is controlled by an encoder and a decoder, wherein the encoder includes an encoder network interface card and the decoder includes a decoder network interface card.
150. The network according to claim 149, wherein the encoder network interface card includes an encoder buffer and the decoder network interface card includes a decoder buffer. 151. The network according to claim 150, wherein the buffer manager buffers the data stream in the encoder buffer and the decoder buffer.
152. The network according to claim 151, wherein the buffer manager implements a circular buffer in each of the encoder buffer and the decoder buffer.
153. The network according to claim 152, wherein the buffer manager dynamically allocates a bandwidth for the transmission of the data stream based a set of user network parameters.
154. The network according to claim 153, wherein the buffer manager negotiates the bandwidth for each transmission of the data stream to optimize the distribution of the bandwidth based on a priority parameter associated with a particular data stream and a condition of the user network based on the set of user network parameters.
155. The network according to claim 154, wherein the buffer manager implements global smoothing to dynamically allocate the bandwidth.
156. The network according to claim 154, wherein the buffer manager implements local smoothing to dynamically allocate the bandwidth.
157. The network according to claim 152, wherein the circular buffer is a permanently allocated buffer.
158. The network according to claim 157, wherein the circular buffer includes a read pointer and a write pointer, the read pointer does not pass the write pointer, and the write pointer does not pass the read pointer.
159. The network according to claim 151, wherein the buffer manager implements a doubly-linked circular buffer in each of the encoder buffer and the decoder buffer.
160. The network according to claim 158, wherein the buffer manager adjusts a timing parameter associated with the incrementing of the read pointer and the write pointer based on the set of user network parameters. 161. The network according to claim 160, wherein the buffer manager dynamically adjusts a buffer width parameter associated with each of the circular buffers based on the set of user network parameters.
162. The network according to claim 161, wherein buffer manager adjusts the timing parameter to ensure the write pointer is continually incremented even if the user network condition forces the read pointer to stop incrementing.
163. The network according to claim 162, wherein after the read pointer is forced to stop incrementing, the buffer manager adjusts the timing parameter to allow the read pointer to catch up to the write pointer when the user network condition allows the read pointer to begin incrementing. 164. The network according to claim 132, wherein the control module assigns a priority parameter to each of the plurality of sensors. 165. The network according to claim 164, wherein each of the plurality of sensors that is assigned a high priority receives a greater amount of bandwidth than each of the plurality of sensors that is assigned a lower priority. 166. The network according to claim 165, wherein the quality of service manager optimizes at least one of a required bandwidth parameter, a latency parameter, a jitter parameter and a loss parameter to ensure a particular bandwidth for each transmission of the data stream. 167. The network according to claim 166, wherein the quality of service manager optimizes at least one of a frame resolution parameter, a frame rate parameter, a color depth parameter and a frame drop-out frequency parameter based on the priority parameter assigned to each of the sensors and an available bandwidth parameter based on a user network condition associated with each transmission of the data stream.
168. The network according to claim 167, wherein each of the frame resolution parameter, the frame rate parameter, the color depth parameter, the frame drop-out frequency parameter, the required bandwidth parameter, the latency parameter, the jitter parameter, and the loss parameter are nonlinearly linked and a dimension in an optimization space.
169. The network according to claim 168, wherein the quality of service manager optimizes a multifunction optimization problem defined by each of the dimensions in the optimization space by finding a global minimum of the multifunction optimization problem.
170. The network according to claim 169, wherein the quality of service manager applies a genetic algorithm to find the global minimum of the multifunction optimization problem. 171. The network according to claim 169, wherein the quality of service manager applies a fuzzy neural network to the multifunction optimization problem.
172. A multimedia network comprising: a sensor network including a plurality of interconnected sensors coupled to a control module, wherein the control module receives a set of sensed data from the plurality of sensors including a set of temporal data and generates a data stream based on the sensed data; a communication bridge coupled to the sensor network, wherein the bridge includes a buffer manager to buffer the data stream received from the sensor network and a quality of service manager to guarantee a particular bandwidth for the transmission of the data stream; and a user network coupled to the communication bridge, wherein the user network receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge.
173. The network according to claim 172, wherein the network further includes an intelligent compression module coupled to the sensor network, wherein a set of spatial data is interpreted from the set of temporal data.
174. The network according to claim 173, wherein the compression module encodes the set of temporal data by inserting an I-frame at a start location of each of a plurality of scenes.
175. The network according to claim 174, wherein the set of temporal data coπesponds to video data and the set of spatial data coπesponds to still imagery.
176. The network according to claim 175, wherein the I-frame is inserted based on computing a difference between the I-frame and a set of remaining frames in the scene. 177. The network according to claim 176, wherein the communication bridge maintains synchronicity of the set of temporal data as the set of temporal data is transmitted through the asynchronous user network.
178. The network according to claim 177, wherein the communication bridge dynamically adjusts the timing of the transmission of the set of temporal data and a set of network buffer parameters to ensure the set of temporal data is synchronously received by the user network despite a network latency inherent in the user network.
179. A tracking network comprising: a sensor network including a plurality of interconnected sensors coupled to a control module, wherein the plurality of sensors track a moving object in a monitoring area, and the control module receives a set of sensed data from the plurality of sensors and generates a data stream based on the sensed data; a communication bridge coupled to the sensor network, wherein the bridge buffers the data stream received from the sensor network; and a user network coupled to the communication bridge, wherein the user network receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge.
180. The network according to claim 179, wherein the data stream is a packetized and homogenized data stream of multimedia data.
181. The network according to claim 180, wherein the user network is the Internet.The network according to claim 181, wherein the plurality of interconnected sensors includes a set of local area sensors.
182. The network according to claim 182, wherein the plurality of interconnected sensors includes a set of middle area sensors.
183. The network according to claim 183, wherein the plurality of interconnected sensors includes a set of wide area sensors.
184. The network according to claim 184, wherein a first communication bus provides a first communication path between each of the wide area sensors. 185. The network according to claim 185, wherein a second communication bus provides a second communication path between the set of wide area sensors and the set of middle area sensors.
186. The network according to claim 186, wherein a third communication bus provides a third communication path between the set of middle area sensors and the set of local area sensors.
187. The network according to claim 187, wherein a fourth communication bus provides a fourth communication path between the set of wide area sensors and the control module.
188. The network according to claim 188, wherein a first emergency communication bus provides a first emergency communication path between each of the local area sensors.
189. The network according to claim 189, wherein a second emergency communication bus provides a second emergency communication path between each of the middle area sensors. 190. The network according lo claim 188, wherein the first, the second and the third communication paths are wireless.
191. The network according to claim 188, wherein the fourth communication path is a satellite relay to the control module.
192. The network according to claim 190, wherein the set of local area sensors generates a set of local area sensed data, the set of middle area sensors generates a set of middle area sensed data, and the set of wide area sensors generates a set of wide area sensed data.
193. The network according to claim 193, wherein the control module fuses together the local area sensed data, the middle area sensed data and the wide area sensed data.
194. The network according to claim 194, wherein the fusing of the local area sensed data, the middle area sensed data and the wide area sensed data is based on neuro-fuzzy processing.
195. The network according to claim 195, wherein the fusing of the local area sensed data, the middle area sensed data and the wide area sensed data further comprises: approximating an initial draft of a set of fuzzy rules; mapping the initial draft of the set of fuzzy rules to a location and a curve of a set of membership functions; fine-tuning the location of the set of membership functions for optimal performance of the set of fuzzy rules using a neural network; submitting a set of training data to a fuzzy rule base and the neural network; generating a set of initial fuzzy membership functions using the neural network; submitting the set of initial fuzzy membership functions to the fuzzy rule base; generating an actual output from the fuzzy rule base; comparing the actual output with a desired output contained in the set of training data; adjusting a set of neural network weights, thereby adjusting the set of membership functions; and presenting the adjusted set of membership functions to the fuzzy rule base until a difference between the actual output and the desired output is below a predetermined minimum threshold value.
196. The network according to claim 196, wherein additional sets of training data are iteratively submitted to the fuzzy rule base and the neural network until a set of parameters associated with the set of membership functions converges to a final value.
197. The network according to claim 197, wherein based on a definition of the set of membership functions, the fuzzy rule base mimics the set of training data.
198. The network according to claim 179, wherein the sensor network is a redundant network providing a set of alternative communication paths between the local area sensors, the middle area sensors and the wide area sensors.
199. The network according to claim 199, wherein the sensor network operates primarily in an autonomous mode.
200. The network according to claim 198, wherein the set of local area sensors includes at least one of a magnetic sensor, a simple seismic sensor, a simple chemical sensor, a temperature sensor, and a wind sensor.
201. The network according to claim 201, wherein the set of local area sensors are in a sleep mode until activated by a wakeup signal transmitted from one of the middle area sensors to one of the local area sensors over the third communication bus.
202. The network according to claim 202, wherein one of the local area sensors transmits analog data to one of the middle area sensors over an analog RF channel.
203. The network according to claim 203, wherein the set of middle area sensors are one-dimensional sensors.
204. The network according to claim 204, wherein the set of middle area sensors includes at least one of a voice sensor, a spectrometer, an x-ray, and a complex chemical sensor.
205. The network according to claim 205, wherein the middle area sensed data is digitized.
206. The network according to claim 206, wherein the middle area sensed data is compressed using a gateway software agent.
207. The network according to claim 207, wherein the gateway software agent processes the middle area sensed data and determines whether the middle area sensed data is meaningful.
208. The network according to claim 208, wherein the middle area sensed data is transmitted from the set of middle area sensors to the set of wide area sensors over the second communication bus if the gateway software agent determines that the middle area sensed data is meaningful.
209. The network according to claim 209, wherein the gateway software agent is a self-organized fuzzy controller that operates with a million operations per second (MOPS) processing power.
210. The network according to claim 210, wherein the gateway software agent further comprises: an analog-to-digital converter; a template matching module; a filter bank coupled to the template matching module; a decision making module; and a communication interface.
211. The network according to claim 211, wherein the communication interface is a Harris prism. 212. The network according to claim 212, wherein a coπelation peak representing the digital cross-correlation of a sample signal with a set of filter signals from the filter bank is compared to a predefined threshold value.
213. The network according to claim 213, wherein the decision making module sends a positive signal to the communication interface if the coπelation peak is greater than or equal to the predefined threshold value.
214. The network according to claim 214, wherein the middle area sensed data is transmitted to the set of wide area sensors over the second communication bus if the communication interface receives the positive signal from the decision making module. 215. The network according to claim 215, wherein the middle area sensed data is not transmitted to the set of wide area sensors over the second communication bus if the communication interface does not receive the positive signal from the decision making module.
216. The network according to claim 216, wherein the set of wide area sensors are two-dimensional sensors.
217. The network according to claim 217, wherein the set of wide area sensors includes at least one of a forward-looking infrared sensor, an imaging radar, a complex seismic sensor, a two-dimensional imagery sensor, a three- dimensional volumetric sensor, a four-dimensional video with a set of three- dimensional spectral video sequences, a hyperspectral video sensor, and a video sensor. 218. The network according to claim 218, wherein the wide area sensed data is compressed using an intelligent agent.
219. The network according to claim 219, wherein the wide area sensed data is compressed using a host software agent.
220. The network according to claim 220, wherein the host software agent transmits the wide area sensed data and the middle area sensed data to the control module over the fourth communication bus.
221. The network according to claim 221, wherein the host software agent collects, processes and transmits the homogenized data stream including the middle area sensed data and the wide area sensed data to the control module. 222. The network according to claim 222, wherein the homogenized data stream includes a set of visual information in the form of video and imagery.
223. The network according to claim 223, wherein the host software agent transmits a sample of the homogenized data stream to the control module.
224. The network according to claim 224, wherein the host software agent negotiates with the control module to determine a subset of the homogenized data stream to be transmitted from the set of wide area sensors to the control module after the sample is transmitted to the control module.
225. The network according to claim 225, wherein the control module packetizes the homogenized data in a transport layer. 226. The network according to claim 226, wherein the homogenized data is packetized using a TCP/IP protocol.
227. The network according to claim 226, wherein the homogenized data is packetized using a TCP/IP protocol and an ATM protocol.
228. The network according to claim 195, wherein the set of middle area sensors and the set of wide area sensors include highly-distributed, 8 billion operations per second (BOPS) processing power.
229. The network according to claim 179, wherein the sensor network is data- centric.
230. The network according to claim 229, wherein each of the set of local area sensors, the set of middle area sensors, and the set of wide area sensors includes a limited view of a monitoring area.
231. The network according to claim 231, wherein a portion of the limited view overlaps with each of the set of local area sensors, the set of middle area sensors and the set of wide area sensors.
232. The network according to claim 232, wherein each of the set of local area sensors, the set of middle area sensors and the set of wide area sensors monitors a portion of the monitoring area and synthesizes a set of neighboring data based on a first set of sensed data from a first sensor adjacent to a particular sensor and a second set of sensed data from a second sensor adjacent to the particular sensor. 233. The network according to claim 233, wherein each of the set of local area sensors, the set of middle area sensors and the set of wide area sensors and the set of middle area sensors further processes the first set of sensed data and the second set of sensed data along with the set of local area sensed data, the set of middle area sensed data or the set of wide area sensed data associated with the particular sensor.
234. The network according to claim 234, wherein the sensor network is redundant based on the overlap of the portion of the monitoring area associated with each of the set of local area sensors, the set of middle area sensors and the set of wide area sensors. 235. A tracking network comprising: a sensor network including a plurality of interconnected sensors coupled to a control module, wherein the plurality of sensors track a moving object in a monitoring area, and the control module receives a set of sensed data including a set of temporal data from the plurality of sensors and generates a data stream based on the sensed data; an intelligent compression module coupled to the sensor network, wherein a set of spatial data is interpreted from the set of temporal data; a communication bridge coupled to the sensor network, wherein the bridge buffers the data stream received from the sensor network; and a user network coupled to the communication bridge, wherein the user network receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge.
236. The network according to claim 236, wherein the data stream is a packetized and homogenized data stream of multimedia data.
237. The network according to claim 237, wherein the user network is the Internet.
238. The network according to claim 238, wherein the plurality of interconnected sensors includes a set of local area sensors, a set of middle area sensors and a set of wide area sensors.
239. The network according to claim 239, wherein the set of wide area sensors includes a wide field-of-view camera and a narrow field-of-view camera.
240. The network according to claim 240, further including a functionality evaluator to determine the functionality of each of the sensors.
241. The network according to claim 241, wherein the functionality evaluator eliminates in real-time the set of sensed data from any malfunctioning sensor.
242. The network according to claim 242, wherein the functionality evaluator comprises the steps of: fuzzifying an input; applying a fuzzy operator to the fuzzified input; applying an implication operator to the input; aggregating a set of outputs; defuzzifying the set of outputs. 243. The network according to claim 243, further comprising the step of applying a fuzzy weight generator to group a set of alternative neural networks into a set of clusters based on a coπelation measurement among the set of alternative neural networks. 244. The network according to claim 244, further comprising the step of categorizing a set of input sample data into an appropriate fuzzy neural network, wherein each fuzzy neural network is trained with a coπesponding set of sample data.
245. The network according to claim 245, wherein each fuzzy neural network is trained using a modified learning vector quantization method by finding a set of reproduction vectors representing an information source with a minimum expected distortion. 246. The network according to claim 236, wherein an intelligent decision aid processes and integrates the set of sensed data from the plurality of sensors.
247. The network according to claim 247, wherein the intelligent decision aid integrates a neural network that adaptively generates a set of fuzzy logic rules.
248. The network according to claim 240, further including a functionality evaluator to determine the functionality of each of the sensors.
249. A tracking network comprising: a motion detection network including a plurality of interconnected sensors coupled to a control module, wherein the plurality of sensors track at least one moving object in a monitoring area, and the control module receives a set of sensed data including a set of temporal data from the plurality of sensors, and generates a data stream based on the sensed data; a communication bridge coupled to the motion detection network, wherein the bridge buffers the data stream received from the motion detection network; a user network coupled to the communication bridge, wherein the user network receives the data stream from the motion detection network and transmits a set of input data to the control module through the communication bridge; and wherein the control module receives at least a set of location coordinates coπesponding to the at least one moving object from a first sensor and transmits the set of location coordinates to a second sensor that tracks the at least one moving object.
250. The network according to claim 250, wherein the first sensor is at least one wide field-of-view camera and the second sensor is at least one naπow field- of-view camera.
251. The network according to claim 251 , wherein the control module determines the size, direction and number of at least one moving object in the monitoring area.
252. The network according to claim 252, wherein the at least one wide field-of- view camera and the at least one naπow field-of-view cameras are analog cameras.
253. The network according to claim 253, wherein an analog-to-digital converter is coupled to the at least one wide field-of-view camera and the at least one naπow field-of-view camera. 254. The network according to claim 252, wherein the at least one wide field-of- view camera and the at least one naπow field-of-view camera are digital cameras.
255. The network according to claim 250, wherein the first sensor includes a set of fish eye cameras and each of the fish eye cameras has a field of view of 180 degrees.
256. The network according to claim 256, wherein the control module coverts a set of polar coordinates from the pair of fish eye cameras to a set of Cartesian coordinates, and the set of location coordinates transmitted to the naπow field- of-view camera coπesponds to the set of Cartesian coordinates. 257. The network according to claim 255, wherein a first moving object and a second moving object are tracked by the at least one naπow field-of-view camera using time division multiplexing.
258. The network according to claim 251, wherein the at least one naπow field-of- view camera includes a tilt capability and a pan capability controlled by the control module.
259. The network according to claim 259, wherein the at least one naπow field-of- view camera remains in a sleep mode while the at least one wide field-of-view camera remains in an active mode to detect the at least one moving object in the monitoring area. 260. The network according to claim 260, wherein the user network is the Internet. 261. The network according to claim 261, wherein the at least one naπow field-of- view camera and the at least one wide field-of-view camera are IP-addressed.
262. The network according to claim 262, wherein the at least one naπow field-of- view camera and the at least one wide field-of-view camera are controlled by a user transmitting the set of input data to the control module through the Internet. 263. The network according to claim 259, wherein the at least one naπow field-of- view camera and the at least one wide field-of-view camera remain in a sleep mode until a motion detector coupled to the control module detects the at least one moving object in the monitoring area.
264. The network according to claim 259, wherein the at least one naπow field-of- view camera and the at least one wide field-of-view camera remain in a sleep mode until a directional sensor coupled to the control module detects the at least one moving object in the monitoring area.
265. The network according to claim 265, wherein the directional sensor is a bug eye sensor. 266. The network according to claim 266, wherein the bug eye sensor includes a plurality of nonimaging optical elements.
267. The network according to claim 267, wherein at least one of the plurality of nonimaging optical elements is activated by the at least one moving object, and a location of the at least one moving object is transmitted to the at least one wide field-of-view camera based on a set of coordinates associated with the at least one of the plurality of nonimaging optical elements that is activated.
268. The network according to claim 268, wherein the intensity of the at least one of the plurality of nonimaging optical elements that is activated is different from the intensity of the plurality of nonimaging optical elements that are not activated.
269. The network according to claim 250, wherein the first sensor is a plurality of motion detectors that are distributed among the second sensor that includes at least one naπow field-of-view camera. 270. The network according to claim 270, each of the motion detectors includes a wireless transmitter coupled to the control module.
271. The network according to claim 271 , wherein the set of location coordinates transmitted to the at least one naπow field-of-view camera coπesponds to the location of a particular motion detector that is activated by the least one moving object. 272. The network according to claim 251, wherein a frame subtraction unit coupled to the control module determines any movement coπesponding to the at least one moving object based on differences between a first set of pixels in a first frame of the monitoring area from the plurality of sensors and a second set of pixels in a consecutive second frame of the monitoring area from the plurality of sensors.
273. The network according to claim 273, wherein an object unit coupled to the frame subtraction unit and the control module determines a number of moving objects in the monitoring area and the set of location coordinates for each of the number of moving objects. 274. The network according to claim 274, wherein the object unit defines the at least one moving object as a region of connected pixels having a similar color and a similar light intensity and a set of dissimilar properties from a set of adjacent pixels.
275. The network according to claim 275, wherein the object unit adjusts a connectivity parameter that defines the at least one moving object.
276. The network according to claim 276, wherein the object unit divides the region into a plurality of segments to determine a boundary suπounding the at least one moving object.
277. The network according to claim 277, wherein the object unit calculates a local center of gravity coordinate coπesponding to each of the plurality of segments in the region.
278. The network according to claim 278, wherein the object unit calculates a global center of gravity coordinate coπesponding to a center point based on the local center of gravity coordinates. 279. The network according to claim 279, wherein the at least one naπow field-of- view camera tracks the at least one moving object based on the global center of gravity coordinate transmitted from the control unit to the at least one naπow field-of-view camera.
280. The network according to claim 276, wherein the object unit determines the set of location coordinates for the at least one moving object based on centroid analysis of the at least one moving object.
281. The network according to claim 281, wherein the centroid analysis includes the object unit calculating geometric convex hulls to determine a smallest polygon that maximizes the area of the region.
282. The network according to claim 276, wherein the object unit includes an object recognition filter to determine whether the at least one moving object is in a set of significant objects.
283. The network according to claim 283, wherein the set of location coordinates coπesponding to the at least one moving object are transmitted to the at least one naπow field-of-view camera if the object unit determines the at least one moving object is in the set of significant objects.
284. The network according to claim 284, wherein the object recognition filter includes a set of condition tables that define a feature set associated with the set of significant objects.
285. The network according to claim 285, wherein the feature set includes at least one of an object size parameter, an object direction parameter, an average speed of object parameter, a path of object parameter, and a location of object parameter.
286. The network according to claim 286, wherein the set of sensed data coπesponding to the at least one wide field-of-view camera is transmitted to the object recognition filter data.
287. The network according to claim 287, wherein the set of sensed data coπesponding to the at least one naπow field-of-view camera is transmitted to the object recognition filter.
288. The network according to claim 288, wherein the data stream coπesponding to the at least one moving object is transmitted from the control module to the user network if the object recognition filter determines that the at least one moving object in the set of sensed data coπesponding to the at least one naπow field-of-view camera is in the set of significant objects.
289. A method of tracking at least one moving object in a monitoring area, the method comprising the steps of: testing a plurality of interconnected sensors coupled to a control module in a motion detection network to determine if any of the plurality of sensors is activated based on movement of the at least one moving object; activating a first alarm condition in response to the activation of at least one of the plurality of sensors; processing a set of sensed data, including a set of temporal data, from a first sensor to calculate a set of location coordinates associated with each of the at least one moving objects; transmitting a first set of data from the first sensor to an object recognition module coupled to the control module to determine if each of the at least one moving objects is in a set of significant objects; activating a second alarm condition if any of the at least one moving objects is in the set of significant objects; tracking each of the at least one moving objects with a second sensor based on the set of location coordinates associated with each of the at least one moving objects; transmitting a second set of data from the second sensor to the object recognition module to determine if each of the at least one moving objects is in the set of significant objects; and transmitting the second set of data from the second sensor to a user network through a communication bridge.
290. The method according to claim 290, wherein the second set of data is transmitted to the user network if at least one of the moving objects is in the set of significant objects.
291. The network according to claim 291, wherein the user network is the Internet. 292. The network according to claim 292, wherein the temporal data is a set of video data of the monitoring area.
293. The network according to claim 293, wherein the control module interprets a set of spatial data from the temporal data.
294. The network according to claim 294, wherein the spatial data is a set of still imagery of the monitoring area. 295. The network according to claim 295, wherein the first sensor is at least one wide field-of-view camera.
296. The network according to claim 296, wherein the second sensor is at least one naπow field-of-view camera.
297. The network according to claim 297, wherein the centroid analysis includes the object unit calculating geometric convex hulls to determine a smallest polygon that maximizes the area of the region.
298. A method of tracking at least one moving object in a monitoring area, the method comprising the steps of: testing a plurality of interconnected sensors coupled to a control module in a motion detection network to determine if any of the plurality of sensors is activated based on movement of the at least one moving object; processing a set of sensed data, including a set of temporal data, from a first sensor to calculate a set of location coordinates associated with each of the at least one moving objects; tracking each of the at least one moving objects with a second sensor based on the set of location coordinates associated with each of the at least one moving objects; transmitting a second set of data from the second sensor to an object recognition module coupled to the control module to determine if each of the at least one moving objects is in a set of significant objects; and transmitting the second set of data from the second sensor to a user network through a communication bridge.
299. A method of tracking at least one moving object in a monitoring area, the method comprising the steps of: testing a plurality of interconnected sensors coupled to a control module in a motion detection network to determine if any of the plurality of sensors is activated based on movement of the at least one moving object; processing a set of sensed data, including a set of temporal data, from a first sensor to calculate a set of location coordinates associated with each of the at least one moving objects; tracking each of the at least one moving objects with a second sensor based on the set of location coordinates associated with each of the at least one moving objects; transmitting a second set of data from the second sensor to an object recognition module coupled to the control module to determine if each of the at least one moving objects is in a set of significant objects; intelligently compressing the second set of data by interpreting a set of spatial data from the set of temporal data; and transmitting the compressed second set of data from the second sensor to a user network through a communication bridge. A method of fusing data in a sensor network comprising the steps of: approximating an initial draft of a set of fuzzy rules coπesponding to a set of sensed data from a plurality of interconnected sensors coupled to a control module; mapping the initial draft of the set of fuzzy rules to a location and a curve of a set of membership functions; fine-tuning the location of the set of membership functions for optimal performance of the set of fuzzy rules using a neural network; submitting a set of training data to a fuzzy rule base and the neural network; generating a set of initial fuzzy membership functions using the neural network; submitting the set of initial fuzzy membership functions to the fuzzy rule base; generating an actual output from the fuzzy rule base; comparing the actual output with a desired output contained in the set of training data; adjusting a set of neural network weights, thereby adjusting the set of membership functions; and presenting the adjusted set of membership functions to the fuzzy rule base until a difference between the actual output and the desired output is below a predetermined minimum threshold value.
301. The method according to claim 301, wherein additional sets of training data are iteratively submitted to the fuzzy rule base and the neural network until a set of parameters associated with the set of membership functions converges to a final value.
302. The method according to claim 302, wherein based on a definition of the set of membership functions, the fuzzy rule base mimics the set of training data. 303. A network comprising: a sensor network including a set of local area sensors, a set of middle area sensors, and a set of wide area sensors coupled to a control module, wherein the control module receives a set of sensed data from the set of local area, the set of middle area and the set of wide area sensors, and generates a data stream based on the set of sensed data; a gateway software agent coupled to the set of middle area sensors that intelligently filters a contextual meaning from the sensed data and determines whether the sensed data is meaningful; a host software agent coupled to the set of wide area sensors that collects, processes and transmits the sensed data to the control module; a communication bridge coupled to the sensor network, wherein the bridge buffers the data stream received from the sensor network; and a user network coupled to the communication bridge, wherein the user network receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge.
304. The network according to claim 304, wherein the gateway software agent further comprises: an analog-to-digital converter; a template matching module; a filter bank coupled to the template matching module; a decision making module; and a communication interface.
305. A network comprising: a sensor network including a set of local area sensors, a set of middle area sensors, and a set of wide area sensors coupled to a control module, wherein the control module receives a set of sensed data from the set of local area, the set of middle area and the set of wide area sensors, and generates a data stream based on the set of sensed data; a communication bridge coupled to the sensor network, wherein the bridge buffers the data stream received from the sensor network; and a user network coupled to the communication bridge, wherein the user network receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge; wherein each of the set of local area sensors, the set of middle area sensors, and the set of wide area sensors monitors a limited region of a monitoring area, and a portion of each the limited regions overlaps with the limited region coπesponding to each of the set of local area, the set of middle area, and the set of wide area sensors.
306. A network comprising: a sensor network including a plurality of interconnected sensors coupled to a control module, wherein the control module receives a set of audio data from the plurality of sensors and generates a homogenized data stream based on the sensed data; a communication bridge coupled to the sensor network, wherein the bridge buffers the homogenized data stream received from the sensor network; and a user network coupled to the communication bridge, wherein the user network receives the homogenized data stream from the sensor network and transmits a set of input data to the control module through the communication bridge.
307. The method according to claim 307, wherein the user network is the Internet.
PCT/US2001/031799 2000-10-16 2001-10-11 Multimedia sensor network WO2002033558A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002213121A AU2002213121A1 (en) 2000-10-16 2001-10-11 Multimedia sensor network

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US69014900A 2000-10-16 2000-10-16
US09/690,149 2000-10-16

Publications (1)

Publication Number Publication Date
WO2002033558A1 true WO2002033558A1 (en) 2002-04-25

Family

ID=24771291

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/031799 WO2002033558A1 (en) 2000-10-16 2001-10-11 Multimedia sensor network

Country Status (3)

Country Link
AU (1) AU2002213121A1 (en)
TW (1) TW569570B (en)
WO (1) WO2002033558A1 (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10301457A1 (en) * 2003-01-10 2004-07-29 Vcs Video Communication Systems Ag Recording method for video / audio data
WO2005036853A1 (en) * 2003-10-13 2005-04-21 Koninklijke Philips Electronics N.V. A network and a network element and method of operation therefor
CN100342410C (en) * 2005-06-06 2007-10-10 重庆大学 Time synchronizing method and apparatus for wireless physiological information sensor network
WO2008039653A2 (en) * 2006-09-27 2008-04-03 Schlumberger Canada Limited Sensor and recorder communication
EP1981243A1 (en) 2007-04-13 2008-10-15 E-Senza Technologies GmbH A data communication network system for multi-channel bidirectional wireless data communication
EP2009876A2 (en) * 2007-06-29 2008-12-31 Honeywell International Inc. Systems and method for publishing selectively altered sensor data in real time
US7783930B2 (en) 2003-01-10 2010-08-24 Robert Bosch Gmbh Recording method for video/audio data
CN102932217A (en) * 2012-11-20 2013-02-13 无锡成电科大科技发展有限公司 Household IOT (Internet of Things) system
US8391563B2 (en) 2010-05-25 2013-03-05 Sony Corporation Using computer video camera to detect earthquake
WO2015187832A1 (en) * 2014-06-04 2015-12-10 Life Technologies Corporation Methods, systems, and computer-readable media for compression of sequencing data
CN106170022A (en) * 2016-08-31 2016-11-30 上海交通大学 A kind of distributed multimedia sensory-control system
US9756570B1 (en) 2016-06-28 2017-09-05 Wipro Limited Method and a system for optimizing battery usage of an electronic device
US9846747B2 (en) 2014-01-08 2017-12-19 Tata Consultancy Services Limited System and method of data compression
CN107613413A (en) * 2012-03-21 2018-01-19 鲍尔卡斯特公司 Wireless sensor system, method and apparatus with switch and socket control
US10108462B2 (en) 2016-02-12 2018-10-23 Microsoft Technology Licensing, Llc Virtualizing sensors
CN108789456A (en) * 2017-05-02 2018-11-13 北京米文动力科技有限公司 A kind of remote video transmission method and system
CN110536372A (en) * 2019-07-17 2019-12-03 长春工业大学 A kind of annular wireless sensor network Uneven Cluster algorithm based on fuzzy control
CN110780590A (en) * 2018-07-27 2020-02-11 菲尼克斯电气公司 Techniques for providing safety control parameters for multi-channel control of a machine
WO2020146036A1 (en) * 2019-01-13 2020-07-16 Strong Force Iot Portfolio 2016, Llc Methods, systems, kits and apparatuses for monitoring and managing industrial settings
US10778547B2 (en) 2018-04-26 2020-09-15 At&T Intellectual Property I, L.P. System for determining a predicted buffer condition based on flow metrics and classifier rules generated in response to the creation of training data sets
CN112783671A (en) * 2021-01-20 2021-05-11 中国兵器工业集团第二一四研究所苏州研发中心 Fusion system suitable for image voice and data transmission
CN113141380A (en) * 2016-12-14 2021-07-20 微软技术许可有限责任公司 Encoding optimization for obfuscated media
WO2021178145A1 (en) * 2020-03-06 2021-09-10 Butlr Technologies, Inc. Monitoring human location, trajectory and behavior using thermal data
CN113489952A (en) * 2021-06-30 2021-10-08 电子科技大学 Video monitoring facility layout method oriented to indoor three-dimensional scene
CN113485274A (en) * 2021-07-28 2021-10-08 燕山大学 Data perception and dynamic priority transmission joint scheduling method for technological process
CN115150559A (en) * 2022-09-06 2022-10-04 国网天津市电力公司高压分公司 Remote vision system with acquisition self-adjustment calculation compensation and calculation compensation method
CN115796249A (en) * 2022-11-22 2023-03-14 辉羲智能科技(上海)有限公司 Chiplet interconnection-oriented neural network chip layer switching mapping method
US11694072B2 (en) 2017-05-19 2023-07-04 Nvidia Corporation Machine learning technique for automatic modeling of multiple-valued outputs
US11774292B2 (en) 2020-03-06 2023-10-03 Butlr Technologies, Inc. Determining an object based on a fixture
CN113141380B (en) * 2016-12-14 2024-04-30 微软技术许可有限责任公司 Coding optimization for blurred media

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7561877B2 (en) 2005-03-18 2009-07-14 Qualcomm Incorporated Apparatus and methods for managing malfunctions on a wireless device
US8595504B2 (en) 2008-08-12 2013-11-26 Industrial Technology Research Institute Light weight authentication and secret retrieval
US10516415B2 (en) * 2018-02-09 2019-12-24 Kneron, Inc. Method of compressing convolution parameters, convolution operation chip and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3604556A (en) * 1970-01-14 1971-09-14 Louis E Schwartz Tape cassette holder
US5861804A (en) * 1997-07-10 1999-01-19 Bakson, Inc. Computer controlled security and surveillance system
US5982418A (en) * 1996-04-22 1999-11-09 Sensormatic Electronics Corporation Distributed video data storage in video surveillance system
US5987519A (en) * 1996-09-20 1999-11-16 Georgia Tech Research Corporation Telemedicine system using voice video and data encapsulation and de-encapsulation for communicating medical information between central monitoring stations and remote patient monitoring stations
US6271752B1 (en) * 1998-10-02 2001-08-07 Lucent Technologies, Inc. Intelligent multi-access system
US6310548B1 (en) * 2000-05-30 2001-10-30 Rs Group, Inc. Method and system for door alert

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3604556A (en) * 1970-01-14 1971-09-14 Louis E Schwartz Tape cassette holder
US5982418A (en) * 1996-04-22 1999-11-09 Sensormatic Electronics Corporation Distributed video data storage in video surveillance system
US5987519A (en) * 1996-09-20 1999-11-16 Georgia Tech Research Corporation Telemedicine system using voice video and data encapsulation and de-encapsulation for communicating medical information between central monitoring stations and remote patient monitoring stations
US5861804A (en) * 1997-07-10 1999-01-19 Bakson, Inc. Computer controlled security and surveillance system
US6271752B1 (en) * 1998-10-02 2001-08-07 Lucent Technologies, Inc. Intelligent multi-access system
US6310548B1 (en) * 2000-05-30 2001-10-30 Rs Group, Inc. Method and system for door alert

Cited By (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7813221B2 (en) 2002-11-22 2010-10-12 Westerngeco L.L.C. Sensor and recorder communication
US7783930B2 (en) 2003-01-10 2010-08-24 Robert Bosch Gmbh Recording method for video/audio data
DE10301457A1 (en) * 2003-01-10 2004-07-29 Vcs Video Communication Systems Ag Recording method for video / audio data
US8051336B2 (en) 2003-01-10 2011-11-01 Robert Bosch Gmbh Recording method for video/audio data
WO2005036853A1 (en) * 2003-10-13 2005-04-21 Koninklijke Philips Electronics N.V. A network and a network element and method of operation therefor
CN100342410C (en) * 2005-06-06 2007-10-10 重庆大学 Time synchronizing method and apparatus for wireless physiological information sensor network
WO2008039653A3 (en) * 2006-09-27 2008-07-24 Schlumberger Ca Ltd Sensor and recorder communication
EP2631677A1 (en) * 2006-09-27 2013-08-28 Geco Technology B.V. Sensor and recorder communication
WO2008039653A2 (en) * 2006-09-27 2008-04-03 Schlumberger Canada Limited Sensor and recorder communication
EP1981243A1 (en) 2007-04-13 2008-10-15 E-Senza Technologies GmbH A data communication network system for multi-channel bidirectional wireless data communication
EP2009876A3 (en) * 2007-06-29 2009-03-11 Honeywell International Inc. Systems and method for publishing selectively altered sensor data in real time
EP2009876A2 (en) * 2007-06-29 2008-12-31 Honeywell International Inc. Systems and method for publishing selectively altered sensor data in real time
US8391563B2 (en) 2010-05-25 2013-03-05 Sony Corporation Using computer video camera to detect earthquake
CN107613413B (en) * 2012-03-21 2021-05-04 鲍尔卡斯特公司 Wireless sensor system, method and apparatus with switch and socket control
CN107613413A (en) * 2012-03-21 2018-01-19 鲍尔卡斯特公司 Wireless sensor system, method and apparatus with switch and socket control
CN102932217A (en) * 2012-11-20 2013-02-13 无锡成电科大科技发展有限公司 Household IOT (Internet of Things) system
US9846747B2 (en) 2014-01-08 2017-12-19 Tata Consultancy Services Limited System and method of data compression
WO2015187832A1 (en) * 2014-06-04 2015-12-10 Life Technologies Corporation Methods, systems, and computer-readable media for compression of sequencing data
US10254242B2 (en) 2014-06-04 2019-04-09 Life Technologies Corporation Methods, systems, and computer-readable media for compression of sequencing data
US10108462B2 (en) 2016-02-12 2018-10-23 Microsoft Technology Licensing, Llc Virtualizing sensors
US9756570B1 (en) 2016-06-28 2017-09-05 Wipro Limited Method and a system for optimizing battery usage of an electronic device
CN106170022B (en) * 2016-08-31 2024-03-29 上海交通大学 Distributed multimedia sensor control system
CN106170022A (en) * 2016-08-31 2016-11-30 上海交通大学 A kind of distributed multimedia sensory-control system
CN113141380B (en) * 2016-12-14 2024-04-30 微软技术许可有限责任公司 Coding optimization for blurred media
CN113141380A (en) * 2016-12-14 2021-07-20 微软技术许可有限责任公司 Encoding optimization for obfuscated media
CN108789456A (en) * 2017-05-02 2018-11-13 北京米文动力科技有限公司 A kind of remote video transmission method and system
US11694072B2 (en) 2017-05-19 2023-07-04 Nvidia Corporation Machine learning technique for automatic modeling of multiple-valued outputs
US10778547B2 (en) 2018-04-26 2020-09-15 At&T Intellectual Property I, L.P. System for determining a predicted buffer condition based on flow metrics and classifier rules generated in response to the creation of training data sets
CN110780590B (en) * 2018-07-27 2023-06-30 菲尼克斯电气公司 Techniques for providing safety control parameters for multi-channel control of a machine
CN110780590A (en) * 2018-07-27 2020-02-11 菲尼克斯电气公司 Techniques for providing safety control parameters for multi-channel control of a machine
WO2020146036A1 (en) * 2019-01-13 2020-07-16 Strong Force Iot Portfolio 2016, Llc Methods, systems, kits and apparatuses for monitoring and managing industrial settings
CN110536372A (en) * 2019-07-17 2019-12-03 长春工业大学 A kind of annular wireless sensor network Uneven Cluster algorithm based on fuzzy control
CN110536372B (en) * 2019-07-17 2022-05-31 长春工业大学 Non-uniform clustering method for annular wireless sensor network based on fuzzy control
US11774292B2 (en) 2020-03-06 2023-10-03 Butlr Technologies, Inc. Determining an object based on a fixture
US11644363B2 (en) 2020-03-06 2023-05-09 Butlr Technologies, Inc. Thermal data analysis for determining location, trajectory and behavior
GB2612680A (en) * 2020-03-06 2023-05-10 Butlr Tech Inc Monitoring human location, trajectory and behavior using thermal data
WO2021178145A1 (en) * 2020-03-06 2021-09-10 Butlr Technologies, Inc. Monitoring human location, trajectory and behavior using thermal data
US11959805B2 (en) 2020-03-06 2024-04-16 Butlr Technologies, Inc. Thermal data analysis for determining location, trajectory and behavior
CN112783671B (en) * 2021-01-20 2024-01-26 中国兵器工业集团第二一四研究所苏州研发中心 Fusion system suitable for image voice and data transmission
CN112783671A (en) * 2021-01-20 2021-05-11 中国兵器工业集团第二一四研究所苏州研发中心 Fusion system suitable for image voice and data transmission
CN113489952A (en) * 2021-06-30 2021-10-08 电子科技大学 Video monitoring facility layout method oriented to indoor three-dimensional scene
CN113489952B (en) * 2021-06-30 2022-03-22 电子科技大学 Video monitoring facility layout method oriented to indoor three-dimensional scene
CN113485274B (en) * 2021-07-28 2022-07-29 燕山大学 Data perception and dynamic priority transmission joint scheduling method facing to technological process
CN113485274A (en) * 2021-07-28 2021-10-08 燕山大学 Data perception and dynamic priority transmission joint scheduling method for technological process
CN115150559A (en) * 2022-09-06 2022-10-04 国网天津市电力公司高压分公司 Remote vision system with acquisition self-adjustment calculation compensation and calculation compensation method
CN115796249A (en) * 2022-11-22 2023-03-14 辉羲智能科技(上海)有限公司 Chiplet interconnection-oriented neural network chip layer switching mapping method

Also Published As

Publication number Publication date
AU2002213121A1 (en) 2002-04-29
TW569570B (en) 2004-01-01

Similar Documents

Publication Publication Date Title
WO2002033558A1 (en) Multimedia sensor network
Radha et al. Scalable internet video using MPEG-4
US5140417A (en) Fast packet transmission system of video data
Kishino et al. Variable bit-rate coding of video signals for ATM networks
US6680976B1 (en) Robust, reliable compression and packetization scheme for transmitting video
USRE39955E1 (en) Multiple encoder output buffer apparatus for differential coding of video information
US5758194A (en) Communication apparatus for handling networks with different transmission protocols by stripping or adding data to the data stream in the application layer
FR2851397A1 (en) Video sequence analyzing process for use in communication network e.g. TCP/IP, involves analyzing temporal information to decide necessity of generation of request to camera for obtaining special activity information
Jacobs et al. Providing video services over networks without quality of service guarantees
Eleftheriadis et al. Dynamic rate shaping of compressed digital video
Wakeman Packetized video—options for interaction between the user, the network and the codec
Dasen et al. An error tolerant, scalable video stream encoding and compression for mobile computing
Parthasarathy et al. Design of a transport coding scheme for high-quality video over ATM networks
Morrison et al. Two-layer video coding for ATM networks
EP1032881A1 (en) A robust, reliable compression and packetization scheme for transmitting video
Fankhauser et al. The WaveVideo system and network architecture: Design and implementation
Khilenko et al. Improving the Quality of Automated Vehicle Control Systems Using Video Compression Technologies for Networks with Unstable Bandwidth
Sharon et al. Modeling and control of VBR H. 261 video transmission over frame relay networks
Jacobs et al. Adaptive video applications for non-QoS networks
Esteve et al. A flexible video streaming system for urban traffic control
Kassler et al. Classification and evaluation of filters for wavelet coded videostreams
Sharon et al. Accurate modeling of H. 261 VBR video sources for packet transmission studies
Sharifinejad The quality of service improvement for multimedia over high-speed networks
Chi et al. Video sensor node for low-power ad-hoc wireless networks
Chung et al. Network-friendly video streaming via adaptive LMS bandwidth control

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP