WO2013057547A1 - Communication methods providing media content stream selection and related system - Google Patents

Communication methods providing media content stream selection and related system Download PDF

Info

Publication number
WO2013057547A1
WO2013057547A1 PCT/IB2012/000202 IB2012000202W WO2013057547A1 WO 2013057547 A1 WO2013057547 A1 WO 2013057547A1 IB 2012000202 W IB2012000202 W IB 2012000202W WO 2013057547 A1 WO2013057547 A1 WO 2013057547A1
Authority
WO
WIPO (PCT)
Prior art keywords
media content
real time
time media
content stream
endpoint device
Prior art date
Application number
PCT/IB2012/000202
Other languages
French (fr)
Inventor
Eric Daniel GRÖNDAL
Bo Burman
Magnus Westerlund
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Publication of WO2013057547A1 publication Critical patent/WO2013057547A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1069Session establishment or de-establishment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1083In-session procedures
    • H04L65/1089In-session procedures by adding media; by removing media
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1083In-session procedures
    • H04L65/1093In-session procedures by adding participants; by removing participants
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/403Arrangements for multi-party communication, e.g. for conferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/61Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
    • H04L65/613Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio for the control of the source by the destination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/764Media network packet handling at the destination 

Definitions

  • the present invention relates to communications systems and, more particularly, to systems and methods providing real-time communications.
  • Multimedia conferencing may allow real time streaming between two or more endpoint devices over a network such as the Internet to provide real time communications.
  • Each of a plurality of endpoint devices participating in a video conference session may generate a multimedia content stream including video and audio content, and a central conference node may select a content stream or streams that is/are provided to each of the endpoint devices.
  • a conference node may select a content stream based on a comparison of audio content/volume associated with each of the content streams. Stated in other words, the conference node may attempt to select a content stream corresponding to a speaker that is currently most active in the conference call. In such an
  • selection of a content stream may be undesirably biased toward an endpoint device generating the greatest audio volume, for example, due to undesired noise.
  • an endpoint device may operate during a real time communication session including a plurality of real time media content streams provided by at least one other endpoint device.
  • a first one of the real time media content streams of the communication session may be received at the endpoint device from a remote communication node in accordance with a first selection criteria.
  • the first real time media content stream of the conference communication may be rendered for display at the endpoint device, and a selection message may be generated identifying a second selection criteria to be used by the remote communications node to select at least one of the media content streams of the communication session for reception at the endpoint device.
  • the selection message may be transmitted to the remote communications node, and a second one of the real time media content streams of the communication session may be received from the remote communication node in accordance with the second selection criteria.
  • the second real time media content stream of the communication session may then be rendered for display.
  • a user of the endpoint device may thus be able to control the media content stream or streams to be rendered at the endpoint device during a communication session including a plurality of media content streams from another/other endpoint devices involved in the communication session.
  • a participant using an endpoint device may thus elect to include or exclude a particular media content stream from another participant, to substitute one media content stream for another, reset a media content stream to a default selection determined remotely, or to reset all media content streams to a default determined remotely.
  • Identification information may be received from the remote communication node wherein the identification information includes first identification information for the first real time media content stream and second identification information for the second real time media content stream.
  • the selection message may include the first identification information and/or the second identification information for at least one of the first and/or the second real time media content streams.
  • the selection message may include the second identification information identifying the second real time media content stream to request that the second real time media content stream be included in transmissions to the endpoint device.
  • the selection message may include the first identification information for the first real time media content stream to request that the first real time media content stream be excluded from transmissions to the endpoint device.
  • the selection message may include the first identification information for the first real time media content stream and the second identification information for the second real time media content stream to request that the second real time media content stream be ⁇ substituted for the first real time media content stream in transmissions to the endpoint device.
  • communications may include a network interface configured to provide a data coupling over a network and a processor coupled to the network interface.
  • the processor may be configured to receive a first one of a plurality of real time media content streams of a real time communication session through the network interface from a remote communication node in accordance with a first selection criteria, with the plurality of real time media content streams being provided by at least one other endpoint device, and to render the first real time media content stream of the conference communication for display.
  • the processor may be further configured to generate a selection message identifying a second selection criteria to be used by the remote communications node to select at least one of the media content streams of the communication session for reception at the endpoint device.
  • the processor may be configured to transmit the selection message through the network interface to the remote communications node, and to receive a second one of the real time media content streams of the communication session from the remote communication node in accordance with the second selection criteria, and to render the second real time media content stream of the communication session for display.
  • the processor may be further configured to receive identification information through the network interface from the remote communication node, wherein the identification information includes first identification information for the first real time media content stream and second identification information for the second real time media content stream, and wherein the selection message includes the first identification information and/or the second identification information for at least one of the first and/or the second real time media content streams.
  • the selection message may include the second identification information identifying the second real time media content stream to request that the second real time media content stream be included in transmissions to the endpoint device.
  • the selection message may include the first identification information for the first real time media content stream to request that the first real time media content stream be excluded from transmissions to the endpoint device.
  • the selection message may include the first identification information for the first real time media content stream and the second identification information for the second real time media content stream to request that the second real time media content stream be substituted for the first real time media content stream in transmissions to the endpoint device.
  • a communication node may support a real time communication session between a plurality of remote endpoint devices generating a plurality of real time media content streams.
  • the plurality of real time media content streams for the communication session may be received at the communication node from the plurality of remote endpoint devices.
  • a first one of the real time media content streams of the communication session may be provided to a first one of the endpoint devices in accordance with a first selection criteria for the first endpoint device.
  • a selection message may be received from the first endpoint device to identify a second selection criteria for the first endpoint device, and a second one of the real time media content streams of the communication session may be provided to the first endpoint device in accordance with the second selection criteria.
  • Identification information for each of the plurality of real time media content streams may be generated, and the identification information for each of the plurality of real time media content streams may be provided to each of the endpoint devices.
  • the selection message may include identification information for at least one of the first and/or the second real time media content streams.
  • the selection message may include the second identification information identifying the second real time media content stream to be included in transmissions to the first endpoint device.
  • the selection message may include the first identification information for the first real time media content stream to be excluded from transmissions to the first endpoint device.
  • the selection message may include the first
  • identification information for the first real time media content stream and the second identification information for the second real time media content stream to substitute the second real time media content stream for the first real time media content stream in transmissions to the first endpoint device.
  • a communication node may support a real time communication session between a plurality of remote endpoint devices generating a plurality of real time media content streams.
  • the communication node may include a network interface configured to provide a data coupling over a network, and a processor coupled to the network interface.
  • the processor may be configured to receive the plurality of real time media content streams for the communication session through the network interface from the plurality of remote endpoint devices, and to provide a first one of the real time media content streams of the communication session through the network interface to a first one of the endpoint devices in accordance with a first selection criteria for the first endpoint device.
  • the processor may be further configured to receive a selection message from the first endpoint device through the network interface to identify a second selection criteria for the first endpoint device, and to provide a second one of the real time media content streams of the communication session to the first endpoint device in accordance with the second selection criteria.
  • the processor may be further configured to generate identification information for each of the plurality of real time media content streams, and to provide the identification information for each of the plurality of real time media content streams to each of the endpoint devices.
  • the selection message may include identification information for at least one of the first and/or the second real time media content streams.
  • the selection message may include the second identification information identifying the second real time media content stream to be included in transmissions to the first endpoint device.
  • the selection message may include the first identification information for the first real time media content stream to be excluded from transmissions to the first endpoint device.
  • the selection message may include the first identification information for the first real time media content stream and the second identification information for the second real time media content stream to substitute the second real time media content stream for the first real time media content stream in transmissions to the first endpoint device.
  • Figure 1 is a schematic diagram illustrating a plurality of endpoint devices and a conference node communicating through a network according to some embodiments;
  • Figure 2 is a block diagram illustrating an endpoint device of Figure 1 according to some embodiments;
  • Figure 3 is a block diagram illustrating a conference node of Figure 1 according to some embodiments.
  • Figures 4-5 are flow charts illustrating operations of endpoint devices and/or conference nodes according to some embodiments
  • FIG. 6 illustrates a media selection primitive where FCS represents Floor Control Server
  • Figure 7 illustrates media selection attributes according to some embodiments
  • Figure 8 illustrates a format of the OPERATION attribute of Figure 7 according to some embodiments.
  • Figure 9 illustrates defined entries (i.e., MESS Operations) for the OPERATION attribute
  • Figure 10 illustrates a format of a MEDIA-IDENTIFICATION-HEADER according to some embodiments
  • Figure 1 1 illustrates MESS Media Identification Types according to some embodiments
  • Figure 12 illustrates additional Media Selection Error Codes according to some embodiments.
  • Figure 13 illustrates a structure of an Include message according to some embodiments.
  • an element(s), operation(s), step(s), etc. may be required with respect to a particular embodiment without being required for all embodiments. Accordingly, these terms should not be considered as limiting with respect to claims (in the present application and/or in future applications claiming priority from the present application) omitting the referenced element(s), operation(s), step(s), etc.
  • the present disclosure describes media content stream selection in both conferencing communication embodiments (also referred to as group communications) and peer-to-peer
  • endpoint devices also referred to as endpoints
  • all available media content streams in the session may be identifyable and secure transport of messages may be provided between endpoint devices and a network communications node (e.g., a conference node). Distribution of the identification information to all endpoint devices participating in a conferencing session is also discussed.
  • Necessary messages may potentially be mapped onto several different encodings, and one mapping is proposed that uses an extended version of the Binary Floor Control Protocol (BFCP).
  • BFCP Binary Floor Control Protocol
  • SIP Session Initiated Protocol
  • SDP Session Description Protocol
  • Some embodiments discussed herein may provide functionality that grants receiving endpoint devices capabilities to dynamically select the information and/or media content received from other participating clients (e.g., other endpoint devices participating in the session).
  • media content stream refers to media (e.g., a video and/or audio media content stream) being sent from one specific media capture device (such as a microphone for audio media and/or a video camera for video media) at an endpoint communication device.
  • endpoint refers to a communication device that handles media by originating one or more media content streams (e.g., originating audio and/or video streams using a microphone and/or video camera) and/or terminating one or more media content streams (e.g., generating audio and/or video output) received from one or more other endpoint devices.
  • an RTP (real-time transport protocol) Mixer may be considered as an endpoint.
  • an endpoint device in a two-way video conference may provide a plurality of video media content streams from multiple video cameras capturing different aspects/views of a room, and/or multiple endpoint devices in a group video conference (with more than two participants) may provide respective video media content streams from respective video cameras.
  • a receiving endpoint device may only be able to render one video media content stream (e.g., due to hardware limitations such as small screen size).
  • currently available RTP (real time protocol) mixers may choose one of the video streams to display at the receiving endpoint device, for example, based on comparative levels of audio activity at speakers associated with the respective video cameras.
  • a conference node may act as an endpoint device, for example, where a conference node sees no reason to receive a specific media content stream from an endpoint and act as an endpoint requesting an exclude from the media content stream provider.
  • MESS Media Stream Selection
  • MESS describes how to generate and distribute media content stream information in both group conferencing embodiments and in point to point communication embodiments. This disclosure also describes how to set up a control channel to send messages between endpoint devices and further defines a set of messages that can be used to handle media content stream requests.
  • FIG 1 is a schematic diagram illustrating a plurality of endpoint devices 1 1 1-1 to 1 1 1-n participating in a streaming communication session (such as a video conferencing session) through network 101 (e.g., the Internet) and conference node 1 15 according to some embodiments. While at least five endpoint devices 1 1 1 are shown in Figure 1 by way of example, embodiments of the present invention may be implemented using any number of two or more endpoint devices.
  • each endpoint device 11 1 included in a conference session may act as a sender endpoint device to generate a media content stream (including audio and video), and the respective media content streams from all endpoint devices 1 1 1 may be transmitted to conference node 1 15 through network 101.
  • Each endpoint device 1 1 1 involved in the conference session may also act as as receiver endpoint device to receive one or more of the media content streams of the conference session.
  • conference node 115 may then select a media content stream or streams, and the selected media content stream or streams may then be forwarded from the conference node 115 through network 101 to the respective endpoint nodes 1 1 1.
  • conference node 1 15 may select a media content stream to be sent to a respective endpoint device 1 1 1 1 responsive to input from the respective endpoint device 1 1 1.
  • each endpoint device 1 1 1 of a conference session may select a media content stream or streams of the conference session to be presented at that endpoint device 1 1 1.
  • MESS may also be used in peer to peer embodiments with endpoint devices (e.g., two endpoint devices) coupled through network 101 without a conference node.
  • endpoint devices e.g., two endpoint devices
  • Two endpoint devices in a peer to peer embodiment may each send and receive multiple media content streams, and each of the endpoint devices may use functionality of embodiments discussed herein to control the content media stream or streams that are received from the other endpoint device.
  • FIG 2 is a block diagram illustrating an endpoint device 1 1 1 of Figure 1 according to some embodiments.
  • Endpoint device 11 1 may include processor 131 coupled to display 121 (e.g., a liquid crystal display screen providing a video output) or display output, user input interface 129 (e.g., including a keypad, a touch sensitive surface of display 121 , etc.), speaker 123 or speaker output, one or more video cameras 125 or video camera input(s), and one or more microphones 127 or microphone input(s).
  • display 121 e.g., a liquid crystal display screen providing a video output
  • user input interface 129 e.g., including a keypad, a touch sensitive surface of display 121 , etc.
  • speaker 123 or speaker output e.g., including a keypad, a touch sensitive surface of display 121 , etc.
  • speaker 123 or speaker output e.g., including a keypad, a touch sensitive surface of display 121 , etc.
  • Inputs/outputs discussed above may be interfaces (e.g., couplings, jacks, etc.) for wired inputs/outputs and/or wireless interfaces (e.g., Bluetooth, WiFi, etc.).
  • network interface 133 may provide a data/communications coupling between processor 131 and network 101.
  • Endpoint device 1 1 1 may be a smartphone, a tablet computer, a netbook computer, a laptop computer, a desktop computer, a stationary video conferencing telephone, or any other device supporting multimedia conferencing.
  • a coupling between network interface 133 and network 101 may be provided over a wireled coupling (e.g., using a digital subscriber line modem, a cable modem, etc.), over a wirelss coupling (e.g., over a 3G/4G wireless network, over a WiFi link, etc.), or over a combination thereof.
  • a wireled coupling e.g., using a digital subscriber line modem, a cable modem, etc.
  • a wirelss coupling e.g., over a 3G/4G wireless network, over a WiFi link, etc.
  • FIG. 1 When implemented as a wireless mobile terminal such as a smartphone, a tablet computer, a netbook computer, or a laptop computer, for example, all elements of Figure 2 (inlcuding a video camera 125 and a microphone 127) may be provided within the housing of the mobile terminal.
  • the built-in video camera and/or microphone may provide one media content stream, and video/audio output may be provided using a built-in speaker and display.
  • endpoint device 1 1 1 may not include a built-in video camera, microphone, speaker, and/or display.
  • such a device may include inputs for one or more external video cameras and/or microphones and outputs for one or more displays and/or speakers.
  • a video conferencing system for a larger conference room setting, for example, a plurality of external cameras and associated microphones may be positioned around the conference room and coupled to processor 131 through video/microphone inputs 125/127, and display and speaker outputs may be coupled to an external display/speaker (e.g., a large screen display, a projection display, etc.).
  • An endpoint device 1 11 may thus provide one or more media content streams responsive to one or more video/microphone pairs. If an endpoint device provides more than one media content stream, each media content stream may be separately kauntified for selection by other endpoint devices involved in the communication session.
  • each endpoint device 1 1 1 may also provide/render one or more media content streams of the communication session through display/speaker 121 and 123 (and/or through an external display/speaker), and each endpoint device 1 1 1 may dynamically include, exclude, and/or substitute one or more of the media content streams of the communication session that is/are to be provided/rendered during the communication session.
  • an endpoint device 1 1 1 such as a smartphone, with a limited display size, a single media content stream of the communication session may be selected at any time.
  • a larger display e.g., with a desk top computer, an external display, etc.
  • multiple media content streams may be selected at any time for simultaneous presentation on different portions of display 121.
  • FIG. 3 is a block diagram illustrating a conference node 1 15 of Figure 1 according to some embodiments.
  • conference node 1 15 may include processor 231 and network interface 233, with network interface 233 providing a data/communications coupling between processor 231 and network 101.
  • Processor 231 may thus receive one or more media content streams from each endpoint device 1 1 1 involved in a communication session, and processor 231 may provide an identificaction for each of the media content streams for the communication session.
  • Processor 231 may further publish these identifications to each endpoint device 11 1 involved in the communication session.
  • Each endpoint processor 131 may thus receive the identifications of the media content streams of the communication session, and an endpoint processor 131 may use these identifications to dynamically select one or more of the media content streams of the communication session.
  • An endpoint processor 131 may transmit a selection instruction including an identification of a selected media content stream back to conference node 1 15 processor 231 to include/exclude/substitute/reset a selected media content stream(s) for the respective endpoint device.
  • an endpoint device 1 1 1 participating in a conference/group communication may receive a media content stream or streams (e.g., a video stream) from a centralized conference node 1 15. More particularly, all participating endpoint devices 1 1 1 publish information identifying the media content stream or streams offered by the respective endpoint devices 1 1 1.
  • a media content stream or streams e.g., a video stream
  • the conference node 1 15 may present a media content stream(s) to the receiving endpoint device 1 1 1 based on a request/selection from the receiving endpoint device 1 1 1.
  • An endpoint device 11 1 (or user thereof) may select the media content stream to be received from another endpoint device 1 1 1 based on the published media content stream information for that endpoint device 1 11.
  • An endpoint device 1 11 can make new decisions about what content to receive dynamically at any time during the session.
  • An endpoint device 11 1 may choose to stop receiving content from another endpoint device 1 1 1 involved in the conference/group communication (also referred to as a session), for example, due to low quality or other reasons.
  • the set of excluded media content streams during a session may be subject to change and an endpoint device 1 1 1 (or a user thereof) can make new decisions to exclude content dynamically at any time during the session.
  • An endpoint device 11 1 may render a received media content stream, and the endpoint device (or user thereof) may choose to replace the received media content stream with some other available media content stream. This may be considered as an atomic combination of the Include and Exclude use-cases above, first excluding one media content stream, and effectively replacing it by including another media content stream. An endpoint device 1 1 1 can make new substitute decisions dynamically at any time during the session.
  • An endpoint device 1 1 1 may no longer have any specific wish to always include or always exclude a certain media content stream, but may instead want to return decisions regarding forwarding media content streams or not to the conference node 115.
  • An endpoint device 111 (or user thereof) can reset any previously included or excluded media content stream at any time during the session.
  • all media content streams may/shall have a state corresponding to being reset and may thus be under the conference node 115 policy control. This initial/reset condition may also be referred to as a default condition. If an endpoint device 1 1 1 is configured to simultaneously present a plurality of content media streams, for example, the Reset command may be used to selectively reset control of one of the plurality of content media streams.
  • An endpoint device 1 1 1 (or user thereof) may choose to remove all previous decisions about included and excluded media content streams. This method may be used as a shortcut to avoid repeated reset messages described above in the section "Reset Media Content Stream.” When such a default condition is provided for an endpoint device 1 1 1, conference node 1 15 may select a media content stream corresponding to a greatest audio magnitude (e.g., assumed to be the currently speaking party).
  • a greatest audio magnitude e.g., assumed to be the currently speaking party.
  • all different media content streams of a session are given respective unique media IDs (identifications).
  • the given IDs may/must also be distributed/published to all participating endpoint devices 1 1 1. The following sections describe how to generate such IDs and how to distribute them.
  • SDP Session Description Protocol
  • RTP Media Transport discloses a particular embodiment where RTP [see, RFC 3550, Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003] is used for media transport. Other media transports may be used, in which case the mapping to RTP may not apply and other mappings may instead be used.
  • Endpoint devices 1 1 1 1 wishing to join a session are responsible to send information about media content streams they will make available to the other party or parties of the conferencing session (i.e., to the other endpoint devices 1 1 1 involved in the conferencing session). This may be done by generating media content stream IDs, or other sufficiently unique identifications that can be used to generate media content stream IDs for all transmitted media content streams. Depending on the capabilities of the signaling protocol used, an endpoint device 1 1 1 can also have the opportunity to convey other information in addition to the media content stream ID, such as e.g. describing or naming a media content stream(s) explicitly.
  • User and endpoint device 1 11 information may be relevant in a scenario covering multiple users and/or endpoint devices 1 11 (e.g. where a middle node 1 15 is responsible for forwarding requests or making decisions about media content stream selection), but may be redundant for point to point embodiments.
  • Reception of media content stream information may depend on a context in which the receiving endpoint device 1 1 1 exists.
  • the distribution of media information may in general be different than distribution of media content stream information in a point to point session, which may/must be taken into account when defining use of MESS with media description protocols.
  • RTP Media Transport When RTP is used for transmission of media content streams, a single RTP session can be used to transfer a number of different media content streams. In such embodiments, every received data packet may/must carry an identifier, or something that can be used as an identifier, to separate individual media content streams. Without such an identifier it may not be possible to demultiplex incoming packets correctly. Use of other protocols for transmission may have similar problems when
  • SSRC may be used as the sole identifier, but to avoid changing a media content streaming ID if the SSRC changes (e.g. due to an SSRC collision), use of an identifier that is not dependent on, but related to, SSRC may be a better choice.
  • the SSRC may uniquely identify each content media stream of a communication session.
  • a sub element of ⁇ media> defines an element ⁇ src- id> that may/must be used to carry the SSRC (Synchronization Source) selected for the corresponding media content steam.
  • SSRC Synchronization Source
  • This may enable an endpoint device 11 1 to do reverse look-up of a media content stream ID on incoming packets using SSRC, or CSRC (Contributing Source) in the event that media content streams are aggregated by an RTP mixer.
  • RTP media content streaming IDs may/must be included as SSRC attributes as described, for example, in RFC 5576 [Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media Attributes in the Session Description Protocol (SDP)", RFC 5576, June 2009].
  • RFC 4566 may be sufficient to name an individual media content stream. If a media block carries information about multiple SSRCs, this method may not be enough to name all different media content streams. For this purpose, a new source-specific attribute is proposed.
  • ⁇ description> as the i-line ⁇ session description> in SDP may provide a textual description of the media content stream represented by the SSRC included in the attribute declaration.
  • an intercepting node e.g., a conference node 115 in the network may be responsible for generating media descriptions upon reception of the actual RTP media content stream.
  • an intercepting node e.g., a conference node 115
  • such a solution may suffer if all media is not sent to that node at all times. This may introduce a delay of media description creation until the intercepting node has received RTP packets from all media sources.
  • MGCP Media Gateway Control Protocol
  • RFC 3435 January 2003]
  • 3GPP 3 rd Generation Partnership Project
  • IMS Internet Protocol Multimedia Subsystem
  • an MRFP Multimedia Resource Function Processor
  • MRFC Multimedia Resource Function Controller
  • SDP information e.g. through H.248 or SIP
  • the MRFC receives the SIP INVITE with SDP from participating endpoint devices 1 1 1 and therefore also information about what SSRCs the endpoint devices 1 1 1 intend to use.
  • the MRFP will see incoming SSRCs in the actual RTP media content streams, but not before any media traffic has occurred.
  • the MRFC may also be responsible for publishing the conference XML data (see, RFC 4575, cited above), e.g. as a body in SIP NOTIFY to SUBSCRIBED endpoint devices 1 1 1.
  • the MRFC or any other node 115 acting as a Conference AS (Application Server), may have the best information to generate and distribute media content streaming IDs and may be chosen as the responsible node 115.
  • the protocol to transfer media content streaming ID and SSRC information between Conferencing AS'es and/or MRFC's may be outside the scope of this disclosure.
  • a conference node 1 15 may/should try to locate information from endpoint devices 1 11 that name or describe individual media content streams in the SDP, and include the information in the body of the per-media ⁇ display-text> tag. According to some embodiments, the information may/should be taken from, in this order if more preferred information is missing:
  • the receiving client may, for example, use the ⁇ display-text> content to amend originating user and/or endpoint device 1 1 1 information presented to the receiving user with the media content stream specific information.
  • endpoint devices 11 1 may publish SSRC information using SDP in request and response. This may, for example, be valid for the SDP in both the SIP INVITE and the corresponding 200 OK, or in any provisional responses.
  • the list of published SSRCs may be the list of offered media content streams available for request. Also, the SDP can be searched for the information attribute described above in the section "Publishing Media Information From Endpoint Devices" to extract information about naming of media content stream.
  • media content streaming information may be distributed using an XML body following a schema defined in Conference package (see, RFC 4575, cited above), e.g. carried by a SIP NOTIFY.
  • SIP NOTIFY For use with SIP and once a client has SUBSCRIBEd for conference information, it may/should be prepared to receive SIP NOTIFYs. If the SIP NOTIFY carries this type of XML, the receiving endpoint device 1 1 1 can extract information about media content streaming IDs and media content stream descriptions by finding all ⁇ media> elements in the received XML. This produces a valid request list of available media ID's and their corresponding SSRC values.
  • a communication channel used for MESS may need to offer reliable transmission and a near real time response.
  • BFCP Binary Floor Control Protocol
  • RFC 4582 [Camarillo, G., Ott, J, and K. Drage, "The Binary Floor Control Protocol (BFCP)", RFC 4582, November 2006].
  • BFCP is a protocol that may already be supported by conference-aware nodes and clients (e.g., conference nodes 1 15 and endpoint devices 1 1 1). Existing implementations may thus be extended to handle any newly defined messages. Moreover, BFCP uses a reliable transport. In the context of media content stream selection, BFCP may be related and may thus be a feasible choice.
  • MESS messages defined in this disclosure may be provided as extensions to existing messages described in BFCP (see, RFC 4582, cited above). Accordingly, these MESS messages may be independent of any other message and may be implemented separately from legacy messages.
  • Legacy floor control functionality of BFCP may require additional protocols to handle floor creation, but this may not be needed by MESS and may thus be outside a scope of this disclosure.
  • Floor creation is described, for example, in SDP for BFCP ⁇ see, RFC 4583, Camarillo, G., "Session Description Protocol (SDP) Format for Binary Floor Control Protocol (BFCP) Streams", RFC 4583, November 2006].
  • BFCP (see, RFC 4582, cited above) defines 13 primitives used in BFCP. To implement
  • MESS as an extension to BFCP may require this set of primitives to be extended with another one called “MediaSelection” having a value, for example, of 32.
  • MESS may use the same common header, referred to as COMMON-HEADER, as defined in BFCP (see, RFC 4582, cited above).
  • the attributes may also follow the same pattern as described in that RFC, i.e. they are in the format Type-Length-Value, as shown in the media selection primitive of Figure 6 where FCS represents Floor Control Server Media Selection Primitives.
  • MESS may also define a set of new attributes as shown by the media selection attributes of Figure 7.
  • the OPERATION attribute of Figure 7 may have a format according to some .
  • the Operation id field of the OPERATION attribute contains a 16-bit vale that identifies an operation to be performed.
  • defined entries for the OPERATION attribute i.e., MESS Operations
  • MESS Operations may include: “Include”, “Exclude”, “Substitute”, “Reset”, and "Reset All”.
  • the MEDIA-IDENTIFICATION attribute is a grouped attribute consisting of a header, referred to as MEDIA-IDENTIFICATION-HEADER with identification type information followed by a sequence of other MEDIA-IDENTIFICATION attributes.
  • a format of the MEDIA-IDENTIFICATION- HEADER is illustrated in Figure 10.
  • the ID Type field is a 8 bit field describing the type of media id. Defined types in this disclosure may include the MESS Media Identification Types illustrated in Figure 1 1.
  • the Media ID field may contain different information based on the ID Type.
  • the Media ID field in MEDIA-IDENTIFICATION attributes of type "User” may only be allowed to hold MEDIA-IDENTIFICATION of type "Endpoint”
  • Media ID field in MEDIA-IDENTIFICATION attributes of type "Endpoint” may only be allowed to hold MEDIA-IDENTIFICATION attributes of type "Media”.
  • the Media ID field in MEDIA- IDENTIFICATION attributes of type "Media” may hold the actual media ID number.
  • This format may allow expression of tree-like identifications with attributes of type User being root node with attributes of endpoint devices 1 1 1 as leafs containing only attributes of type "Media” using structures as discussed, for example, in RFC 4582 [Crocker, Ed. D., "Augmented BNF for Syntax Specifications: ABNF", RFC 5234, January 2008].
  • MEDIA-IDENTIFICATION (USER-SUB-IDENTIFICATION /
  • MEDIA-SUB-IDENTIFICATION (MEDIA-IDENTIFICATION-HEADER) [0098] Defined Messages:
  • MESS defines 5 messages that may be used to control the media content stream to be received by an endpoint device 11 1.
  • Floor participants may use the messages in this clause without having obtained a floor, and floor servers may accept the messages from participants not owning the floor.
  • floor servers may accept the messages from participants not owning the floor.
  • the FLOOR-ID may/shall be ignored by receivers of this message implementing embodiments of this disclosure, and senders implementing embodiments of this disclosure may/shall set it to 0.
  • a floor chair requires a floor participant to own the floor before using messages of this clause, they may/shall both follow regular BFCP floor control procedures as defined in BFCP ⁇ see, RFC 4582, cited above). For example, a floor participant not allowed to access the floor may receive a BFCP Error message containing Error Code 5 (Not authorized).
  • Extension attributes that may be defined in the future are referred to as EXTENSION- ATTRIBUTE in the ABNF (Augmented Backus-Naur Form), similarly as was done in section 5.3. of BFCP ⁇ see, RFC 4582, cited above).
  • MESS "Include” messages may be sent as BFCP messages with primitive "Media Selection” and the OPERATION attribute set to value "Include”.
  • a list of media identifications then follows representing media content streams that are always to be included from now on. Requests to Include an already included media content stream may/shall be ignored. Note that the message may be defined in a way that makes it additive and identifications for previously included media may/should not be included for every new request.
  • MESS "Exclude” messages may be sent as BFCP messages with primitive "Media
  • Requests to "Exclude” an already excluded media may/shall be ignored. Note that the message is defined in a way that makes it additive and identifications for previously excluded media may/should not be included for every new request.
  • MESS "Substitute" messages are sent as BFCP messages with primitive "Media
  • a pair of MEDIA-IDENTIFICATION' s may then follow where the first MEDIA-IDENTIFICATION indicates which media content stream to replace and the second indicates the media content stream to replace it with. Note that the passed
  • MEDIA-INDENTIFICATIONs typically need to be of type USER-SUB-IDENTIFICATION, since they in general do not refer to media from the same user, but other addressing may be sufficient.
  • MESS "Reset" messages are sent as BFCP messages with primitive "Media Selection” and the OPERATION attribute set to "Reset".
  • the message carries a list of MEDIA-IDENTIFICATION to be reset. It may not matter if the media content stream described by MEDIA-IDENTIFICATION has been previously excluded, previously included, or neither previously excluded nor included. The result at the floor control may always be the same, and the media associated with the received ID will no longer be subject to explicit inclusion/exclusion. Requests to "Reset" an already reset media may/shall be ignored.
  • MESS "Reset AH" messages are sent as BFCP messages with primitive "Media Selection” and the OPERATION attribute set to "Reset All".
  • a "Reset All” message has no attributes.
  • the message is equivalent to a MESS Reset message including MEDIA-IDENTIFICATION attributes of all streams that have previously been specified in "Include”, “Exclude” or as second MEDIA- IDENTIFICATION attribute in "Substitute”, effectively releasing all existing media content streams from being subject to inclusion/exclusion. This operation can fully reset the inclusion/exclusion state even if the requesting endpoint device 1 1 1 has lost track of what restrictions were previously applied.
  • BFCP (see, RFC 4582, cited above) defines attributes for error handling.
  • the BFCP Error message in BFCP section 5.3.13 (see, RFC 4582, cited above) may/shall be used also for error reporting applicable to this RFC.
  • BFCP (see, RFC 4582, cited above) defines 9 error codes used in floor control. This disclosure defines five addtional error codes that may be used in MESS responses as shown in Figure 12. An exact reason for a failure may be included as UTF8 (Unicode Transformation Format-8) encoded text in the field "Error specific details" of the BFCP ERROR-CODE attribute. The ERROR-INFO attribute MAY also be used.
  • RTP is a widely used protocol to transfer media content streams. Usage of MESS when media transport is handled using RTP might impact how RTCP reports may/must be handled when excluding media.
  • RTP Translator see, RFC 51 17, Westerlund, M. and S. Wenger, "RTP Topologies", RFC 51 17, January 2008] exists between endpoint devices 1 1 1 and if the RTP Translator is able to adjust its forwarding rules based on the signaling defined in this disclosure, RTCP reporting may become inconsistent for an excluded media content stream. As this potential issue may be outside the scope of the present disclosure, further discussion thereof is omitted.
  • a client e.g., a user of an endpoint device 1 1
  • a conference node 1 15 in the network then sends the following SIP NOTIFY sample body to subscribed clients (e.g., endpoint devices 1 1 1).
  • Any subscribing endpoint device 1 1 1 that receives this information can now actively request the "Alice cam" media from sip:alice@example.com to be explicitly included in received media content streams. This may be accomplished by sending an Include message as defined in this disclosure (some fields not encoded for clarity) as shown in Figure 13. The receiver of this message may/must send a response as soon as possible according to some embodiments.
  • a new registry may be started for this disclosure with:
  • RTCP traffic to/from endpoint devices 1 1 1 may expose information about endpoint devices 1 1 1 excluding other endpoint devices 1 1 1.
  • Previously received RTCP traffic replaced with no traffic or some kind of yet-to-be-defined exclusion report to keep RTCP behavior intact
  • FIG. 4 is a flow chart illustrating operations of conference node 115 according to some embodiments.
  • a conference session may be initiated at block 401 , either by conference node 115 and/or by an endpoint device or devices 1 1 1.
  • processor 231 may be programmed to provide a conference session at an arranged time, and the conference session may be initiated once one or more invited endpoint devices 1 1 1 join the conference session.
  • processor 231 may receive media content streams from participating endpoint devices, and at block 403, processor 231 may initially provide media content streams to participating endpoint devices using default selection criteria when they join the conference session.
  • the initial default media content stream selection may be based on audio volumes accocitated with the respective media content streams.
  • processor 231 may select one of the media content streams that is provided to all endpoint devices, and processor 231 may continue to select this media content stream for each endpoint that remains in the default condition.
  • processor 231 may generate identification information (including media content streaming IDs such as SSRCs) for each of the initial media content streams generated by the initial endpoint devices at block 404.
  • Processor 231 may generate the identification information based on information provided by the respective endpoint devices 1 1 1.
  • a conference session may change at block 405 anytime another media content stream is added to the conference session (e.g., when another endpoint device 1 1 1 joins the conference session) or when a media content stream is no longer to be included in the conference session (e.g., when an endpoint device 11 1 leaves the conference session).
  • processor 231 may generate identification information (including media content streaming IDs) for each current media content stream generated by the current endpoint devices at block 407.
  • processor 231 may maintain current identification information for the current media content streams of the conference session.
  • processor 231 may publish the identification information (including the media content streaming IDs) to each of the endpoint devices 1 1 1 currently participating in the conference session. Accordingly, all endpoint devices 1 1 1 may be provided with current identification information for all available media content streams for the conference session, and each endpoint device may use this information to select one or more of the media content streams.
  • a streaming selection messages may include one of the following messages (each of which is discussed above): "Include” message; “Exclude” message; “Substitute” message; “Reset” message; and "Reset All” message.
  • processor 231 may disregard a previous selection criteria for that endpoint device (e.g., default selection based on volume) and instead provide the media content stream identified in the "Include” message.
  • a previous selection criteria for that endpoint device e.g., default selection based on volume
  • processor 231 may exclude the identified media content stream from selection for the endpoint device 1 1 1 from which the "Exclude” message is received. If media content stream selection for endpoint device 1 1 1 is currently based on comparitive audio volumes, for example, volume based selection may continue for the endpoint device with the change that the excluded media content stream will not be considered and will be excluded even if its volume is the greatest.
  • processor 231 may substitute one of the two identified media content streams for the other. If the endpoint device 1 1 1 provides/renders multiple media content streams (e.g., using a split screen display, multiple displays, etc.), for example, the "Substitute” message may allow substitution of one media content stream for another using one command without affecting any of the other content media streams that are being provided/rendered.
  • processor 231 may reset any previously applied selections that may have been applied to the identified media content stream.
  • Processor 231, for example, may remove any explicit “Include” or “Exclude” selections that may have been applied to the identified media content stream.
  • processor 231 may reset any previous selections that may have been applied for the endpoint device with respect to all of the media content streams. For example, processor 231 may revert to a default selection criteria (e.g., based on volume) for the endpoint device 1 1 1 that sent the "Reset All” message.
  • a default selection criteria e.g., based on volume
  • Operations of Figure 4 may continue until the conference session is terminated at block 417.
  • the conference session may terminate, for example, if an allowed time for the conference session has expired, if all endpoint devices 1 1 1 have left the conference session, if an initiating endpoint device 1 1 1 terminates the conference session, etc.
  • FIG. 5 is a flow chart illustrating operations of an endpoint device 1 11 according to some embodiments.
  • an endpoint device 1 1 1 may initiate and/or join a conferencing session and/or peer to peer call session supported by conference node 1 15.
  • processor 131 may provide information regarding a media content stream or streams that will be provided by endpoint device 1 1 1 (e.g., responsive to input from video camera 125 and/or microphone 127) during the conference session. More particularly, processor 131 may provide sufficient information to allow conference node 1 15 to generate identification information for the respective media content stream or streams to be provided by endpoint device 1 1 1.
  • operations of block 502 may be performed when endpoint device 1 1 1 joins/initiates a conference session, and any time endpoint device 1 1 1 changes (e.g., adds or terminates) a media content stream that is provided during the conference session.
  • processor 131 may receive a content media stream or streams provided by conference node 1 15.
  • processor 131 may provide/render a media content stream or streams provided from conference node 1 15. More particulary, processor 131 may provide/render the media stream/streams using display 121 and speaker 123.
  • conference node 115 may provide the media content stream or streams according to a default (e.g., based on audio volume).
  • processor 131 may receive identification information (including media content stream IDs such as SSRCs) for each media content stream that is currently available for the conference session.
  • This identification information corresponds to the identification information published by conference node 1 15 at block 409. This information may be updated by conference node 1 15 any time there is a change of media content streams for the conference session.
  • this identification information may be used by endpoint device 111 , for example, to modify a selection of a media content stream or streams using selection messages (e.g., "Include”, “Exclude”, “Substitute”, “Reset”, “Reset All”, etc.) as discussed above.
  • a selection message (e.g., "Include”, “Exclude”, “Substitute”, “Reset”, “Reset All”, etc.) is entered (e.g., responsive to user input thorugh user input interface 129) at block 507, processor 131 may transmit the selection message at block 509.
  • a graphical user interface may be provided using a portion of display 121, and the graphical user interface may allow user selection of a content media stream or streams (based on identification information for the media content streams received at block 505) and selection message.
  • processor 131 may continue providing/rendering a media content stream received/provided from conference node at blocks 503 and/or 504 until the conference session is terminated and/or endpoint device 1 1 1 leaves the conference session at block 511.
  • an RTP mixer may be provided as a conference node 115 between a plurality of endpoint devices 1 11 participating in a communication session, and a plurality of transport channels may be provided between the RTP mixer and at least one of the endpoint devices.
  • endpoint device 11 1-1 may provide enough display capacity (e.g., multiple display screens and/or a sufficiently large display screen) and sufficiently powerful hardware to simultaneously decode/render/present two or more high definition (HD) media content streams in parallel, and multiple transport channels may be provided between the RTP mixer and endpoint device 1 1 1-1 to support multiple parallel media content data streams between the RTP mixer and endpoint device 1 1 1-1.
  • HDMI high definition
  • Endpoint device 1 1 1 -1 may thus set up two media content stream transport channels to/from the RTP mixer (wherein the transport channels may or may not be full duplex channels).
  • the RTP mixer can then send two media content streams to endpoint device 1 1 1-1 in parallel.
  • endpoint device 1 1 1 -1 may separately apply selection messages (e.g., "Include”, “Exclude”, “Substitute”, “Reset”, and/or "Reset All” messages) to the different media content streams being received by the endpoint device 1 1 1-1.
  • selection messages e.g., "Include”, “Exclude”, “Substitute”, “Reset”, and/or "Reset All” messages
  • each selection message generated by endpoint device 1 1 1-1 may include an identifictation of the transport channel to which the selection message should apply allowing selective control of the different media content streams.
  • Endpoint device 1 1 1 -1 may thus generate a selection message that is applied to one of the transport channels without affecting the other.
  • the RTP mixer may use the identification of the transport channel in a selection message to separately control a media content stream(s) provided to endpoint device 1 1 1 -1 over the identified transport channel.
  • Coupled may include wirelessly coupled, connected, or responsive.
  • the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise.
  • Well-known functions or constructions may not be described in detail for brevity and/or clarity.
  • the term “and/or”, abbreviated “/”, includes any and all combinations of one or more of the associated listed items.
  • the terms “comprise”, “comprising”, “comprises”, “include”, “including”, “includes”, “have”, “has”, “having”, or variants thereof are open-ended, and include one or more stated features, integers, nodes, steps, components or functions but do not preclude the presence or addition of one or more other features, integers, nodes, steps, components, functions or groups thereof.
  • the common abbreviation “e.g.”, which derives from the Latin phrase “exempli gratia” may be used to introduce or specify a general example or examples of a previously mentioned item, and is not intended to be limiting of such item.
  • the common abbreviation “i.e.”, which derives from the Latin phrase “id est,” may be used to specify a particular item from a more general recitation.
  • Example embodiments are described herein with reference to block diagrams and/or flowchart illustrations of computer-implemented methods, apparatus (systems and/or devices) and/or computer program products. It is understood that a block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions that are performed by one or more computer circuits.
  • These computer program instructions may be provided to a processor circuit of a general purpose computer circuit, special purpose computer circuit, and/or other programmable data processing circuit to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, transform and control transistors, values stored in memory locations, and other hardware components within such circuitry to implement the functions/acts specified in the block diagrams and/or flowchart block or blocks, and thereby create means
  • These computer program instructions may also be stored in a tangible computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instructions which implement the functions/acts specified in the block diagrams and/or flowchart block or blocks.
  • a tangible, non-transitory computer-readable medium may include an electronic, magnetic, optical, electromagnetic, or semiconductor data storage system, apparatus, or device. More specific examples of the computer-readable medium would include the following: a portable computer diskette, a random access memory (RAM) circuit, a read-only memory (ROM) circuit, an erasable programmable read-only memory (EPROM or Flash memory) circuit, a portable compact disc read-only memory (CD-ROM), and a portable digital video disc read-only memory (DVD/BlueRay).
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • CD-ROM compact disc read-only memory
  • DVD/BlueRay portable digital video disc read-only memory
  • the computer program instructions may also be loaded onto a computer and/or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer and/or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.
  • embodiments of the present invention may be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.) that runs on a processor such as a digital signal processor, which may collectively be referred to as "circuitry,” "a module” or variants thereof.

Abstract

An endpoint device may operate in a real time communication session including a plurality of real time media content streams provided by at least one other endpoint device. The endpoint device may receive ( 503 ) a first one of the real time media content streams of the communication session from a remote communication node in accordance with a first selection criteria, and the endpoint device may render ( 504 ) the first real time media content stream of the conference communication for display. The endpoint device may then generate ( 507 ) a selection message identifying a second selection criteria to be used by the remote communications node to select at least one of the media content streams of the communication session for reception at the endpoint device, and the selection message may be transmitted ( 509 ) to the remote communications node. A second one of the real time media content streams of the communication session may be received ( 503 ) from the remote communication node in accordance with the second selection criteria, and the second real time media content stream of the communication session may be rendered ( 504 ) for display. Related methods and communication nodes are also discussed.

Description

COMMUNICATION METHODS PROVIDING MEDIA CONTENT STREAM SELECTION AND RELATED SYSTEM
TECHNICAL FIELD
[0001] The present invention relates to communications systems and, more particularly, to systems and methods providing real-time communications.
BACKGROUND
[0002] Multimedia conferencing may allow real time streaming between two or more endpoint devices over a network such as the Internet to provide real time communications. Each of a plurality of endpoint devices participating in a video conference session, for example, may generate a multimedia content stream including video and audio content, and a central conference node may select a content stream or streams that is/are provided to each of the endpoint devices. A conference node, for example, may select a content stream based on a comparison of audio content/volume associated with each of the content streams. Stated in other words, the conference node may attempt to select a content stream corresponding to a speaker that is currently most active in the conference call. In such an
implementation, however, selection of a content stream may be undesirably biased toward an endpoint device generating the greatest audio volume, for example, due to undesired noise.
SUMMARY
[0003] It may therefore be an object to address at least some of the above mentioned
disadvantages and/or to improve performance in a communication system.
[0004] According to some embodiments, an endpoint device may operate during a real time communication session including a plurality of real time media content streams provided by at least one other endpoint device. A first one of the real time media content streams of the communication session may be received at the endpoint device from a remote communication node in accordance with a first selection criteria. The first real time media content stream of the conference communication may be rendered for display at the endpoint device, and a selection message may be generated identifying a second selection criteria to be used by the remote communications node to select at least one of the media content streams of the communication session for reception at the endpoint device. The selection message may be transmitted to the remote communications node, and a second one of the real time media content streams of the communication session may be received from the remote communication node in accordance with the second selection criteria. The second real time media content stream of the communication session may then be rendered for display.
[0005] A user of the endpoint device may thus be able to control the media content stream or streams to be rendered at the endpoint device during a communication session including a plurality of media content streams from another/other endpoint devices involved in the communication session. In a video conferencing session, for example, a participant using an endpoint device may thus elect to include or exclude a particular media content stream from another participant, to substitute one media content stream for another, reset a media content stream to a default selection determined remotely, or to reset all media content streams to a default determined remotely.
[0006] Identification information may be received from the remote communication node wherein the identification information includes first identification information for the first real time media content stream and second identification information for the second real time media content stream. Moreover, the selection message may include the first identification information and/or the second identification information for at least one of the first and/or the second real time media content streams. The selection message may include the second identification information identifying the second real time media content stream to request that the second real time media content stream be included in transmissions to the endpoint device. The selection message may include the first identification information for the first real time media content stream to request that the first real time media content stream be excluded from transmissions to the endpoint device. The selection message may include the first identification information for the first real time media content stream and the second identification information for the second real time media content stream to request that the second real time media content stream be · substituted for the first real time media content stream in transmissions to the endpoint device.
[0007] According to some other embodiments, an endpoint device for media content
communications may include a network interface configured to provide a data coupling over a network and a processor coupled to the network interface. The processor may be configured to receive a first one of a plurality of real time media content streams of a real time communication session through the network interface from a remote communication node in accordance with a first selection criteria, with the plurality of real time media content streams being provided by at least one other endpoint device, and to render the first real time media content stream of the conference communication for display. The processor may be further configured to generate a selection message identifying a second selection criteria to be used by the remote communications node to select at least one of the media content streams of the communication session for reception at the endpoint device. The processor may be configured to transmit the selection message through the network interface to the remote communications node, and to receive a second one of the real time media content streams of the communication session from the remote communication node in accordance with the second selection criteria, and to render the second real time media content stream of the communication session for display.
[0008] The processor may be further configured to receive identification information through the network interface from the remote communication node, wherein the identification information includes first identification information for the first real time media content stream and second identification information for the second real time media content stream, and wherein the selection message includes the first identification information and/or the second identification information for at least one of the first and/or the second real time media content streams. The selection message may include the second identification information identifying the second real time media content stream to request that the second real time media content stream be included in transmissions to the endpoint device. The selection message may include the first identification information for the first real time media content stream to request that the first real time media content stream be excluded from transmissions to the endpoint device. The selection message may include the first identification information for the first real time media content stream and the second identification information for the second real time media content stream to request that the second real time media content stream be substituted for the first real time media content stream in transmissions to the endpoint device.
[0009] According to still other embodiments, a communication node may support a real time communication session between a plurality of remote endpoint devices generating a plurality of real time media content streams. The plurality of real time media content streams for the communication session may be received at the communication node from the plurality of remote endpoint devices. A first one of the real time media content streams of the communication session may be provided to a first one of the endpoint devices in accordance with a first selection criteria for the first endpoint device. A selection message may be received from the first endpoint device to identify a second selection criteria for the first endpoint device, and a second one of the real time media content streams of the communication session may be provided to the first endpoint device in accordance with the second selection criteria.
[0010] Identification information for each of the plurality of real time media content streams may be generated, and the identification information for each of the plurality of real time media content streams may be provided to each of the endpoint devices. Moreover, the selection message may include identification information for at least one of the first and/or the second real time media content streams. The selection message may include the second identification information identifying the second real time media content stream to be included in transmissions to the first endpoint device. The selection message may include the first identification information for the first real time media content stream to be excluded from transmissions to the first endpoint device. The selection message may include the first
identification information for the first real time media content stream and the second identification information for the second real time media content stream to substitute the second real time media content stream for the first real time media content stream in transmissions to the first endpoint device.
[0011] According to yet further embodiments, a communication node may support a real time communication session between a plurality of remote endpoint devices generating a plurality of real time media content streams. The communication node may include a network interface configured to provide a data coupling over a network, and a processor coupled to the network interface. The processor may be configured to receive the plurality of real time media content streams for the communication session through the network interface from the plurality of remote endpoint devices, and to provide a first one of the real time media content streams of the communication session through the network interface to a first one of the endpoint devices in accordance with a first selection criteria for the first endpoint device. The processor may be further configured to receive a selection message from the first endpoint device through the network interface to identify a second selection criteria for the first endpoint device, and to provide a second one of the real time media content streams of the communication session to the first endpoint device in accordance with the second selection criteria.
[0012] The processor may be further configured to generate identification information for each of the plurality of real time media content streams, and to provide the identification information for each of the plurality of real time media content streams to each of the endpoint devices. Moreover, the selection message may include identification information for at least one of the first and/or the second real time media content streams. The selection message may include the second identification information identifying the second real time media content stream to be included in transmissions to the first endpoint device. The selection message may include the first identification information for the first real time media content stream to be excluded from transmissions to the first endpoint device. The selection message may include the first identification information for the first real time media content stream and the second identification information for the second real time media content stream to substitute the second real time media content stream for the first real time media content stream in transmissions to the first endpoint device.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this application, illustrate certain non-limiting embodiment(s) of the invention. In the drawings:
[0014] Figure 1 is a schematic diagram illustrating a plurality of endpoint devices and a conference node communicating through a network according to some embodiments; [0015] Figure 2 is a block diagram illustrating an endpoint device of Figure 1 according to some embodiments;
[0016] Figure 3 is a block diagram illustrating a conference node of Figure 1 according to some embodiments;
[0017] Figures 4-5 are flow charts illustrating operations of endpoint devices and/or conference nodes according to some embodiments;
[0018] Figure 6 illustrates a media selection primitive where FCS represents Floor Control Server
Media Selection Primitives according to some embodiments;
[0019] Figure 7 illustrates media selection attributes according to some embodiments;
[0020] Figure 8 illustrates a format of the OPERATION attribute of Figure 7 according to some embodiments;
[0021] Figure 9 illustrates defined entries (i.e., MESS Operations) for the OPERATION attribute
(i.e., MESS Operations) of Figure 8 according to some embodiments;
[0022] Figure 10 illustrates a format of a MEDIA-IDENTIFICATION-HEADER according to some embodiments;
[0023] Figure 1 1 illustrates MESS Media Identification Types according to some embodiments;
[0024] Figure 12 illustrates additional Media Selection Error Codes according to some embodiments; and
[0025] Figure 13 illustrates a structure of an Include message according to some embodiments.
DETAILED DESCRIPTION
[0026] In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these specific details. In other instances, well- known methods, procedures, components and circuits have not been described in detail so as not to obscure the present invention. Moreover, the terms "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" may be used herein with respect to particular embodiments without limiting the scope of the present invention. Stated in other words by way of example, an element(s), operation(s), step(s), etc., may be required with respect to a particular embodiment without being required for all embodiments. Accordingly, these terms should not be considered as limiting with respect to claims (in the present application and/or in future applications claiming priority from the present application) omitting the referenced element(s), operation(s), step(s), etc. Moreover, to the extent that terms such as "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" are used in the following disclosure, these terms may be interpreted in accordance with RFC (Request For Comments) 21 19 [Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 21 19, March 1997].
[0027] The present disclosure describes media content stream selection in both conferencing communication embodiments (also referred to as group communications) and peer-to-peer
communication embodiments. To allow endpoint devices (also referred to as endpoints) to select a specific media content stream or streams, all available media content streams in the session may be identifyable and secure transport of messages may be provided between endpoint devices and a network communications node (e.g., a conference node). Distribution of the identification information to all endpoint devices participating in a conferencing session is also discussed. Necessary messages may potentially be mapped onto several different encodings, and one mapping is proposed that uses an extended version of the Binary Floor Control Protocol (BFCP).
[0028] The importance and usage of multimedia conferencing are increasing. Setup up of a multimedia conference may be well defined, for example, using SIP (Session Initiated Protocol) and/or SDP (Session Description Protocol). When SIP/SDP is used for session setup, however, little or no dynamic control may be allowed with respect to the media content to be received from other participants during the session. Some embodiments discussed herein may provide functionality that grants receiving endpoint devices capabilities to dynamically select the information and/or media content received from other participating clients (e.g., other endpoint devices participating in the session).
[0029] As used herein, the term media content stream refers to media (e.g., a video and/or audio media content stream) being sent from one specific media capture device (such as a microphone for audio media and/or a video camera for video media) at an endpoint communication device. The term endpoint refers to a communication device that handles media by originating one or more media content streams (e.g., originating audio and/or video streams using a microphone and/or video camera) and/or terminating one or more media content streams (e.g., generating audio and/or video output) received from one or more other endpoint devices. By way of example, an RTP (real-time transport protocol) Mixer may be considered as an endpoint.
[0030] In embodiments where one or more endpoint devices of a conference session offer more than one media content stream (e.g., multiple video cameras providing multiple video media content streams), but where a receiving endpoint device of the conference session cannot handle/output all simultaneous media content streams, there may be a need for the receiving endpoint device to actively and/or dynamically select the media content stream(s) to be received/output during an ongoing conference. According to some embodiments, for example, an endpoint device in a two-way video conference may provide a plurality of video media content streams from multiple video cameras capturing different aspects/views of a room, and/or multiple endpoint devices in a group video conference (with more than two participants) may provide respective video media content streams from respective video cameras. A receiving endpoint device, however, may only be able to render one video media content stream (e.g., due to hardware limitations such as small screen size). In such situations, currently available RTP (real time protocol) mixers may choose one of the video streams to display at the receiving endpoint device, for example, based on comparative levels of audio activity at speakers associated with the respective video cameras.
[0031] According to embodiments discussed herein, it may be possible to let receiving endpoint devices choose which media content stream(s) to receive, given that each endpoint device and/or the conference node publishes information about the media content stream or streams that is/are available to all other endpoint devices in the communication and given that a protocol to request a specific media content stream(s) from other endpoint devices is provided. In certain conditions, a conference node may act as an endpoint device, for example, where a conference node sees no reason to receive a specific media content stream from an endpoint and act as an endpoint requesting an exclude from the media content stream provider. This functionality may be provided using Media Stream Selection (MESS) as discussed herein. MESS describes how to generate and distribute media content stream information in both group conferencing embodiments and in point to point communication embodiments. This disclosure also describes how to set up a control channel to send messages between endpoint devices and further defines a set of messages that can be used to handle media content stream requests.
[0032] Figure 1 is a schematic diagram illustrating a plurality of endpoint devices 1 1 1-1 to 1 1 1-n participating in a streaming communication session (such as a video conferencing session) through network 101 (e.g., the Internet) and conference node 1 15 according to some embodiments. While at least five endpoint devices 1 1 1 are shown in Figure 1 by way of example, embodiments of the present invention may be implemented using any number of two or more endpoint devices. Once a video conference has been established, each endpoint device 11 1 included in a conference session may act as a sender endpoint device to generate a media content stream (including audio and video), and the respective media content streams from all endpoint devices 1 1 1 may be transmitted to conference node 1 15 through network 101. Each endpoint device 1 1 1 involved in the conference session may also act as as receiver endpoint device to receive one or more of the media content streams of the conference session. For each receiving endpoint device 1 11 , conference node 115 may then select a media content stream or streams, and the selected media content stream or streams may then be forwarded from the conference node 115 through network 101 to the respective endpoint nodes 1 1 1. More particularly, conference node 1 15 may select a media content stream to be sent to a respective endpoint device 1 1 1 responsive to input from the respective endpoint device 1 1 1. Stated in otherwords, each endpoint device 1 1 1 of a conference session may select a media content stream or streams of the conference session to be presented at that endpoint device 1 1 1. While Figure 1 shows five endpoint devices and a conference node by way of example, MESS may also be used in peer to peer embodiments with endpoint devices (e.g., two endpoint devices) coupled through network 101 without a conference node. Two endpoint devices in a peer to peer embodiment, for example, may each send and receive multiple media content streams, and each of the endpoint devices may use functionality of embodiments discussed herein to control the content media stream or streams that are received from the other endpoint device.
[0033] Figure 2 is a block diagram illustrating an endpoint device 1 1 1 of Figure 1 according to some embodiments. Endpoint device 11 1 , for example, may include processor 131 coupled to display 121 (e.g., a liquid crystal display screen providing a video output) or display output, user input interface 129 (e.g., including a keypad, a touch sensitive surface of display 121 , etc.), speaker 123 or speaker output, one or more video cameras 125 or video camera input(s), and one or more microphones 127 or microphone input(s). Inputs/outputs discussed above may be interfaces (e.g., couplings, jacks, etc.) for wired inputs/outputs and/or wireless interfaces (e.g., Bluetooth, WiFi, etc.). In addition, network interface 133 may provide a data/communications coupling between processor 131 and network 101. Endpoint device 1 1 1 , for example, may be a smartphone, a tablet computer, a netbook computer, a laptop computer, a desktop computer, a stationary video conferencing telephone, or any other device supporting multimedia conferencing. Accordingly, a coupling between network interface 133 and network 101 may be provided over a wireled coupling (e.g., using a digital subscriber line modem, a cable modem, etc.), over a wirelss coupling (e.g., over a 3G/4G wireless network, over a WiFi link, etc.), or over a combination thereof.
[0034] When implemented as a wireless mobile terminal such as a smartphone, a tablet computer, a netbook computer, or a laptop computer, for example, all elements of Figure 2 (inlcuding a video camera 125 and a microphone 127) may be provided within the housing of the mobile terminal. In such a mobile terminal, the built-in video camera and/or microphone may provide one media content stream, and video/audio output may be provided using a built-in speaker and display. In other embodiments, endpoint device 1 1 1 may not include a built-in video camera, microphone, speaker, and/or display.
Instead, such a device may include inputs for one or more external video cameras and/or microphones and outputs for one or more displays and/or speakers. With a video conferencing system for a larger conference room setting, for example, a plurality of external cameras and associated microphones may be positioned around the conference room and coupled to processor 131 through video/microphone inputs 125/127, and display and speaker outputs may be coupled to an external display/speaker (e.g., a large screen display, a projection display, etc.). An endpoint device 1 11 may thus provide one or more media content streams responsive to one or more video/microphone pairs. If an endpoint device provides more than one media content stream, each media content stream may be separately ideintified for selection by other endpoint devices involved in the communication session.
[0035] As discussed in greater detail below, each endpoint device 1 1 1 may also provide/render one or more media content streams of the communication session through display/speaker 121 and 123 (and/or through an external display/speaker), and each endpoint device 1 1 1 may dynamically include, exclude, and/or substitute one or more of the media content streams of the communication session that is/are to be provided/rendered during the communication session. In an endpoint device 1 1 1 , such as a smartphone, with a limited display size, a single media content stream of the communication session may be selected at any time. When a larger display is provided (e.g., with a desk top computer, an external display, etc.), multiple media content streams may be selected at any time for simultaneous presentation on different portions of display 121.
[0036] Figure 3 is a block diagram illustrating a conference node 1 15 of Figure 1 according to some embodiments. As shown in Figure 3, conference node 1 15 may include processor 231 and network interface 233, with network interface 233 providing a data/communications coupling between processor 231 and network 101. Processor 231 may thus receive one or more media content streams from each endpoint device 1 1 1 involved in a communication session, and processor 231 may provide an identificaction for each of the media content streams for the communication session. Processor 231 may further publish these identifications to each endpoint device 11 1 involved in the communication session. Each endpoint processor 131 may thus receive the identifications of the media content streams of the communication session, and an endpoint processor 131 may use these identifications to dynamically select one or more of the media content streams of the communication session. An endpoint processor 131, for example, may transmit a selection instruction including an identification of a selected media content stream back to conference node 1 15 processor 231 to include/exclude/substitute/reset a selected media content stream(s) for the respective endpoint device.
[0037] Use Cases for MESS:
[0038] The following sections (including "Include Media Content Stream", "Exclude Media
Content Stream", "Substitute Media Content Stream", "Reset Media Content Stream", and "Reset All") present some embodiments of use cases targeted by MESS. In some embodiments, an endpoint device 1 1 1 participating in a conference/group communication may receive a media content stream or streams (e.g., a video stream) from a centralized conference node 1 15. More particularly, all participating endpoint devices 1 1 1 publish information identifying the media content stream or streams offered by the respective endpoint devices 1 1 1. There may be more available media content streams from other participants in the conference/group communication than what the receiving endpoint device 1 1 1 can present (e.g., display /render) simultaneously, and the conference node 1 15 may present a media content stream(s) to the receiving endpoint device 1 1 1 based on a request/selection from the receiving endpoint device 1 1 1.
[0039] Include Media Content Stream:
[0040] An endpoint device 11 1 (or user thereof) may select the media content stream to be received from another endpoint device 1 1 1 based on the published media content stream information for that endpoint device 1 11. An endpoint device 1 11 can make new decisions about what content to receive dynamically at any time during the session.
[0041] Exclude Media Content Stream:
[0042] An endpoint device 11 1 (or user thereof) may choose to stop receiving content from another endpoint device 1 1 1 involved in the conference/group communication (also referred to as a session), for example, due to low quality or other reasons. The set of excluded media content streams during a session may be subject to change and an endpoint device 1 1 1 (or a user thereof) can make new decisions to exclude content dynamically at any time during the session.
[0043] Substitute Media Content Stream:
[0044] An endpoint device 11 1 may render a received media content stream, and the endpoint device (or user thereof) may choose to replace the received media content stream with some other available media content stream. This may be considered as an atomic combination of the Include and Exclude use-cases above, first excluding one media content stream, and effectively replacing it by including another media content stream. An endpoint device 1 1 1 can make new substitute decisions dynamically at any time during the session.
[0045] Reset Media Content Stream:
[0046] An endpoint device 1 1 1 (or user thereof) may no longer have any specific wish to always include or always exclude a certain media content stream, but may instead want to return decisions regarding forwarding media content streams or not to the conference node 115. An endpoint device 111 (or user thereof) can reset any previously included or excluded media content stream at any time during the session. According to some embodiments, at the beginning of a session, all media content streams may/shall have a state corresponding to being reset and may thus be under the conference node 115 policy control. This initial/reset condition may also be referred to as a default condition. If an endpoint device 1 1 1 is configured to simultaneously present a plurality of content media streams, for example, the Reset command may be used to selectively reset control of one of the plurality of content media streams.
[0047] Reset All:
[0048] An endpoint device 1 1 1 (or user thereof) may choose to remove all previous decisions about included and excluded media content streams. This method may be used as a shortcut to avoid repeated reset messages described above in the section "Reset Media Content Stream." When such a default condition is provided for an endpoint device 1 1 1, conference node 1 15 may select a media content stream corresponding to a greatest audio magnitude (e.g., assumed to be the currently speaking party).
[0049] Media Information:
[0050] To be able to identify the available media content streams according to some ·
embodiments, all different media content streams of a session are given respective unique media IDs (identifications). The given IDs may/must also be distributed/published to all participating endpoint devices 1 1 1. The following sections describe how to generate such IDs and how to distribute them.
[0051] As discussed below in the section "SDP Media Description", according to some embodiments, a media description may be signaled using SDP [see, RFC 4566, Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006], but other signaling methods may be used according to other embodiments, in which case mappings to SDP-specific lines and attributes may not apply and other mappings may instead be used.
[0052] The section "RTP Media Transport" (provided below) discloses a particular embodiment where RTP [see, RFC 3550, Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003] is used for media transport. Other media transports may be used, in which case the mapping to RTP may not apply and other mappings may instead be used.
[0053] Unique Media ID:
[0054] To request a specific media content stream, all involved endpoint devices 11 1 involved in a session may need to agree on how to uniquely identify different media content streams of the session with unique media content stream IDs.
[0055] There may be no particular algorithm to generate unique media content stream IDs, as such an algorithm may depend on which media transport is used. In such an algorithm, all of the media content stream IDs of a communication session may be unique among all communicating endpoint devices 1 1 1 of a session, and all endpoint devices 1 1 1 may share the same definitions of what media content streams are identified by what media content stream IDs.
[0056] Distribution of Media Information:
[0057] Assuming all available media content streams from all communicating endpoint devices
1 11 of a session are associated with respective unique media content stream IDs, those media content stream IDs may need to be distributed to endpoint devices 1 1 1 of the session wishing to actively control what content to receive. There might also be other interesting per-media related information to be distributed, such as, naming or describing individual media content streams to aid selection.
[0058] Publishing Media Information from Endpoint Devices: [0059] Endpoint devices 1 1 1 wishing to join a session are responsible to send information about media content streams they will make available to the other party or parties of the conferencing session (i.e., to the other endpoint devices 1 1 1 involved in the conferencing session). This may be done by generating media content stream IDs, or other sufficiently unique identifications that can be used to generate media content stream IDs for all transmitted media content streams. Depending on the capabilities of the signaling protocol used, an endpoint device 1 1 1 can also have the opportunity to convey other information in addition to the media content stream ID, such as e.g. describing or naming a media content stream(s) explicitly.
[0060] Publishing Media Information from Conference Nodes:
[0061] The SIP (Session Initiated Protocol) Event Package for Conference State of RFC 4575
[Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session Initiation Protocol (SIP) Event Package for Conference State", RFC 4575, August 2006] defines an XML (Extensible Markup Language) schema that may be used to distribute conferencing information. This schema defines elements (among others) for users, endpoint devices 1 1 1 , and media. The defined <media> element may include a media content stream ID attribute. This attribute may be used to carry generated media content stream IDs. This means that a media content stream ID may only need to be unique within an endpoint device 11 1 and session context and referring clients may/must use both user endpoint information and media content stream ID to uniquely identify a media content stream. User and endpoint device 1 11 information may be relevant in a scenario covering multiple users and/or endpoint devices 1 11 (e.g. where a middle node 1 15 is responsible for forwarding requests or making decisions about media content stream selection), but may be redundant for point to point embodiments.
[0062] Any description or naming of individual media content streams published by endpoint devices (as described in the previous section) may/should be included in the XML as body of <display- text>, which may be another sub element of <media>. There may exist alternatives to obtain naming and description information, but it may in general depend on what is supported by the used media description protocol.
[0063] Receiving Media Information:
[0064] Reception of media content stream information may depend on a context in which the receiving endpoint device 1 1 1 exists. In conferencing session embodiments, the distribution of media information may in general be different than distribution of media content stream information in a point to point session, which may/must be taken into account when defining use of MESS with media description protocols.
[0065] RTP Media Transport: [0066] When RTP is used for transmission of media content streams, a single RTP session can be used to transfer a number of different media content streams. In such embodiments, every received data packet may/must carry an identifier, or something that can be used as an identifier, to separate individual media content streams. Without such an identifier it may not be possible to demultiplex incoming packets correctly. Use of other protocols for transmission may have similar problems when
multiplexing.
[0067] In the case of RTP, SSRC may be used as the sole identifier, but to avoid changing a media content streaming ID if the SSRC changes (e.g. due to an SSRC collision), use of an identifier that is not dependent on, but related to, SSRC may be a better choice. The SSRC may uniquely identify each content media stream of a communication session.
[0068] According to RFC 4575 (cited above), a sub element of <media> defines an element <src- id> that may/must be used to carry the SSRC (Synchronization Source) selected for the corresponding media content steam. This may enable an endpoint device 11 1 to do reverse look-up of a media content stream ID on incoming packets using SSRC, or CSRC (Contributing Source) in the event that media content streams are aggregated by an RTP mixer.
[0069] SDP Media Description:
[0070] This section may apply for embodiments where SDP media description is used with RTP
Media Transport. Use of MESS with other media transport in SDP may be used according to other embodiments. The generated RTP media content streaming IDs may/must be included as SSRC attributes as described, for example, in RFC 5576 [Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media Attributes in the Session Description Protocol (SDP)", RFC 5576, June 2009].
[0071] Assuming a single media in an SDP media block, using an i-line (as described in SDP
RFC 4566, cited above) may be sufficient to name an individual media content stream. If a media block carries information about multiple SSRCs, this method may not be enough to name all different media content streams. For this purpose, a new source-specific attribute is proposed.
a=ssrc:<ssrc> information:<description>
[0072] A new, optional, source-specific attribute, with identical syntax and semantics of
<description> as the i-line <session description> in SDP (except that it is specified per SSRC) may provide a textual description of the media content stream represented by the SSRC included in the attribute declaration.
[0073] In the case of RTP, an intercepting node (e.g., a conference node 115) in the network may be responsible for generating media descriptions upon reception of the actual RTP media content stream. However, such a solution may suffer if all media is not sent to that node at all times. This may introduce a delay of media description creation until the intercepting node has received RTP packets from all media sources.
[0074] In cases where a Media Gateway and it's controller are separate entities [see, e.g., RFC
3435, Andreasen, F. and B. Foster, "Media Gateway Control Protocol (MGCP) Version 1.0", RFC 3435, January 2003], such as in a 3GPP (3rd Generation Partnership Project) IMS (Internet Protocol Multimedia Subsystem) split architecture where an MRFP (Multimedia Resource Function Processor) and an MRFC (Multimedia Resource Function Controller) exchange SDP information (e.g. through H.248 or SIP), the MRFC receives the SIP INVITE with SDP from participating endpoint devices 1 1 1 and therefore also information about what SSRCs the endpoint devices 1 1 1 intend to use. The MRFP will see incoming SSRCs in the actual RTP media content streams, but not before any media traffic has occurred. The MRFC may also be responsible for publishing the conference XML data (see, RFC 4575, cited above), e.g. as a body in SIP NOTIFY to SUBSCRIBED endpoint devices 1 1 1. In short, the MRFC, or any other node 115 acting as a Conference AS (Application Server), may have the best information to generate and distribute media content streaming IDs and may be chosen as the responsible node 115.
[0075] There may be no significant difference in a call-out conferencing embodiment where a conference node 1 15 calls out to invited participants. The initial SDP will hold information about the capabilities of the network node 1 15 and responding endpoints provide answer SDP's with media description (including SSRC) of the intended/offered media content streams.
[0076] In a distributed conference with several involved Conferencing AS'es (and also if 3GPP
IMS split architecture is not used), the protocol to transfer media content streaming ID and SSRC information between Conferencing AS'es and/or MRFC's may be outside the scope of this disclosure.
[0077] A conference node 1 15 may/should try to locate information from endpoint devices 1 11 that name or describe individual media content streams in the SDP, and include the information in the body of the per-media <display-text> tag. According to some embodiments, the information may/should be taken from, in this order if more preferred information is missing:
1. The value from an "information" SSRC attribute described above;
2. The value from an i-line within the media block;
3. The value field of a label attribute [see, RFC 4574, Levin, O. and G. Camarillo, "The Session Description Protocol (SDP) Label Attribute", RFC 4574, August 2006] within the media block; and
4. The value from an i-line at the SDP session level.
[0078] Other sources of information may be used, may be more preferred, and/or the <di splay - text> may also be empty. The receiving client may, for example, use the <display-text> content to amend originating user and/or endpoint device 1 1 1 information presented to the receiving user with the media content stream specific information.
[0079] Point to Point Communications:
[0080] In point to point communication, endpoint devices 11 1 may publish SSRC information using SDP in request and response. This may, for example, be valid for the SDP in both the SIP INVITE and the corresponding 200 OK, or in any provisional responses.
[0081] The list of published SSRCs may be the list of offered media content streams available for request. Also, the SDP can be searched for the information attribute described above in the section "Publishing Media Information From Endpoint Devices" to extract information about naming of media content stream.
[0082] Conferencing Embodiments:
[0083] In conferencing embodiments, media content streaming information may be distributed using an XML body following a schema defined in Conference package (see, RFC 4575, cited above), e.g. carried by a SIP NOTIFY. For use with SIP and once a client has SUBSCRIBEd for conference information, it may/should be prepared to receive SIP NOTIFYs. If the SIP NOTIFY carries this type of XML, the receiving endpoint device 1 1 1 can extract information about media content streaming IDs and media content stream descriptions by finding all <media> elements in the received XML. This produces a valid request list of available media ID's and their corresponding SSRC values.
[0084] MESS Requests:
[0085] To request media content streams, a communication channel between the endpoint device
1 11 and the node 115 in control of the media content streams may need to be setup. This disclosure describes use of SIP/SDP for this purpose according to some embodiments, but other methods may be used. A communication channel used for MESS may need to offer reliable transmission and a near real time response.
[0086] Transport:
[0087] Binary Floor Control Protocol (BFCP) is described in RFC 4582 [Camarillo, G., Ott, J, and K. Drage, "The Binary Floor Control Protocol (BFCP)", RFC 4582, November 2006]. BFCP is a protocol that may already be supported by conference-aware nodes and clients (e.g., conference nodes 1 15 and endpoint devices 1 1 1). Existing implementations may thus be extended to handle any newly defined messages. Moreover, BFCP uses a reliable transport. In the context of media content stream selection, BFCP may be related and may thus be a feasible choice.
[0088] MESS messages defined in this disclosure may be provided as extensions to existing messages described in BFCP (see, RFC 4582, cited above). Accordingly, these MESS messages may be independent of any other message and may be implemented separately from legacy messages. [0089] Legacy floor control functionality of BFCP may require additional protocols to handle floor creation, but this may not be needed by MESS and may thus be outside a scope of this disclosure. Floor creation is described, for example, in SDP for BFCP {see, RFC 4583, Camarillo, G., "Session Description Protocol (SDP) Format for Binary Floor Control Protocol (BFCP) Streams", RFC 4583, November 2006].
[0090] BFCP Extensions:
[0091] BFCP (see, RFC 4582, cited above) defines 13 primitives used in BFCP. To implement
MESS as an extension to BFCP may require this set of primitives to be extended with another one called "MediaSelection" having a value, for example, of 32. MESS may use the same common header, referred to as COMMON-HEADER, as defined in BFCP (see, RFC 4582, cited above). The attributes may also follow the same pattern as described in that RFC, i.e. they are in the format Type-Length-Value, as shown in the media selection primitive of Figure 6 where FCS represents Floor Control Server Media Selection Primitives. In addition to the new primitive of Figure 6, MESS may also define a set of new attributes as shown by the media selection attributes of Figure 7.
[0092] OPERATION:
[0093] The OPERATION attribute of Figure 7 may have a format according to some .
embodiments as illustrated in Figure 8. The Operation id field of the OPERATION attribute contains a 16-bit vale that identifies an operation to be performed. As shown in Figure 9, defined entries for the OPERATION attribute (i.e., MESS Operations) according this disclosure may include: "Include", "Exclude", "Substitute", "Reset", and "Reset All".
[0094] MEDIA-IDENTIFICATION
[0095] The MEDIA-IDENTIFICATION attribute is a grouped attribute consisting of a header, referred to as MEDIA-IDENTIFICATION-HEADER with identification type information followed by a sequence of other MEDIA-IDENTIFICATION attributes. A format of the MEDIA-IDENTIFICATION- HEADER is illustrated in Figure 10. The ID Type field is a 8 bit field describing the type of media id. Defined types in this disclosure may include the MESS Media Identification Types illustrated in Figure 1 1.
[0096] The following describes the format of the grouped attribute. The Media ID field may contain different information based on the ID Type. The Media ID field in MEDIA-IDENTIFICATION attributes of type "User" may only be allowed to hold MEDIA-IDENTIFICATION of type "Endpoint", and Media ID field in MEDIA-IDENTIFICATION attributes of type "Endpoint" may only be allowed to hold MEDIA-IDENTIFICATION attributes of type "Media". The Media ID field in MEDIA- IDENTIFICATION attributes of type "Media" may hold the actual media ID number. [0097] This format may allow expression of tree-like identifications with attributes of type User being root node with attributes of endpoint devices 1 1 1 as leafs containing only attributes of type "Media" using structures as discussed, for example, in RFC 4582 [Crocker, Ed. D., "Augmented BNF for Syntax Specifications: ABNF", RFC 5234, January 2008].
MEDIA-IDENTIFICATION = (USER-SUB-IDENTIFICATION /
ENDPOINT-SUB-IDENTIFICATION /
MEDIA-SUB-IDENTIFICATION)
USER-SUB-IDENTIFICATION = (MEDIA-IDENTIFICATION-HEADER)
[ENDPOINT-SUB-IDENTIFICATION]
ENDPOINT-SUB-IDENTIFICATION = (MEDIA-IDENTIFICATION- HEADER)
[MEDIA-SUB-IDENTIFICATION] MEDIA-SUB-IDENTIFICATION = (MEDIA-IDENTIFICATION-HEADER) [0098] Defined Messages:
[0099] MESS defines 5 messages that may be used to control the media content stream to be received by an endpoint device 11 1.
[00100] Floor participants may use the messages in this clause without having obtained a floor, and floor servers may accept the messages from participants not owning the floor. When floor control is bypassed in this way, the FLOOR-ID may/shall be ignored by receivers of this message implementing embodiments of this disclosure, and senders implementing embodiments of this disclosure may/shall set it to 0.
[00101] If a floor chair requires a floor participant to own the floor before using messages of this clause, they may/shall both follow regular BFCP floor control procedures as defined in BFCP {see, RFC 4582, cited above). For example, a floor participant not allowed to access the floor may receive a BFCP Error message containing Error Code 5 (Not authorized).
[00102] When a floor control server implementing embodiments of this disclosure sends a BFCP SUPPORTED-PRIMITIVES attribute, the codes for messages defined in this clause may/must be included in the Primitives list.
[00103] Extension attributes that may be defined in the future are referred to as EXTENSION- ATTRIBUTE in the ABNF (Augmented Backus-Naur Form), similarly as was done in section 5.3. of BFCP {see, RFC 4582, cited above).
[00104] "Include" Message:
[00105] MESS "Include" messages may be sent as BFCP messages with primitive "Media Selection" and the OPERATION attribute set to value "Include". A list of media identifications then follows representing media content streams that are always to be included from now on. Requests to Include an already included media content stream may/shall be ignored. Note that the message may be defined in a way that makes it additive and identifications for previously included media may/should not be included for every new request.
Include = (COMMON-HEADER)
1 * (FLOOR-ID)
(OPERATION)
l *(MEDIA-IDENTIFICATION)
* [EXTENSION- ATTRIBUTE]
[00106] "Exclude" Message:
[00107] MESS "Exclude" messages may be sent as BFCP messages with primitive "Media
Selection" and the OPERATION attribute set to value "Exclude". A list of media identifications representing media content streams that are to always be excluded from now on may then follow.
Requests to "Exclude" an already excluded media may/shall be ignored. Note that the message is defined in a way that makes it additive and identifications for previously excluded media may/should not be included for every new request.
Exclude = (COMMON-HEADER)
l *(FLOOR-ID)
(OPERATION)
1 * (MEDIA-IDENTIFICATION)
* [EXTENSION-ATTRIBUTE]
[00108] "Substitute" Message:
[00109] MESS "Substitute" messages are sent as BFCP messages with primitive "Media
Selection" and the OPERATION attribute set to "Substitute". A pair of MEDIA-IDENTIFICATION' s may then follow where the first MEDIA-IDENTIFICATION indicates which media content stream to replace and the second indicates the media content stream to replace it with. Note that the passed
MEDIA-INDENTIFICATIONs typically need to be of type USER-SUB-IDENTIFICATION, since they in general do not refer to media from the same user, but other addressing may be sufficient.
Substitute = (COMMON-HEADER)
l *(FLOOR-ID)
(OPERATION)
1 * (MEDIA-IDENTIFICATION MEDIA-IDENTIFICATION) * [EXTENSION- ATTRIBUTE]
[00110] "Reset" Message:
[001 11] MESS "Reset" messages are sent as BFCP messages with primitive "Media Selection" and the OPERATION attribute set to "Reset". The message carries a list of MEDIA-IDENTIFICATION to be reset. It may not matter if the media content stream described by MEDIA-IDENTIFICATION has been previously excluded, previously included, or neither previously excluded nor included. The result at the floor control may always be the same, and the media associated with the received ID will no longer be subject to explicit inclusion/exclusion. Requests to "Reset" an already reset media may/shall be ignored.
Reset = (COMMON-HEADER)
1 * (FLOOR-ID)
(OPERATION)
1 * (MEDIA-IDENTIFICATION)
* [EXTENSION- ATTRIBUTE]
[001 12] "Reset All" Message:
[00113] MESS "Reset AH" messages are sent as BFCP messages with primitive "Media Selection" and the OPERATION attribute set to "Reset All". A "Reset All" message has no attributes. The message is equivalent to a MESS Reset message including MEDIA-IDENTIFICATION attributes of all streams that have previously been specified in "Include", "Exclude" or as second MEDIA- IDENTIFICATION attribute in "Substitute", effectively releasing all existing media content streams from being subject to inclusion/exclusion. This operation can fully reset the inclusion/exclusion state even if the requesting endpoint device 1 1 1 has lost track of what restrictions were previously applied.
Reset All = (COMMON-HEADER)
l *(FLOOR-ID)
(OPERATION)
* [EXTENSION-ATTRIBUTE]
[00114] MESS Responses:
[00115] This disclosure does not define any success responses, because the result of sent requests may/should be immediately apparent through which media content stream(s) is received.
[00116] BFCP (see, RFC 4582, cited above) defines attributes for error handling. The BFCP Error message in BFCP section 5.3.13 (see, RFC 4582, cited above) may/shall be used also for error reporting applicable to this RFC.
[001 17] BFCP (see, RFC 4582, cited above) defines 9 error codes used in floor control. This disclosure defines five addtional error codes that may be used in MESS responses as shown in Figure 12. An exact reason for a failure may be included as UTF8 (Unicode Transformation Format-8) encoded text in the field "Error specific details" of the BFCP ERROR-CODE attribute. The ERROR-INFO attribute MAY also be used.
[00118] RTP Implications:
[00119] RTP is a widely used protocol to transfer media content streams. Usage of MESS when media transport is handled using RTP might impact how RTCP reports may/must be handled when excluding media. In embodiments where an RTP Translator [see, RFC 51 17, Westerlund, M. and S. Wenger, "RTP Topologies", RFC 51 17, January 2008] exists between endpoint devices 1 1 1 and if the RTP Translator is able to adjust its forwarding rules based on the signaling defined in this disclosure, RTCP reporting may become inconsistent for an excluded media content stream. As this potential issue may be outside the scope of the present disclosure, further discussion thereof is omitted.
[00120] Examples of Embodiments:
[00121] Note that only relevant portions of the SDP are discussed with respect to embodiments disclosed below.
[00122] Embodiments Where A Client joins a conference:
[00123] A client (e.g., a user of an endpoint device 1 1 1) may join a conference by sending an SDP according to the following:
s=MESS Example Client
m=audio 49200 RTP/AVP 96
a=rtpmap:96 G719/48000/2
a=ssrc:521923924 cname:alice@foo. example. com
a=mid:l
m=video 49300 RTP/AVP 96
a=rtpmap:96 H264/90000
a=ssrc:834753488 cname:alice@foo. example. com
a=ssrc:834753488 information: "Alice cam"
a mid:2
a=content:main
In this SDP, Alice explicitly names her video content stream "Alice cam" using the new attribute defined in this disclosure. This information is associated with a specific SSRC. A conference node 1 15 in the network then sends the following SIP NOTIFY sample body to subscribed clients (e.g., endpoint devices 1 1 1).
<?xml version-" 1.0" encoding="UTF-8"?>
<conference-info
xmlns="urn:ietf:params:xml:ns:conference-info"
entity="sips:conf233@example.com"
state="full" version=" l ">
<!- OTHER CONFERENCE INFO -->
<users>
<!-- USER -->
<user entity="sip:alice@example.com" state="full">
<display-text>Alice</display-text>
<!- ENDPOINTS -->
<endpoint
entity="sip:4kfk4j392jsu@example.com;grid=433kj4j3u">
<status>connected</ status>
<!-- MORE INFORMATION ->
<!- MEDIA -->
<media id=" l "> <display-text>main video</display-text>
<type>Video</type>
<label> Alice cam</label>
<src-id>834753488</src-id>
<status>sendrecv</ status>
</media>
<!- POSSIBLY ADDITIONAL MEDIA ->
</endpoint>
<!- POSSIBLY ADDITIONAL ENDPOINTS ->
</user>
<!- ADDITIONAL USERS ->
</users>
<!- ADDITIONAL ELEMENTS -->
</ conference-info>
Any subscribing endpoint device 1 1 1 that receives this information can now actively request the "Alice cam" media from sip:alice@example.com to be explicitly included in received media content streams. This may be accomplished by sending an Include message as defined in this disclosure (some fields not encoded for clarity) as shown in Figure 13. The receiver of this message may/must send a response as soon as possible according to some embodiments.
[00124] IANA (Internet Assigned Numbers Authority) Considerations:
[00125] Following the guidelines in SDP RFC 4566 (cited above), in SDP Grouping Framework [see, RFC 5888, Camarillo, G. and H. Schulzrinne, "The Session Description Protocol (SDP) Grouping Framework", RFC 5888, June 2010] and in RTP (see, RFC 3550, cited above), the IANA may be requested to register a new source-specific attribute named "information" as discussed above in the section "Publishing Media Information From Endpoint Devices." The following entries may be added to the BFCP (see, RFC 4582, cited above) registry:
o Primitives from Figure 6;
o Attributes from Figure 7; and
o Error Codes from Figure 12.
A new registry may be started for this disclosure with:
o Operations from Figure 9; and
o Media Identification Types from Figure 1 1.
[00126] Security Considerations:
[00127] When using MESS there is a potential risk of exposing client behavior to other participants. Consider an example where multiple endpoint devices 11 1 participate in a conference, and media transport is provided using RTP. If the network between endpoint devices 111 contains one (or more) RTP translators and even if MESS communication is strictly between floor server and floor participant, RTCP traffic to/from endpoint devices 1 1 1 may expose information about endpoint devices 1 1 1 excluding other endpoint devices 1 1 1. Previously received RTCP traffic replaced with no traffic (or some kind of yet-to-be-defined exclusion report to keep RTCP behavior intact) may leak information about one endpoint device 1 11 excluding media content of another endpoint device(s) 1 1 1.
[00128] Conference node operations:
[00129] Figure 4 is a flow chart illustrating operations of conference node 115 according to some embodiments. For example, a conference session may be initiated at block 401 , either by conference node 115 and/or by an endpoint device or devices 1 1 1. For example, processor 231 may be programmed to provide a conference session at an arranged time, and the conference session may be initiated once one or more invited endpoint devices 1 1 1 join the conference session.
[00130] At block 402, processor 231 may receive media content streams from participating endpoint devices, and at block 403, processor 231 may initially provide media content streams to participating endpoint devices using default selection criteria when they join the conference session. The initial default media content stream selection, for example, may be based on audio volumes accocitated with the respective media content streams. In this condition, processor 231 may select one of the media content streams that is provided to all endpoint devices, and processor 231 may continue to select this media content stream for each endpoint that remains in the default condition. In addition, processor 231 may generate identification information (including media content streaming IDs such as SSRCs) for each of the initial media content streams generated by the initial endpoint devices at block 404. Processor 231 , for example, may generate the identification information based on information provided by the respective endpoint devices 1 1 1.
[00131] Moreover, a conference session may change at block 405 anytime another media content stream is added to the conference session (e.g., when another endpoint device 1 1 1 joins the conference session) or when a media content stream is no longer to be included in the conference session (e.g., when an endpoint device 11 1 leaves the conference session). When a conference session changes at block 405, processor 231 may generate identification information (including media content streaming IDs) for each current media content stream generated by the current endpoint devices at block 407. Through blocks 404, 405, and 407, processor 231 may maintain current identification information for the current media content streams of the conference session. At block 409, processor 231 may publish the identification information (including the media content streaming IDs) to each of the endpoint devices 1 1 1 currently participating in the conference session. Accordingly, all endpoint devices 1 1 1 may be provided with current identification information for all available media content streams for the conference session, and each endpoint device may use this information to select one or more of the media content streams.
[00132] If a selection message is received by processor 231 from an endpoint device 1 11 at block 41 1, processor 231 may update the media content streaming selection for the endpoint device 1 11 that transmitted the selection command at block 415, and processor 231 may provide the updated media content streams to the respeective endpoint devices at block 416. A streaming selection messages may include one of the following messages (each of which is discussed above): "Include" message; "Exclude" message; "Substitute" message; "Reset" message; and "Reset All" message.
[00133] When the "Include" message is received from an endpoint device 1 1 1 (including identification information for the media content stream to be provided to that endpoint device 1 1 1), processor 231 may disregard a previous selection criteria for that endpoint device (e.g., default selection based on volume) and instead provide the media content stream identified in the "Include" message.
[00134] When the "Exclude" message is received from an endpoint device 1 1 1 (including identification information for the media content stream to be excluded from that endpoint device 11 1), processor 231 may exclude the identified media content stream from selection for the endpoint device 1 1 1 from which the "Exclude" message is received. If media content stream selection for endpoint device 1 1 1 is currently based on comparitive audio volumes, for example, volume based selection may continue for the endpoint device with the change that the excluded media content stream will not be considered and will be excluded even if its volume is the greatest.
[00135] When the "Substitute" message is received from an endpoint device 1 1 1 (including identification information for two media content streams), processor 231 may substitute one of the two identified media content streams for the other. If the endpoint device 1 1 1 provides/renders multiple media content streams (e.g., using a split screen display, multiple displays, etc.), for example, the "Substitute" message may allow substitution of one media content stream for another using one command without affecting any of the other content media streams that are being provided/rendered.
[00136] When the "Reset" message is received from an endpoint device 1 1 1 (including
identification information for the media content stream to be reset for that endpoint device), processor 231 may reset any previously applied selections that may have been applied to the identified media content stream. Processor 231, for example, may remove any explicit "Include" or "Exclude" selections that may have been applied to the identified media content stream.
[00137] When the "Reset All" message is received from an endpoint device 1 11 (without identifying any of the media content streams), processor 231 may reset any previous selections that may have been applied for the endpoint device with respect to all of the media content streams. For example, processor 231 may revert to a default selection criteria (e.g., based on volume) for the endpoint device 1 1 1 that sent the "Reset All" message.
[00138] Operations of Figure 4 may continue until the conference session is terminated at block 417. The conference session may terminate, for example, if an allowed time for the conference session has expired, if all endpoint devices 1 1 1 have left the conference session, if an initiating endpoint device 1 1 1 terminates the conference session, etc.
[00139] Endpoint Device Operations:
[00140] Figure 5 is a flow chart illustrating operations of an endpoint device 1 11 according to some embodiments. At block 501 , an endpoint device 1 1 1 may initiate and/or join a conferencing session and/or peer to peer call session supported by conference node 1 15. At block 502, processor 131 may provide information regarding a media content stream or streams that will be provided by endpoint device 1 1 1 (e.g., responsive to input from video camera 125 and/or microphone 127) during the conference session. More particularly, processor 131 may provide sufficient information to allow conference node 1 15 to generate identification information for the respective media content stream or streams to be provided by endpoint device 1 1 1. Moreover, operations of block 502 may be performed when endpoint device 1 1 1 joins/initiates a conference session, and any time endpoint device 1 1 1 changes (e.g., adds or terminates) a media content stream that is provided during the conference session.
[00141] At block 503, processor 131 may receive a content media stream or streams provided by conference node 1 15. At block 504, processor 131 may provide/render a media content stream or streams provided from conference node 1 15. More particulary, processor 131 may provide/render the media stream/streams using display 121 and speaker 123. Upon initially joining/initiating the conference session, for example, conference node 115 may provide the media content stream or streams according to a default (e.g., based on audio volume).
[00142] At block 505, processor 131 may receive identification information (including media content stream IDs such as SSRCs) for each media content stream that is currently available for the conference session. This identification information corresponds to the identification information published by conference node 1 15 at block 409. This information may be updated by conference node 1 15 any time there is a change of media content streams for the conference session. Moreover, this identification information may be used by endpoint device 111 , for example, to modify a selection of a media content stream or streams using selection messages (e.g., "Include", "Exclude", "Substitute", "Reset", "Reset All", etc.) as discussed above.
[00143] If a selection message (e.g., "Include", "Exclude", "Substitute", "Reset", "Reset All", etc.) is entered (e.g., responsive to user input thorugh user input interface 129) at block 507, processor 131 may transmit the selection message at block 509. By way of example, a graphical user interface may be provided using a portion of display 121, and the graphical user interface may allow user selection of a content media stream or streams (based on identification information for the media content streams received at block 505) and selection message. Once a selection message has been sent at block 507, processor 131 may continue providing/rendering a media content stream received/provided from conference node at blocks 503 and/or 504 until the conference session is terminated and/or endpoint device 1 1 1 leaves the conference session at block 511.
[00144] Multiple Transport Channel Embodiments
[00145] According to some embodiments, an RTP mixer may be provided as a conference node 115 between a plurality of endpoint devices 1 11 participating in a communication session, and a plurality of transport channels may be provided between the RTP mixer and at least one of the endpoint devices. By way of example, endpoint device 11 1-1 may provide enough display capacity (e.g., multiple display screens and/or a sufficiently large display screen) and sufficiently powerful hardware to simultaneously decode/render/present two or more high definition (HD) media content streams in parallel, and multiple transport channels may be provided between the RTP mixer and endpoint device 1 1 1-1 to support multiple parallel media content data streams between the RTP mixer and endpoint device 1 1 1-1.
Endpoint device 1 1 1 -1, for example, may thus set up two media content stream transport channels to/from the RTP mixer (wherein the transport channels may or may not be full duplex channels). The RTP mixer can then send two media content streams to endpoint device 1 1 1-1 in parallel. Accordingly, endpoint device 1 1 1 -1 may separately apply selection messages (e.g., "Include", "Exclude", "Substitute", "Reset", and/or "Reset All" messages) to the different media content streams being received by the endpoint device 1 1 1-1. Accordingly, each selection message generated by endpoint device 1 1 1-1 may include an identifictation of the transport channel to which the selection message should apply allowing selective control of the different media content streams. Endpoint device 1 1 1 -1 may thus generate a selection message that is applied to one of the transport channels without affecting the other. The RTP mixer may use the identification of the transport channel in a selection message to separately control a media content stream(s) provided to endpoint device 1 1 1 -1 over the identified transport channel.
[00146] Further Definitions and Embodiments:
[00147] When an element is referred to as being "connected", "coupled", "responsive", or variants thereof to another element, it can be directly connected, coupled, or responsive to the other element or one or more intervening elements may be present. In contrast, when an element is referred to as being "directly connected", "directly coupled", "directly responsive", or variants thereof to another element, there are no intervening elements present. Like numbers refer to like nodes/elements throughout.
Furthermore, "coupled", "connected", "responsive", or variants thereof as used herein may include wirelessly coupled, connected, or responsive. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Well-known functions or constructions may not be described in detail for brevity and/or clarity. The term "and/or", abbreviated "/", includes any and all combinations of one or more of the associated listed items. [00148] As used herein, the terms "comprise", "comprising", "comprises", "include", "including", "includes", "have", "has", "having", or variants thereof are open-ended, and include one or more stated features, integers, nodes, steps, components or functions but do not preclude the presence or addition of one or more other features, integers, nodes, steps, components, functions or groups thereof. Furthermore, as used herein, the common abbreviation "e.g.", which derives from the Latin phrase "exempli gratia," may be used to introduce or specify a general example or examples of a previously mentioned item, and is not intended to be limiting of such item. The common abbreviation "i.e.", which derives from the Latin phrase "id est," may be used to specify a particular item from a more general recitation.
[00149] Example embodiments are described herein with reference to block diagrams and/or flowchart illustrations of computer-implemented methods, apparatus (systems and/or devices) and/or computer program products. It is understood that a block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions that are performed by one or more computer circuits. These computer program instructions may be provided to a processor circuit of a general purpose computer circuit, special purpose computer circuit, and/or other programmable data processing circuit to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, transform and control transistors, values stored in memory locations, and other hardware components within such circuitry to implement the functions/acts specified in the block diagrams and/or flowchart block or blocks, and thereby create means
(functionality) and/or structure for implementing the functions/acts specified in the block diagrams and/or flowchart block(s).
[00150] These computer program instructions may also be stored in a tangible computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instructions which implement the functions/acts specified in the block diagrams and/or flowchart block or blocks.
[00151] A tangible, non-transitory computer-readable medium may include an electronic, magnetic, optical, electromagnetic, or semiconductor data storage system, apparatus, or device. More specific examples of the computer-readable medium would include the following: a portable computer diskette, a random access memory (RAM) circuit, a read-only memory (ROM) circuit, an erasable programmable read-only memory (EPROM or Flash memory) circuit, a portable compact disc read-only memory (CD-ROM), and a portable digital video disc read-only memory (DVD/BlueRay).
[00152] The computer program instructions may also be loaded onto a computer and/or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer and/or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.
Accordingly, embodiments of the present invention may be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.) that runs on a processor such as a digital signal processor, which may collectively be referred to as "circuitry," "a module" or variants thereof.
[00153] It should also be noted that in some alternate implementations, the functions/acts noted in the blocks may occur out of the order noted in the flowcharts. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Moreover, the functionality of a given block of the flowcharts and/or block diagrams may be separated into multiple blocks and/or the functionality of two or more blocks of the flowcharts and/or block diagrams may be at least partially integrated. Finally, other blocks may be added/inserted between the blocks that are illustrated:
Moreover, although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.
[00154] Many different embodiments have been disclosed herein, in connection with the above description and the drawings. It will be understood that it would be unduly repetitious and obfuscating to literally describe and illustrate every combination and subcombination of these embodiments.
Accordingly, the present specification, including the drawings, shall be construed to constitute a complete written description of various example combinations and subcombinations of embodiments and of the manner and process of making and using them, and shall support claims to any such combination or subcombination.
[00155] Other network elements, communication devices and/or methods according to
embodiments of the invention will be or become apparent to one with skill in the art upon review of the present drawings and description. It is intended that all such additional network elements, devices, and/or methods be included within this description, be within the scope of the present invention, and be protected by the accompanying claims. Moreover, it is intended that all embodiments disclosed herein can be implemented separately or combined in any way and/or combination.

Claims

That Which Is Claimed Is:
1. A method of operating an endpoint device (1 1 1) during a real time communication session including a plurality of real time media content streams provided by at least one other endpoint device, the method comprising:
receiving (503) a first one of the real time media content streams of the communication session from a remote communication node in accordance with a first selection criteria;
rendering (504) the first real time media content stream of the conference communication for display;
generating (507) a selection message identifying a second selection criteria to be used by the remote communications node to select at least one of the media content streams of the communication session for reception at the endpoint device;
transmitting (509) the selection message to the remote communications node; receiving (503) a second one of the real time media content streams of the communication session from the remote communication node in accordance with the second selection criteria; and
rendering (504) the second real time media content stream of the communication session for display.
2. The method according to Claim 1 further comprising:
receiving (505) identification information from the remote communication node wherein the identification information includes first identification information for the first real time media content stream and second identification information for the second real time media content stream;
wherein the selection message includes the first identification information and/or the second identification information for at least one of the first and/or the second real time media content streams.
3. The method according to Claim 2 wherein the selection message includes the second identification information identifying the second real time media content stream to request that the second real time media content stream be included in transmissions to the endpoint device (1 1 1).
4. The method according to Claim 2 wherein the selection message includes the first identification information for the first real time media content stream to request that the first real time media content stream be excluded from transmissions to the endpoint device (1 1 1).
5. The method according to Claim 2 wherein the selection message includes the first identification information for the first real time media content stream and the second identification information for the second real time media content stream to request that the second real time media content stream be substituted for the first real time media content stream in transmissions to the endpoint device (1 1 1).
6. An endpoint device (1 11) for media content communications, the endpoint device (11 1) comprising:
a network interface (133) configured to provide a data coupling over a network (101); and a processor (131) coupled to the network interface (133), wherein the processor (131) is configured to receive a first one of a plurality of real time media content streams of a real time communication session through the network interface (133) from a remote communication node (1 15) in accordance with a first selection criteria with the plurality of real time media content streams being provided by at least one other endpoint device, to render the first real time media content stream of the conference communication for display, to generate a selection message identifying a second selection criteria to be used by the remote communications node (1 15) to select at least one of the media content streams of the communication session for reception at the endpoint device (1 1 1), to transmit the selection message through the network interface (133) to the remote communications node (1 15), to receive a second one of the real time media content streams of the communication session from the remote communication node (1 15) in accordance with the second selection criteria, and to render the second real time media content stream of the communication session for display.
7. The endpoint device (1 1 1) according to Claim 6 wherein the processor (131) is further configured to receive identification information through the network interface (133) from the remote communication node (1 15) wherein the identification information includes first identification
information for the first real time media content stream and second identification information for the second real time media content stream, and wherein the selection message includes the first identification information and/or the second identification information for at least one of the first and/or the second real time media content streams.
8. The endpoint device (11 1) according to Claim 6 wherein the selection message includes the second identification information identifying the second real time media content stream to request that the second real time media content stream be included in transmissions to the endpoint device (1 1 1).
9. The endpoint device (11 1) according to Claim 6 wherein the selection message .includes the first identification information for the first real time media content stream to request that the first real time media content stream be excluded from transmissions to the endpoint device (1 1 1).
10. The endpoint device (1 11) according to Claim 6 wherein the selection message includes the first identification information for the first real time media content stream and the second
identification information for the second real time media content stream to request that the second real time media content stream be substituted for the first real time media content stream in transmissions to the endpoint device (11 1).
1 1. A method of operating a communication node (1 15) supporting a real time communication session between a plurality of remote endpoint devices (1 11) generating a plurality of real time media content streams, the method comprising:
receiving (402) the plurality of real time media content streams for the communication session at the communication node from the plurality of remote endpoint devices;
providing (403) a first one of the real time media content streams of the communication session to a first one of the endpoint devices in accordance with a first selection criteria for the first endpoint device;
receiving (41 1) a selection message from the first endpoint device to identify a second selection criteria for the first endpoint device; and
providing (416) a second one of the real time media content streams of the communication session to the first endpoint device in accordance with the second selection criteria.
12. The method according to Claim 1 1 further comprising:
generating (404) identification information for each of the plurality of real time media content streams; and
providing (409) the identification information for each of the plurality of real time media content streams to each of the endpoint devices;
wherein the selection message includes identification information for at least one of the first and/or the second real time media content streams.
13. The method according to Claim 12 wherein the selection message includes the second identification information identifying the second real time media content stream to be included in transmissions to the first endpoint device (1 1 1).
14. The method according to Claim 12 wherein the selection message includes the first identification information for the first real time media content stream to be excluded from transmissions to the first endpoint device (1 11).
15. The method according to Claim 12 wherein the selection message includes the first identification information for the first real time media content stream and the second identification information for the second real time media content stream to substitute the second real time media content stream for the first real time media content stream in transmissions to the first endpoint device (1 1 1).
16. A communication node (1 15) configured to support a real time communication session between a plurality of remote endpoint devices (1 1 1) generating a plurality of real time media content streams, the communication node (1 15) comprising:
a network interface (233) configured to provide a data coupling over a network (101); and a processor (231) coupled to the network interface (233), wherein the processor is configured to receive the plurality of real time media content streams for the communication session through the network interface (233) from the plurality of remote endpoint devices, to provide a first one of the real time media content streams of the communication session through the network interface (233) to a first one of the endpoint devices in accordance with a first selection criteria for the first endpoint device, to receive a selection message from the first endpoint device through the network interface (233) to identify a second selection criteria for the first endpoint device, and to provide a second one of the real time media content streams of the communication session to the first endpoint device in accordance with the second selection criteria.
17. The communication node (1 15) according to Claim 16 wherein the processor (231) is further configured to generate identification information for each of the plurality of real time media content streams, and to provide the identification information for each of the plurality of real time media content streams to each of the endpoint devices, wherein the selection message includes identification information for at least one of the first and/or the second real time media content streams.
18. The communication node (1 15) according to Claim 17 wherein the selection message includes the second identification information identifying the second real time media content stream to be included in transmissions to the first endpoint device (1 1 1).
19. The communication node (1 15) according to Claim 17 wherein the selection message includes the first identification information for the first real time media content stream to be excluded from transmissions to the first endpoint device (1 1 1).
20. The communication node (1 15) according to Claim 17 wherein the selection message includes the first identification information for the first real time media content stream and the second identification information for the second real time media content stream to substitute the second real time media content stream for the first real time media content stream in transmissions to the first endpoint device (1 1 1).
PCT/IB2012/000202 2011-10-21 2012-02-06 Communication methods providing media content stream selection and related system WO2013057547A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161550034P 2011-10-21 2011-10-21
US61/550,034 2011-10-21

Publications (1)

Publication Number Publication Date
WO2013057547A1 true WO2013057547A1 (en) 2013-04-25

Family

ID=45757744

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2012/000202 WO2013057547A1 (en) 2011-10-21 2012-02-06 Communication methods providing media content stream selection and related system

Country Status (1)

Country Link
WO (1) WO2013057547A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016069286A1 (en) * 2014-10-30 2016-05-06 Microsoft Technology Licensing, Llc Application level audio connection and streaming

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040008635A1 (en) * 2002-07-10 2004-01-15 Steve Nelson Multi-participant conference system with controllable content delivery using a client monitor back-channel
US20040230651A1 (en) * 2003-05-16 2004-11-18 Victor Ivashin Method and system for delivering produced content to passive participants of a videoconference
US20050080849A1 (en) * 2003-10-09 2005-04-14 Wee Susie J. Management system for rich media environments
WO2011043886A1 (en) * 2009-10-09 2011-04-14 Sony Ericsson Mobile Communications Ab Live media stream selection on a mobile device.

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040008635A1 (en) * 2002-07-10 2004-01-15 Steve Nelson Multi-participant conference system with controllable content delivery using a client monitor back-channel
US20040230651A1 (en) * 2003-05-16 2004-11-18 Victor Ivashin Method and system for delivering produced content to passive participants of a videoconference
US20050080849A1 (en) * 2003-10-09 2005-04-14 Wee Susie J. Management system for rich media environments
WO2011043886A1 (en) * 2009-10-09 2011-04-14 Sony Ericsson Mobile Communications Ab Live media stream selection on a mobile device.

Non-Patent Citations (14)

* Cited by examiner, † Cited by third party
Title
ANDREASEN, F.; B. FOSTER: "Media Gateway Control Protocol (MGCP) Version 1.0", RFC 3435, January 2003 (2003-01-01)
BRADNER, S.: "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997 (1997-03-01)
CAMARILLO, G.: "Session Description Protocol (SDP) Format for Binary Floor Control Protocol (BFCP) Streams", RFC 4583, November 2006 (2006-11-01)
CAMARILLO, G.; H. SCHULZRINNE: "The Session Description Protocol (SDP) Grouping Framework", RFC 5888, June 2010 (2010-06-01)
CAMARILLO, G.; OTT, J.; K. DRAGE: "The Binary Floor Control Protocol (BFCP", RFC 4582, November 2006 (2006-11-01)
CROCKER, ED. D.: "Augmented BNF for Syntax Specifications: ABNF", RFC 5234, January 2008 (2008-01-01)
GRONDAL B BURMAN M WESTERLUND ERICSSON AB D: "Media Stream Selection (MESS); draft-westerlund-dispatch-stream-selec tion-00.txt", MEDIA STREAM SELECTION (MESS); DRAFT-WESTERLUND-DISPATCH-STREAM-SELEC TION-00.TXT, INTERNET ENGINEERING TASK FORCE, IETF; STANDARDWORKINGDRAFT, INTERNET SOCIETY (ISOC) 4, RUE DES FALAISES CH- 1205 GENEVA, SWITZERLAND, 24 October 2011 (2011-10-24), pages 1 - 24, XP015078815 *
HANDLEY, M.; JACOBSON, V.; C. PERKINS: "SDP: Session Description Protocol", RFC 4566, July 2006 (2006-07-01)
LENNOX VIDYO H SCHULZRINNE COLUMBIA U J: "Mechanisms for Media Source Selection in the Session Description Protocol (SDP); draft-lennox-mmusic-sdp-source-selection-02.txt", MECHANISMS FOR MEDIA SOURCE SELECTION IN THE SESSION DESCRIPTION PROTOCOL (SDP); DRAFT-LENNOX-MMUSIC-SDP-SOURCE-SELECTION-02.TXT, INTERNET ENGINEERING TASK FORCE, IETF; STANDARDWORKINGDRAFT, INTERNET SOCIETY (ISOC) 4, RUE DES FALAISES CH- 1205 GENEVA, no. 2, 21 October 2010 (2010-10-21), pages 1 - 17, XP015072035 *
LENNOX, J.; OTT, J.; T. SCHIERL: "Source-Specific Media Attributes in the Session Description Protocol (SDP", RFC 5576, June 2009 (2009-06-01)
LEVIN, O.; G. CAMARILLO: "The Session Description Protocol (SDP) Label Attribute", RFC 4574, August 2006 (2006-08-01)
ROSENBERG, J.; SCHULZRINNE, H.; O. LEVIN: "A Session Initiation Protocol (SIP) Event Package for Conference State", RFC 4575, August 2006 (2006-08-01)
SCHULZRINNE, H.; CASNER, S.; FREDERICK, R.; V. JACOBSON: "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003 (2003-07-01)
WESTERLUND, M.; S. WENGER: "RTP Topologies", RFC 5117, January 2008 (2008-01-01)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016069286A1 (en) * 2014-10-30 2016-05-06 Microsoft Technology Licensing, Llc Application level audio connection and streaming

Similar Documents

Publication Publication Date Title
JP5363461B2 (en) Group call function inquiry
US9106716B2 (en) Method, apparatus, and system for cross-platform conference convergence
US10129300B2 (en) Communications methods, apparatus and systems for conserving media resource function resources
US20130282820A1 (en) Method and System for an Optimized Multimedia Communications System
JP2009521843A (en) Method for converting between unicast and multicast sessions
Elleuch Models for multimedia conference between browsers based on WebRTC
US20100085959A1 (en) System and method for achieving interoperability between endpoints operating under different protocols
US7953123B2 (en) Method and system for controlling the establishment of communications channels for allowing transmission of multimedia information
KR20110050439A (en) Method and system for selective call forwarding based on media attributes in telecommunication network
US10601880B2 (en) Conference reconstruction in SIP networks
EP2730073B1 (en) Media stream grouping in multimedia communication networks
CN101453349A (en) Method and system for processing real-time stream media protocol
US11418635B2 (en) Method of dynamic selection, by a caller, from a plurality of terminals of a callee
WO2012175227A1 (en) Methods and apparatus for identifying rtp media streams containing related media data
WO2013057547A1 (en) Communication methods providing media content stream selection and related system
US20090207988A1 (en) Method and system for telecommunication sessions using only initial signal messages
CN104205765A (en) HOLD announcement configuration
WO2014026316A1 (en) Media data transmission method and device
US10171518B2 (en) Performing an action on certain media streams in a multimedia communications network
JP6183881B2 (en) Codec conversion gateway, codec conversion method, and codec conversion program
EP4351102A1 (en) Call processing method, apparatus and system
US9143722B2 (en) Method and apparatus for providing session description for a media session
EP4351103A1 (en) Call processing method, apparatus, and system
CN101459572A (en) Method and apparatus for realizing related media stream in IP packet network
Burman et al. RFC 8853: Using Simulcast in Session Description Protocol (SDP) and RTP Sessions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12705908

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12705908

Country of ref document: EP

Kind code of ref document: A1