WO2004012426A1 - System and method for speakerphone operation in a communications device - Google Patents

System and method for speakerphone operation in a communications device Download PDF

Info

Publication number
WO2004012426A1
WO2004012426A1 PCT/US2003/023113 US0323113W WO2004012426A1 WO 2004012426 A1 WO2004012426 A1 WO 2004012426A1 US 0323113 W US0323113 W US 0323113W WO 2004012426 A1 WO2004012426 A1 WO 2004012426A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
path
inbound
outbound
signal
Prior art date
Application number
PCT/US2003/023113
Other languages
French (fr)
Inventor
Pratik Desai
Ali Behboodian
Chin Pan Wong
Original Assignee
Motorola, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola, Inc. filed Critical Motorola, Inc.
Priority to JP2004524756A priority Critical patent/JP2005534258A/en
Priority to GB0502502A priority patent/GB2407744B/en
Priority to AU2003256725A priority patent/AU2003256725A1/en
Publication of WO2004012426A1 publication Critical patent/WO2004012426A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/38Transceivers, i.e. devices in which transmitter and receiver form a structural unit and in which at least one part is used for functions of transmitting and receiving
    • H04B1/40Circuits
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B3/00Line transmission systems
    • H04B3/02Details
    • H04B3/20Reducing echo effects or singing; Opening or closing transmitting path; Conditioning for transmission in one direction or the other
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/60Substation equipment, e.g. for use by subscribers including speech amplifiers

Definitions

  • the invention relates to the field of communications, and more particularly to techniques for generating clearer and more reliable speakerphone operation in a cellular telephone or other communications device.
  • One solution to the speakerphone problem is to attempt to physically isolate the speaker and microphone from each other in the handset. For instance, one may place the speaker used for speakerphone operation in a rear-facing part of the handset so that less sound impinges directly on the microphone from the speaker. However, this placement makes the sound harder to hear for a user from whom the speaker faces away, and some amount of speaker energy will still leak through the cellular or other case to the microphone.
  • a communications device such as a cellular telephone handset or other device may incorporate dual voice activity detection circuits to simultaneously monitor the signal energy and other characteristics in both speaker and microphone paths, and award control to one or the other path based on dynamic thresholds or other adaptive or other criteria.
  • problems such as premature dropouts caused by greater than average background noise may be prevented by applying hangtime parameters which keep the speaker path open until a minimum interval has passed, before transferring control to the microphone path.
  • the criteria applied to trigger a change in control from speaker path to microphone path and vice versa may also be adapted in embodiments of the invention, including to eliminate a lower threshold below which the speaker path switches out and passes control to the microphone path, automatically.
  • FIG. 1 illustrates a two-way communications platform including speakerphone operation, according to an embodiment of the invention.
  • Figs. 2(A)-2(C) illustrate processing of inbound and outbound speech in different regards, according to an embodiment of the invention.
  • Fig. 3 illustrates a speakerphone control operation, according to an embodiment of the invention.
  • Figs. 4(A) and 4(B) illustrate processing of inbound and outbound speech in different regards, according to an embodiment of the invention.
  • Fig. 5 illustrates inbound and outbound speech envelopes, according to an embodiment of the invention.
  • Fig. 6 illustrates a dynamic inbound break-in threshold and other speech processing, according to an embodiment of the invention.
  • Fig. 7 illustrates inbound break-in instances using a dynamic break-in threshold and other speech processing, according to an embodiment of the invention.
  • Fig. 8 illustrates a speakerphone control operation, according to an embodiment of the invention.
  • Figs. 9(A) and 9(B) illustrate processing of inbound and outbound speech in different regards, according to an embodiment of the invention.
  • Figs. 10(A) and 10(B) illustrate outbound and inbound path control including an interposed hangtime, according to an embodiment of the invention.
  • Fig. 10(A) and 10(B) illustrate outbound and inbound path control including an interposed hangtime, according to an embodiment of the invention.
  • FIG. 11 illustrates a speakerphone control operation, according to an embodiment of the invention.
  • Figs. 12(A) and 12(B) illustrate processing of inbound and outbound speech in different regards, according to an embodiment of the invention.
  • Fig. 13 illustrates speaker path activation, according to conventional far-end processing during noisy conditions.
  • Figs. 14(A) and 14(B) illustrate speaker path activation during noisy conditions, according to an embodiment of the invention. DETAILED DESCRIPTION OF EMBODIMENTS
  • Fig. 1 illustrates an architecture of a communications device having a speakerphone capability according to an embodiment of the invention.
  • the device illustrated in Fig. 1 may be or include, for instance, a cellular telephone handset, a voice-enabled wired or wireless device such as a networked Voice over IP (VoIP) or ISDN telephone device, a two-way radio communications device, a modem or hybrid telephone/modem device, a wired or wireless telephone connected to the public switched telephone network (PSTN) via a speakerphone base, or other communications devices or platforms.
  • VoIP Voice over IP
  • PSTN public switched telephone network
  • the communications device may include a microphone path 128 which includes a microphone 102 or other acoustical or other input transducer, and a speaker path 130 which includes a speaker 120 or other acoustical or other output transducer.
  • a microphone path 128 which includes a microphone 102 or other acoustical or other input transducer
  • a speaker path 130 which includes a speaker 120 or other acoustical or other output transducer.
  • the microphone path 128 may from time to time be referred to as the inbound or near-end channel, and the speaker path 130 as the outbound or far-end channel, respectively.
  • the microphone 102 in the microphone path 128 may be connected to a microphone gain control 104, to boost or attenuate the output of microphone 102 as appropriate.
  • the output of the microphone gain control 104 may be communicated to an echo canceller 106 to remove a portion of any feedback, including echo, leaking from speaker 120 to microphone 102.
  • Echo canceller 106 may for example be implemented in hardware, software, firmware of a combination thereof. Echo canceller 106 may for instance be implemented instance using commercially available parts such as dedicated integrated circuits manufactured by O i Semiconductor or others, or using software modules such as echo canceller modules available for digital signal processors such as the DSP 56000 family manufactured by Motorola Corp., digital signal processors made by Texas Instruments Inc., or others.
  • the echo canceller 106 may incorporate or implement known echo cancellation algorithms, for instance algorithms related to or incorporated in International Telecommunications Union (ITU) standard G.l 65 or other cancellation algorithms or techniques.
  • the echo canceller 106 may reduce the echo or other feedback by as much as 35 dB or more, but may typically not eliminate the full degree of feedback present in the signal generated by the microphone 102.
  • the output of the echo canceller 106 may be communicated to a speech encoder 108, which compresses or otherwise processes speech input for purposes of wireless or other transmission.
  • the speech encoder 108 may be implemented using known speech compression or other algorithms, for instance algorithms related to or incorporated in ITU standards such as ITU G.711, G.723, G.726, G.729, or other protocols.
  • LD-CELP Low-Delay Code-Excited Linear Prediction
  • codec speech compression/decompression
  • the speech encoder 108 may likewise be implemented in hardware, software, firmware or a combination thereof, including using programmable digital signal processors or other components.
  • the modem transmit module 110 may prepare the encoded signal for wireless or other transmission via an antenna or other air or other interface, for instance generating wireless transmission in the 800/900 MHz, 1.9GHz or other cellular, PCS or other frequency spectra for voice or other communications.
  • a modem receiver module 126 may likewise be coupled to a cellular antenna or other source of radio frequency (RF) or other wireless or other energy to capture, downconvert and/or demodulate wireless carrier signals.
  • the modem receive module 126 may communicate the demodulated received signal to a speech decoder 124.
  • the speech decoder 124 may in general perform the reverse type of operation from the speech encoder 108, for example to decompress far-end speech from a remote user of another cellular handset or other device.
  • the output of speech decoder 124 may be communicated to the speaker gain control 122, providing amplification or attenuation of the decoded speech for driving the speaker 120, such as the earpiece speaker in a cellular handset or other transducer.
  • the output of the speech decoder 124 may also be communicated to the echo canceller 106 to perform echo detection and cancellation processing.
  • the microphone path 128 and the speaker path 130 may each be coupled to further circuitry to monitor and manage the speakerphone operation of the communications device. More specifically, the output of the echo canceller 106 may also be communicated to an inbound voice activity detector (VAD) 114. The output of the speech decoder 124 may similarly be communicated to an outbound voice activity detector (VAD) 118. Each of inbound VAD 114 and outbound VAD 118 may also be implemented using hardware, software, firmware of a combination thereof. The inbound VAD 114 and outbound VAD 118 may, for instance, each be implemented using a microprocessor, a digital signal processor or other processors.
  • VAD voice activity detector
  • VAD outbound voice activity detector
  • the VAD 114 and VAD 118 may each generate a speech energy envelope, speech sample, voice- present or other types of speech detection signals or functions used to identify the presence of speech information, as opposed to background or other types of noise.
  • Inbound VAD 114 and outbound VAD 118 may for instance be programmed to perform speech detection algorithms, such as those related to or incorporated in ITU standards or others, for instance according or related to the ITU G.711, G.723, G.726, G.729 or other standards.
  • the inbound VAD 114 and outbound VAD 118 may also be coupled together, to permit direct communication therebetween.
  • the output of each of the inbound VAD 114 and the outbound VAD 118 may in turn be communicated to a duplex arbiter 116.
  • Duplex arbiter 116 may also be implemented using hardware such as a microprocessor or digital signal processor, in software, firmware or a combination thereof to perform supervisory tasks to arbitrate and manage the activation of the microphone path 128, speaker path 130 and other resources to enhance speakerphone and other operation.
  • the duplex arbiter 116 may, for instance, determine instances in time when the inbound (near-end, or handheld user of the communications device) speech energy is significant while the outbound (far-end, or remote user) speech energy is negligible so that the duplex arbiter 116 may activate the microphone path 128 to capture that local speech, while deactivating or muting the speaker path 130 since the far-end user is interpreted as not speaking or communicating.
  • the duplex arbiter 116 may activate the speaker path 130 while deactivating the microphone path 128, so that the far-end user's speech may be heard over the speaker 120.
  • the duplex arbiter 116 may apply selective criteria to decide which path to activate. As illustrated for instance in Figs. 2(A) - 2(C), intervals may occur when both the inbound VAD 114 (Fig. 2(B)) and outbound VAD 118 (Fig. 2(A)) have detected speech energy greater than their respective detection thresholds, and present duplex arbiter 116 with a speech-detected signal, illustrated as a gate function.
  • the duplex arbiter 116 may choose to activate one or the other path. As illustrated in that figure, in embodiments the duplex arbiter 116 may switch control to the microphone path 128 (inbound channel) when speech is recognized at the microphone 102, even when the absolute value of the energy presented by the presumed speech signal is less than the output of the outbound VAD 118. This decision criteria may be applied because the energy of the speech content in the microphone path 128 may typically be significantly less than that of the speaker path 130, even when a user is speaking with a normal voice close to the microphone 102, which intensity only decreases when the cellular handset or other device is placed farther away from the user.
  • the duplex arbiter 116 may also communicate with a comfort noise generation and substitution module 112, likewise capable of being implemented in hardware, software or firmware or a combination thereof.
  • the comfort noise generation and substitution module 112 may in turn also communicate with the microphone gain control 104 and the speaker gain control 122, to output white noise or other comparatively pleasant or innocuous sounds during path transitions, dead spots such as when both the microphone path 128 and speaker path 130 may be muted, or at other times.
  • the duplex arbiter 116 may award control to the microphone path 128 or the speaker path 130 under different fixed or dynamic criteria used for decision processing.
  • a threshold used to award control to the microphone path 128 may be dynamically computed based on the energy being produced by speech encoder and other parameters.
  • processing may begin.
  • microphone samples from the microphone 102 and speaker samples from the speaker 120 may be communicated to the echo canceller 106.
  • the speech encoder 108 may process the output of echo canceller 106.
  • a break-in threshold referred to as "ib_break_in_thresh” and used for deciding to award control to the microphone path 128 while muting the speaker path 130, may be dynamically computed based on the outbound speech (or speaker) energy for the present discrete speech frame (n) and speech encoder parameters. In embodiments, that calculation may be or include the following computations:
  • the output of the speech encoder 108 may also be communicated to an inbound speech envelope generator 132, which may in embodiments be integrated with or interface to inbound VAD 114.
  • Inbound speech envelope generator 132 may generate a moving envelope representing speech energy, such as a moving average or other representation of speech energy of the signal in the microphone path 128.
  • Outbound speech envelope generator 134 which also may be integrated with or interface to outbound VAD 118, may similarly generate an envelope output based on the signal in the speaker path 130.
  • the resulting speech envelope may be compared to the current inbound break-in threshold (ib_break_m_thresh). If the envelope of the inbound
  • Figs. 4(A) and 4(B) illustrate speaker samples and echo-cancelled microphone samples, respectively, generated according to the embodiment illustrated in Fig. 3.
  • Fig. 5 depicts an illustrative speech envelope for the inbound and outbound signals generated according to that embodiment. As illustrated in that figure, at certain times the inbound signal may exceed the outbound signal, while at other times the outbound signal may be greater than the inbound signal.
  • Fig. 6 illustrates an overlay of the outbound (speaker path 130) speech energy on an illustrative inbound dynamic break-in threshold, with a fixed inbound break-in threshold also shown for comparison.
  • the inbound break- in threshold may be made a dynamic function of the parameters of Algorithm 1 or otherwise, resulting in a time-varying threshold which tracks, at least in part, the outbound speech energy with which the inbound speech is in competition.
  • the inbound break-in threshold rises to a relatively higher plateau, forcing near-end speech at the microphone 102 to be greater in intensity to capture the channel.
  • the inbound break-in threshold may be relaxed in intervals during which the outbound speech energy decreases, so that comparatively softer near-end speech may activate the microphone path 128, unlike the fixed threshold approach.
  • Fig. 7 illustrates the inbound speech envelope, inbound break-in dynamic threshold and inbound break-in instances generated according to the embodiment shown in Fig. 3. As illustrated in that figure, the inbound break-in instances may consequently occur in those periods of time where a relatively quiet outbound channel has driven the inbound break-in threshold to a lower level, enabling the microphone path 128 to appropriately seize the channel even with less energetic speech.
  • the inbound break-in instances may consequently occur in those periods of time where a relatively quiet outbound channel has driven the inbound break-in threshold to a lower level, enabling the microphone path 128 to appropriately seize the channel even with less energetic speech.
  • encoded speech is choppy or contains large swings in amplitude or other artifacts, in cases those inputs may cause rapid switching between microphone path 128 and speaker path 130, or other "race" or other undesirable conditions.
  • Fig. 7 illustrates the inbound speech envelope, inbound break-in dynamic threshold and inbound break-in instances generated according to the embodiment shown in Fig. 3.
  • the duplex arbiter 116 and other cooperating components may insert a delay interval or hangtime before permitting a transition of control from the microphone path 128 to the speaker path 130, and vice versa.
  • the introduction of a hangtime may serve to prevent such race conditions when one or both of the near-end and far-end speech contains rapidly varying amplitudes.
  • step 802 processing may begin.
  • near-end samples from the microphone 102 may be processed by the speech encoder 108.
  • outbound speech from the far-end user may be processed by speech decoder 124.
  • the echo canceller 106 may receive the outputs of the speech encoder 108 and the speech decoder 124 to suppress echo and other feedback artifacts.
  • the echo-cancelled inbound speech and the decoded outbound speech may be communicated to inbound speech envelope generator 132 and outbound speech envelope generator " 134, respectively, to generate speech energy envelopes or other functions.
  • an inbound break-in threshold ib_break_in_threshold
  • outbound break-in threshold ob_break_m_threshold
  • ib_break_in_threshold at least one of an inbound hangtime (ib_hang_time) and an outbound hangtime (ob_hang_time) may be decremented, or set to initial values if the communications device is in an initialization mode such as in a startup or reset operation.
  • a determination may be made whether the speaker path 130 is activated. If the speaker path 130 is not activated, processing may proceed to step 818 where a determination may be made whether the microphone path 128 is activated.
  • step 822 where the microphone path 128 may be activated or unmuted, while the speaker path 130 may be deactivated or muted.
  • control may proceed to step 840 where processing for the current frame may end, following which processing may repeat, proceed to other tasks or end.
  • step 818 If the determination at step 818 is that the microphone path 128 is on, processing may proceed to step 820 where a determination may be made the outbound speech envelope (ob_env) may be greater than the outbound break-in threshold (ob_break_in_threshold). If the outbound speech envelope (ob_env) is greater than the outbound break-in threshold (ob_break_in_threshold), processing may proceed to step 824 where a determination may be made whether the inbound hangtime (ib_hang_time) has expired. If the inbound hangtime (ib_hang_time) has not expired, processing may proceed to step 822 where again the microphone path 128 may be activated or unmuted, while the speaker path 130 may be deactivated or muted.
  • step 824 processing may proceed to step 826 where an outbound hangtime (ob_hangtime) may be set to begin a hangtime period for the speaker path 130.
  • the outbound hangtime (ob_hangtime) may for instance be set to a fixed amount of time, such as 4 seconds or another value according to implementation.
  • the outbound hangtime may be computed or set on a dynamic basis, for instance as a function of prior inbound or outbound hangtimes, detected speech energy in the inbound or outbound paths or other variables.
  • step 828 the microphone path 128 may be deactivated or muted, while the speaker path 130 may be activated or unmuted, after which control may proceed to step 840 where processing for the current frame of time may end, following which processing may repeat, proceed to other tasks or end.
  • step 820 If at step 820 the outbound speech envelope (ob_env) is determined to not exceed the outbound break-in threshold (ob_break_in_threshold), processing may proceed to step 822 where again the microphone path 128 may be activated or unmuted, while the speaker path 130 may be deactivated or muted. Control may then also proceed to step 840 where processing for the current frame of time may end, following which processing may repeat, proceed to other tasks or end.
  • the outbound speech envelope ob_env
  • ob_break_in_threshold the outbound break-in threshold
  • step 816 If at step 816 a determination is made that the speaker path 130 is on, processing may proceed to step 830 in which a determination may be made whether the inbound envelope (ib_envelope) exceeds the inbound break-in threshold (ib_break_in_threshold). If the inbound envelope (ib_envelope) does not exceed the inbound break-in threshold (ib_break_in_threshold), processing may proceed to step 832 where the speaker path 130 may be activated or unmuted while the microphone path 128 may be deactivated or muted. Following that step, control may then proceed to step 840 where processing for the current frame of time may end, following which processing may repeat, proceed to other tasks or end.
  • step 830 If at step 830 a determination is made that the inbound envelope (ib_envelope) exceeds the inbound break-in threshold (ib_break_in_threshold), processing may proceed to step 834 where a determination may be made whether the outbound hangtime (objiangtime) has expired. If the outbound hangtime (ob_hangtime) has not expired, processing may likewise proceed to step 832 where the speaker path 130 may be activated or unmuted while the microphone path 128 may be deactivated or muted.
  • the outbound hangtime objiangtime
  • step 834 If at step 834 a determination is made that the outbound hangtime
  • processing may proceed to step 836 where the inbound hangtime may be set to a fixed amount of time, such as 4 seconds or another value according to implementation.
  • the inbound hangtime may be computed or set on a dynamic basis, for instance as a function of prior inbound or outbound hangtimes, detected speech energy in the inbound or outbound paths or other variables.
  • Processing may then proceed to step 838, where the speaker path 130 may be deactivated or muted while the microphone path 128 may be activated or unmuted.
  • control may then proceed to step 840 where processing for the current frame of time may end, following which processing may repeat, proceed to other tasks or end.
  • the awarding of control to the microphone path 128 or the speaker path 130 may therefore depend on more than one criterion. Those criteria may include the exceeding of speech envelope thresholds but also interposing a hangtime during which the currently active path may retain control, regardless of the activity in the other path.
  • the inbound and outbound hangtimes may in embodiments be fixed or dynamic, and may be incremented or decremented depending on conditions. For instance, during periods of increasing noise or other parameters, either or both of the hangtimes may be incremented, or during periods of decreasing noise or other parameters, either or both of the hangtimes may be decremented. Greater continuity in speech or other interaction may therefore be achieved. [00051] Fig.
  • FIG. 9(A) illustrates speech samples from speaker 120 and Fig. 9(B) illustrates speech samples from microphone 102 which may be processed in one regard according to the embodiment illustrated in Fig. 8.
  • Fig. 10(A) illustrates the resulting outbound speech envelope (ob_env) along with the outbound break-in threshold (ob_break_m_thershold).
  • Fig. 10(A) also illustrates the application of an outbound hangtime
  • Fig. 10(B) illustrates the inbound speech envelope (ib_env) along with the inbound break-in threshold (ib_break_in_thershold).
  • Fig. 10(B) also illustrates the application of an inbound hangtime (ib_hangtime) interval during which the microphone path 128 may retain control and continue to be activated, despite the presence of energetic speech in the speaker path 130. The introduction of these delay intervals may increase the sense of continuity for the near-end and far-end users during speakerphone operation.
  • processing may begin in step 1102.
  • near-end samples from the microphone 102 may be processed by the speech encoder 108.
  • outbound speech from the far-end user may be processed by speech decoder 124.
  • the echo canceller 106 may receive the outputs of the speech encoder 108 and the speech decoder 124 to suppress echo and other feedback artifacts.
  • the echo-cancelled inbound speech and the decoded outbound speech may be communicated to inbound speech envelope generator 132 and outbound speech envelope generator 134, respectively, to generate speech energy envelopes or other functions.
  • an inbound on threshold (ib_on_threshold) and outbound on threshold (ob_on_threshold) may be generated, for instance similarly to the embodiment illustrated in Fig. 3 or otherwise.
  • the duplex arbiter 1116 may apply control logic to lock to the microphone path 128 or the speaker path 130, according to the current speech envelopes of the paths.
  • step 1116 a determination may be made whether the outbound envelope
  • step 1118 a determination may be made whether the inbound envelope (ib_env) exceeds the inbound on threshold (ib_on_threshold). If the inbound envelope (ib_env) exceeds the inbound on threshold, processing may proceed to step 1120 where a determination may be made whether the speaker path 130 is locked, that is, currently has control of the communications channel, such as a wireless cellular or other connection. If the speaker path 130 is locked, the state of the microphone path 128 and speaker path 130 may remain unchanged from the start of processing at step 1102 and control may proceed to step 1128 where processing for the current frame may end, following which processing may repeat, proceed to other tasks or end.
  • step 1120 If the determination at step 1120 is that the speaker path 130 is not locked, processing may proceed to step 1122 where the speaker path 130 may be deactivated or muted, while the microphone path 128 may be activated or unmuted. Processing then may likewise proceed to step 1128 to repeat, proceed to other tasks or end.
  • step 1118 If the determination at step 1118 is that the inbound envelope (ib_env) does not exceed the inbound on threshold (ib_on_threshold), processing may proceed to step 1128 to repeat, proceed to other tasks or end.
  • step 1116 If the determination at step 1116 is that the outbound envelope (ob_env) exceeds the outbound on threshold (ob_on_threshold), processing may proceed to step 1124 where a determination may be made whether the microphone path 128 is locked. If the microphone path 128 is not locked, control may proceed to step 1126 where the speaker path 130 may be activated or unmuted while the microphone path 128 may be deactivated or muted. Processing then may proceed to step 1128 to repeat, proceed to other tasks or end. Likewise, if the determination at step 1124 is that the microphone path 128 is locked, the state of the microphone path 128 and speaker path 130 may remain unchanged from the start of processing at step 1102, and control may proceed to step 1128 to repeat, proceed to other tasks or end.
  • Fig. 12(A) illustrates samples from speaker 120 containing fricatives and other noise components
  • Fig. 12(B) illustrates samples from microphone 102 at the same time which may together be processed for instance according to the embodiment illustrated in Fig. 11.
  • Fig. 13 illustrates speakerphone control which might occur when operating upon such signals without the benefit of the invention, including rapid switching of the speaker path 130 between on and off states, due to the fricative and other noise artifacts.
  • Fig. 14(A) illustrates the resulting speakerphone operation according to the embodiment of the invention illustrated in Fig. 11, in which the speaker path 130 maintains control of the channel even during relatively noisy background periods, in part because the outbound off threshold is eliminated, allowing the speaker path 130 to remain active. Instead of choppy or punctuated switching, the speaker path remains activated until the microphone path 128 appropriately seizes control of the channel due to energetic speech exceeding the inbound on threshold, as illustrated in Fig. 14(B). Smoother more continuous conversation results.
  • the communications device in which the invention may operate may be or include a cellular telephone, but could consist of other communications platforms such as wired or wireless telephones, two-way radios, base stations for wireless telephones, network-enabled wireless communications devices such as 802.11a, 802.11b, 802. llg or other short or long- range telephony or other units, or other equipment as well.
  • the intelligence may be embedded or shared in an attachment coupled to the communications device.
  • the intelligence may be embedded or shared in a detachable battery, a headphone device, a tabletop or other fixed or non-wearable speakerphone unit, or in other accessories or parts.
  • the intelligence may enable a speakerphone operation through a car audio system coupled to a cellular telephone.
  • the intelligence embedded in the add-on device may communicate with the electronics of the communications device through interfaces such as a serial port such as an RS-232, a universal serial bus (USB) or a universal asynchronous receiver/transmitter (UART) connection, an infrared data (IrDA) port, a radio frequency link, or other serial, parallel or other data ports or other connections.
  • a serial port such as an RS-232, a universal serial bus (USB) or a universal asynchronous receiver/transmitter (UART) connection
  • IrDA infrared data
  • radio frequency link or other serial, parallel or other data ports or other connections.

Abstract

The invention provides a cellular telephone or other communications device with intelligence to manage speakerphone operation to more nearly approximate normal conversation, even when using a one-way only transmission mode. The microphone path (128) and speaker path (130) may be continuously monitored using dual voice activity detectors (114, 118) to assess the energy and other characteristics of each channel, and switch between one or the other depending on dynamic criteria. In noisy environments, a hangtime may be applied before permitting switching to avoid premature dropouts. Other criteria used to trigger the seizure of the channel may be adjusted, such as to eliminate a lower threshold below which the speaker path (130) switches out automatically.

Description

SYSTEM AND METHOD FOR SPEAKERPHONE OPERATION IN A COMMUNICATIONS DEVICE
FEELD OF THE INVENTION
[0001] The invention relates to the field of communications, and more particularly to techniques for generating clearer and more reliable speakerphone operation in a cellular telephone or other communications device.
BACKGROUND OF THE INVENTION
[0002] Convenient and effective speakerphone operation has become a desirable feature in cellular handsets and other communications devices. Communities concerned with traffic safety have in some instances banned the handheld operation of cellular phones while driving. Handsets and other devices equipped with a speakerphone feature permit users to place the device in a resting position in a car or other location while still carrying out normal conversations and other telephone access.
[0003] However, equipping a cellular telephone with an effective speakerphone capability is not a trivial integration task. One practical difficulty is that many cellular telephones are small devices which contain both an earpiece speaker and integrated microphone within a few inches of each other, to make the unit more compact. Therefore, duplex-type operation where both the speaker path and microphone path are active at the same time may generate unwanted feedback, since the output of the speaker leaks into the microphone via air and case vibration. This feedback problem only gets worse as speaker volumes are increased, such as they might be in a noisy car or room. [0004] Echo canceling circuits are known which can be connected to the microphone path on a cellular phone or other device, and remove a portion of the feedback energy emanating from the speaker. Unfortunately, echo canceling circuits are currently only capable of about 35 dB of cancellation, and the energy from the speaker may be more than 35 dB greater than the energy delivered by the embedded microphone so that echo and feedback still occur, even when echo cancellation circuits are included.
[0005] One solution to the speakerphone problem is to attempt to physically isolate the speaker and microphone from each other in the handset. For instance, one may place the speaker used for speakerphone operation in a rear-facing part of the handset so that less sound impinges directly on the microphone from the speaker. However, this placement makes the sound harder to hear for a user from whom the speaker faces away, and some amount of speaker energy will still leak through the cellular or other case to the microphone.
[0006] Another solution to feedback is to prevent the speaker path and microphone path from operating at the same time. This simplex-type of operation makes direct feedback impossible but results in one-way communication only, which requires users at both ends to signal the end of their speech, and wait for a response. More effective and natural speakerphone operation is desirable. Other problems exist.
SUMMARY OF THE INVENTION
[0007] The invention overcoming these and other problems in the art relates in one regard to a system and method for speakerphone operation in a communications device, in which built-in intelligence simultaneously manages both the speaker path and the microphone path of the device to reduce unwanted echo and feedback while still preserving a perceived quality of conversational speech. In an embodiment of the invention, a communications device such as a cellular telephone handset or other device may incorporate dual voice activity detection circuits to simultaneously monitor the signal energy and other characteristics in both speaker and microphone paths, and award control to one or the other path based on dynamic thresholds or other adaptive or other criteria. In other embodiments, problems such as premature dropouts caused by greater than average background noise may be prevented by applying hangtime parameters which keep the speaker path open until a minimum interval has passed, before transferring control to the microphone path. The criteria applied to trigger a change in control from speaker path to microphone path and vice versa may also be adapted in embodiments of the invention, including to eliminate a lower threshold below which the speaker path switches out and passes control to the microphone path, automatically.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] The invention will be described with reference to the accompanying drawings, in which like elements are referenced with like" numbers, and in which: [0009] Fig. 1 illustrates a two-way communications platform including speakerphone operation, according to an embodiment of the invention. [00010] Figs. 2(A)-2(C) illustrate processing of inbound and outbound speech in different regards, according to an embodiment of the invention. [00011] Fig. 3 illustrates a speakerphone control operation, according to an embodiment of the invention. [00012] Figs. 4(A) and 4(B) illustrate processing of inbound and outbound speech in different regards, according to an embodiment of the invention. [00013] Fig. 5 illustrates inbound and outbound speech envelopes, according to an embodiment of the invention. [00014] Fig. 6 illustrates a dynamic inbound break-in threshold and other speech processing, according to an embodiment of the invention. [00015] Fig. 7 illustrates inbound break-in instances using a dynamic break-in threshold and other speech processing, according to an embodiment of the invention. [00016] Fig. 8 illustrates a speakerphone control operation, according to an embodiment of the invention. [00017] Figs. 9(A) and 9(B) illustrate processing of inbound and outbound speech in different regards, according to an embodiment of the invention. [00018] Figs. 10(A) and 10(B) illustrate outbound and inbound path control including an interposed hangtime, according to an embodiment of the invention. [00019] Fig. 11 illustrates a speakerphone control operation, according to an embodiment of the invention. [00020] Figs. 12(A) and 12(B) illustrate processing of inbound and outbound speech in different regards, according to an embodiment of the invention. [00021] Fig. 13 illustrates speaker path activation, according to conventional far-end processing during noisy conditions. [00022] Figs. 14(A) and 14(B) illustrate speaker path activation during noisy conditions, according to an embodiment of the invention. DETAILED DESCRIPTION OF EMBODIMENTS
[00023] Fig. 1 illustrates an architecture of a communications device having a speakerphone capability according to an embodiment of the invention. The device illustrated in Fig. 1 may be or include, for instance, a cellular telephone handset, a voice-enabled wired or wireless device such as a networked Voice over IP (VoIP) or ISDN telephone device, a two-way radio communications device, a modem or hybrid telephone/modem device, a wired or wireless telephone connected to the public switched telephone network (PSTN) via a speakerphone base, or other communications devices or platforms. In general, according to the illustrated architecture the communications device may include a microphone path 128 which includes a microphone 102 or other acoustical or other input transducer, and a speaker path 130 which includes a speaker 120 or other acoustical or other output transducer. In embodiments, in general only one of the microphone path 128 and the speaker path 130 may be activated at the same time, to avoid feedback between the two transducers. Other modes are possible in other embodiments. The microphone path 128 may from time to time be referred to as the inbound or near-end channel, and the speaker path 130 as the outbound or far-end channel, respectively.
[00024] The microphone 102 in the microphone path 128 may be connected to a microphone gain control 104, to boost or attenuate the output of microphone 102 as appropriate. The output of the microphone gain control 104 may be communicated to an echo canceller 106 to remove a portion of any feedback, including echo, leaking from speaker 120 to microphone 102. Echo canceller 106 may for example be implemented in hardware, software, firmware of a combination thereof. Echo canceller 106 may for instance be implemented instance using commercially available parts such as dedicated integrated circuits manufactured by O i Semiconductor or others, or using software modules such as echo canceller modules available for digital signal processors such as the DSP 56000 family manufactured by Motorola Corp., digital signal processors made by Texas Instruments Inc., or others. In embodiments, the echo canceller 106 may incorporate or implement known echo cancellation algorithms, for instance algorithms related to or incorporated in International Telecommunications Union (ITU) standard G.l 65 or other cancellation algorithms or techniques. In embodiments, the echo canceller 106 may reduce the echo or other feedback by as much as 35 dB or more, but may typically not eliminate the full degree of feedback present in the signal generated by the microphone 102. The output of the echo canceller 106 may be communicated to a speech encoder 108, which compresses or otherwise processes speech input for purposes of wireless or other transmission. The speech encoder 108 may be implemented using known speech compression or other algorithms, for instance algorithms related to or incorporated in ITU standards such as ITU G.711, G.723, G.726, G.729, or other protocols. Those standards or protocols may incorporate or implement for example the Low-Delay Code-Excited Linear Prediction (LD-CELP) speech coding algorithm, which encodes 2.5 ms frames of digitized, telephone bandwidth speech or audio signals sampled at 8 KHz, or other digitizing or other techniques. Other speech compression/decompression (codec) algorithms, software or standards may be used. The speech encoder 108 may likewise be implemented in hardware, software, firmware or a combination thereof, including using programmable digital signal processors or other components. [00026] After a user's speech input is encoded by the speech encoder 108, the encoded speech may be communicated to the modem transmit module 110. The modem transmit module 110 may prepare the encoded signal for wireless or other transmission via an antenna or other air or other interface, for instance generating wireless transmission in the 800/900 MHz, 1.9GHz or other cellular, PCS or other frequency spectra for voice or other communications.
[00027] On the receiver side, a modem receiver module 126 may likewise be coupled to a cellular antenna or other source of radio frequency (RF) or other wireless or other energy to capture, downconvert and/or demodulate wireless carrier signals. The modem receive module 126 may communicate the demodulated received signal to a speech decoder 124. The speech decoder 124 may in general perform the reverse type of operation from the speech encoder 108, for example to decompress far-end speech from a remote user of another cellular handset or other device. The output of speech decoder 124 may be communicated to the speaker gain control 122, providing amplification or attenuation of the decoded speech for driving the speaker 120, such as the earpiece speaker in a cellular handset or other transducer. The output of the speech decoder 124 may also be communicated to the echo canceller 106 to perform echo detection and cancellation processing.
[00028] In embodiments of the invention such as that illustrated in Fig. 1, the microphone path 128 and the speaker path 130 may each be coupled to further circuitry to monitor and manage the speakerphone operation of the communications device. More specifically, the output of the echo canceller 106 may also be communicated to an inbound voice activity detector (VAD) 114. The output of the speech decoder 124 may similarly be communicated to an outbound voice activity detector (VAD) 118. Each of inbound VAD 114 and outbound VAD 118 may also be implemented using hardware, software, firmware of a combination thereof. The inbound VAD 114 and outbound VAD 118 may, for instance, each be implemented using a microprocessor, a digital signal processor or other processors. The VAD 114 and VAD 118 may each generate a speech energy envelope, speech sample, voice- present or other types of speech detection signals or functions used to identify the presence of speech information, as opposed to background or other types of noise. Inbound VAD 114 and outbound VAD 118 may for instance be programmed to perform speech detection algorithms, such as those related to or incorporated in ITU standards or others, for instance according or related to the ITU G.711, G.723, G.726, G.729 or other standards. The inbound VAD 114 and outbound VAD 118 may also be coupled together, to permit direct communication therebetween. The output of each of the inbound VAD 114 and the outbound VAD 118 may in turn be communicated to a duplex arbiter 116. Duplex arbiter 116 may also be implemented using hardware such as a microprocessor or digital signal processor, in software, firmware or a combination thereof to perform supervisory tasks to arbitrate and manage the activation of the microphone path 128, speaker path 130 and other resources to enhance speakerphone and other operation. The duplex arbiter 116 may, for instance, determine instances in time when the inbound (near-end, or handheld user of the communications device) speech energy is significant while the outbound (far-end, or remote user) speech energy is negligible so that the duplex arbiter 116 may activate the microphone path 128 to capture that local speech, while deactivating or muting the speaker path 130 since the far-end user is interpreted as not speaking or communicating. [00030] Conversely, in instances when the inbound speech energy detected by the inbound VAD 114 is negligible while the outbound speech energy detected by the outbound VAD 118 is significant, the duplex arbiter 116 may activate the speaker path 130 while deactivating the microphone path 128, so that the far-end user's speech may be heard over the speaker 120.
[00031] On the other hand, during those intervals of time in which both the inbound
VAD 114 and outbound VAD 118 detect significant speech energy in their respective paths, the duplex arbiter 116 may apply selective criteria to decide which path to activate. As illustrated for instance in Figs. 2(A) - 2(C), intervals may occur when both the inbound VAD 114 (Fig. 2(B)) and outbound VAD 118 (Fig. 2(A)) have detected speech energy greater than their respective detection thresholds, and present duplex arbiter 116 with a speech-detected signal, illustrated as a gate function.
[00032] As illustrated in Fig. 2(C), when both VAD signals are active, the duplex arbiter 116 may choose to activate one or the other path. As illustrated in that figure, in embodiments the duplex arbiter 116 may switch control to the microphone path 128 (inbound channel) when speech is recognized at the microphone 102, even when the absolute value of the energy presented by the presumed speech signal is less than the output of the outbound VAD 118. This decision criteria may be applied because the energy of the speech content in the microphone path 128 may typically be significantly less than that of the speaker path 130, even when a user is speaking with a normal voice close to the microphone 102, which intensity only decreases when the cellular handset or other device is placed farther away from the user.
[00033] Operation of this type may permit seamless transitions between the near-end and far-end user's speech in conversation, and prevent artifacts such as channel lockouts. In embodiments, as illustrated the duplex arbiter 116 may also communicate with a comfort noise generation and substitution module 112, likewise capable of being implemented in hardware, software or firmware or a combination thereof. The comfort noise generation and substitution module 112 may in turn also communicate with the microphone gain control 104 and the speaker gain control 122, to output white noise or other comparatively pleasant or innocuous sounds during path transitions, dead spots such as when both the microphone path 128 and speaker path 130 may be muted, or at other times. In other embodiments or under other conditions, the duplex arbiter 116 may award control to the microphone path 128 or the speaker path 130 under different fixed or dynamic criteria used for decision processing. In an embodiment illustrated in Fig. 3, for example, a threshold used to award control to the microphone path 128 may be dynamically computed based on the energy being produced by speech encoder and other parameters. In step 302, processing may begin. In step 304, microphone samples from the microphone 102 and speaker samples from the speaker 120 may be communicated to the echo canceller 106. In step 306, the speech encoder 108 may process the output of echo canceller 106. In step 308, a break-in threshold, referred to as "ib_break_in_thresh" and used for deciding to award control to the microphone path 128 while muting the speaker path 130, may be dynamically computed based on the outbound speech (or speaker) energy for the present discrete speech frame (n) and speech encoder parameters. In embodiments, that calculation may be or include the following computations:
Algorithm 1 ib_break_in_thresh(n) = β*ob_r0(n); IF (ibjbreak_in_thresh(n) > ib_break_in_threslι(n-l )) ib_break_injhresh(n) = β*ob_r0(n); ELSE ib_break_in_thresh(n) = a*ib_break_in_thresh(n-l) + (1- )*β*ob_r0(n); END
Where: ob_rO(n) = outbound speech energy for a frame n; n = current speech frame β = an energy scalar; and α = decay rate.
[00035] In step 310, the output of the speech encoder 108 may also be communicated to an inbound speech envelope generator 132, which may in embodiments be integrated with or interface to inbound VAD 114. Inbound speech envelope generator 132 may generate a moving envelope representing speech energy, such as a moving average or other representation of speech energy of the signal in the microphone path 128. Outbound speech envelope generator 134, which also may be integrated with or interface to outbound VAD 118, may similarly generate an envelope output based on the signal in the speaker path 130.
[00036] In step 312, the resulting speech envelope may be compared to the current inbound break-in threshold (ib_break_m_thresh). If the envelope of the inbound
speech exceeds that threshold, processing proceeds to step 314 where the duplex arbiter 116 may mute the speaker path 130 and activate or unmute the microphone path 128, thus allowing the near-end user's speech to be captured and communicated to the far-end user. If the envelope of the inbound speech does not exceed the inbound break-in threshold (ib_break_in_thresh), processing proceeds to step 316 where processing for the current frame of time may end, following which processing may repeat, proceed to other tasks or end. [00037] Figs. 4(A) and 4(B) illustrate speaker samples and echo-cancelled microphone samples, respectively, generated according to the embodiment illustrated in Fig. 3. Fig. 5 depicts an illustrative speech envelope for the inbound and outbound signals generated according to that embodiment. As illustrated in that figure, at certain times the inbound signal may exceed the outbound signal, while at other times the outbound signal may be greater than the inbound signal.
[00038] Fig. 6 illustrates an overlay of the outbound (speaker path 130) speech energy on an illustrative inbound dynamic break-in threshold, with a fixed inbound break-in threshold also shown for comparison. As illustrated in that figure, the inbound break- in threshold may be made a dynamic function of the parameters of Algorithm 1 or otherwise, resulting in a time-varying threshold which tracks, at least in part, the outbound speech energy with which the inbound speech is in competition. Thus, in intervals during which the outbound speech energy is comparatively high, the inbound break-in threshold rises to a relatively higher plateau, forcing near-end speech at the microphone 102 to be greater in intensity to capture the channel. Conversely, the inbound break-in threshold may be relaxed in intervals during which the outbound speech energy decreases, so that comparatively softer near-end speech may activate the microphone path 128, unlike the fixed threshold approach.
[00039] Fig. 7 illustrates the inbound speech envelope, inbound break-in dynamic threshold and inbound break-in instances generated according to the embodiment shown in Fig. 3. As illustrated in that figure, the inbound break-in instances may consequently occur in those periods of time where a relatively quiet outbound channel has driven the inbound break-in threshold to a lower level, enabling the microphone path 128 to appropriately seize the channel even with less energetic speech. [00040] When encoded speech is choppy or contains large swings in amplitude or other artifacts, in cases those inputs may cause rapid switching between microphone path 128 and speaker path 130, or other "race" or other undesirable conditions. In an embodiment of the invention illustrated in Fig. 8, the duplex arbiter 116 and other cooperating components may insert a delay interval or hangtime before permitting a transition of control from the microphone path 128 to the speaker path 130, and vice versa. The introduction of a hangtime may serve to prevent such race conditions when one or both of the near-end and far-end speech contains rapidly varying amplitudes.
[00041] As shown in Fig. 8, in step 802 processing may begin. In step 804, near-end samples from the microphone 102 may be processed by the speech encoder 108. In step 806, outbound speech from the far-end user may be processed by speech decoder 124. In step 808, the echo canceller 106 may receive the outputs of the speech encoder 108 and the speech decoder 124 to suppress echo and other feedback artifacts. In step 810, the echo-cancelled inbound speech and the decoded outbound speech may be communicated to inbound speech envelope generator 132 and outbound speech envelope generator" 134, respectively, to generate speech energy envelopes or other functions.
[00042] In step 812, an inbound break-in threshold (ib_break_in_threshold) and outbound break-in threshold (ob_break_m_threshold) may be generated, for instance according to the embodiment illustrated in Fig. 3 or otherwise. In step 814, at least one of an inbound hangtime (ib_hang_time) and an outbound hangtime (ob_hang_time) may be decremented, or set to initial values if the communications device is in an initialization mode such as in a startup or reset operation. In step 816, a determination may be made whether the speaker path 130 is activated. If the speaker path 130 is not activated, processing may proceed to step 818 where a determination may be made whether the microphone path 128 is activated.
[00043] If the microphone path 128 is not activated, processing may proceed to step
822 where the microphone path 128 may be activated or unmuted, while the speaker path 130 may be deactivated or muted. After step 822, control may proceed to step 840 where processing for the current frame may end, following which processing may repeat, proceed to other tasks or end.
[00044] If the determination at step 818 is that the microphone path 128 is on, processing may proceed to step 820 where a determination may be made the outbound speech envelope (ob_env) may be greater than the outbound break-in threshold (ob_break_in_threshold). If the outbound speech envelope (ob_env) is greater than the outbound break-in threshold (ob_break_in_threshold), processing may proceed to step 824 where a determination may be made whether the inbound hangtime (ib_hang_time) has expired. If the inbound hangtime (ib_hang_time) has not expired, processing may proceed to step 822 where again the microphone path 128 may be activated or unmuted, while the speaker path 130 may be deactivated or muted.
[00045] If at step 824 the inbound hangtime (ib_hangtime) has expired, processing may proceed to step 826 where an outbound hangtime (ob_hangtime) may be set to begin a hangtime period for the speaker path 130. The outbound hangtime (ob_hangtime) may for instance be set to a fixed amount of time, such as 4 seconds or another value according to implementation. In embodiments, the outbound hangtime may be computed or set on a dynamic basis, for instance as a function of prior inbound or outbound hangtimes, detected speech energy in the inbound or outbound paths or other variables. In step 828, the microphone path 128 may be deactivated or muted, while the speaker path 130 may be activated or unmuted, after which control may proceed to step 840 where processing for the current frame of time may end, following which processing may repeat, proceed to other tasks or end.
[00046] If at step 820 the outbound speech envelope (ob_env) is determined to not exceed the outbound break-in threshold (ob_break_in_threshold), processing may proceed to step 822 where again the microphone path 128 may be activated or unmuted, while the speaker path 130 may be deactivated or muted. Control may then also proceed to step 840 where processing for the current frame of time may end, following which processing may repeat, proceed to other tasks or end.
[00047] If at step 816 a determination is made that the speaker path 130 is on, processing may proceed to step 830 in which a determination may be made whether the inbound envelope (ib_envelope) exceeds the inbound break-in threshold (ib_break_in_threshold). If the inbound envelope (ib_envelope) does not exceed the inbound break-in threshold (ib_break_in_threshold), processing may proceed to step 832 where the speaker path 130 may be activated or unmuted while the microphone path 128 may be deactivated or muted. Following that step, control may then proceed to step 840 where processing for the current frame of time may end, following which processing may repeat, proceed to other tasks or end.
[00048] If at step 830 a determination is made that the inbound envelope (ib_envelope) exceeds the inbound break-in threshold (ib_break_in_threshold), processing may proceed to step 834 where a determination may be made whether the outbound hangtime (objiangtime) has expired. If the outbound hangtime (ob_hangtime) has not expired, processing may likewise proceed to step 832 where the speaker path 130 may be activated or unmuted while the microphone path 128 may be deactivated or muted.
[00049] If at step 834 a determination is made that the outbound hangtime
(ob_hangtime) has expired, processing may proceed to step 836 where the inbound hangtime may be set to a fixed amount of time, such as 4 seconds or another value according to implementation. In embodiments, the inbound hangtime may be computed or set on a dynamic basis, for instance as a function of prior inbound or outbound hangtimes, detected speech energy in the inbound or outbound paths or other variables. Processing may then proceed to step 838, where the speaker path 130 may be deactivated or muted while the microphone path 128 may be activated or unmuted. Following that step, control may then proceed to step 840 where processing for the current frame of time may end, following which processing may repeat, proceed to other tasks or end.
[00050] In the embodiment of the invention illustrated in Fig. 8, the awarding of control to the microphone path 128 or the speaker path 130 may therefore depend on more than one criterion. Those criteria may include the exceeding of speech envelope thresholds but also interposing a hangtime during which the currently active path may retain control, regardless of the activity in the other path. The inbound and outbound hangtimes may in embodiments be fixed or dynamic, and may be incremented or decremented depending on conditions. For instance, during periods of increasing noise or other parameters, either or both of the hangtimes may be incremented, or during periods of decreasing noise or other parameters, either or both of the hangtimes may be decremented. Greater continuity in speech or other interaction may therefore be achieved. [00051] Fig. 9(A) illustrates speech samples from speaker 120 and Fig. 9(B) illustrates speech samples from microphone 102 which may be processed in one regard according to the embodiment illustrated in Fig. 8. Fig. 10(A) illustrates the resulting outbound speech envelope (ob_env) along with the outbound break-in threshold (ob_break_m_thershold).
[00052] Fig. 10(A) also illustrates the application of an outbound hangtime
(objhangtime) interval during which the speaker path 130 may retain control and continue to be activated, despite the presence of energetic speech in the microphone path 128. Conversely, Fig. 10(B) illustrates the inbound speech envelope (ib_env) along with the inbound break-in threshold (ib_break_in_thershold). Fig. 10(B) also illustrates the application of an inbound hangtime (ib_hangtime) interval during which the microphone path 128 may retain control and continue to be activated, despite the presence of energetic speech in the speaker path 130. The introduction of these delay intervals may increase the sense of continuity for the near-end and far-end users during speakerphone operation.
[00053] In particularly noisy environments, such as for example in urban areas, when an automobile window may be open, during playback of a noisy voice message or at other times, the fricatives and other signal components may tend to trigger the speaker path 130 to be muted, even when still-intelligible speech is present. This may in one regard be due to the crossing of an outbound muting threshold ordinarily intended to switch the speaker path 130 off when the far-end user input has degraded into noise. In an embodiment of the invention illustrated in Fig. 11, this effect may be addressed in one regard by eliminating the outbound off threshold (ob_off _threshold) and permitting the speaker path 130 to occupy the channel until the microphone path 128 contains energetic speech, rather than configuring the speaker path 130 to switch itself off below that threshold.
[00054] As shown in that figure, processing may begin in step 1102. In step 1104, near-end samples from the microphone 102 may be processed by the speech encoder 108. In step 1106, outbound speech from the far-end user may be processed by speech decoder 124. In step 1108, the echo canceller 106 may receive the outputs of the speech encoder 108 and the speech decoder 124 to suppress echo and other feedback artifacts. In step 1110, the echo-cancelled inbound speech and the decoded outbound speech may be communicated to inbound speech envelope generator 132 and outbound speech envelope generator 134, respectively, to generate speech energy envelopes or other functions.
[00055] In step 1112, an inbound on threshold (ib_on_threshold) and outbound on threshold (ob_on_threshold) may be generated, for instance similarly to the embodiment illustrated in Fig. 3 or otherwise. In step 1114, the duplex arbiter 1116 may apply control logic to lock to the microphone path 128 or the speaker path 130, according to the current speech envelopes of the paths.
[00056] In step 1116, a determination may be made whether the outbound envelope
(ob_env) exceeds the outbound on threshold (ob_on_threshold). If the outbound envelope (ob_env) does not exceed the outbound on threshold (ob_on_threshold), processing may proceed to step 1118 where a determination may be made whether the inbound envelope (ib_env) exceeds the inbound on threshold (ib_on_threshold). If the inbound envelope (ib_env) exceeds the inbound on threshold, processing may proceed to step 1120 where a determination may be made whether the speaker path 130 is locked, that is, currently has control of the communications channel, such as a wireless cellular or other connection. If the speaker path 130 is locked, the state of the microphone path 128 and speaker path 130 may remain unchanged from the start of processing at step 1102 and control may proceed to step 1128 where processing for the current frame may end, following which processing may repeat, proceed to other tasks or end.
[00057] If the determination at step 1120 is that the speaker path 130 is not locked, processing may proceed to step 1122 where the speaker path 130 may be deactivated or muted, while the microphone path 128 may be activated or unmuted. Processing then may likewise proceed to step 1128 to repeat, proceed to other tasks or end.
[00058] If the determination at step 1118 is that the inbound envelope (ib_env) does not exceed the inbound on threshold (ib_on_threshold), processing may proceed to step 1128 to repeat, proceed to other tasks or end.
[00059] If the determination at step 1116 is that the outbound envelope (ob_env) exceeds the outbound on threshold (ob_on_threshold), processing may proceed to step 1124 where a determination may be made whether the microphone path 128 is locked. If the microphone path 128 is not locked, control may proceed to step 1126 where the speaker path 130 may be activated or unmuted while the microphone path 128 may be deactivated or muted. Processing then may proceed to step 1128 to repeat, proceed to other tasks or end. Likewise, if the determination at step 1124 is that the microphone path 128 is locked, the state of the microphone path 128 and speaker path 130 may remain unchanged from the start of processing at step 1102, and control may proceed to step 1128 to repeat, proceed to other tasks or end.
[00060] Fig. 12(A) illustrates samples from speaker 120 containing fricatives and other noise components, and Fig. 12(B) illustrates samples from microphone 102 at the same time which may together be processed for instance according to the embodiment illustrated in Fig. 11. Fig. 13 illustrates speakerphone control which might occur when operating upon such signals without the benefit of the invention, including rapid switching of the speaker path 130 between on and off states, due to the fricative and other noise artifacts.
[00061] Fig. 14(A) on the other hand illustrates the resulting speakerphone operation according to the embodiment of the invention illustrated in Fig. 11, in which the speaker path 130 maintains control of the channel even during relatively noisy background periods, in part because the outbound off threshold is eliminated, allowing the speaker path 130 to remain active. Instead of choppy or punctuated switching, the speaker path remains activated until the microphone path 128 appropriately seizes control of the channel due to energetic speech exceeding the inbound on threshold, as illustrated in Fig. 14(B). Smoother more continuous conversation results.
[00062] The foregoing description of the system and method for speakerphone operation according to the invention is illustrative, and variations in configuration and implementation will occur to persons skilled in the art. For instance, while the invention has generally been described as containing discrete voice detectors in the form of inbound VAD 114 and outbound VAD 118, in embodiments the functions or parts of the functions of the two voice activity detectors could be combined in one part, or in one software module. More than two paths could also be managed according to the invention. Similarly, while the invention has been described with respect to an inbound path including an echo canceller 106, in embodiments other types of noise suppressors could be implemented, or in embodiments that component could be omitted or modified.
[00063] It has likewise been noted that the communications device in which the invention may operate may be or include a cellular telephone, but could consist of other communications platforms such as wired or wireless telephones, two-way radios, base stations for wireless telephones, network-enabled wireless communications devices such as 802.11a, 802.11b, 802. llg or other short or long- range telephony or other units, or other equipment as well.
[00064] Yet further, while the invention has generally been described in terms of a speakerphone architecture in which the electronic intelligence governing the speakerphone operation is integral with the cellular telephone or other communications device, in other embodiments the intelligence may be embedded or shared in an attachment coupled to the communications device. For instance, the intelligence may be embedded or shared in a detachable battery, a headphone device, a tabletop or other fixed or non-wearable speakerphone unit, or in other accessories or parts. For example, the intelligence may enable a speakerphone operation through a car audio system coupled to a cellular telephone.
[00065] In the case of a detachable or coupleable unit which adds or enhances speakerphone capability in a communications device, the intelligence embedded in the add-on device may communicate with the electronics of the communications device through interfaces such as a serial port such as an RS-232, a universal serial bus (USB) or a universal asynchronous receiver/transmitter (UART) connection, an infrared data (IrDA) port, a radio frequency link, or other serial, parallel or other data ports or other connections. The scope of the invention is accordingly intended to be limited only by the following claims.

Claims

CLAIMSWe claim:
1. A system for managing speakerphone operation in a communications device, comprising: a first voice activity detector, configured to communicate with an inbound path of the communications device, the first voice activity detector generating at least first voice data based upon a signal in the inbound path; a second voice activity detector, configured to communicate with an outbound path of the communications device, the second voice activity detector generating at least second voice data based upon a signal in the outbound path; and a processor, communicating with the first voice activity detector and the second voice activity detector, the processor controlling at least one of the inbound path and the outbound path based upon at least one of the first voice data and the second voice data.
2. A system according to claim 1, wherein the communications device comprises at least one of a cellular telephone, a voice-enabled network device, and a telephone device.
3. A system according to claim 1, wherein the first voice data comprises at least one of a first voice energy signal, a first voice envelope, a first voice sample, and a first voice present signal.
4. A system according to claim 1, wherein the second voice data comprises at least one of a second voice energy signal, a second voice envelope, a second voice sample and a second voice present signal.
5. A system according to claim 1, wherein the controlling performed by the processor comprises awarding control of a communications channel to one of the inbound path and the outbound path based upon a comparison of the first voice data and the second voice data.
6. A system according to claim 5, wherein the communications channel comprises a wireless communications channel.
7. A system for managing speakerphone operation in a communications device, comprising: voice activity detection means, communicating with each of an inbound path and an outbound path each of the communications device, the voice activity detection means generating at least first voice data based upon a signal in the inbound path and at least second voice data based upon a signal in the outbound path; and processing means, configured to communicate with the voice activity detection means, the processing means controlling at least one of the inbound path and the outbound path based upon at least one of the first voice data and the second voice data.
8. A system for managing speakerphone operation in a communications device, comprising: a first voice activity detector, configured to communicate with an inbound path of the communications device, the first voice activity detector generating at least a first voice detection signal based upon at least a first voice threshold applied to a signal in the inbound path; a second voice activity detector, configured to communicate with an outbound path of the communications device, the second voice activity detector generating at least a second voice detection signal based upon at least a second voice threshold applied to a signal in the outbound path; and a processor, communicating with the first voice activity detector and the second voice activity detector, the processor controlling at least one of the inbound path and the outbound path based upon at least a comparison of the first voice detection signal and the second voice detection signal.
9. A system for managing speakerphone operation in a communications device, comprising: a processor, the processor being configured to execute- voice activity detection code, receiving input from each of an inbound path and an outbound path each of the communications device, the executable voice activity detection code generating at least a first voice detection signal based upon a first voice threshold applied to a signal in the inbound path and at least a second voice detection signal based upon at least a second voice threshold applied to a signal in the outbound path, and arbitration code, the arbitration code controlling at least one of the inbound path and the outbound path based upon at least one of the first voice detection signal and the second voice detection signal.
10. A system for managing speakerphone operation in a communications device, comprising: a first voice activity detector, configured to communicate with an inbound path of the communications device, the first voice activity detector generating at least a first voice detection signal based upon a signal in the inbound path; a second voice activity detector, configured to communicate with an outbound path of the communications device, the second voice activity detector generating at least a second voice detection signal based upon a signal in the outbound path; and a processor, communicating with the first voice activity detector and the second voice activity detector, the processor controlling speakerphone operation to award control of a communications channel to at least one of the inbound path and the outbound path based upon at least a comparison of the first voice detection signal and the second voice detection signal and at least one of an inbound hangtime and an outbound hangtime.
PCT/US2003/023113 2002-07-26 2003-07-24 System and method for speakerphone operation in a communications device WO2004012426A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2004524756A JP2005534258A (en) 2002-07-26 2003-07-24 System and method for operating a speakerphone in a communication device
GB0502502A GB2407744B (en) 2002-07-26 2003-07-24 System and method for speakerphone operation in a communications device
AU2003256725A AU2003256725A1 (en) 2002-07-26 2003-07-24 System and method for speakerphone operation in a communications device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US39857802P 2002-07-26 2002-07-26
US60/398,578 2002-07-26

Publications (1)

Publication Number Publication Date
WO2004012426A1 true WO2004012426A1 (en) 2004-02-05

Family

ID=31188421

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2003/023113 WO2004012426A1 (en) 2002-07-26 2003-07-24 System and method for speakerphone operation in a communications device

Country Status (6)

Country Link
JP (1) JP2005534258A (en)
KR (1) KR100736246B1 (en)
CN (1) CN1692618A (en)
AU (1) AU2003256725A1 (en)
GB (1) GB2407744B (en)
WO (1) WO2004012426A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104093178A (en) * 2013-04-01 2014-10-08 联想(北京)有限公司 Communication method and mobile terminal

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100720589B1 (en) * 2005-08-26 2007-05-22 엘지전자 주식회사 System and Method for controlling speaker-phone using asynchronous HDLC
KR101714546B1 (en) * 2011-01-11 2017-03-10 삼성전자주식회사 Device and method for processing voice communication in mobile terminal
US20150139428A1 (en) * 2013-11-20 2015-05-21 Knowles IPC (M) Snd. Bhd. Apparatus with a speaker used as second microphone

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6570985B1 (en) * 1998-01-09 2003-05-27 Ericsson Inc. Echo canceler adaptive filter optimization
US6611693B2 (en) * 1996-02-23 2003-08-26 Nokia Mobile Phones Ltd. Multi-service mobile station

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6223154B1 (en) 1998-07-31 2001-04-24 Motorola, Inc. Using vocoded parameters in a staggered average to provide speakerphone operation based on enhanced speech activity thresholds

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6611693B2 (en) * 1996-02-23 2003-08-26 Nokia Mobile Phones Ltd. Multi-service mobile station
US6570985B1 (en) * 1998-01-09 2003-05-27 Ericsson Inc. Echo canceler adaptive filter optimization

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104093178A (en) * 2013-04-01 2014-10-08 联想(北京)有限公司 Communication method and mobile terminal

Also Published As

Publication number Publication date
CN1692618A (en) 2005-11-02
JP2005534258A (en) 2005-11-10
KR100736246B1 (en) 2007-07-06
GB0502502D0 (en) 2005-03-16
GB2407744A (en) 2005-05-04
AU2003256725A1 (en) 2004-02-16
KR20050029280A (en) 2005-03-24
GB2407744B (en) 2006-06-07

Similar Documents

Publication Publication Date Title
US6081732A (en) Acoustic echo elimination in a digital mobile communications system
US5696821A (en) Radiotelephone and method therefor for substantially reducing audio feedback
US6850617B1 (en) Telephone receiver circuit with dynamic sidetone signal generator controlled by voice activity detection
US6138040A (en) Method for suppressing speaker activation in a portable communication device operated in a speakerphone mode
KR100623410B1 (en) An echo canceler circuit and method
US7212841B2 (en) Telephone apparatus and a communication method using such apparatus
US8447595B2 (en) Echo-related decisions on automatic gain control of uplink speech signal in a communications device
JP2001510655A (en) Method and apparatus for using state determination to control functional elements of a digital telephone system
JP3009647B2 (en) Acoustic echo control system, simultaneous speech detector of acoustic echo control system, and simultaneous speech control method of acoustic echo control system
AU2003277339A1 (en) Integrated noise cancellation and residual echo supression
US5771440A (en) Communication device with dynamic echo suppression and background noise estimation
US20050014535A1 (en) System and method for speaker-phone operation in a communications device
JPH10322441A (en) Hand-free telephone set
JPH08163227A (en) Automatic received sound volume varying circuit
US6662027B2 (en) Method of arbitrating speakerphone operation in a portable communication device for eliminating false arbitration due to echo
WO2004012426A1 (en) System and method for speakerphone operation in a communications device
US5912923A (en) System for transmitting coded speech signals
KR100538615B1 (en) Apparatus for removing noise
JPH10285083A (en) Voice communication equipment
JPH07297901A (en) Radio telephony equipment
JPH09116468A (en) Voice communication equipment provided with acoustic echo canceller
JP3201136B2 (en) Digital mobile radio equipment
JP2000502537A (en) Telecommunications system, subscriber unit, and television receiver comprising subscriber unit
JPS6167357A (en) Hand free telephone set
JPH09130282A (en) Method for reducing noise of digital radio communication equipment

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 1020057001469

Country of ref document: KR

Ref document number: 2004524756

Country of ref document: JP

ENP Entry into the national phase

Ref document number: 0502502

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20030724

WWE Wipo information: entry into national phase

Ref document number: 20038228203

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 1020057001469

Country of ref document: KR

122 Ep: pct application non-entry in european phase