US20140172424A1 - Preserving audio data collection privacy in mobile devices - Google Patents

Preserving audio data collection privacy in mobile devices Download PDF

Info

Publication number
US20140172424A1
US20140172424A1 US14/186,730 US201414186730A US2014172424A1 US 20140172424 A1 US20140172424 A1 US 20140172424A1 US 201414186730 A US201414186730 A US 201414186730A US 2014172424 A1 US2014172424 A1 US 2014172424A1
Authority
US
United States
Prior art keywords
audio
frames
analysis
continuous
analyzing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/186,730
Inventor
Leonard Henry Grokop
Vidya Narayanan
James W. Dolter
Sanjiv Nanda
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Priority to US14/186,730 priority Critical patent/US20140172424A1/en
Assigned to QUALCOMM INCORPORATED reassignment QUALCOMM INCORPORATED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DOLTER, JAMES W., NANDA, SANJIV, NARAYANAN, VIDYA, GROKOP, LEONARD H.
Publication of US20140172424A1 publication Critical patent/US20140172424A1/en
Assigned to QUALCOMM INCORPORATED reassignment QUALCOMM INCORPORATED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DOLTER, JAMES W., NANDA, SANJIV, NARAYANAN, VIDYA, GROKOP, LEONARD H.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/02Protecting privacy or anonymity, e.g. protecting personally identifiable information [PII]

Definitions

  • Mobile devices are incredibly widespread in today's society. For example, people use cellular phones, smart phones, personal digital assistants, laptop computers, pagers, tablet computers, etc. to send and receive data wirelessly from countless locations. Moreover, advancements in wireless communication technology have greatly increased the versatility of today's mobile devices, enabling users to perform a wide range of tasks from a single, portable device that conventionally required either multiple devices or larger, non-portable equipment.
  • mobile devices can be configured to determine what environment (e.g., restaurant, car, park, airport, etc.) a mobile device user may be in through a process called context determination.
  • Context awareness applications that perform such context determinations seek to determine the environment of a mobile device by utilizing information from the mobile device's sensor inputs, such as GPS, WiFi and BlueTooth®.
  • classifying audio from the mobile device's microphone is highly valuable in making context determinations, but the process of collecting audio that may include speech can raise privacy issues.
  • Techniques disclosed herein provide for using the hardware and/or software of a mobile device to obscure speech in the audio data before a context determination is made by a context awareness application using the audio data.
  • a subset of a continuous audio stream is captured such that speech (words, phrases and sentences) cannot be reliably reconstructed from the gathered audio.
  • the subset is analyzed for audio characteristics, and a determination can be made regarding the ambient environment.
  • a method of privacy-sensitive audio analysis may include capturing a subset of audio data contained in a continuous audio stream.
  • the continuous audio stream may contain human speech.
  • the subset of audio data may obscure content of the human speech.
  • the method may include analyzing the subset of audio data for audio characteristics.
  • the method may include making a determination of an ambient environment, based, at least in part, on the audio characteristics.
  • Embodiments of such a method may include one or more of the following:
  • the subset of audio data may comprise a computed function of the continuous audio stream having a lesser number of bits than is needed to reproduce the continuous audio stream with intelligible fidelity.
  • the subset of audio data may comprise a plurality of audio data segments, each audio data segment comprising data from a different temporal component of the continuous audio stream.
  • the method may include making a determination of an identity of a person based, at least in part, on the audio characteristics.
  • the plurality of audio data segments may comprise between 30 ms to 100 ms of recorded audio.
  • Each temporal component of the continuous audio stream may be between 250 ms to 2s in length.
  • the method may include randomly altering an order of the plurality of audio data segments before analyzing the subset of audio data. Randomly altering the order of the plurality of audio data segments may be based, at least in part, on information from one of: a Global Positioning System (GPS) device, signal noise from circuitry within a mobile device, signal noise from a microphone, and signal noise from an antenna.
  • GPS Global Positioning System
  • a device for obscuring privacy-sensitive audio may include a microphone.
  • the device may include a processing unit communicatively coupled to the microphone.
  • the processing unit may be configured to capture a subset of audio data contained in a continuous audio stream represented in a signal from the microphone.
  • the continuous audio stream may contain human speech.
  • the subset of audio data may obscure content of the human speech.
  • the processing unit may be configured to analyze the subset of audio data for audio characteristics.
  • the processing unit may be configured to make a determination of an ambient environment, based, at least in part, on the audio characteristics.
  • Embodiments of such a device may include one or more of the following:
  • the subset of audio data may comprise a computed function of the continuous audio stream having a lesser number of bits than is needed to reproduce the continuous audio stream with intelligible fidelity.
  • the subset of audio data may comprise a plurality of audio data segments, each audio data segment comprising data from a different temporal component of the continuous audio stream.
  • the processing unit may be configured to make a determination of an identity of a person based, at least in part, on the audio characteristics.
  • Each of the plurality of audio data segments may comprise between 30 ms to 100 ms of recorded audio.
  • Each temporal component of the continuous audio stream may be between 250 ms to 2s in length.
  • the device wherein the processing unit is further configured to randomly altering an order of the plurality of audio data segments before analyzing the subset of audio data. Randomly altering the order of the plurality of audio data segments may be based, at least in part, on information from one of: a Global Positioning System (GPS) device, signal noise from circuitry within a mobile device, signal noise from the microphone, and signal noise from an antenna.
  • GPS Global Positioning System
  • a system for determining an environment associated with a mobile device may include an audio sensor configured to receive a continuous audio stream.
  • the system may include at least one processing unit coupled to the audio sensor.
  • the processing unit may be configured to capture a subset of audio data contained in the continuous audio stream, such that the subset of audio data obscures content of human speech included in the continuous audio stream.
  • the processing unit may be configured to analyze the subset of audio data for audio characteristics.
  • the processing unit may be configured to make a determination of an ambient environment, based, at least in part, on the audio characteristics.
  • Embodiments of such a system may include one or more of the following:
  • the system may include a network interface configured to send information representing the subset of audio data via a network to a location remote from the mobile device.
  • the at least one processing unit may be configured to make the determination of the ambient environment at the location remote from the mobile device.
  • the subset of audio data may comprise a plurality of audio data segments, each audio data segment comprising data from a different temporal component of the continuous audio stream.
  • the at least one processing unit may be configured to make a determination of an identity of a person based, at least in part, on the audio characteristics.
  • Each of the plurality of audio data segments may comprise between 30 ms to 100 ms of recorded audio.
  • Each temporal component of the continuous audio stream may be between 250 ms to 2s in length.
  • the processing unit may be further configured to randomly alter an order of the plurality of audio data segments before analyzing the subset of audio data.
  • a computer program product residing on a non-transitory processor-readable medium includes processor-readable instructions configured to cause a processor to capture a subset of audio data contained in a continuous audio stream.
  • the continuous audio stream may contains human speech.
  • the subset of audio data may obscure content of the human speech.
  • the processor-readable instructions may be configured to cause the processor to analyze the subset of audio data for audio characteristics.
  • the processor-readable instructions may be configured to cause the processor to make a determination of an ambient environment, based, at least in part, on the audio characteristics.
  • Embodiments of such a computer program product may include one or more of the following:
  • the subset of audio data may comprise a computed function of the continuous audio stream having a lesser number of bits than is needed to reproduce the continuous audio stream with intelligible fidelity.
  • the subset of audio data may comprise a plurality of audio data segments, each audio data segment comprising data from a different temporal component of the continuous audio stream.
  • the processor-readable instructions may be configured to cause the processor to make a determination of an identity of a person based, at least in part, on the audio characteristics.
  • Each of the plurality of audio data segments may comprise between 30 ms to 100 ms of recorded audio.
  • Each temporal component of the continuous audio stream may be between 250 ms to 2s in length.
  • the processor-readable instructions may be configured to randomly alter an order of the plurality of audio data segments before analyzing the subset of audio data.
  • the processor-readable instructions for randomly altering the order of the plurality of audio data segments is based, at least in part, on information from one of: a Global Positioning System (GPS) device, signal noise from circuitry within a mobile device, signal noise from a microphone, and signal noise from an antenna.
  • GPS Global Positioning System
  • a device for obscuring privacy-sensitive audio may include means for capturing a subset of audio data contained in a continuous audio stream represented in a signal from a microphone.
  • the continuous audio stream may contain human speech.
  • the subset of audio data may obscure content of the human speech.
  • the device may include means for analyzing the subset of audio data for audio characteristics.
  • the device may include means for determining an ambient environment, based, at least in part, on the audio characteristics.
  • Embodiments of such a device may include one or more of the following:
  • the means for capturing the subset of audio data may be configured to capture the subset of audio data in accordance with a computed function of the continuous audio stream having a lesser number of bits than is needed to reproduce the continuous audio stream with intelligible fidelity.
  • the means for capturing the subset of audio data may be configured to capture the subset of audio data such that the subset of audio data comprises a plurality of audio data segments, each audio data segment comprising data from a different temporal component of the continuous audio stream.
  • the means for determining the ambient environment may be configured to make a determination of an identity of a person based, at least in part, on the audio characteristics.
  • the means for capturing the subset of audio data may be configured to capture the subset of audio data such that each of the plurality of audio data segments comprises between 30 ms to 100 ms of recorded audio.
  • Items and/or techniques described herein may provide one or more of the following capabilities, as well as other capabilities not mentioned.
  • Obscuring of the content of speech that may be included in an audio stream used for a context determination while having little or no impact on the accuracy of the context determination. Utilizing a relatively simple method that can be executed in real time, using minimal processing resources. Including an ability to upload a subset of audio data (having obscured speech) to help improve the accuracy of models used in context determinations. While at least one item/technique-effect pair has been described, it may be possible for a noted effect to be achieved by means other than that noted, and a noted item/technique may not necessarily yield the noted effect.
  • FIG. 1 is a simplified block diagram of basic components of a mobile device configured to support context awareness applications, according to one embodiment.
  • FIGS. 2 a - 2 c are visualizations of processes for capturing sufficient audio information to classify the ambient environment of a mobile device without performance degradation, while helping ensure privacy of speech.
  • FIGS. 3 a and 3 b are flow diagrams of methods for providing the functionality shown in FIGS. 2 b and 2 c.
  • FIG. 4 is a graph illustrating results of an analysis computing an upper bound on the probability of a speech recognizer reconstructing n-grams of words, from audio data resulting from certain processing methods described herein.
  • Mobile devices such as personal digital assistants (PDAs), mobile phones, tablet computers, and other personal electronics, can be enabled with context awareness applications. These context awareness applications can determine, for example, where a user of the mobile device is and what the user might be doing, among other things. Such context determinations can help enable a mobile device to provide additional functionality to a user, such as enter a car mode after determining the user is in a car, or entering a silent mode when determining the user has entered a movie theater.
  • PDAs personal digital assistants
  • context awareness applications can determine, for example, where a user of the mobile device is and what the user might be doing, among other things.
  • Such context determinations can help enable a mobile device to provide additional functionality to a user, such as enter a car mode after determining the user is in a car, or entering a silent mode when determining the user has entered a movie theater.
  • a subset of audio data may be captured from a continuous audio stream that may contain speech, whereby the nature of the sampling obscures any speech that might be contained in the continuous audio stream.
  • the nature of the sampling also preserves certain audio characteristics of the continuous audio stream such that a context determination—such as a determination regarding a particular ambient environment of a mobile device—suffers little or no reduction in accuracy.
  • FIG. 1 is a is a simplified block diagram illustrating certain components of a mobile device 100 that can provide for context awareness, according to one embodiment.
  • This diagram is an example and is not limiting.
  • the mobile device 100 may include additional components (e.g., user interface, antennas, display, etc.) omitted from FIG. 1 for simplicity. Additionally, the components shown may be combined, separated, or omitted, depending on the functionality of the mobile device 100 .
  • the mobile device 100 includes a mobile network interface 120 .
  • Such an interface can include hardware, software, and/or firmware for communicating with a mobile carrier.
  • the mobile network interface 120 can utilize High Speed Packet Access (HSPA), Enhanced HSPA (HSPA+), 3GPP Long Term Evolution (LTE), and/or other standards for mobile communication.
  • HSPA High Speed Packet Access
  • HSPA+ Enhanced HSPA
  • LTE 3GPP Long Term Evolution
  • the mobile network interface 120 can also provide certain information, such as location data, that can be useful in context awareness applications.
  • the mobile device 100 can include other wireless interface(s) 170 .
  • Such interfaces can include IEEE 802.11 (WiFi), Bluetooth®, and/or other wireless technologies.
  • These wireless interface(s) 170 can provide information to the mobile device 100 that may be used in a context determination.
  • the wireless interface(s) 170 can provide information regarding location by determining the approximate location of a wireless network to which one or more of the wireless interface(s) 170 are connected.
  • the wireless interface(s) 170 can enable the mobile device 100 to communicate with other devices, such as wireless headsets and/or microphones, which may provide information useful in determining a context of the mobile device 100 .
  • the mobile device 100 also can include a global positioning system (GPS) unit 160 , accelerometer(s) 130 , and/or other sensor(s) 150 . These additional features can provide information such as location, orientation, movement, temperature, proximity, etc. As with the wireless interface(s) 170 , information from these components can help context awareness applications make a context determination regarding the context of the mobile device 100 .
  • GPS global positioning system
  • the mobile device 100 additionally can include an analysis/determination module(s) 110 .
  • the analysis/determination module(s) 110 can receive sensor information from the various components to which it is communicatively coupled.
  • the analysis/determination module(s) 110 also can execute software (including context awareness applications) stored on a memory 180 , which can be separate from and/or integrated into the analysis/determination module(s) 110 .
  • the analysis/determination module(s) 110 can comprise one or many processing devices, including a central processing unit (CPU), microprocessor, digital signal processor (DSP), and/or components that, among other things, have the means capable of analyzing audio data and making a determination based on the analysis.
  • CPU central processing unit
  • DSP digital signal processor
  • wireless interfaces 170 can greatly assist in determining location when the user is outdoors, near identifiable WiFi or BlueTooth access points, walking, etc.
  • these components have their limitations. In many scenarios they are less useful for determining environment and situation. For example, information from these components is less useful in distinguishing whether a user is in a meeting or in their office, or whether a user is in a grocery store or the gymnasium immediately next to it.
  • information from the audio capturing module 140 can provide highly valuable audio data that can be used to help classify the environment, as well as determine whether there is speech present, whether there are multiple speakers present, the identity of a speaker, etc.
  • the process of capturing audio data by a mobile device 100 for a context determination can include temporarily and/or permanently storing audio data to the phone's memory 180 .
  • the capture of audio data that includes intelligible speech can raise privacy issues. In fact, federal, state, and/or local laws may be implicated if the mobile device 100 captures speech from a user of the mobile device 100 , or another person, without consent. These issues can be mitigated by using the hardware and/or software of the mobile device 100 to pre-process the audio data before it is captured such that speech (words, phrases and sentences) cannot be reliably reconstructed from the captured audio data. Moreover, the pre-processing can still allow determination of an ambient environment (e.g., from background noise) and/or other audio characteristics of the audio data, such as the presence of speech, music, typing sounds, etc.
  • an ambient environment e.g., from background noise
  • FIG. 2 a is a visualization of a process for capturing sufficient audio information to classify a mobile device and/or user's situation/environment without performance degradation. Additionally the process can also help ensure that speech (words, phrases and sentences) cannot be reliably reconstructed from the captured information.
  • This process involves reducing the dimensionality of an input audio stream. In other words, the bits (i.e., digital data) of an input stream of continuous audio are reduced such that the resultant audio stream has a lesser number of bits than is needed to reproduce the continuous audio stream with intelligible fidelity. Reducing the dimensionality therefore can be a computed function designed to ensure speech is irreproducible.
  • a continuous audio stream can comprise a window 210 of audio data lasting T window seconds.
  • the window 210 can be viewed as having a plurality of audio data segments. More specifically, the window 210 can comprise N temporal components, or blocks 220 , where each block 220 lasts T block seconds and comprises a plurality of frames 230 of T frame seconds each.
  • a microphone signal can be sampled such that only one frame 230 (with T frame seconds of data) is collected in every block of T block seconds.
  • T frame can range from less than 30 ms to 100 ms or more
  • T block can range from less than 250 ms up to 2000 ms (2s) or more
  • T window can be as short as a single block (e.g., one block per window), up to one minute or more.
  • Different frame, block, and window lengths can impact the number of frames 230 per block 220 and the number of blocks 220 per window 210 .
  • the capturing of frames 230 can be achieved in different ways.
  • the analysis/determination module(s) 110 can continuously sample the microphone signal during a window 210 of continuous audio, discarding (i.e., not storing) the unwanted frames 230 .
  • the processing unit can simply discard 450 ms out of every 500 ms sampled.
  • the analysis/determination module(s) 110 can turn the audio capturing module 140 off during the unwanted frames 230 (e.g., turning the audio capturing module 140 off for 450 ms out of every 500 ms), thereby collecting only the frames 230 that will be inserted into the resulting audio information 240 - a used in a context determination.
  • the resulting audio information 240 - a is a collection of frames 230 that comprises only a subset of the continuous audio stream in the window 210 . Even so, this resulting audio information 240 - a can include audio characteristics that can help enable a context determination, such as determining an ambient environment, with no significant impact on in the accuracy of the determination. Accordingly, the resulting audio information 240 - a can be provided in real time to an application for context classification, and/or stored as one or more waveform(s) in memory 180 for later analysis and/or uploading to a server communicatively coupled to the mobile device 100 .
  • FIGS. 2 b and 2 c are visualizations of processes for capturing audio information, similar to the process shown in FIG. 2 a . In FIGS. 2 b and 2 c , however, additional steps are taken to help ensure further privacy of any speech that may be captured.
  • a visualization is provided illustrating how, for every window 210 of T window seconds, the first frames 230 of each block 220 can be captured.
  • the frame 230 - 1 of the final block 220 of the window 210 is captured, all the captured frames of the window 210 can be randomly permutated (i.e. randomly shuffled) to provide the resultant audio information 240 - b .
  • the resultant audio information 240 - b is similar to the resulting audio information 240 - a of FIG.
  • FIG. 2 c illustrates a process similar to the one shown in FIG. 2 b , but further randomizing the frame 230 captured for each block 220 . More specifically, rather than capture the first frame 230 of each block 220 of a window 210 as shown in FIGS. 2 a and 2 b , the process shown in FIG. 2 c demonstrates that a random frame 230 from each block 220 can be selected instead.
  • the randomization of both the capturing of frames 230 of a window 210 and the ordering of frames 230 in the resultant audio information 240 - c helps further ensure that any speech contained in a continuous audio stream within a window 210 is obscured and irreproducible.
  • the randomization used in processes shown in FIGS. 2 b and 2 c can be computed using a seed that is generated in numerous ways.
  • the seed may be based on GPS time provided by the GPS unit 160 , noise from circuitry within the mobile device 100 , noise (or other signal) from the audio capturing module 140 , noise from an antenna, etc.
  • the permutation can be discarded (e.g., not stored) to help ensure that the shuffling effect cannot be reversed.
  • FIGS. 2 a , 2 b , and 2 c are provided as examples and are not limiting. Other embodiments are contemplated.
  • the blocks 220 may be randomly permutated before frames 230 are captured.
  • frames 230 can be captured randomly throughout the entire window 210 , rather than capturing one frame 230 per block 220 .
  • FIG. 3 a is a flow diagram illustrating an embodiment of a method 300 - 1 for providing the functionality shown in FIGS. 2 b and 2 c .
  • the method 300 - 1 can begin at stage 310 , where a block 220 of audio data from a continuous audio stream is received.
  • the continuous audio stream can be, for example, audio within a window 210 of time to which the audio capturing module 140 of a mobile device 100 is exposed.
  • a frame 230 of the block 220 of audio data is captured.
  • the frame 230 can be a predetermined frame (e.g. first frame) of each block 220 of audio data, or it can be randomly selected.
  • the frame 230 is captured, for example, by being stored (either temporarily or permanently) in the memory 180 of a mobile device 100 .
  • the capturing of a frame 230 can include turning a audio capturing module 140 on and off and/or sampling certain portions of a signal from a audio capturing module 140 representing a continuous audio stream.
  • stage 340 the process moves to stage 340 , where the order of the captured frames are randomized
  • These randomized frames can be stored, for example, in an audio file used for analysis by a context awareness application.
  • stage 350 a determination of the ambient environment (or other context determination) is made, based, at least in part, on audio characteristics of the randomized frames.
  • stages of the method 300 - 1 may be performed by one or more different components of the mobile device 100 and/or other systems communicatively coupled with the mobile device 100 .
  • stages can be performed by any combination of hardware, software, and/or firmware.
  • certain stages such as stages 320 - 340 can be performed by hardware (such as the analysis/determination module(s) 110 ), randomizing captured frames, for instance, on a buffer before storing them on the memory 180 and/or providing them to a software application.
  • some embodiments may enable certain parameters (e.g., T window , T block , and/or T frame ) to be at least partially configurable by software.
  • a mobile device 100 may upload the resultant audio information 240 including the captured frames to a remote server.
  • the remote server can make the determination of the ambient environment of stage 350 .
  • the mobile device 100 can upload the resultant audio information 240 along with a determination of the ambient environment made by the mobile device 100 .
  • the remote server can use the determination and the resultant audio information 240 to modify existing models used to make ambient environment determinations. This enables the server to maintain models that are able to “learn” from input received by mobile devices 100 . Modified and/or updated models then can be downloaded to mobile devices 100 to help improve the accuracy of ambient environment determinations made by the mobile devices 100 . Thus, ambient environment determinations (or other contextual determinations) can be continually improved.
  • the techniques described herein can allow determination of not only an ambient environment and/or other contextual determinations, but other audio characteristics of the audio data as well. These audio characteristics can include the presence of speech, music, typing sounds, and more. Depending on the audio characteristics include, different determinations may be made.
  • FIG. 3 b a flow diagram illustrating an example of a method 300 - 1 , which includes stages similar to the method 300 - 1 of FIG. 3 .
  • the method 300 - 2 of FIG. 3 b includes an additional stage 360 where a determination is made regarding the identity of speaker(s) whose speech is included in the captured frames used to made a determination of an ambient environment.
  • the determination of stage 360 can be made by the mobile device 100 and/or a remote server to which the captured frames are uploaded.
  • the determination regarding identity can include the use of other information and/or models, such as models to help determine the age, gender, etc. of the speaker and, stored information regarding audio characteristics of a particular person's speech, and other data.
  • classifiers e.g., probabilistic classifiers used in context awareness applications
  • the data used was a commercially acquired audio data set of environmental sounds of a set of environments (e.g., in a park, on a street, in a market, in a car, in an airport, etc.) common among context awareness applications.
  • T frame 50 ms
  • Table 1 indicates how reducing the dimensionality of the audio data by sampling only subsets of a continuous audio stream can have little impact on the accuracy of the classifier's determination of an ambient environment until T block approaches 2 seconds (i.e., the microphone is on for only 50 ms for every 2 seconds, or 2.5% of the time). Results may be different for different classifiers.
  • configurations may be described as a process which is depicted as a flow diagram or block diagram. Although each may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged. A process may have additional steps not included in the figure.
  • Computer programs incorporating various features of the present invention may be encoded on various non-transitory computer-readable and/or non-transitory processor-readable storage media; suitable media include magnetic media, optical media, flash memory, and other non-transitory media.
  • Non-transitory processor-readable storage media encoded with the program code may be packaged with a compatible device or provided separately from other devices.
  • program code may be encoded and transmitted via wired optical, and/or wireless networks conforming to a variety of protocols, including the Internet, thereby allowing distribution, e.g., via Internet download.

Abstract

Techniques are disclosed for using the hardware and/or software of the mobile device to obscure speech in the audio data before a context determination is made by a context awareness application using the audio data. In particular, a subset of a continuous audio stream is captured such that speech (words, phrases and sentences) cannot be reliably reconstructed from the gathered audio. The subset is analyzed for audio characteristics, and a determination can be made regarding the ambient environment.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of U.S. patent application Ser. No. 13/213,294, filed Aug. 19, 2011, which claims priority to U.S. Provisional Patent Application No. 61/488,927, filed May 23, 2011, entitled “PRESERVING AUDIO DATA COLLECTION PRIVACY IN MOBILE DEVICES,” Attorney Docket No. 111174P1, all of which is hereby incorporated herein by reference for all purposes.
  • BACKGROUND
  • Mobile devices are incredibly widespread in today's society. For example, people use cellular phones, smart phones, personal digital assistants, laptop computers, pagers, tablet computers, etc. to send and receive data wirelessly from countless locations. Moreover, advancements in wireless communication technology have greatly increased the versatility of today's mobile devices, enabling users to perform a wide range of tasks from a single, portable device that conventionally required either multiple devices or larger, non-portable equipment.
  • For example, mobile devices can be configured to determine what environment (e.g., restaurant, car, park, airport, etc.) a mobile device user may be in through a process called context determination. Context awareness applications that perform such context determinations seek to determine the environment of a mobile device by utilizing information from the mobile device's sensor inputs, such as GPS, WiFi and BlueTooth®. In many scenarios, classifying audio from the mobile device's microphone is highly valuable in making context determinations, but the process of collecting audio that may include speech can raise privacy issues.
  • BRIEF SUMMARY
  • Techniques disclosed herein provide for using the hardware and/or software of a mobile device to obscure speech in the audio data before a context determination is made by a context awareness application using the audio data. In particular, a subset of a continuous audio stream is captured such that speech (words, phrases and sentences) cannot be reliably reconstructed from the gathered audio. The subset is analyzed for audio characteristics, and a determination can be made regarding the ambient environment.
  • In some embodiments, a method of privacy-sensitive audio analysis is presented. The method may include capturing a subset of audio data contained in a continuous audio stream. The continuous audio stream may contain human speech. The subset of audio data may obscure content of the human speech. The method may include analyzing the subset of audio data for audio characteristics. The method may include making a determination of an ambient environment, based, at least in part, on the audio characteristics.
  • Embodiments of such a method may include one or more of the following: The subset of audio data may comprise a computed function of the continuous audio stream having a lesser number of bits than is needed to reproduce the continuous audio stream with intelligible fidelity. The subset of audio data may comprise a plurality of audio data segments, each audio data segment comprising data from a different temporal component of the continuous audio stream. The method may include making a determination of an identity of a person based, at least in part, on the audio characteristics. The plurality of audio data segments may comprise between 30 ms to 100 ms of recorded audio. Each temporal component of the continuous audio stream may be between 250 ms to 2s in length. The method may include randomly altering an order of the plurality of audio data segments before analyzing the subset of audio data. Randomly altering the order of the plurality of audio data segments may be based, at least in part, on information from one of: a Global Positioning System (GPS) device, signal noise from circuitry within a mobile device, signal noise from a microphone, and signal noise from an antenna.
  • In some embodiments, a device for obscuring privacy-sensitive audio is presented. The device may include a microphone. The device may include a processing unit communicatively coupled to the microphone. The processing unit may be configured to capture a subset of audio data contained in a continuous audio stream represented in a signal from the microphone. The continuous audio stream may contain human speech. The subset of audio data may obscure content of the human speech. The processing unit may be configured to analyze the subset of audio data for audio characteristics. The processing unit may be configured to make a determination of an ambient environment, based, at least in part, on the audio characteristics.
  • Embodiments of such a device may include one or more of the following: The subset of audio data may comprise a computed function of the continuous audio stream having a lesser number of bits than is needed to reproduce the continuous audio stream with intelligible fidelity. The subset of audio data may comprise a plurality of audio data segments, each audio data segment comprising data from a different temporal component of the continuous audio stream. The processing unit may be configured to make a determination of an identity of a person based, at least in part, on the audio characteristics. Each of the plurality of audio data segments may comprise between 30 ms to 100 ms of recorded audio. Each temporal component of the continuous audio stream may be between 250 ms to 2s in length. The device wherein the processing unit is further configured to randomly altering an order of the plurality of audio data segments before analyzing the subset of audio data. Randomly altering the order of the plurality of audio data segments may be based, at least in part, on information from one of: a Global Positioning System (GPS) device, signal noise from circuitry within a mobile device, signal noise from the microphone, and signal noise from an antenna.
  • In some embodiments, a system for determining an environment associated with a mobile device is presented. The system may include an audio sensor configured to receive a continuous audio stream. The system may include at least one processing unit coupled to the audio sensor. The processing unit may be configured to capture a subset of audio data contained in the continuous audio stream, such that the subset of audio data obscures content of human speech included in the continuous audio stream. The processing unit may be configured to analyze the subset of audio data for audio characteristics. The processing unit may be configured to make a determination of an ambient environment, based, at least in part, on the audio characteristics.
  • Embodiments of such a system may include one or more of the following: The system may include a network interface configured to send information representing the subset of audio data via a network to a location remote from the mobile device. The at least one processing unit may be configured to make the determination of the ambient environment at the location remote from the mobile device. The subset of audio data may comprise a plurality of audio data segments, each audio data segment comprising data from a different temporal component of the continuous audio stream.
  • The at least one processing unit may be configured to make a determination of an identity of a person based, at least in part, on the audio characteristics. Each of the plurality of audio data segments may comprise between 30 ms to 100 ms of recorded audio. Each temporal component of the continuous audio stream may be between 250 ms to 2s in length. The processing unit may be further configured to randomly alter an order of the plurality of audio data segments before analyzing the subset of audio data.
  • In some embodiments, a computer program product residing on a non-transitory processor-readable medium is presented. The non-transitory processor-readable medium includes processor-readable instructions configured to cause a processor to capture a subset of audio data contained in a continuous audio stream. The continuous audio stream may contains human speech. The subset of audio data may obscure content of the human speech. The processor-readable instructions may be configured to cause the processor to analyze the subset of audio data for audio characteristics. The processor-readable instructions may be configured to cause the processor to make a determination of an ambient environment, based, at least in part, on the audio characteristics.
  • Embodiments of such a computer program product may include one or more of the following: The subset of audio data may comprise a computed function of the continuous audio stream having a lesser number of bits than is needed to reproduce the continuous audio stream with intelligible fidelity. The subset of audio data may comprise a plurality of audio data segments, each audio data segment comprising data from a different temporal component of the continuous audio stream. The processor-readable instructions may be configured to cause the processor to make a determination of an identity of a person based, at least in part, on the audio characteristics. Each of the plurality of audio data segments may comprise between 30 ms to 100 ms of recorded audio. Each temporal component of the continuous audio stream may be between 250 ms to 2s in length. The processor-readable instructions may be configured to randomly alter an order of the plurality of audio data segments before analyzing the subset of audio data. The processor-readable instructions for randomly altering the order of the plurality of audio data segments is based, at least in part, on information from one of: a Global Positioning System (GPS) device, signal noise from circuitry within a mobile device, signal noise from a microphone, and signal noise from an antenna.
  • In some embodiments, a device for obscuring privacy-sensitive audio is presented. The device may include means for capturing a subset of audio data contained in a continuous audio stream represented in a signal from a microphone. The continuous audio stream may contain human speech. The subset of audio data may obscure content of the human speech. The device may include means for analyzing the subset of audio data for audio characteristics. The device may include means for determining an ambient environment, based, at least in part, on the audio characteristics.
  • Embodiments of such a device may include one or more of the following: The means for capturing the subset of audio data may be configured to capture the subset of audio data in accordance with a computed function of the continuous audio stream having a lesser number of bits than is needed to reproduce the continuous audio stream with intelligible fidelity. The means for capturing the subset of audio data may be configured to capture the subset of audio data such that the subset of audio data comprises a plurality of audio data segments, each audio data segment comprising data from a different temporal component of the continuous audio stream. The means for determining the ambient environment may be configured to make a determination of an identity of a person based, at least in part, on the audio characteristics. The means for capturing the subset of audio data may be configured to capture the subset of audio data such that each of the plurality of audio data segments comprises between 30 ms to 100 ms of recorded audio.
  • Items and/or techniques described herein may provide one or more of the following capabilities, as well as other capabilities not mentioned. Obscuring of the content of speech that may be included in an audio stream used for a context determination while having little or no impact on the accuracy of the context determination. Utilizing a relatively simple method that can be executed in real time, using minimal processing resources. Including an ability to upload a subset of audio data (having obscured speech) to help improve the accuracy of models used in context determinations. While at least one item/technique-effect pair has been described, it may be possible for a noted effect to be achieved by means other than that noted, and a noted item/technique may not necessarily yield the noted effect.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • An understanding of the nature and advantages of various embodiments may be facilitated by reference to the following figures. In the appended figures, similar components or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label by a dash and a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.
  • FIG. 1 is a simplified block diagram of basic components of a mobile device configured to support context awareness applications, according to one embodiment.
  • FIGS. 2 a-2 c are visualizations of processes for capturing sufficient audio information to classify the ambient environment of a mobile device without performance degradation, while helping ensure privacy of speech.
  • FIGS. 3 a and 3 b are flow diagrams of methods for providing the functionality shown in FIGS. 2 b and 2 c.
  • FIG. 4 is a graph illustrating results of an analysis computing an upper bound on the probability of a speech recognizer reconstructing n-grams of words, from audio data resulting from certain processing methods described herein.
  • DETAILED DESCRIPTION
  • The following description is provided with reference to the drawings, where like reference numerals are used to refer to like elements throughout. While various details of one or more techniques are described herein, other techniques are also possible. In some instances, well-known structures and devices are shown in block diagram form in order to facilitate describing various techniques.
  • Mobile devices, such as personal digital assistants (PDAs), mobile phones, tablet computers, and other personal electronics, can be enabled with context awareness applications. These context awareness applications can determine, for example, where a user of the mobile device is and what the user might be doing, among other things. Such context determinations can help enable a mobile device to provide additional functionality to a user, such as enter a car mode after determining the user is in a car, or entering a silent mode when determining the user has entered a movie theater.
  • Techniques are described herein for preserving privacy in speech that may be captured in audio used for context determinations of mobile devices. More particularly, a subset of audio data may be captured from a continuous audio stream that may contain speech, whereby the nature of the sampling obscures any speech that might be contained in the continuous audio stream. However, the nature of the sampling also preserves certain audio characteristics of the continuous audio stream such that a context determination—such as a determination regarding a particular ambient environment of a mobile device—suffers little or no reduction in accuracy. These and other techniques, are described in further detail below.
  • FIG. 1 is a is a simplified block diagram illustrating certain components of a mobile device 100 that can provide for context awareness, according to one embodiment. This diagram is an example and is not limiting. For example, the mobile device 100 may include additional components (e.g., user interface, antennas, display, etc.) omitted from FIG. 1 for simplicity. Additionally, the components shown may be combined, separated, or omitted, depending on the functionality of the mobile device 100.
  • In this embodiment, the mobile device 100 includes a mobile network interface 120. Such an interface can include hardware, software, and/or firmware for communicating with a mobile carrier. The mobile network interface 120 can utilize High Speed Packet Access (HSPA), Enhanced HSPA (HSPA+), 3GPP Long Term Evolution (LTE), and/or other standards for mobile communication. The mobile network interface 120 can also provide certain information, such as location data, that can be useful in context awareness applications.
  • Additionally, the mobile device 100 can include other wireless interface(s) 170. Such interfaces can include IEEE 802.11 (WiFi), Bluetooth®, and/or other wireless technologies. These wireless interface(s) 170 can provide information to the mobile device 100 that may be used in a context determination. For example, the wireless interface(s) 170 can provide information regarding location by determining the approximate location of a wireless network to which one or more of the wireless interface(s) 170 are connected. Additionally or alternatively, the wireless interface(s) 170 can enable the mobile device 100 to communicate with other devices, such as wireless headsets and/or microphones, which may provide information useful in determining a context of the mobile device 100.
  • The mobile device 100 also can include a global positioning system (GPS) unit 160, accelerometer(s) 130, and/or other sensor(s) 150. These additional features can provide information such as location, orientation, movement, temperature, proximity, etc. As with the wireless interface(s) 170, information from these components can help context awareness applications make a context determination regarding the context of the mobile device 100.
  • The mobile device 100 additionally can include an analysis/determination module(s) 110. Among other things, the analysis/determination module(s) 110 can receive sensor information from the various components to which it is communicatively coupled. The analysis/determination module(s) 110 also can execute software (including context awareness applications) stored on a memory 180, which can be separate from and/or integrated into the analysis/determination module(s) 110. Furthermore the analysis/determination module(s) 110 can comprise one or many processing devices, including a central processing unit (CPU), microprocessor, digital signal processor (DSP), and/or components that, among other things, have the means capable of analyzing audio data and making a determination based on the analysis.
  • Although information from wireless interfaces 170, GPS unit 160, accelerometer(s) 130, and/or other sensor(s) 150, can greatly assist in determining location when the user is outdoors, near identifiable WiFi or BlueTooth access points, walking, etc., these components have their limitations. In many scenarios they are less useful for determining environment and situation. For example, information from these components is less useful in distinguishing whether a user is in a meeting or in their office, or whether a user is in a grocery store or the gymnasium immediately next to it. In these scenarios and others, information from the audio capturing module 140 (e.g., microphone(s) and/or other audio capturing means) of the mobile device 100 can provide highly valuable audio data that can be used to help classify the environment, as well as determine whether there is speech present, whether there are multiple speakers present, the identity of a speaker, etc.
  • The process of capturing audio data by a mobile device 100 for a context determination can include temporarily and/or permanently storing audio data to the phone's memory 180. The capture of audio data that includes intelligible speech, however, can raise privacy issues. In fact, federal, state, and/or local laws may be implicated if the mobile device 100 captures speech from a user of the mobile device 100, or another person, without consent. These issues can be mitigated by using the hardware and/or software of the mobile device 100 to pre-process the audio data before it is captured such that speech (words, phrases and sentences) cannot be reliably reconstructed from the captured audio data. Moreover, the pre-processing can still allow determination of an ambient environment (e.g., from background noise) and/or other audio characteristics of the audio data, such as the presence of speech, music, typing sounds, etc.
  • FIG. 2 a is a visualization of a process for capturing sufficient audio information to classify a mobile device and/or user's situation/environment without performance degradation. Additionally the process can also help ensure that speech (words, phrases and sentences) cannot be reliably reconstructed from the captured information. This process involves reducing the dimensionality of an input audio stream. In other words, the bits (i.e., digital data) of an input stream of continuous audio are reduced such that the resultant audio stream has a lesser number of bits than is needed to reproduce the continuous audio stream with intelligible fidelity. Reducing the dimensionality therefore can be a computed function designed to ensure speech is irreproducible.
  • For example, a continuous audio stream can comprise a window 210 of audio data lasting Twindow seconds. The window 210 can be viewed as having a plurality of audio data segments. More specifically, the window 210 can comprise N temporal components, or blocks 220, where each block 220 lasts Tblock seconds and comprises a plurality of frames 230 of Tframe seconds each. A microphone signal can be sampled such that only one frame 230 (with Tframe seconds of data) is collected in every block of Tblock seconds.
  • The values of Tframe and Tblock can vary depending on desired functionality. In one embodiment, for example Tframe=50 ms and Tblock=500 ms, but these settings can vary substantially with little effect on the accuracy of a context determination that uses the resulting audio information 240-a. For example, Tframe can range from less than 30 ms to 100 ms or more, Tblock can range from less than 250 ms up to 2000 ms (2s) or more, and Twindow can be as short as a single block (e.g., one block per window), up to one minute or more. Different frame, block, and window lengths can impact the number of frames 230 per block 220 and the number of blocks 220 per window 210.
  • The capturing of frames 230 can be achieved in different ways. For example, the analysis/determination module(s) 110 can continuously sample the microphone signal during a window 210 of continuous audio, discarding (i.e., not storing) the unwanted frames 230. Thus, in the example above where Tframe=50 ms and Tblock=500 ms, the processing unit can simply discard 450 ms out of every 500 ms sampled. Additionally or alternatively, the analysis/determination module(s) 110 can turn the audio capturing module 140 off during the unwanted frames 230 (e.g., turning the audio capturing module 140 off for 450 ms out of every 500 ms), thereby collecting only the frames 230 that will be inserted into the resulting audio information 240-a used in a context determination.
  • The resulting audio information 240-a is a collection of frames 230 that comprises only a subset of the continuous audio stream in the window 210. Even so, this resulting audio information 240-a can include audio characteristics that can help enable a context determination, such as determining an ambient environment, with no significant impact on in the accuracy of the determination. Accordingly, the resulting audio information 240-a can be provided in real time to an application for context classification, and/or stored as one or more waveform(s) in memory 180 for later analysis and/or uploading to a server communicatively coupled to the mobile device 100.
  • FIGS. 2 b and 2 c are visualizations of processes for capturing audio information, similar to the process shown in FIG. 2 a. In FIGS. 2 b and 2 c, however, additional steps are taken to help ensure further privacy of any speech that may be captured.
  • Referring to FIG. 2 b, a visualization is provided illustrating how, for every window 210 of Twindow seconds, the first frames 230 of each block 220 can be captured. After the frame 230-1 of the final block 220 of the window 210 is captured, all the captured frames of the window 210 can be randomly permutated (i.e. randomly shuffled) to provide the resultant audio information 240-b. Thus, the resultant audio information 240-b is similar to the resulting audio information 240-a of FIG. 2 a, with the additional feature that the frames from which the resultant audio information 240-b is comprised are randomized, thereby further decreasing the likelihood that any speech that may be included in the resultant audio information 240-b could be reproduced with intelligible fidelity.
  • FIG. 2 c illustrates a process similar to the one shown in FIG. 2 b, but further randomizing the frame 230 captured for each block 220. More specifically, rather than capture the first frame 230 of each block 220 of a window 210 as shown in FIGS. 2 a and 2 b, the process shown in FIG. 2 c demonstrates that a random frame 230 from each block 220 can be selected instead. The randomization of both the capturing of frames 230 of a window 210 and the ordering of frames 230 in the resultant audio information 240-c, helps further ensure that any speech contained in a continuous audio stream within a window 210 is obscured and irreproducible.
  • The randomization used in processes shown in FIGS. 2 b and 2 c can be computed using a seed that is generated in numerous ways. For example, the seed may be based on GPS time provided by the GPS unit 160, noise from circuitry within the mobile device 100, noise (or other signal) from the audio capturing module 140, noise from an antenna, etc. Furthermore, the permutation can be discarded (e.g., not stored) to help ensure that the shuffling effect cannot be reversed.
  • The processes shown in FIGS. 2 a, 2 b, and 2 c are provided as examples and are not limiting. Other embodiments are contemplated. For example, the blocks 220 may be randomly permutated before frames 230 are captured. Alternatively, frames 230 can be captured randomly throughout the entire window 210, rather than capturing one frame 230 per block 220.
  • FIG. 3 a is a flow diagram illustrating an embodiment of a method 300-1 for providing the functionality shown in FIGS. 2 b and 2 c. The method 300-1 can begin at stage 310, where a block 220 of audio data from a continuous audio stream is received. The continuous audio stream can be, for example, audio within a window 210 of time to which the audio capturing module 140 of a mobile device 100 is exposed.
  • At stage 320, a frame 230 of the block 220 of audio data is captured. As discussed earlier, the frame 230 can be a predetermined frame (e.g. first frame) of each block 220 of audio data, or it can be randomly selected. The frame 230 is captured, for example, by being stored (either temporarily or permanently) in the memory 180 of a mobile device 100. As discussed previously, the capturing of a frame 230 can include turning a audio capturing module 140 on and off and/or sampling certain portions of a signal from a audio capturing module 140 representing a continuous audio stream.
  • At stage 330, it is determined whether there are additional blocks 220 in the current window 210. If so, the process of capturing a frame 230 from a block 220 is repeated. This can be repeated any number of times, depending on desired functionality. For example, where Tblock=500 ms and Twindow=10 seconds, the process of capturing a frame 230 will be repeated 20 times, resulting in 20 captured frames 230.
  • If frames 230 from all blocks 220 in the current window 210 have been captured, the process moves to stage 340, where the order of the captured frames are randomized These randomized frames can be stored, for example, in an audio file used for analysis by a context awareness application. Finally, at stage 350, a determination of the ambient environment (or other context determination) is made, based, at least in part, on audio characteristics of the randomized frames.
  • Different stages of the method 300-1 may be performed by one or more different components of the mobile device 100 and/or other systems communicatively coupled with the mobile device 100. Moreover, stages can be performed by any combination of hardware, software, and/or firmware. For example, to help ensure that an entire audio stream (e.g., an audio stream that may have recognizable speech) is inaccessible to software applications executed by the mobile device 100, certain stages, such as stages 320-340 can be performed by hardware (such as the analysis/determination module(s) 110), randomizing captured frames, for instance, on a buffer before storing them on the memory 180 and/or providing them to a software application. Additionally or alternatively, some embodiments may enable certain parameters (e.g., Twindow, Tblock, and/or Tframe) to be at least partially configurable by software.
  • In yet other embodiments, a mobile device 100 may upload the resultant audio information 240 including the captured frames to a remote server. In this case, the remote server can make the determination of the ambient environment of stage 350. Alternatively, the mobile device 100 can upload the resultant audio information 240 along with a determination of the ambient environment made by the mobile device 100. In either case, the remote server can use the determination and the resultant audio information 240 to modify existing models used to make ambient environment determinations. This enables the server to maintain models that are able to “learn” from input received by mobile devices 100. Modified and/or updated models then can be downloaded to mobile devices 100 to help improve the accuracy of ambient environment determinations made by the mobile devices 100. Thus, ambient environment determinations (or other contextual determinations) can be continually improved.
  • As indicated above, the techniques described herein can allow determination of not only an ambient environment and/or other contextual determinations, but other audio characteristics of the audio data as well. These audio characteristics can include the presence of speech, music, typing sounds, and more. Depending on the audio characteristics include, different determinations may be made.
  • FIG. 3 b a flow diagram illustrating an example of a method 300-1, which includes stages similar to the method 300-1 of FIG. 3. The method 300-2 of FIG. 3 b, however, includes an additional stage 360 where a determination is made regarding the identity of speaker(s) whose speech is included in the captured frames used to made a determination of an ambient environment. As with stage 350, the determination of stage 360 can be made by the mobile device 100 and/or a remote server to which the captured frames are uploaded. Additionally, the determination regarding identity can include the use of other information and/or models, such as models to help determine the age, gender, etc. of the speaker and, stored information regarding audio characteristics of a particular person's speech, and other data.
  • Listening to captured audio files generated by the processes discussed above clearly demonstrates that words cannot be reliably reconstructed from this scheme. However, this notion can be demonstrated mathematically by performing an analysis to compute an upper bound on the probability of a speech recognizer reconstructing an n-grams of words, where an n-gram of words is a collection of n consecutive words, given the collected audio data from publicly-available sources for developing commercial speech recognizers.
  • FIG. 4 is a graph illustrating the results of such an analysis, showing the upper bounds on probability of correctly guessing an n-gram given collected audio. Results are shown for correctly reconstructing a 1-gram 410 and 2-gram 420 where Tframe=50 ms, for variable lengths of Tblock. The probability of reconstructing an n-gram intuitively decreases with increasing n. This can be seen from FIG. 4 where, for Tblock=500 ms, the probability of correctly reconstructing a 1-gram 410 is 14%, while the probability of correctly reconstructing a 2-gram 420 is 8%. (It should be noted that this analysis does not include the permutation of the frames discussed herein, which can obscure language even further, reducing probability by roughly a factor of (Twindow/Tblock) factorial.)
  • Despite the reduced probabilities of reconstructing speech, the techniques discussed herein have no significant impact on the ability of classifiers (e.g., probabilistic classifiers used in context awareness applications) to discern the environment of the user. This is demonstrated in Table 1, which shows the precision and recall of a context awareness classifier, with statistical models having one mixture component and two mixture components, where Tframe=50 ms and Tblock is variable. The data used was a commercially acquired audio data set of environmental sounds of a set of environments (e.g., in a park, on a street, in a market, in a car, in an airport, etc.) common among context awareness applications.
  • TABLE 1
    1 mixture component 2 mixture components
    Precision Recall Precision Recall
    Tblock (%) (%) (%) (%)
     50 ms 47.2 47.4 49.4 46.2
    250 ms 48.2 47.5 48.6 42.7
    500 ms 48.7 48.7 48.6 40.7
     1 s 48.0 45.8 43.9 33.3
     2 s 38.0 39.4 43.8 27.4
  • Because Tframe=50 ms, the precision and recall shown in Table 1 for Tblock=50 ms is continuous audio. Table 1, thus indicates how reducing the dimensionality of the audio data by sampling only subsets of a continuous audio stream can have little impact on the accuracy of the classifier's determination of an ambient environment until Tblock approaches 2 seconds (i.e., the microphone is on for only 50 ms for every 2 seconds, or 2.5% of the time). Results may be different for different classifiers.
  • The methods, systems, devices, graphs, and tables discussed above are examples. Various configurations may omit, substitute, or add various procedures or components as appropriate. For instance, in alternative configurations, the methods may be performed in an order different from that described, and/or various stages may be added, omitted, and/or combined. Also, features described with respect to certain configurations may be combined in various other configurations. Different aspects and elements of the configurations may be combined in a similar manner. Also, technology evolves and, thus, many of the elements are examples and do not limit the scope of the disclosure or claims. Additionally, the techniques discussed herein may provide differing results with different types of context awareness classifiers.
  • Specific details are given in the description to provide a thorough understanding of example embodiments (including implementations). However, embodiments may be practiced without these specific details. For example, well-known circuits, processes, algorithms, structures, and techniques have been shown without unnecessary detail in order to avoid obscuring the configurations. This description provides example configurations only, and does not limit the scope, applicability, or configurations of the claims. Rather, the preceding description of the configurations will provide those skilled in the art with an enabling description for implementing described techniques. Various changes may be made in the function and arrangement of elements without departing from the spirit or scope of the disclosure.
  • Also, configurations may be described as a process which is depicted as a flow diagram or block diagram. Although each may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged. A process may have additional steps not included in the figure.
  • Computer programs incorporating various features of the present invention may be encoded on various non-transitory computer-readable and/or non-transitory processor-readable storage media; suitable media include magnetic media, optical media, flash memory, and other non-transitory media. Non-transitory processor-readable storage media encoded with the program code may be packaged with a compatible device or provided separately from other devices. In addition program code may be encoded and transmitted via wired optical, and/or wireless networks conforming to a variety of protocols, including the Internet, thereby allowing distribution, e.g., via Internet download.
  • Having described several example configurations, various modifications, alternative constructions, and equivalents may be used without departing from the spirit of the disclosure. For example, the above elements may be components of a larger system, wherein other rules may take precedence over or otherwise modify the application of the invention. Also, a number of steps may be undertaken before, during, or after the above elements are considered. Accordingly, the above description does not bound the scope of the claims.

Claims (31)

1. (canceled)
2. A method for performing an audio analysis, the method comprising:
receiving, by a computerized device, a continuous audio stream;
capturing, by the computerized device, from the continuous audio stream, a plurality of audio frames from a plurality of audio blocks of the continuous audio stream, wherein:
each audio block of the plurality of audio blocks includes multiple audio frames; and
a single audio frame is captured from each audio block of the plurality of audio blocks;
analyzing, by the computerized device, the plurality of audio frames; and
determining, based on analyzing the plurality of audio frames, a characteristic of an ambient environment of the continuous audio stream.
3. The method for performing the audio analysis of claim 2, wherein the continuous audio stream comprises human speech.
4. The method for performing the audio analysis of claim 3, the method further comprising:
determining, by the computerized device, based on analyzing the plurality of audio frames, an identity of a speaker of the human speech.
5. The method for performing the audio analysis of claim 2, the method further comprising:
shuffling, by the computerized device, the plurality of audio frames into a shuffled order, wherein analyzing the plurality of audio frames comprises analyzing the plurality of audio frames in the shuffled order.
6. The method for performing the audio analysis of claim 2, the method further comprising:
for each audio frame, randomizing, by the computerized device, selection of the audio frame from the multiple audio frames present within the corresponding audio block of the plurality of audio blocks.
7. The method for performing the audio analysis of claim 6, wherein randomizing selection of the audio frame is based, at least in part, on information selected from a source of the group comprising:
a global navigation satellite system (GNSS) device,
signal noise from circuitry within a mobile device,
signal noise from a microphone, and
signal noise from an antenna.
8. The method for performing the audio analysis of claim 2, further comprising:
uploading, by the computerized device, the plurality of audio frames to a remote server system, wherein determining, based on analyzing the plurality of audio frames, the characteristic of the ambient environment of the continuous audio stream is performed by the remote server system.
9. The method for performing the audio analysis of claim 2, wherein receiving the continuous audio stream occurs via a microphone of the computerized device and the computerized device is a cellular phone.
10. A system for performing an audio analysis, the system comprising:
one or more processors; and
a memory communicatively coupled with and readable by the one or more processors and having stored therein processor-readable instructions which, when executed by the one or more processors, cause the one or more processors to:
capture, from a continuous audio stream, a plurality of audio frames from a plurality of audio blocks of the continuous audio stream, wherein:
each audio block of the plurality of audio blocks includes multiple audio frames; and
a single audio frame is captured from each audio block of the plurality of audio blocks;
analyze the plurality of audio frames; and
determine, based on analyzing the plurality of audio frames, a characteristic of an ambient environment of the continuous audio stream.
11. The system for performing the audio analysis of claim 10, wherein the continuous audio stream captured by the processor comprises human speech.
12. The system for performing the audio analysis of claim 11, wherein the processor-readable instructions, when executed, further cause the one or more processors to:
determine, based on analyzing the plurality of audio frames, an identity of a speaker of the human speech.
13. The system for performing the audio analysis of claim 10, wherein the processor-readable instructions, when executed, further cause the one or more processors to:
shuffle the plurality of audio frames into a shuffled order, wherein analyzing the plurality of audio frames comprises analyzing the plurality of audio frames in the shuffled order.
14. The system for performing the audio analysis of claim 10, wherein the processor-readable instructions, when executed, further cause the one or more processors to:
for each audio frame, randomize selection of the audio frame from the multiple audio frames present within the corresponding audio block of the plurality of audio blocks.
15. The system for performing the audio analysis of claim 14, wherein the processor-readable instructions that, when executed, cause the one or more processors to randomize selection of the audio frame bases the randomization, at least in part, on information selected from a source of the group comprising:
a global navigation satellite system (GNSS) device,
signal noise from circuitry within a mobile device,
signal noise from a microphone, and
signal noise from an antenna.
16. The system for performing the audio analysis of claim 10, wherein the system is implemented as part of a cellular phone comprising a microphone.
17. A non-transitory processor-readable medium for performing an audio analysis, comprising processor-readable instructions configured to cause one or more processors to:
capture, from a continuous audio stream, a plurality of audio frames from a plurality of audio blocks of the continuous audio stream, wherein:
each audio block of the plurality of audio blocks includes multiple audio frames; and
a single audio frame is captured from each audio block of the plurality of audio blocks;
analyze the plurality of audio frames; and
determine, based on analyzing the plurality of audio frames, a characteristic of an ambient environment of the continuous audio stream.
18. The non-transitory processor-readable medium for performing the audio analysis of claim 17, wherein the continuous audio stream captured by the processor comprises human speech.
19. The non-transitory processor-readable medium for performing the audio analysis of claim 18, wherein the processor-readable instructions are further configured to cause the one or more processors to:
determine, based on analyzing the plurality of audio frames, an identity of a speaker of the human speech.
20. The non-transitory processor-readable medium for performing the audio analysis of claim 17, wherein the processor-readable instructions are further configured to cause the one or more processors to:
shuffle the plurality of audio frames into a shuffled order, wherein analyzing the plurality of audio frames comprises analyzing the plurality of audio frames in the shuffled order.
21. The non-transitory processor-readable medium for performing the audio analysis of claim 17, wherein the processor-readable instructions are further configured to cause the one or more processors to:
for each audio frame, randomize selection of the audio frame from the multiple audio frames present within the corresponding audio block of the plurality of audio blocks.
22. The non-transitory processor-readable medium for performing the audio analysis of claim 21, wherein the processor-readable instructions configured to cause the one or more processors to randomize selection of the audio frame bases the randomization, at least in part, on information selected from a source of the group comprising:
a global navigation satellite non-transitory processor-readable medium (GNSS) device,
signal noise from circuitry within a mobile device,
signal noise from a microphone, and
signal noise from an antenna.
23. The non-transitory processor-readable medium for performing the audio analysis of claim 17, wherein the non-transitory processor-readable medium is implemented as part of a cellular phone comprising a microphone.
24. An apparatus for performing an audio analysis, the apparatus comprising:
means for receiving a continuous audio stream;
means for capturing from the continuous audio stream, a plurality of audio frames from a plurality of audio blocks of the continuous audio stream, wherein:
each audio block of the plurality of audio blocks includes multiple audio frames; and
a single audio frame is captured from each audio block of the plurality of audio blocks;
means for analyzing the plurality of audio frames; and
means for determining, based on analyzing the plurality of audio frames, a characteristic of an ambient environment of the continuous audio stream.
25. The apparatus for performing the audio analysis of claim 24, wherein the continuous audio stream comprises human speech.
26. The apparatus for performing the audio analysis of claim 25, the apparatus further comprising:
means for determining, based on analyzing the plurality of audio frames, an identity of a speaker of the human speech.
27. The apparatus for performing the audio analysis of claim 24, the apparatus further comprising:
means for shuffling the plurality of audio frames into a shuffled order, wherein analyzing the plurality of audio frames comprises analyzing the plurality of audio frames in the shuffled order.
28. The apparatus for performing the audio analysis of claim 24, the apparatus further comprising:
means for randomizing, for each frame, selection of the audio frame from the multiple audio frames present within the corresponding audio block of the plurality of audio blocks.
29. The apparatus for performing the audio analysis of claim 28, wherein the means for randomizing selection of the audio frame bases randomization, at least in part, on information selected from a source of the group consisting of:
a global navigation satellite system (GNSS) device,
signal noise from circuitry within a mobile device,
signal noise from a microphone, and
signal noise from an antenna.
30. The apparatus for performing the audio analysis of claim 24, further comprising:
means for uploading the plurality of audio frames to a remote server system, wherein the means for determining, based on analyzing the plurality of audio frames, the characteristic of the ambient environment of the continuous audio stream is present at the remote server system.
31. The apparatus for performing the audio analysis of claim 24, wherein the apparatus is integrated as part of a cellular phone.
US14/186,730 2011-05-23 2014-02-21 Preserving audio data collection privacy in mobile devices Abandoned US20140172424A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/186,730 US20140172424A1 (en) 2011-05-23 2014-02-21 Preserving audio data collection privacy in mobile devices

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201161488927P 2011-05-23 2011-05-23
US13/213,294 US8700406B2 (en) 2011-05-23 2011-08-19 Preserving audio data collection privacy in mobile devices
US14/186,730 US20140172424A1 (en) 2011-05-23 2014-02-21 Preserving audio data collection privacy in mobile devices

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/213,294 Continuation US8700406B2 (en) 2011-05-23 2011-08-19 Preserving audio data collection privacy in mobile devices

Publications (1)

Publication Number Publication Date
US20140172424A1 true US20140172424A1 (en) 2014-06-19

Family

ID=46178795

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/213,294 Active 2032-03-09 US8700406B2 (en) 2011-05-23 2011-08-19 Preserving audio data collection privacy in mobile devices
US14/186,730 Abandoned US20140172424A1 (en) 2011-05-23 2014-02-21 Preserving audio data collection privacy in mobile devices

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/213,294 Active 2032-03-09 US8700406B2 (en) 2011-05-23 2011-08-19 Preserving audio data collection privacy in mobile devices

Country Status (6)

Country Link
US (2) US8700406B2 (en)
EP (1) EP2715722B1 (en)
JP (1) JP5937202B2 (en)
KR (1) KR101580510B1 (en)
CN (1) CN103620680B (en)
WO (1) WO2012162009A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140278391A1 (en) * 2013-03-12 2014-09-18 Intermec Ip Corp. Apparatus and method to classify sound to detect speech
US10540521B2 (en) 2017-08-24 2020-01-21 International Business Machines Corporation Selective enforcement of privacy and confidentiality for optimization of voice applications
WO2021107218A1 (en) * 2019-11-29 2021-06-03 주식회사 공훈 Method and device for protecting privacy of voice data

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130090926A1 (en) * 2011-09-16 2013-04-11 Qualcomm Incorporated Mobile device context information using speech detection
EP2590440B1 (en) * 2011-09-30 2019-10-30 Orange Method, apparatuses and application for the contextual obscuring attributes of a user profile
US8925037B2 (en) * 2013-01-02 2014-12-30 Symantec Corporation Systems and methods for enforcing data-loss-prevention policies using mobile sensors
US9300266B2 (en) * 2013-02-12 2016-03-29 Qualcomm Incorporated Speaker equalization for mobile devices
KR102149266B1 (en) * 2013-05-21 2020-08-28 삼성전자 주식회사 Method and apparatus for managing audio data in electronic device
US9305317B2 (en) 2013-10-24 2016-04-05 Tourmaline Labs, Inc. Systems and methods for collecting and transmitting telematics data from a mobile device
US10057764B2 (en) * 2014-01-18 2018-08-21 Microsoft Technology Licensing, Llc Privacy preserving sensor apparatus
JP6215129B2 (en) * 2014-04-25 2017-10-18 京セラ株式会社 Portable electronic device, control method and control program
US10404697B1 (en) 2015-12-28 2019-09-03 Symantec Corporation Systems and methods for using vehicles as information sources for knowledge-based authentication
US10326733B2 (en) 2015-12-30 2019-06-18 Symantec Corporation Systems and methods for facilitating single sign-on for multiple devices
US10116513B1 (en) 2016-02-10 2018-10-30 Symantec Corporation Systems and methods for managing smart building systems
US10375114B1 (en) 2016-06-27 2019-08-06 Symantec Corporation Systems and methods for enforcing access-control policies
US10462184B1 (en) 2016-06-28 2019-10-29 Symantec Corporation Systems and methods for enforcing access-control policies in an arbitrary physical space
US10469457B1 (en) 2016-09-26 2019-11-05 Symantec Corporation Systems and methods for securely sharing cloud-service credentials within a network of computing devices
US10812981B1 (en) 2017-03-22 2020-10-20 NortonLifeLock, Inc. Systems and methods for certifying geolocation coordinates of computing devices
GB2567703B (en) * 2017-10-20 2022-07-13 Cirrus Logic Int Semiconductor Ltd Secure voice biometric authentication
DE102019108178B3 (en) * 2019-03-29 2020-06-18 Tribe Technologies Gmbh Method and device for automatic monitoring of telephone calls
US11354085B2 (en) 2019-07-03 2022-06-07 Qualcomm Incorporated Privacy zoning and authorization for audio rendering
KR20210100368A (en) 2020-02-06 2021-08-17 삼성전자주식회사 Electronice device and control method thereof

Citations (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4221931A (en) * 1977-10-17 1980-09-09 Harris Corporation Time division multiplied speech scrambler
US4600941A (en) * 1982-12-17 1986-07-15 Sony Corporation Scrambling system for audio frequency signals
US5267312A (en) * 1990-08-06 1993-11-30 Nec Home Electronics, Ltd. Audio signal cryptographic system
US5583888A (en) * 1993-09-13 1996-12-10 Nec Corporation Vector quantization of a time sequential signal by quantizing an error between subframe and interpolated feature vectors
US6018706A (en) * 1996-01-26 2000-01-25 Motorola, Inc. Pitch determiner for a speech analyzer
US6078666A (en) * 1996-10-25 2000-06-20 Matsushita Electric Industrial Co., Ltd. Audio signal processing method and related device with block order switching
US6119086A (en) * 1998-04-28 2000-09-12 International Business Machines Corporation Speech coding via speech recognition and synthesis based on pre-enrolled phonetic tokens
US20020099955A1 (en) * 2001-01-23 2002-07-25 Vidius Inc. Method for securing digital content
US20030138100A1 (en) * 2001-04-09 2003-07-24 Toshihiro Ishizaka Recording apparatus, recording method, recording medium, and program for recording information protecting intangible property right
US20040059918A1 (en) * 2000-12-15 2004-03-25 Changsheng Xu Method and system of digital watermarking for compressed audio
US20050180311A1 (en) * 2004-02-17 2005-08-18 Nokia Corporation OFDM transceiver structure with time-domain scrambling
US6937730B1 (en) * 2000-02-16 2005-08-30 Intel Corporation Method and system for providing content-specific conditional access to digital content
US6978235B1 (en) * 1998-05-11 2005-12-20 Nec Corporation Speech coding apparatus and speech decoding apparatus
US20050289063A1 (en) * 2002-10-21 2005-12-29 Medialive, A Corporation Of France Adaptive and progressive scrambling of audio streams
US20060143018A1 (en) * 2002-09-06 2006-06-29 Densham Rodney H Processing digital data
US20060167682A1 (en) * 2002-10-21 2006-07-27 Medialive Adaptive and progressive audio stream descrambling
US7177808B2 (en) * 2000-11-29 2007-02-13 The United States Of America As Represented By The Secretary Of The Air Force Method for improving speaker identification by determining usable speech
US20070110237A1 (en) * 2005-07-07 2007-05-17 Verance Corporation Watermarking in an encrypted domain
US20070174059A1 (en) * 1996-05-16 2007-07-26 Rhoads Geoffrey B Methods, Systems, and Sub-Combinations Useful in Media Identification
US7263489B2 (en) * 1998-12-01 2007-08-28 Nuance Communications, Inc. Detection of characteristics of human-machine interactions for dialog customization and analysis
US20070276661A1 (en) * 2006-04-24 2007-11-29 Ivan Dimkovic Apparatus and Methods for Encoding Digital Audio Data with a Reduced Bit Rate
US20080215315A1 (en) * 2007-02-20 2008-09-04 Alexander Topchy Methods and appratus for characterizing media
US20080222734A1 (en) * 2000-11-13 2008-09-11 Redlich Ron M Security System with Extraction, Reconstruction and Secure Recovery and Storage of Data
US20080223627A1 (en) * 2005-10-19 2008-09-18 Immersion Corporation, A Delaware Corporation Synchronization of haptic effect data in a media transport stream
US20080243492A1 (en) * 2006-09-07 2008-10-02 Yamaha Corporation Voice-scrambling-signal creation method and apparatus, and computer-readable storage medium therefor
US20080293397A1 (en) * 2005-05-16 2008-11-27 Sony Ericsson Mobile Communications Ab Method for Disabling a Mobile Device
US20090003600A1 (en) * 2007-06-29 2009-01-01 Widevine Technologies, Inc. Progressive download or streaming of digital media securely through a localized container and communication protocol proxy
US20090041235A1 (en) * 1998-08-20 2009-02-12 Akikaze Technologies, Llc Secure Information Distribution System Utilizing Information Segment Scrambling
US20090103728A1 (en) * 2007-10-09 2009-04-23 Sarvar Patel Secure wireless communication
US20090125304A1 (en) * 2007-11-13 2009-05-14 Samsung Electronics Co., Ltd Method and apparatus to detect voice activity
US20090307779A1 (en) * 2006-06-28 2009-12-10 Hyperquality, Inc. Selective Security Masking within Recorded Speech
US20100063803A1 (en) * 2008-09-06 2010-03-11 GH Innovation, Inc. Spectrum Harmonic/Noise Sharpness Control
US20100114344A1 (en) * 2008-10-31 2010-05-06 France Telecom Communication system incorporating ambient sound pattern detection and method of operation thereof
US20100114568A1 (en) * 2008-10-24 2010-05-06 Lg Electronics Inc. Apparatus for processing an audio signal and method thereof
US20100121636A1 (en) * 2008-11-10 2010-05-13 Google Inc. Multisensory Speech Detection
US7720012B1 (en) * 2004-07-09 2010-05-18 Arrowhead Center, Inc. Speaker identification in the presence of packet losses
US20100277579A1 (en) * 2009-04-30 2010-11-04 Samsung Electronics Co., Ltd. Apparatus and method for detecting voice based on motion information
US20110035034A1 (en) * 2006-01-06 2011-02-10 Google Inc. Serving Media Articles with Altered Playback Speed
US20110077946A1 (en) * 2009-09-30 2011-03-31 International Business Machines Corporation Deriving geographic distribution of physiological or psychological conditions of human speakers while preserving personal privacy
US20110208507A1 (en) * 2010-02-19 2011-08-25 Google Inc. Speech Correction for Typed Input
US20110218798A1 (en) * 2010-03-05 2011-09-08 Nexdia Inc. Obfuscating sensitive content in audio sources
US20110216905A1 (en) * 2010-03-05 2011-09-08 Nexidia Inc. Channel compression
US20110224986A1 (en) * 2008-07-21 2011-09-15 Clive Summerfield Voice authentication systems and methods
US8050931B2 (en) * 2007-03-22 2011-11-01 Yamaha Corporation Sound masking system and masking sound generation method
US20120084089A1 (en) * 2010-09-30 2012-04-05 Google Inc. Progressive encoding of audio
US20120136658A1 (en) * 2010-11-30 2012-05-31 Cox Communications, Inc. Systems and methods for customizing broadband content based upon passive presence detection of users
US20120173880A1 (en) * 2010-12-29 2012-07-05 Viswanathan Swaminathan System And Method For Decrypting Content Samples Including Distinct Encryption Chains
US20120203491A1 (en) * 2011-02-03 2012-08-09 Nokia Corporation Method and apparatus for providing context-aware control of sensors and sensor data
US8244531B2 (en) * 2008-09-28 2012-08-14 Avaya Inc. Method of retaining a media stream without its private audio content
US20120245941A1 (en) * 2011-03-21 2012-09-27 Cheyer Adam J Device Access Using Voice Authentication
US20120310645A1 (en) * 2010-01-26 2012-12-06 Google Inc. Integration of embedded and network speech recognizers
US20130035893A1 (en) * 2011-03-31 2013-02-07 Qualcomm Incorporated Methods, devices, and apparatuses for activity classification using temporal scaling of time-referenced features
US8428272B2 (en) * 2009-02-19 2013-04-23 Yamaha Corporation Masking sound generating apparatus, masking system, masking sound generating method, and program
US8520843B2 (en) * 2001-08-07 2013-08-27 Fraunhofer-Gesellscaft zur Foerderung der Angewandten Forschung E.V. Method and apparatus for encrypting a discrete signal, and method and apparatus for decrypting
US20140064484A1 (en) * 1998-03-16 2014-03-06 Intertrust Technologies Corporation Methods and apparatus for persistent control and protection of content
US8861742B2 (en) * 2010-01-26 2014-10-14 Yamaha Corporation Masker sound generation apparatus and program

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7143028B2 (en) * 2002-07-24 2006-11-28 Applied Minds, Inc. Method and system for masking speech
JP4206876B2 (en) * 2003-09-10 2009-01-14 ヤマハ株式会社 Communication device and program for transmitting state of remote place
JP2006238110A (en) * 2005-02-25 2006-09-07 Matsushita Electric Ind Co Ltd Monitoring system
JP4914319B2 (en) * 2007-09-18 2012-04-11 日本電信電話株式会社 COMMUNICATION VOICE PROCESSING METHOD, DEVICE THEREOF, AND PROGRAM THEREOF
US8140326B2 (en) * 2008-06-06 2012-03-20 Fuji Xerox Co., Ltd. Systems and methods for reducing speech intelligibility while preserving environmental sounds
JP5222680B2 (en) * 2008-09-26 2013-06-26 セコム株式会社 Terminal user monitoring apparatus and system
US9159324B2 (en) * 2011-07-01 2015-10-13 Qualcomm Incorporated Identifying people that are proximate to a mobile device user via social graphs, speech models, and user context
US20130006633A1 (en) * 2011-07-01 2013-01-03 Qualcomm Incorporated Learning speech models for mobile device users
US20130090926A1 (en) * 2011-09-16 2013-04-11 Qualcomm Incorporated Mobile device context information using speech detection

Patent Citations (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4221931A (en) * 1977-10-17 1980-09-09 Harris Corporation Time division multiplied speech scrambler
US4600941A (en) * 1982-12-17 1986-07-15 Sony Corporation Scrambling system for audio frequency signals
US5267312A (en) * 1990-08-06 1993-11-30 Nec Home Electronics, Ltd. Audio signal cryptographic system
US5583888A (en) * 1993-09-13 1996-12-10 Nec Corporation Vector quantization of a time sequential signal by quantizing an error between subframe and interpolated feature vectors
US6018706A (en) * 1996-01-26 2000-01-25 Motorola, Inc. Pitch determiner for a speech analyzer
US20070174059A1 (en) * 1996-05-16 2007-07-26 Rhoads Geoffrey B Methods, Systems, and Sub-Combinations Useful in Media Identification
US6078666A (en) * 1996-10-25 2000-06-20 Matsushita Electric Industrial Co., Ltd. Audio signal processing method and related device with block order switching
US20140064484A1 (en) * 1998-03-16 2014-03-06 Intertrust Technologies Corporation Methods and apparatus for persistent control and protection of content
US6119086A (en) * 1998-04-28 2000-09-12 International Business Machines Corporation Speech coding via speech recognition and synthesis based on pre-enrolled phonetic tokens
US6978235B1 (en) * 1998-05-11 2005-12-20 Nec Corporation Speech coding apparatus and speech decoding apparatus
US20090041235A1 (en) * 1998-08-20 2009-02-12 Akikaze Technologies, Llc Secure Information Distribution System Utilizing Information Segment Scrambling
US7263489B2 (en) * 1998-12-01 2007-08-28 Nuance Communications, Inc. Detection of characteristics of human-machine interactions for dialog customization and analysis
US6937730B1 (en) * 2000-02-16 2005-08-30 Intel Corporation Method and system for providing content-specific conditional access to digital content
US20080222734A1 (en) * 2000-11-13 2008-09-11 Redlich Ron M Security System with Extraction, Reconstruction and Secure Recovery and Storage of Data
US7177808B2 (en) * 2000-11-29 2007-02-13 The United States Of America As Represented By The Secretary Of The Air Force Method for improving speaker identification by determining usable speech
US20040059918A1 (en) * 2000-12-15 2004-03-25 Changsheng Xu Method and system of digital watermarking for compressed audio
US20020099955A1 (en) * 2001-01-23 2002-07-25 Vidius Inc. Method for securing digital content
US20070180534A1 (en) * 2001-04-09 2007-08-02 Toshihiro Ishizaka Recording apparatus that records information for protecting intangible property right, recording method thereof, record medium thereof, and program thereof
US20030138100A1 (en) * 2001-04-09 2003-07-24 Toshihiro Ishizaka Recording apparatus, recording method, recording medium, and program for recording information protecting intangible property right
US8520843B2 (en) * 2001-08-07 2013-08-27 Fraunhofer-Gesellscaft zur Foerderung der Angewandten Forschung E.V. Method and apparatus for encrypting a discrete signal, and method and apparatus for decrypting
US20060143018A1 (en) * 2002-09-06 2006-06-29 Densham Rodney H Processing digital data
US20050289063A1 (en) * 2002-10-21 2005-12-29 Medialive, A Corporation Of France Adaptive and progressive scrambling of audio streams
US20060167682A1 (en) * 2002-10-21 2006-07-27 Medialive Adaptive and progressive audio stream descrambling
US20050180311A1 (en) * 2004-02-17 2005-08-18 Nokia Corporation OFDM transceiver structure with time-domain scrambling
US7720012B1 (en) * 2004-07-09 2010-05-18 Arrowhead Center, Inc. Speaker identification in the presence of packet losses
US20080293397A1 (en) * 2005-05-16 2008-11-27 Sony Ericsson Mobile Communications Ab Method for Disabling a Mobile Device
US20070110237A1 (en) * 2005-07-07 2007-05-17 Verance Corporation Watermarking in an encrypted domain
US20080223627A1 (en) * 2005-10-19 2008-09-18 Immersion Corporation, A Delaware Corporation Synchronization of haptic effect data in a media transport stream
US20140226068A1 (en) * 2005-10-19 2014-08-14 Immersion Corporation Synchronization of haptic effect data in a media transport stream
US20110035034A1 (en) * 2006-01-06 2011-02-10 Google Inc. Serving Media Articles with Altered Playback Speed
US20070276661A1 (en) * 2006-04-24 2007-11-29 Ivan Dimkovic Apparatus and Methods for Encoding Digital Audio Data with a Reduced Bit Rate
US20090307779A1 (en) * 2006-06-28 2009-12-10 Hyperquality, Inc. Selective Security Masking within Recorded Speech
US20080243492A1 (en) * 2006-09-07 2008-10-02 Yamaha Corporation Voice-scrambling-signal creation method and apparatus, and computer-readable storage medium therefor
US20080215315A1 (en) * 2007-02-20 2008-09-04 Alexander Topchy Methods and appratus for characterizing media
US8050931B2 (en) * 2007-03-22 2011-11-01 Yamaha Corporation Sound masking system and masking sound generation method
US20090003600A1 (en) * 2007-06-29 2009-01-01 Widevine Technologies, Inc. Progressive download or streaming of digital media securely through a localized container and communication protocol proxy
US20090103728A1 (en) * 2007-10-09 2009-04-23 Sarvar Patel Secure wireless communication
US20090125304A1 (en) * 2007-11-13 2009-05-14 Samsung Electronics Co., Ltd Method and apparatus to detect voice activity
US20110224986A1 (en) * 2008-07-21 2011-09-15 Clive Summerfield Voice authentication systems and methods
US20100063803A1 (en) * 2008-09-06 2010-03-11 GH Innovation, Inc. Spectrum Harmonic/Noise Sharpness Control
US8244531B2 (en) * 2008-09-28 2012-08-14 Avaya Inc. Method of retaining a media stream without its private audio content
US20100114568A1 (en) * 2008-10-24 2010-05-06 Lg Electronics Inc. Apparatus for processing an audio signal and method thereof
US20100114344A1 (en) * 2008-10-31 2010-05-06 France Telecom Communication system incorporating ambient sound pattern detection and method of operation thereof
WO2010054373A2 (en) * 2008-11-10 2010-05-14 Google Inc. Multisensory speech detection
US20100121636A1 (en) * 2008-11-10 2010-05-13 Google Inc. Multisensory Speech Detection
US20130013316A1 (en) * 2008-11-10 2013-01-10 Google Inc. Multisensory Speech Detection
US20120278074A1 (en) * 2008-11-10 2012-11-01 Google Inc. Multisensory speech detection
US8428272B2 (en) * 2009-02-19 2013-04-23 Yamaha Corporation Masking sound generating apparatus, masking system, masking sound generating method, and program
US20100277579A1 (en) * 2009-04-30 2010-11-04 Samsung Electronics Co., Ltd. Apparatus and method for detecting voice based on motion information
US20110077946A1 (en) * 2009-09-30 2011-03-31 International Business Machines Corporation Deriving geographic distribution of physiological or psychological conditions of human speakers while preserving personal privacy
US20120310645A1 (en) * 2010-01-26 2012-12-06 Google Inc. Integration of embedded and network speech recognizers
US8861742B2 (en) * 2010-01-26 2014-10-14 Yamaha Corporation Masker sound generation apparatus and program
US20110208507A1 (en) * 2010-02-19 2011-08-25 Google Inc. Speech Correction for Typed Input
US20110218798A1 (en) * 2010-03-05 2011-09-08 Nexdia Inc. Obfuscating sensitive content in audio sources
US20110216905A1 (en) * 2010-03-05 2011-09-08 Nexidia Inc. Channel compression
US20120084089A1 (en) * 2010-09-30 2012-04-05 Google Inc. Progressive encoding of audio
US20120136658A1 (en) * 2010-11-30 2012-05-31 Cox Communications, Inc. Systems and methods for customizing broadband content based upon passive presence detection of users
US20120173880A1 (en) * 2010-12-29 2012-07-05 Viswanathan Swaminathan System And Method For Decrypting Content Samples Including Distinct Encryption Chains
US20120203491A1 (en) * 2011-02-03 2012-08-09 Nokia Corporation Method and apparatus for providing context-aware control of sensors and sensor data
US20120245941A1 (en) * 2011-03-21 2012-09-27 Cheyer Adam J Device Access Using Voice Authentication
US20130035893A1 (en) * 2011-03-31 2013-02-07 Qualcomm Incorporated Methods, devices, and apparatuses for activity classification using temporal scaling of time-referenced features

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140278391A1 (en) * 2013-03-12 2014-09-18 Intermec Ip Corp. Apparatus and method to classify sound to detect speech
US9076459B2 (en) * 2013-03-12 2015-07-07 Intermec Ip, Corp. Apparatus and method to classify sound to detect speech
US9299344B2 (en) 2013-03-12 2016-03-29 Intermec Ip Corp. Apparatus and method to classify sound to detect speech
US10540521B2 (en) 2017-08-24 2020-01-21 International Business Machines Corporation Selective enforcement of privacy and confidentiality for optimization of voice applications
US11113419B2 (en) 2017-08-24 2021-09-07 International Business Machines Corporation Selective enforcement of privacy and confidentiality for optimization of voice applications
WO2021107218A1 (en) * 2019-11-29 2021-06-03 주식회사 공훈 Method and device for protecting privacy of voice data

Also Published As

Publication number Publication date
US20120303360A1 (en) 2012-11-29
EP2715722A1 (en) 2014-04-09
KR20140021681A (en) 2014-02-20
CN103620680B (en) 2015-12-23
JP5937202B2 (en) 2016-06-22
JP2014517939A (en) 2014-07-24
KR101580510B1 (en) 2015-12-28
WO2012162009A1 (en) 2012-11-29
EP2715722B1 (en) 2018-06-13
US8700406B2 (en) 2014-04-15
CN103620680A (en) 2014-03-05

Similar Documents

Publication Publication Date Title
US8700406B2 (en) Preserving audio data collection privacy in mobile devices
CN110544488B (en) Method and device for separating multi-person voice
JP6730435B2 (en) System, method and program
US11482242B2 (en) Audio recognition method, device and server
EP2994911B1 (en) Adaptive audio frame processing for keyword detection
US11153430B2 (en) Information presentation method and device
CN106033419B (en) Method, device and system for pushing messages in real time
US10433256B2 (en) Application control method and application control device
WO2018228167A1 (en) Navigation method and related product
CN110069624A (en) Text handling method and device
CN110875036A (en) Voice classification method, device, equipment and computer readable storage medium
US9818427B2 (en) Automatic self-utterance removal from multimedia files
CN110347875B (en) Video scene classification method and device, mobile terminal and storage medium
CN106485246B (en) Character identifying method and device
CN107659603B (en) Method and device for interaction between user and push information
CN111787149A (en) Noise reduction processing method, system and computer storage medium
CN112926623A (en) Method, device, medium and electronic equipment for identifying composite video
CN116597828B (en) Model determination method, model application method and related device
WO2023160515A1 (en) Video processing method and apparatus, device and medium
CN114582332B (en) Audio processing method, device and storage medium
CN111460214B (en) Classification model training method, audio classification method, device, medium and equipment
CN108073566A (en) Segmenting method and device, the device for participle
CN106776659B (en) Method and device for sequencing search results based on scenic spot component identification, and user terminal
CN117114073A (en) Data processing method, device, equipment and medium
CN111460214A (en) Classification model training method, audio classification method, device, medium and equipment

Legal Events

Date Code Title Description
AS Assignment

Owner name: QUALCOMM INCORPORATED, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GROKOP, LEONARD H.;NARAYANAN, VIDYA;DOLTER, JAMES W.;AND OTHERS;SIGNING DATES FROM 20110824 TO 20110914;REEL/FRAME:032284/0729

AS Assignment

Owner name: QUALCOMM INCORPORATED, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GROKOP, LEONARD H.;NARAYANAN, VIDYA;DOLTER, JAMES W.;AND OTHERS;SIGNING DATES FROM 20110824 TO 20110914;REEL/FRAME:034308/0092

STCB Information on status: application discontinuation

Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION