CA2538981A1 - Method and device for processing audiovisual data using speech recognition - Google Patents

Method and device for processing audiovisual data using speech recognition Download PDF

Info

Publication number
CA2538981A1
CA2538981A1 CA002538981A CA2538981A CA2538981A1 CA 2538981 A1 CA2538981 A1 CA 2538981A1 CA 002538981 A CA002538981 A CA 002538981A CA 2538981 A CA2538981 A CA 2538981A CA 2538981 A1 CA2538981 A1 CA 2538981A1
Authority
CA
Canada
Prior art keywords
basic units
audio signal
time codes
providing
recognized speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA002538981A
Other languages
French (fr)
Other versions
CA2538981C (en
Inventor
Jocelyne Cote
Howard Ryshpan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
INDEKSO Inc
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of CA2538981A1 publication Critical patent/CA2538981A1/en
Application granted granted Critical
Publication of CA2538981C publication Critical patent/CA2538981C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • G11B27/034Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel

Abstract

A method and apparatus is disclosed for producing an audiovisual work. The method and apparatus is based on speech recognition. Extraction of basic uni ts of speech with related time code is performed. The invention may be advantageously used for performing post-production synchronization of a vide o source, dubbing assisting, closed-captioning assisting and animation generation assisting.

Claims (33)

1. A method for producing an audiovisual work, the method comprising the steps of:
providing an audio signal to a speech recognition module;
performing a speech recognition of said audio signal, the speech recognition comprising an extracting of a series of basic units of recognized speech and related time codes;
receiving the basic units of recognized speech and the related time codes from the speech recognition module;
processing the received basic units to provide synchronization information corresponding to the basic units of recognized speech for a production of said audiovisual work; and displaying on a user interface said synchronization information providing timing information for the basic units of recognized speech in said series.
2. The method as claimed in claim 1, wherein the production comprises post-production audio synchronization, said synchronization information comprises a graphic representation of, a sound to be performed at each point in time over a span of time during said.audiovisual work, and said interface controls said graphic representation over said span while facilitating synchronized recording of said sound in order to perform post-production.
3. The method as claimed in claim 2, wherein the basic units of recognized speech are phonemes.
4. The method as claimed in claim 2, further comprising the step of converting the basic units of recognized speech received with the time codes into words and words related time codes.
5. The method as claimed in claim 2, further comprising the step of converting the basic units of recognized speech received with the time codes into graphemes and graphemes related time codes, the graphemes being processed to provide synchronization information.
6. The method as claimed in claim 5, further comprising the step of providing a conformed text source, further wherein the synchronization information-provided to the user comprises an indication of a temporal location with respect to the audio signal..
7. The method as claimed in claim 5, further comprising the step of providing a script of at least one part of the audio signal, further wherein the synchronization information provided to the user comprises an indication of a temporal location with respect to the script provided.
8. The method as claimed in claim 5, wherein the displaying on a user interface of said synchronization information, comprises the displaying of the graphemes using a horizontally sizeable font.
9. The method as claimed in claim 5, further comprising the step of detecting a Foley in the audio signal using a Foley detection unit, the detecting comprising the providing of an indication of the Foley and a related Foley time code.
10. The method as claimed in claim 5, further comprising the step of amending at least one part of the audio signal and audio signal related time codes using at least the graphemes and the synchronization information.
11. The method as claimed in claim 4, further comprising the providing of a plurality of words in accordance with the provided audio signal, the providing being performed by an operator.
12. The method as claimed in claim 11, further comprising the step of amending a recognized word in accordance with the plurality of words provided by the operator.
13. The method as claimed in claim 12, further comprising the step of creating a composite signal comprising at least the amended word, a video signal related to the audio source and the audio source.
14. The method as claimed in claim 1, wherein the displaying on a user interface of said synchronization information providing timing information for the basic units of recognized speech in said series is used to produce animation.
15. The method as claimed in claim 14, wherein for blocks of continuous spoken wore, said synchronization information provides essential visem information for each sequential frame to be drawn by an animator.
16. The method as claimed in claim 15, further comprising the step of providing a storyboard database, further comprising the step of converting the basic units of recognized speech received with the time codes into words and words related time codes, the processing of the plurality of words and the words related time codes providing an indication of a current temporal location of the audio signal with respect to the storyboard.
17. The method as claimed i~n claim 16, wherein the basic units of recognized speech are phonemes, further compzising the step of providing a plurality of visems for each bf the plurality of woids, using a visem database'and using the phonemes.
18, The method as claimed in claim 17, further comprising the step of outputting an adjusted voice track.comprising the audio signal, at least one part of the storyboard and the plurality of visems.
19. The method as claimed in claim 1, wherein the production comprises adaptation assisting, the adaptation assisting comprises a graphic representation of the basic units of recognized speech, the related time codes and a plm'ality of adapted basic units provided by a user, and said interface providing a visual indication of a matching of the plurality of adapted basic units with the basic speech units, the matching enabling synchronized adaptation of said audio signal.
20. .The method as claimed in claim 19, wherein the plurality of adapted basic units is provided by performing a speech recognition of an adapted voice source.
21. The method as claimed in claim 20, wherein the speech recognition of the adapted voice source further provides related adapted time codes, further wherein the step of adapting the audio signal using said synchronization information and the plurality of adapted basic units is performed by attempting to match at least one of the series of basic units with at least one of the plurality of adapted basic units using the related time codes and the related adapted time codes.
22. A method for performing closed-captioning of an audio source, the method comprising the steps of:
providing an audio signal of an audiolvideo signal to a speech recognition module;
performing a speech recognition of said audiolvideo signal, and incorporating text of said recognized speech of the audio signal as closed-captioning into a visual or non-visual portion of the audio/video signal in synchronization.
23. The method as claimed in claim 21 further comprising the step of providing an indication of an amount of successful replacement ~ of the plurality of basic units of recognized speech of the audio signal by the plurality of basic units of recognized speech of the adapted audio signal.
24. The method as claimed in claim 23, further comprising the step of providing a minimum amount required of successful replacement of the plurality of basic units of recognized speech of the audio signal by the plurality of basic units of recognized speech of the adapted audio signal, the method further comprising the step of canceling the providing of the at least one replaced plurality of basic units with related replaced time codes if the at least one replaced plurality of basic units is lower than the minimum amount required of successful replacement.
25. The method as claimed in claim 1, wherein the audio signal comprises a plurality of voices originating from a plurality of actors, further, comprising the step of assigning each of the series of basic units and the related time codes to a related actor of the plurality of actors.
26. The method as claimed in claim 1, wherein the production comprises closed-captioning production of the audio source, said closed-captioning comprises a graphic representation of the recognized series of basic units, the method further comprising the incorporating of at least one of the series of basic units as closed-captioning in a visual or non-visual portion of the audio/video portion of the audio/video signal in synchronization.
27. The method as claimed in claim 26, further comprising the step of amending at least one part of the plurality of basic units.
28. The method as claimed in claim 1, further comprising the step of converting the basic units of recognized speech received with the time codes into words and words related time codes, further comprising the step of creating a database comprising a word and related basic units.
29. The method as claimed in claim 28, further comprising the step of amending a word of said database, wherein phonemes of the word and the amended word are substantially the same.
30. The method as claimed in claim 1, further comprising the step of converting the basic units of recognized speech received with the time codes into words and words related time codes, further comprising the step of amending at least one word.
31. The method as claimed in claim 30, further comprises the step of providing a visual indication of a word to amend.
32. The method as claimed in claim 1, wherein the audio signal comprises lyrics that are sung, further wherein the production of said audiovisual work comprises a karaoke generation using said audio signal, said karaoke generation comprises a graphic representation of lyrics to be sung at each point in time over a span of time during said audiovisual work using the series of basic units of recognized speech provided and related time codes, together with an index representation of a current temporal position with respect to the graphic representation of the lyrics to be sung.
33. The method as claimed in claim 2, further comprising the step of detecting at least one note encoded in the audio signal according to an encoding scheme, further comprising the providing of the detected at least one note on said graphic representation.
CA2538981A 2001-09-12 2002-09-12 Method and device for processing audiovisual data using speech recognition Expired - Fee Related CA2538981C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US10/067,131 US7343082B2 (en) 2001-09-12 2001-09-12 Universal guide track
US10/067,131 2001-09-12
PCT/CA2002/001386 WO2003023765A1 (en) 2001-09-12 2002-09-12 Method and device for processing audiovisual data using speech recognition

Publications (2)

Publication Number Publication Date
CA2538981A1 true CA2538981A1 (en) 2003-03-20
CA2538981C CA2538981C (en) 2011-07-26

Family

ID=22073905

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2538981A Expired - Fee Related CA2538981C (en) 2001-09-12 2002-09-12 Method and device for processing audiovisual data using speech recognition

Country Status (6)

Country Link
US (2) US7343082B2 (en)
EP (1) EP1425736B1 (en)
AT (1) ATE368277T1 (en)
CA (1) CA2538981C (en)
DE (1) DE60221408D1 (en)
WO (1) WO2003023765A1 (en)

Families Citing this family (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9286941B2 (en) 2001-05-04 2016-03-15 Legend3D, Inc. Image sequence enhancement and motion picture project management system
US8897596B1 (en) 2001-05-04 2014-11-25 Legend3D, Inc. System and method for rapid image sequence depth enhancement with translucent elements
US8401336B2 (en) 2001-05-04 2013-03-19 Legend3D, Inc. System and method for rapid image sequence depth enhancement with augmented computer-generated elements
US7343082B2 (en) * 2001-09-12 2008-03-11 Ryshco Media Inc. Universal guide track
US7587318B2 (en) * 2002-09-12 2009-09-08 Broadcom Corporation Correlating video images of lip movements with audio signals to improve speech recognition
US8009966B2 (en) 2002-11-01 2011-08-30 Synchro Arts Limited Methods and apparatus for use in sound replacement with automatic synchronization to images
KR20050085344A (en) * 2002-12-04 2005-08-29 코닌클리즈케 필립스 일렉트로닉스 엔.브이. Synchronization of signals
US7142250B1 (en) * 2003-04-05 2006-11-28 Apple Computer, Inc. Method and apparatus for synchronizing audio and video streams
WO2004093059A1 (en) * 2003-04-18 2004-10-28 Unisay Sdn. Bhd. Phoneme extraction system
WO2004100128A1 (en) * 2003-04-18 2004-11-18 Unisay Sdn. Bhd. System for generating a timed phomeme and visem list
JP3945778B2 (en) * 2004-03-12 2007-07-18 インターナショナル・ビジネス・マシーンズ・コーポレーション Setting device, program, recording medium, and setting method
GB2424534B (en) * 2005-03-24 2007-09-05 Zootech Ltd Authoring audiovisual content
US20070011012A1 (en) * 2005-07-11 2007-01-11 Steve Yurick Method, system, and apparatus for facilitating captioning of multi-media content
US8060591B1 (en) 2005-09-01 2011-11-15 Sprint Spectrum L.P. Automatic delivery of alerts including static and dynamic portions
US7653418B1 (en) * 2005-09-28 2010-01-26 Sprint Spectrum L.P. Automatic rotation through play out of audio-clips in response to detected alert events
ATE440334T1 (en) * 2006-02-10 2009-09-15 Harman Becker Automotive Sys SYSTEM FOR VOICE-CONTROLLED SELECTION OF AN AUDIO FILE AND METHOD THEREOF
US8713191B1 (en) 2006-11-20 2014-04-29 Sprint Spectrum L.P. Method and apparatus for establishing a media clip
US7747290B1 (en) 2007-01-22 2010-06-29 Sprint Spectrum L.P. Method and system for demarcating a portion of a media file as a ringtone
US8179475B2 (en) * 2007-03-09 2012-05-15 Legend3D, Inc. Apparatus and method for synchronizing a secondary audio track to the audio track of a video source
US20080256136A1 (en) * 2007-04-14 2008-10-16 Jerremy Holland Techniques and tools for managing attributes of media content
US20080263433A1 (en) * 2007-04-14 2008-10-23 Aaron Eppolito Multiple version merge for media production
US8751022B2 (en) * 2007-04-14 2014-06-10 Apple Inc. Multi-take compositing of digital media assets
US20080295040A1 (en) * 2007-05-24 2008-11-27 Microsoft Corporation Closed captions for real time communication
TWI341956B (en) * 2007-05-30 2011-05-11 Delta Electronics Inc Projection apparatus with function of speech indication and control method thereof for use in the apparatus
US9390169B2 (en) * 2008-06-28 2016-07-12 Apple Inc. Annotation of movies
US8265450B2 (en) * 2009-01-16 2012-09-11 Apple Inc. Capturing and inserting closed captioning data in digital video
FR2955183B3 (en) * 2010-01-11 2012-01-13 Didier Calle METHOD FOR AUTOMATICALLY PROCESSING DIGITAL DATA FOR DOUBLING OR POST SYNCHRONIZATION OF VIDEOS
US8572488B2 (en) * 2010-03-29 2013-10-29 Avid Technology, Inc. Spot dialog editor
US8744239B2 (en) 2010-08-06 2014-06-03 Apple Inc. Teleprompter tool for voice-over tool
US8730232B2 (en) 2011-02-01 2014-05-20 Legend3D, Inc. Director-style based 2D to 3D movie conversion system and method
US8621355B2 (en) 2011-02-02 2013-12-31 Apple Inc. Automatic synchronization of media clips
US9241147B2 (en) 2013-05-01 2016-01-19 Legend3D, Inc. External depth map transformation method for conversion of two-dimensional images to stereoscopic images
US9407904B2 (en) 2013-05-01 2016-08-02 Legend3D, Inc. Method for creating 3D virtual reality from 2D images
US9288476B2 (en) 2011-02-17 2016-03-15 Legend3D, Inc. System and method for real-time depth modification of stereo images of a virtual reality environment
US9282321B2 (en) 2011-02-17 2016-03-08 Legend3D, Inc. 3D model multi-reviewer system
US9280905B2 (en) * 2011-12-12 2016-03-08 Inkling Systems, Inc. Media outline
WO2014018652A2 (en) 2012-07-24 2014-01-30 Adam Polak Media synchronization
US9007365B2 (en) 2012-11-27 2015-04-14 Legend3D, Inc. Line depth augmentation system and method for conversion of 2D images to 3D images
US9547937B2 (en) 2012-11-30 2017-01-17 Legend3D, Inc. Three-dimensional annotation system and method
US9007404B2 (en) 2013-03-15 2015-04-14 Legend3D, Inc. Tilt-based look around effect image enhancement method
US9438878B2 (en) 2013-05-01 2016-09-06 Legend3D, Inc. Method of converting 2D video to 3D video using 3D object models
US8719032B1 (en) 2013-12-11 2014-05-06 Jefferson Audio Video Systems, Inc. Methods for presenting speech blocks from a plurality of audio input data streams to a user in an interface
US20160042766A1 (en) * 2014-08-06 2016-02-11 Echostar Technologies L.L.C. Custom video content
GB2553960A (en) 2015-03-13 2018-03-21 Trint Ltd Media generating and editing system
US9609307B1 (en) 2015-09-17 2017-03-28 Legend3D, Inc. Method of converting 2D video to 3D video using machine learning
US10387543B2 (en) * 2015-10-15 2019-08-20 Vkidz, Inc. Phoneme-to-grapheme mapping systems and methods
GB201715753D0 (en) * 2017-09-28 2017-11-15 Royal Nat Theatre Caption delivery system
CN112653916B (en) * 2019-10-10 2023-08-29 腾讯科技(深圳)有限公司 Method and equipment for synchronously optimizing audio and video
US11545134B1 (en) * 2019-12-10 2023-01-03 Amazon Technologies, Inc. Multilingual speech translation with adaptive speech synthesis and adaptive physiognomy

Family Cites Families (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3170907D1 (en) 1981-01-19 1985-07-18 Richard Welcher Bloomstein Apparatus and method for creating visual images of lip movements
GB2101795B (en) 1981-07-07 1985-09-25 Cross John Lyndon Dubbing translations of sound tracks on films
CA1270063A (en) 1985-05-14 1990-06-05 Kouji Miyao Translating apparatus
US5155805A (en) 1989-05-08 1992-10-13 Apple Computer, Inc. Method and apparatus for moving control points in displaying digital typeface on raster output devices
US5159668A (en) 1989-05-08 1992-10-27 Apple Computer, Inc. Method and apparatus for manipulating outlines in improving digital typeface on raster output devices
EP0526064B1 (en) 1991-08-02 1997-09-10 The Grass Valley Group, Inc. Video editing system operator interface for visualization and interactive control of video material
US5434678A (en) 1993-01-11 1995-07-18 Abecassis; Max Seamless transmission of non-sequential video segments
US5481296A (en) 1993-08-06 1996-01-02 International Business Machines Corporation Apparatus and method for selectively viewing video information
JP3356536B2 (en) 1994-04-13 2002-12-16 松下電器産業株式会社 Machine translation equipment
US5717468A (en) 1994-12-02 1998-02-10 International Business Machines Corporation System and method for dynamically recording and displaying comments for a video movie
JP4078677B2 (en) 1995-10-08 2008-04-23 イーサム リサーチ デヴェロップメント カンパニー オブ ザ ヘブライ ユニヴァーシティ オブ エルサレム Method for computerized automatic audiovisual dubbing of movies
JP3454396B2 (en) 1995-10-11 2003-10-06 株式会社日立製作所 Video change point detection control method, playback stop control method based thereon, and video editing system using them
US5732184A (en) 1995-10-20 1998-03-24 Digital Processing Systems, Inc. Video and audio cursor video editing system
US5880788A (en) 1996-03-25 1999-03-09 Interval Research Corporation Automated synchronization of video image sequences to new soundtracks
US6154601A (en) 1996-04-12 2000-11-28 Hitachi Denshi Kabushiki Kaisha Method for editing image information with aid of computer and editing system
US5832171A (en) 1996-06-05 1998-11-03 Juritech, Inc. System for creating video of an event with a synchronized transcript
JPH1074204A (en) 1996-06-28 1998-03-17 Toshiba Corp Machine translation method and text/translation display method
EP0848850A1 (en) 1996-07-08 1998-06-24 Régis Dubos Audio-visual method and devices for dubbing films
US5969716A (en) 1996-08-06 1999-10-19 Interval Research Corporation Time-based media processing system
AU6313498A (en) 1997-02-26 1998-09-18 Tall Poppy Records Limited Sound synchronizing
US6134378A (en) 1997-04-06 2000-10-17 Sony Corporation Video signal processing device that facilitates editing by producing control information from detected video signal information
FR2765354B1 (en) 1997-06-25 1999-07-30 Gregoire Parcollet FILM DUBBING SYNCHRONIZATION SYSTEM
EP0899737A3 (en) 1997-08-18 1999-08-25 Tektronix, Inc. Script recognition using speech recognition
DE19740119A1 (en) * 1997-09-12 1999-03-18 Philips Patentverwaltung System for cutting digital video and audio information
US6174170B1 (en) * 1997-10-21 2001-01-16 Sony Corporation Display of text symbols associated with audio data reproducible from a recording disc
JPH11162152A (en) 1997-11-26 1999-06-18 Victor Co Of Japan Ltd Lyric display control information editing device
JPH11289512A (en) * 1998-04-03 1999-10-19 Sony Corp Editing list preparing device
US6490563B2 (en) * 1998-08-17 2002-12-03 Microsoft Corporation Proofreading with text to speech feedback
IT1314671B1 (en) * 1998-10-07 2002-12-31 Cselt Centro Studi Lab Telecom PROCEDURE AND EQUIPMENT FOR THE ANIMATION OF A SYNTHESIZED HUMAN FACE MODEL DRIVEN BY AN AUDIO SIGNAL.
US20010044719A1 (en) * 1999-07-02 2001-11-22 Mitsubishi Electric Research Laboratories, Inc. Method and system for recognizing, indexing, and searching acoustic signals
US7047191B2 (en) * 2000-03-06 2006-05-16 Rochester Institute Of Technology Method and system for providing automated captioning for AV signals
US7085842B2 (en) * 2001-02-12 2006-08-01 Open Text Corporation Line navigation conferencing system
US7343082B2 (en) * 2001-09-12 2008-03-11 Ryshco Media Inc. Universal guide track

Also Published As

Publication number Publication date
ATE368277T1 (en) 2007-08-15
EP1425736B1 (en) 2007-07-25
EP1425736A1 (en) 2004-06-09
US7343082B2 (en) 2008-03-11
WO2003023765A1 (en) 2003-03-20
US20030049015A1 (en) 2003-03-13
CA2538981C (en) 2011-07-26
US20040234250A1 (en) 2004-11-25
DE60221408D1 (en) 2007-09-06

Similar Documents

Publication Publication Date Title
CA2538981A1 (en) Method and device for processing audiovisual data using speech recognition
US11190855B2 (en) Automatic generation of descriptive video service tracks
US20060136226A1 (en) System and method for creating artificial TV news programs
US11942093B2 (en) System and method for simultaneous multilingual dubbing of video-audio programs
JP2004361965A (en) Text-to-speech conversion system for interlocking with multimedia and method for structuring input data of the same
CN110740275B (en) Nonlinear editing system
US11908449B2 (en) Audio and video translator
CN110781649A (en) Subtitle editing method and device, computer storage medium and electronic equipment
JP2020012855A (en) Device and method for generating synchronization information for text display
CN117596433B (en) International Chinese teaching audiovisual courseware editing system based on time axis fine adjustment
Lu et al. Visualtts: Tts with accurate lip-speech synchronization for automatic voice over
Jankowska et al. Reading rate in filmic audio description
Di Gangi et al. Automatic video dubbing at AppTek
JP2002344805A (en) Method for controlling subtitles display for open caption
CN115633136A (en) Full-automatic music video generation method
JP4595098B2 (en) Subtitle transmission timing detection device
CN117240983B (en) Method and device for automatically generating sound drama
CN101266790A (en) Device and method for automatic time marking of text file
JP2003244539A (en) Consecutive automatic caption processing system
JPS6315294A (en) Voice analysis system
US11947924B2 (en) Providing translated subtitle for video content
Bazaz et al. Automated Dubbing and Facial Synchronization using Deep Learning
Pamisetty et al. Subtitle Synthesis using Inter and Intra utterance Prosodic Alignment for Automatic Dubbing
JP2004336606A (en) Caption production system
Weiss A Framework for Data-driven Video-realistic Audio-visual Speech-synthesis.

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed

Effective date: 20200914