WO1998041978A1 - Procede et dispositif destines a detecter des points de depart et de terminaison d'une section son dans une sequence video - Google Patents
Procede et dispositif destines a detecter des points de depart et de terminaison d'une section son dans une sequence video Download PDFInfo
- Publication number
- WO1998041978A1 WO1998041978A1 PCT/JP1997/000905 JP9700905W WO9841978A1 WO 1998041978 A1 WO1998041978 A1 WO 1998041978A1 JP 9700905 W JP9700905 W JP 9700905W WO 9841978 A1 WO9841978 A1 WO 9841978A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- sound
- point
- section
- audio
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 93
- 238000012545 processing Methods 0.000 claims abstract description 91
- 230000005236 sound signal Effects 0.000 claims abstract description 45
- 238000001914 filtration Methods 0.000 claims abstract description 23
- 238000001514 detection method Methods 0.000 claims description 82
- 239000000872 buffer Substances 0.000 claims description 50
- 230000015654 memory Effects 0.000 claims description 35
- 238000003860 storage Methods 0.000 claims description 13
- 238000012937 correction Methods 0.000 claims description 2
- 230000000717 retained effect Effects 0.000 claims 1
- 238000004364 calculation method Methods 0.000 abstract description 6
- 239000000463 material Substances 0.000 description 20
- 238000010586 diagram Methods 0.000 description 12
- 238000005070 sampling Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000012790 confirmation Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000000630 rising effect Effects 0.000 description 3
- 238000007689 inspection Methods 0.000 description 2
- 238000011897 real-time detection Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000012612 commercial material Substances 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/102—Programmed access in sequence to addressed parts of tracks of operating record carriers
- G11B27/105—Programmed access in sequence to addressed parts of tracks of operating record carriers of operating discs
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/102—Programmed access in sequence to addressed parts of tracks of operating record carriers
- G11B27/107—Programmed access in sequence to addressed parts of tracks of operating record carriers of operating tapes
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/11—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information not detectable on the record carrier
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/22—Means responsive to presence or absence of recorded information signals
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/34—Indicating arrangements
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L2025/783—Detection of presence or absence of voice signals based on threshold decision
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B2220/00—Record carriers by type
- G11B2220/20—Disc-shaped record carriers
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B2220/00—Record carriers by type
- G11B2220/60—Solid state media
- G11B2220/65—Solid state media wherein solid state memory is used for storing indexing information or metadata
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B2220/00—Record carriers by type
- G11B2220/90—Tape-like record carriers
Definitions
- the present invention relates to a method and an apparatus for detecting a sound section of audio data contained in a video such as a video stored on a video tape or a disk, and to a method and an apparatus for simplifying cueing of audio in a video. Get involved. Background art
- CM management device commonly called CM bank
- CM commercial
- CM management device that manages thousands of types of commercial (CM) videos and prepares arbitrary CM videos in broadcast order.
- CM commercial
- CM management device that directly broadcasts a file of CM materials without using a videotape has been used.
- CM materials supplied from material producers such as advertising agencies are registered.
- CM materials are supplied individually on video tapes for each CM, and in addition to the CM, images such as the name of the creator and the date and time of production are recorded in addition to the CM. Also, before and after the CM, there are several seconds of play video to adjust the transmission timing. For this reason, when registering CM materials in the management device, in addition to copying and storing the mother material supplied from the manufacturer to another recording medium such as a tape or disk, The beginning and end need to be registered.
- the present invention provides a method and an apparatus for automating the task of judging the start and end of a CM based on the presence or absence of sound when the CM material is registered in the management apparatus and automating the registration operation. It is intended to be.
- Another object of the present invention is to provide a method and apparatus for detecting the start and end points of a CM video body in real time and registering the positions. Disclosure of the invention
- the present invention provides an envelope calculating means for calculating an envelope of an audio signal waveform input in time series in an interactive registration process to a video management device, and presets a threshold of an audio level for the value of the envelope.
- Threshold value setting means for detecting a sound level to be detected
- start and end point detection means for detecting a time point at which the level of the threshold value and the envelope intersect as a start point or an end point of a sound section. Then, the presence or absence of the sound can be quantitatively and automatically determined.
- the start-and-stop inspection output means includes: a lower limit setting unit for setting a lower limit of an elapsed time length of a silent state in advance; and a process in which the value of the envelope of the audio signal waveform falls below the threshold of the sound level.
- the start / end point detection means includes: a lower limit setting means for setting a lower limit of an elapsed time length of a sound state in advance; and an elapsed time when a value of an envelope of an audio signal waveform exceeds a threshold value of an audio level.
- the envelope calculating means is provided with a filtering means for performing a filtering process having a certain time width on the audio signal input in time series.
- a filtering means for performing a filtering process having a certain time width on the audio signal input in time series.
- a maximum value filter for sequentially obtaining a maximum value of a fixed time width and a minimum value filter for sequentially obtaining a minimum value are used for an audio signal input in time series.
- the present invention provides a video reproducing means for reproducing a material video, an audio input means for inputting an audio signal recorded in an audio track of the reproduced video as a digital signal in a time series, and a sound source based on the input audio signal.
- Speech processing means for detecting the start and end points of the section and display means for displaying the detection result are provided so that the position of the start and end points of the sound section in the material video can be presented to the operator.
- the audio processing means includes, in addition to the envelope calculating means, the sound level threshold setting means, and the start and end point detecting means, a frame position determining means for determining a frame position of the video at the time when the start and end points of the sound section are detected.
- the frame position determining means includes a timer means for counting the elapsed time from the start of the detection processing, a means for reading the frame position of the video, and an elapsed time when the start and end points are detected and a time when the frame position is read.
- Elapsed time storage means for storing the elapsed time and frame position correction means for correcting the frame position read from the difference between the two elapsed times are provided, and a time delay from the start and end point detection to the reading of the frame position is corrected.
- the audio processing means is provided with means for stopping the reproduction of the video at the detected start and end points, so that the reproduction of the video can be temporarily stopped at the frame position of the start and end points.
- a video playback device capable of controlling video playback by a computer is used as the video playback means.
- VISCA Video System Control Architecture: VCRs with terminals and VCRs commonly used for professional editing are used. In this way, it is possible to efficiently start to the detected sound section.
- the present invention provides a frame position storage means for individually storing the start and end frame positions of the sound section in the voice processing means, and a display means for individually displaying the start and end frame positions. Is provided so that the positions of the start point and end point of the sound section in the material video can be individually presented to the worker.
- buffer memory means for accumulating audio signals input in time series in fixed time length units and reproduction means for reproducing the input audio signals are provided, and the detected sound intervals are visually and audibly detected. So that workers can check it.
- the length of the sound section of a certain period of time is Time length setting means for presetting an upper limit and an allowable range of 1 or 2 seconds, and time length comparing means for comparing the detected time length from the start point to the end point of the detected sound section with the set time length. It is to be able to detect only the sound section of CM video with a certain length of time. Also, a margin setting means for setting a margin before and after the detected sound section is provided so that a CM video for a fixed length of time can be registered in the CM management device from the CM material.
- FIG. 1 is a system configuration diagram for realizing an embodiment of the present invention
- FIG. 2 is a conceptual diagram of a sound section detection method of the present invention
- FIG. 3 is a sound section of the present invention.
- FIG. 4 is a flowchart of a section detection method
- FIG. 4 is a diagram showing a start / end point determination condition of a sound section of the present invention
- FIG. 5 is a diagram showing an example of an operation screen for realizing the present invention
- FIG. 6 is a flowchart showing the flow of the entire process
- FIG. 7 is a diagram showing a control method for detecting a sound interval according to the present invention
- FIG. 9 is a flowchart showing a flow of a sound section detection process using the CM time length rule
- FIG. 10 is a data structure for realizing the sound section detection of the present invention. It is a figure showing an example. BEST MODE FOR CARRYING OUT THE INVENTION
- FIG. 1 is an example of a system configuration diagram for realizing the present invention.
- Reference numeral 101 denotes a display device such as a CRT, which displays an output screen of the audio processing device 104.
- the setting of the command and the threshold value for the voice processing device 104 is performed using the input device 105 including a pointing device such as a mouse and a numerical input device such as a numeric keypad.
- the video reproducing device 110 is a device that reproduces a video recorded on a video tape, an optical disc, or the like.
- the audio signal in the video signal reproduced and output from the video reproducing device 110 is sequentially converted into a digital signal by the audio input device 103 and input to the audio processing device 104.
- Information such as the sampling frequency, the number of sampling bits, and the number of channels indicating monaural or stereo (one for monaural and two for stereo) when converting to a digital signal is also transmitted from the audio input device 103 to the audio processing device 1. Passed to 04. Of course, such information may be such that the value set in 104 is given to 103.
- the audio processing device 104 processes the received signal to control the video reproduction device 110.
- the audio processing device 104 and the video reproducing device 110 transmit and receive control commands and responses via the communication line 102.
- a frame number time code
- the audio processing device 104 can request the video reproducing device 110 for a frame number and receive the current frame number of the video.
- the digital audio signal is once sent through the interface 108.
- the data is input to the memory 109 and processed by the CPU 107 according to the processing program stored in the memory 109.
- the processing program is stored in the auxiliary storage device 106, and is appropriately transferred to the memory 109 in accordance with an instruction from the CPU 107.
- Various data created by the processing described below are stored in the memory 109, and are referred to as needed.
- Various information such as a digital signal of voice and a processing result can also be stored in 106.
- the speaker 111 reproduces the audio signal input from the audio signal input device 103 to the audio processing device 104 in accordance with the input, or requests the audio signal stored in the memory 109 to be used by the user. Or play according to
- FIG. 2 is a schematic diagram showing an outline of a method for detecting a sound section in a video according to the present invention.
- the moving image 201 and the audio waveform 202 indicate the image and audio signals contained in the video.
- the audio waveform 202 is shown in monaural for simplicity, but it may be stereo.
- the target video is a CM material
- the video of the material contains a video of play for several seconds before and after the main body of the CM. Because the same video is taken continuously before and after the main body of the CM, it is often not clear from where the video should be broadcast just by watching the moving image 201, and in many cases, . However, no sound is recorded in the play area.
- the amplitude value of the audio waveform 202 As for the amplitude value of the audio waveform 202, positive and negative values frequently appear alternately, and the magnitude is very often instantaneously zero. Therefore, simply examining the magnitude of the amplitude at a certain moment does not necessarily indicate the presence or absence of sound in the vicinity.
- the sound Is determined by the value of the envelope of the audio waveform 202.
- the value of the envelope reflects the presence or absence of nearby sounds. A point at which the envelope value exceeds a predetermined sound level threshold is detected as a start point (IN) of the sound section 203 and a point at which the value falls below the threshold is detected as an end point (OUT).
- the audio reproduction device 110 has already reproduced the video after the change point. Therefore, the frame number at the time of detection is read from the video playback device 110, the frame number is converted using the difference between the read time and the time of the change point, and the frame number of the change point is calculated. The conversion method will be described later with reference to FIG.
- the video section in which the sound continuously exceeds a certain sound level is detected. Can be extracted.
- the start of the rising frame of the sound can be easily realized.
- the length of time from the start point to the end point is known, it is easy to set before and after the video section where the magazine necessary for finishing the broadcast CM video is extracted. As a result, it is possible to register a CM video of good quality with no variation in time length in the CM management device.
- a user uses the system shown in FIG. 1, a user sets a video tape or the like containing a material in a video reproducing apparatus 110 and an audio processing apparatus displayed on a display 101. All you have to do is operate the 104 console buttons. An example of the console screen will be described later with reference to FIG. The user does not need to find the beginning or end of the sound section in the video manually using a jog or shuttle. Therefore, there is an effect of simplifying the operation.
- FIG. 3 is a flowchart of the method for detecting the start and end points of a sound section in a video according to the present invention.
- Reference numerals 310 to 306 denote program steps, and reference numerals 331 to 316 denote output data of the respective steps. All of these programs and data are stored in memory 109 and processed by CPU 107.
- the case where the audio waveform is monaural (channel number 1) and stereo (channel 2) can be handled in the same way.
- the following monaural processing is performed on the audio waveforms of the left and right channels, and the result of both channels is AND (logical product) to determine overlap and OR (logical sum) to obtain the overall result. May be determined.
- step 301 audio data in a video is received from the audio input device 103.
- 3 1 1 is the waveform of the received audio data.
- step 302 the absolute value of each of the data of 311 is obtained, and the audio waveform is turned back. This is because only the sound level is handled regardless of the meaning of the speech.
- 3 1 2 is an audio waveform obtained by folding 3 1 positively.
- steps 303 and 304 the envelope of the waveform 312 is obtained by maximum-minimum filtering. In each filtering, filters with filter sizes of 32 1 and 32 2 are prepared, input data is sequentially read into the filter, and the maximum and minimum values in the filter are calculated and output. In step 303, the maximum value in the filter for the waveform 312 is output for each data.
- step 304 the minimum value in the filter for the maximum value waveform 3 13 is output for each data.
- 3 1 4 is the resulting envelope.
- step 305 threshold processing is performed to compare each data of the envelope 314 with a predetermined sound level threshold 323. When the envelope 3 14 exceeds the threshold 3 2 3, 1 is output as a sound, and when the envelope is lower than 0, 0 is output as silence.
- Reference numeral 315 denotes the voiced / silent binary data output in step 30.5.
- step 300 the continuity of sound and silence of the binary data 315 is checked to detect a sound section 324, and the start and end points 316 of the sound section are output.
- the rising point of the section is output as the start point 3 25 (IN) of the sound, and the falling point of the sound section is output as the end point 3 26 (OUT) of the sound.
- This step 310 will be described with reference to FIG. 4 using a timing chart.
- the method of calculating the envelope by maximum-minimum filtering can significantly reduce the amount of calculation compared to the method of calculating the power spectrum of the audio waveform and using the zero-order power as the envelope. Therefore, it can be realized even if the capacity of CPU is not so high.
- the one-dimensional maximum-minimum filtering described in steps 303 and 304 is described in, for example, “High-speed calculation method for maximum-minimum image filtering” (Transactions of the Institute of Electronics, Information and Communication Engineers DII, Vol. J78-DI I, No. 11, pp. 1598-mouse, November 1995) may be used.
- This is a data sequential processing method using a ring buffer that can store n + 1 data for a filter size n.
- the maximum and minimum values can be obtained with about three average operations per day, regardless of the data properties and filter size. Therefore, it is suitable for processing a large amount of data at a high speed as in this case.
- FIG. 4 is a diagram showing a method for determining the start and end points of a sound section.
- TsCmsec] is the lower limit of the elapsed time length of the sound state
- Tn [msec] is the lower limit of the elapsed time length of the silent state.
- the values of Ts and Tn are set in advance based on the length of one syllable in the voice and the length of the pause between utterances. This makes it possible to prevent the detection of a voiced state below Ts or a silent state below Tn, so that it is not affected by sporadic noises or short-time sound breaks such as seams of phrases in speech.
- a method for detecting a sounded section can be realized.
- Reference numeral 401 denotes a timing chart showing from the input data 315 to the start and end points 316 of the sound section in the step 306. Four flags are provided to determine the status: a silence flag, a sound flag, a start flag and an end flag.
- step 306 input data 315 indicating binary states of sound and no sound are sequentially examined, and the number of data of 0 (silence) and 1 (sound) is counted as the elapsed time of each state. Since the sampling frequency for digitizing the audio signal is passed from the audio input device 103 to the audio processing device 104, it is easy to replace the time conditions Ts and Tn with the data number condition. The number of data in the voiced state is cleared when the silence flag becomes ⁇ , and the data in the voiceless state is cleared when the voice flag is turned ON. Initially, all flags are set to OFF, and the number of data in each state is set to zero. First, the silent flag is turned on when the silent state continues for Tn (402).
- the silence flag When the silence flag is ON, all points that change from silence to speech are considered as starting point candidates, and their data positions are stored in memory 109.
- the force that sets the rising edge of the voiced state 4003 as the candidate for the starting point. Cancel as a one-off noise.
- the rise of the voiced state 404 is set as a candidate for a start point, and the voiced flag is turned ON when the voiced state continues for Ts (405).
- both the silence flag and the sound flag are turned on and the start point condition is satisfied.
- the start flag that has been turned on returns to off when it is sensed.
- the start point is detected up to 420 on the time axis.
- the silence flag is turned off (406).
- the sound flag is 0N
- all points that have changed from sound to silence are considered as end point candidates, and their data positions are stored in the memory 109. Since the elapsed time is less than Tn in the silent state 407, the number of data in 407 is incorporated into the number of data in the voiced state, and is canceled as a short time break.
- the silence flag is turned ON (409). As a result, both the sound flag and the silence flag are turned ON, and the condition of the end point is satisfied. Therefore, the end flag is turned ON and the end point 326 (OUT) is determined.
- the end flag that has been turned on returns to off when it is sensed. Also, the sound flag is turned OFF in preparation for the next start point detection (410).
- the end point is detected up to 4 2 1 on the time axis.
- the start and end points of a sound section can be continuously detected by manipulating the flag as shown in Fig. 4, so that even if a single video contains multiple sound sections, Each can be detected. Therefore, the sound segment detection method of the present invention is applicable not only to CM materials and video images but also to general images such as TV broadcast images and archive images. If the processed video is commercial material, a general CM time length rule of "one CM is composed of 15 seconds or 30 seconds in length" can be used. Even if a section is detected, the start and end points of the appropriate CM body can be determined by grouping the sections according to rules. The method of detecting the start and end points using the CM rules will be described later with reference to FIG.
- FIG. 5 is an example of an operation screen of the sound section detection device which realizes the present invention.
- the operation window 501 is displayed on the display device 101 as a console of the audio processing device 104, and provides an operation environment to the user.
- 501 has a QUIT button 502, a DETECT button 503, a detection result display panel 504, an audio waveform monitor 505, a sounded section display panel 506, a PLAY button 509, and an image.
- the QUIT button 502 is a command button for ending the operation processing and closing the operation window 501.
- the DETECT button 503 is a command button for executing a sound section detection process.
- the audio processing unit 104 displays the detection results.
- the panel 504 is cleared, the detection of a sound section is started according to the program 300, and the result of the processing is displayed on the audio waveform monitor 505.
- the monitor 505 displays the calculated envelope 314 and the threshold value 323 of the sound level.
- the detected frame number is displayed on the panel 504 in the form of a time code. It is convenient for the user because it is easy to intuitively grasp the position and length with the structure of ihh: mm: ss: ff (hh: hour, mra: minute, ss: second, ff: frame).
- the voiced section display panel 506 displays a waveform 507 and a voiced section 508 of the voice data input until the start and end points of the voiced section detection are detected.
- the voiced section 508 corresponds to the detection result display panel 504 from the IN frame to the OUT frame.
- the maximum length of a CM video is 30 seconds, so a 40-second audio waveform is displayed here.
- the PLAY button 509 is a button for reproducing the audio data of the sound section 508. The user can visually confirm the audio signal in the video by using the audio waveform 507. By clicking 509 and playing the sound, it can be confirmed by hearing. As a result, the user can confirm the detection status immediately after detecting the sound section, and the confirmation work can be saved.
- the end may be set by dragging the end of the sound section 508 to widen the section.
- the section length can be calculated.
- the user sets a magazine before and after the section length so that the entire time length becomes a desired length.
- the system changes the frame number of panel 504 according to the set magazine length, and sets the changed frame number as the start and end point of the CM video to be registered in the CM management device. This allows the user to easily proceed with the registration work to the CM management device. Also, by cutting out the video sandwiched between the start and end points of the video, the user can create a commercial video of a desired length for broadcasting.
- the video playback device operation buttons 5 11 are arranged on the video playback device operation panel 5 10. Place.
- the operation buttons 5 1 1 are command buttons for fast-forwarding, rewinding, playing back, frame-by-frame, and pausing video.
- the audio processing device 104 transmits the corresponding operation command to the video reproduction device 110.
- the video frame position is displayed as a time code in the video position display box 5 12.
- a parameter setting box 5 14 for setting parameters for detecting a sound section is arranged on the parameter setting panel 5 13.
- Panel 5 13 includes the following parameters that can be changed: sound level threshold (Threshhold Value), filter time length (Filterlength), lower limit of sound-elapsed time (Noise Limt), and sound-elapsed time.
- Sound level threshold Theshhold Value
- Filterlength filter time length
- Low limit of sound-elapsed time Noise Limt
- Sound-elapsed time Four of the lower limit (Silence) are arranged.
- If the user wants to change the parameters click 514, and enter a value from the input device 105.
- the threshold of the sound level In order to set the threshold of the sound level (Threshold Value in the figure), in addition to inputting a numerical value from the input device 105, another method may be used as follows. First, the threshold of the sound level.
- the video playback device 110 When the radiator setting box is clicked, the video playback device 110 is stopped or paused, and in that state, audio data is input from the audio input device 103 to the audio processing device 104 for several seconds. I do. Next, the maximum value of the sound level of the audio data input for several seconds is set as the sound level threshold. At this time, by inputting for a few seconds, random noise of the audio signal generated by the video reproduction device 110 and the audio input device 103 can be input to the audio processing device 104. In addition, by setting the maximum value as the threshold value of the sound level, it is possible to suppress the noise generated in 110 and 103 from affecting the audio signal in the reproduced video.
- FIG. 6 is a flowchart showing the overall flow of the processing.
- the CPU 107 reads out the program 600 from the auxiliary storage device 106, stores it in the memory 109, and executes it.
- various types of audio data and processing data are also stored in the memory 109, and the structure of these data will be described later with reference to FIG.
- Step 601 is an initialization process for starting the process.
- the CPU 107 secures and clears a memory area required for processing in the memory 109, and sets default values of parameters such as thresholds of the sound level.
- an operation window 501 of the audio processing device 104 is displayed on the display device 101.
- a control command is transmitted to the video playback device 110 to set the playback of the video playback device 110 to a pause state (STAND BY ON).
- STAND BY ON a pause state
- the video playback device 110 can be operated immediately when another control command is sent, and audio signals and frames can be output. The number can be read out quickly.
- step 602 it is determined whether or not there is a user end request, and the screen control in step 603 is repeatedly executed while there is no request.
- step 63 the process branches in accordance with the instruction button specified by the user. For example, when the user clicks the DETECT button 503 of the operation window 501, steps 608 and 609 are executed to wait for user input.
- the number of branches and the number of types of branching may be increased or decreased according to the number and types of instruction buttons arranged in the operation window 501 so that the optimum processing can always be selected.
- Steps 604 to 609 are processing corresponding to each command button.
- Step 604 is processing when the video playback device operation button group 511 is designated.
- This control process can be used not only when the operation button group 5111 is clicked but also as a process for controlling the video playback device 110.
- a control command is transmitted to the video playback device 110, and a response stage of the video playback device 110 is received.
- the response status is determined, and if an error occurs, an error message is displayed on the display device 101 and the processing is interrupted. If the control is successful, the frame number is read out, displayed on the display box 512, and the process returns to step 603.
- Step 6 05 is the parameter when parameter setting box 5 1 4 is specified. This is a meter setting process.
- the user changes a setting parameter by inputting a numerical value from the input device 105, the corresponding parameter stored in the memory 109 is rewritten.
- the parameter related to the time length is changed, the changed time length is converted into the number of data according to the sampling frequency of the audio data.
- Step 606 is a sound reproduction process for reproducing the detected input sound data of the sound section 508 with the speaker 111. If the start and end points of the sound section are set in the detection result display panel 504, the audio data from the IN frame to the OUT frame of 504 is reproduced. That is, the audio data in the audio storage data buffer 105 is reproduced from the data position 1052 to the data position 105. This allows the user to check the detection result by hearing.
- Step 607 is a magazine setting process for providing a margin in the detected sound section.
- the user sets by expanding the section by dragging the end of the sound section 508.
- the time length of the sound section from the IN frame to the OUT frame of the detection result display panel 504 is calculated. If the time length of a single CM video is fixed, the upper limit of the magazine is uniquely determined from the time length of the sound section. While monitoring the user's operation so as not to exceed the upper limit, the magazine is determined, and the frame numbers at the start and end points are corrected. As a result, it is possible to register a high-quality CM video with no variation in time length in the management device. As another method, an appropriate margin satisfying the upper limit may be automatically added before and after the section. If the time length is not limited, add a margin as requested by the user. Conversely, it is also possible to shorten the length of a sound section.
- Step 608 is a process of detecting the start and end points of a sound section.
- the video of the video playback device 110 is played, audio data is input from the audio input device 103, the start and end points of the sound section are detected, and the detection result display panel is displayed. Displayed at 504. Details will be described later in Program 900 (Fig. 9).
- the program 900 is an example in which the start and end inspection method of the sound section shown in the program 300 is applied to the sound section detection device.
- the video of the video playback device 110 may be located at the start point of the sound section.
- the cueing can be realized by transmitting the frame number of the start point of the sound section and the search command from the audio processing device 104 to the video reproducing device 110.
- step 609 the waveform 507 and the sound section 508 are c. Displayed on channel 506.
- the voice data input until the start and end points of the voiced section detection are detected is displayed as a waveform 507, and the detection result display panel 504 from the IN frame to the OUT frame is displayed as a voiced section 508.
- the audio data in the audio storage ring buffer 1550 is displayed in a waveform from the offset 1 054 through the ring buffer.
- the data section between 1052 and 1053 is displayed as 508. This allows the user to visually check the detection result.
- Step 610 is the end processing.
- a communication command is transmitted to the video playback device 110 to put the video playback device 110 in a stop state (STAND BY OFF), and then the communication port is closed.
- the operation window 501 on the display device 101 is closed. Finally, release the secured memory area and end the processing.
- a control method and a filtering method when the method for detecting the start and end points of a sound section shown in the program 300 are applied to a sound section detection device will be disclosed.
- program 300 the start and end points can be detected after inputting the audio data of the entire video, but if long-time audio data is input at once, the time lag until detection becomes longer, and the real-time detection is improved. Be impaired. In order to maintain the real-time property of detection, it is better to shorten the audio data and divide it for each time to perform input processing and detection processing.
- FIG. 7 is a diagram showing a control method of the sound section detection device of the present invention, and shows a process until a start point of the sound section is detected.
- Each rectangle in the figure indicates a process to be controlled, and the width of the rectangle indicates the processing time length.
- Reference numeral 720 denotes audio data input processing in the audio input device 103.
- the input audio is Until the audio buffer of a certain length of time becomes full, it is accumulated in the audio input device 103, and when it is full, an interrupt signal indicating that the audio buffer is full is sent to the audio processing device 104.
- the time length of 702 indicates the size of the audio buffer.
- Reference numeral 703 denotes an acoustic analysis process in the speech processing device 104, which executes the program 300. Step 104 starts from the point when the interrupt signal arrives, and executes it until the next interrupt signal arrives.
- the processing of 703 can take up to 1 second at maximum, which is sufficient as the processing time.
- Ts is set to 200 msec and Tri is set to 500 msec, the start and end points of the sound can be detected by processing up to two times of voice data. At this time, the time lag from the start of input to 103 to the detection at 104 can be suppressed to a maximum of about 3 seconds, and detection can be performed almost in real time.
- Ts and Tn are the lower limit of the elapsed time of the voiced / silent state described in Fig. 4.These values are the length of one syllable in the voice and the length of the pause between utterances.
- the sampling frequency is set to 1 lkHz
- the number of sampling bits is set to 8 bits
- the number of channels is set to 1 (monaural)
- the amount of data transferred to memory 109 is 1 lk byte in a 1 second buffer. Transfer time is not an issue.
- FIG. 4 shows a flow of a process until a start point is detected.
- the DETECT button 503 When the DETECT button 503 is clicked, first the video is played back by the video playback device 110 by the overall control process, the audio input process 702 is started, and the sound section detection process is prepared, and the process progresses. The time begins counting by evening (70 1).
- voice data is input by the voice input processing of 702
- the data arrival time T1 is recorded in the memory 109 in the acoustic analysis processing of ⁇ 03 (704).
- the detection flag on the memory 109 is set to ON (705).
- the sound analysis processing of 703 When the sound analysis processing of 703 is completed, the detection flag is sensed in the overall control processing.
- the intermediate result is displayed on the audio waveform monitor 505 (706).
- the frame number reading time T2 is obtained from the timer, and the frame number and the reading time are stored in the memory 109.
- the frame number is converted into the frame number at the start point of the sound and stored in the memory 109 (770). If the end point of the sound is to be detected continuously, the processing from 702 to 707 is repeated until the end point is detected. ⁇ Since the processes from 02 to 707 can be repeated as many times as necessary, even if one video contains a plurality of sound sections, each can be detected.
- TCO TC2 1000CT2-TO) / 30 [frame]
- L is the audio buffer size (the number of data), and dT is the time length of the audio buffer. If the audio data is 8-bit monaural, L is the number of bytes in the audio buffer.
- the reason why the denominator is 30 in Equation 2 is that the NTSC video signal is composed of 30 frames per second. The end point of the sound can be calculated in the same way.
- the start and end points of a sound section can be detected almost in real time.
- FIG. 8 is a diagram showing the positional relationship between input and output data in the filtering processing in step 303 or 304. Each rectangle in the figure is This shows the data array. 8 0 1 is an input data array (number of data L [number]), and 8 0 2 is a filter buffer (number of data Lf [number]). 802 corresponds to the filter of 3221 in step 303 and corresponds to the filter of 322 in step 304.
- the data of 801 is sequentially read into the filter buffer 802, and the maximum value or the minimum value is obtained from all the data of 802. Output as position data.
- the output data of 803 is obtained from the input data of 801 as a whole.
- Lf data for the filter size is used for initializing 802, so no output data can be obtained for the first 804 and the last 805 of the output data array. If 802 is initialized each time data is received from the voice input device 103 in the control method of FIG. 7, the envelope of the filtering result will be interrupted.
- the filter buffer 802 is initialized only once at 701, and thereafter, without clearing halfway, the position where the next input data is read and the data contents are constantly stored.
- the (n + 1) th acoustic analysis process can use the 800 data Lf inherited from the nth time and the (n + 1) th input data 8006 L data.
- a total of L output data of the 805 part and the 807 part can be obtained.
- L output data can be obtained for L input data, it becomes possible to continuously filter audio data that has been divided and input.
- the output data of the n-th 805 part is obtained after the (n + 1) -th 806 is input.
- the data position X of the start and end points and the input data arrival time T1 read from the timer are used as shown in Equation 1. For this reason, the data arrival times for the n-th and (n + 1) th two times are recorded in the memory 109. If the start and end points of the sound are found at 805, the n-th arrival time is looked up at 807. If so, use the (n + 1) th arrival time.
- the filter size Lf should be set so that L-Lf is positive. Since the fundamental frequency of human voice is generally 100 Hz or more, if the number of data included in the reciprocal time length of 10 ms or more (for example, one frame time 33 mec) is set, there is no problem in calculating the envelope. Absent. The number of data can be calculated by multiplying the time length by the sampling frequency.
- the detection processing can be executed without any interruption in the audio data to be processed.
- FIG. 9 shows a flowchart of the process for detecting the start and end points of a sound section reflecting the above-described control method and filtering method
- FIG. 1 shows the data structure of audio data and control data stored in the memory 109.
- FIG. 9 is a flowchart showing a flow of a sound section detection process using the CM time length rule.
- the program 900 is a processing program for detecting a set of start and end points of a sound section, and is executed in step 608. 900 is roughly divided into the following four processes. (1) Processing to detect the start point of a sound section, (2) Processing to detect the end point of a sound section, (3) Judgment processing using the CM time length rule, (4) Detection that the detection is terminated when the specified time has elapsed Time limit processing.
- the processing of (1) includes steps 902 to 904, and the processing of (2) includes steps 906, 907, and 910. With these, the control of the processes 703 to 707 shown in FIG. 7 is executed.
- the processing of (3) is steps 905 and 911 to 915. Based on these, only the sound section for a predetermined period of time is sifted and divided.
- the processing of (4) is steps 908 and 909. With these, an upper limit is set for the execution time of the detection processing, and an error processing is performed when the end point is not found.
- the minimum necessary processes for detecting a sound section are the processes (1) and (2), and the processes (3) and (4) can be omitted.
- Step 901 is an initialization process. Audio data stored in memory 1 09 And control data are initialized, and the control processing 701 of FIG. 7 is executed. That is, the voice buffer 1 0 3 0, the voice accumulation buffer 1 0 5 0, and the control parameter 1 0 1 0 are initialized, and the empty flag 1 0 4 2 of the filter buffer 1 0 4 is set to TRUE. In step 902, the detection state of the start point of the sound section is determined. Step 9 03 is executed until the start point flag f lagIN 1 0 17 of the control parameter 110 1 0 becomes TRUE.
- step 903 the start point of the sound section is detected. Execute the program 300 and display the result on the audio waveform monitor 505.
- flagIN1017 is set to TRUE
- the current frame number is read from the video playback device 110
- the frame number acquisition time T2 is read from the timer.
- step 904 the frame number of the detected start point is calculated.
- the time TO of the start point is calculated by Equation 1, and the frame number TC0 of the start point is calculated by Equation 2.
- the starting point TC0 is displayed on the detection result display panel 504, and flagIN is returned to FALSE.
- step 905 the detection state of the sound section is determined. The following steps are performed until a sound interval with a certain time length is detected.
- step 906 the detection status of the end point of the sound section is determined.
- Steps 9 07 to 9 09 are executed until the value reaches RUE.
- step 907 the end point of the sound section is detected. Execute the program 300 and display the result on the audio waveform monitor 505. When the end point is detected, flagOUT1018 is set to TRUE, the current frame number is read from the video playback device 110, and the frame number acquisition time T2 is read from the timer. The frame number of the end point at this time is calculated in step 910.
- step 908 the elapsed time of the detection processing is determined. If the elapsed time from the detection of the start point in step 903 is longer than the specified detection time, it is determined that the video being processed does not include a video of an appropriate time length and step 909 Execute The specified detection time is, for example, twice the CM time length of 30 seconds. And set it to 60 seconds. If the current input data arrival time Tl1022 is Tl> T2 + 60 [sec] with respect to T2 obtained in step 903, it is determined that there is no video with an appropriate time length.
- step 909 the detection result is rejected and the detection process is interrupted. Cancels the previously detected start point.
- data input from the audio input device 103 is stopped, video reproduction on the video reproduction device 110 is temporarily stopped, and the audio buffer 103 and the filter buffer 104 are cleared.
- step 910 the frame number of the detected end point is calculated.
- the end point time TO is calculated by Equation 1 and the end point frame number TC0 is calculated by Equation 2.
- the end point TC0 is displayed on the detection result display panel 504, and flagOUT is returned to FALSE.
- step 911 the time length T of the sound section is calculated. T is calculated as the difference between the time of the start point obtained in step 904 and the time of the end point detected in step 910.
- Step 912 is a judgment process using the CM time length rule. If the time length of the detected voiced section satisfies a predetermined fixed time length, steps 913 and 914 are executed, and if it exceeds the fixed time length, step 915 is executed. If the fixed time length is not reached, the process returns to detecting the end point of the next sound section. As a result, it becomes possible to detect only a video including a sound section having a certain length of time.
- a CM is composed of 15 seconds or 30 seconds in length, and set the fixed time length to 15 seconds and 30 seconds, and the allowable range is 15 seconds. 1 second for 30 seconds and 2 seconds for 30 seconds, but these values may be changed appropriately according to the application.
- the detected start and end points are adopted as the start and end points of the sound section.
- data input from the audio input device 103 is stopped, video reproduction in the video reproduction device 110 is temporarily stopped, and the audio buffer 103 and the filter buffer 104 are cleared.
- step 9 15 the detection result is rejected and the detection processing is interrupted. Detected Cancel the start and end points and clear the display on panel 504. In addition, data input from the audio input device 103 is stopped, video reproduction in the video reproduction device 110 is temporarily stopped, and the audio buffer 103 and the filter buffer 104 are cleared.
- FIG. 1 is a diagram showing an example of a data structure for realizing a sound section detection according to the present invention.
- the processing data is stored in memory 109 and read out to CPU 107 as necessary.
- sampling frequency 1001 is audio signal information.
- sampling-bit number 1002 is stored.
- number of channels 1003 used for digitizing the audio signal with the audio input device 103 (1 for monaural, 1 for stereo 2) is stored.
- 1 0 10 is a control parameter. Stores various parameters and flags used in the sound interval detection processing.
- 1 0 1 1 to 1 0 1 4 are variable parameters that can be changed on the parameter setting panel 5 13.
- 1 0 15 to 1 0 18 are four flags indicating the state at the time of the start and end point determination of the voiced section described in Fig. 4, and 1 0 19 and 1 0 2 0 count the voiced / silent state It is a counter to perform.
- the start point flag 1 0 1 7 and the end point flag 1 0 1 8 are set to FALSE if the start and end points are not detected, and set to TRUE if the start and end points are detected.
- the audio buffer 1030 has a data structure of a buffer for storing processing data 311 to 315 transferred between each step of the program 300.
- the number of data 1 032 is the number of data stored in the buffer 1030. As described with reference to FIG. 8, since the output data of the 804 and 805 parts cannot be obtained only by the first input buffer data, the number of data in the output buffer is reduced. Therefore, 103 is provided separately from 103. 1 0 3 3 is processing data.
- the filter buffer 104 is a data structure of a ring buffer used for the maximum and minimum type filtering in steps 303 and 304. Two memories are prepared in memory 109 for MAX filtering and MIN filtering.
- the buffer size 1041 is calculated from the TLf of 1012.
- the empty flag 1 0 4 2 indicates the initialization status of the filter buffer. It is set to TRUE when the buffer is empty, and to FALSE once the buffer is full.
- the input data is initialized by copying the input data by the size 1 0 4 1 minute. If FALSE, do not initialize. As a result, the envelope can be calculated without interruption.
- 104 is an offset indicating the position where the next input data is read. 1 0 4 4 is the read input data, which is the target data of the filtering process.
- Reference numeral 1500 denotes an audio storage ring buffer for copying audio data input from the audio input device 103 and continuously holding the data for several seconds in the past.
- the stored data of 050 is used to display the audio data overnight waveform 507 and to reproduce the sound with the PLAY button 509.
- 1 0 5 1 is the buffer size. If the size of the buffer 105 is set to an integral multiple of 103, copying becomes easier.
- Numeral 1052 is the data position on the ring buffer corresponding to the data position X of the start point of the sound interval in FIG.
- Reference numeral 105 denotes a data position on the ring buffer corresponding to the end point. First, set a negative value to 1052 and 1053 to detect the start and end points. Replace with the value of the data position when output. 1054 is an offset indicating the beginning of the next input data copy position.
- 1 0 5 5 is audio data.
- the memory size of the data used for the sound section detection processing is estimated. For example, if the audio signal information 10000 handles 1 lkHz, 8-bit, monaural audio data, if the time length that can be recorded in the input buffer is 1 second, the required capacity of the audio buffer 1300 is 1 lk byte. And the total of the three buffers is about 33 kbytes. Assuming that the time length for storing audio is 40 seconds, the required capacity of the audio storage ring buffer 150 is about 44 Ok bytes. If the filter time length is 30 msec, the required capacity of the filter buffer 104 is about 0.3 kbytes, and even two filters are less than 1 kbytes. Even with other parameters, the total memory required for data is about 50 Ok bytes.
- the configuration of the present invention is capable of quantitatively and automatically detecting the presence or absence of a sound which has been conventionally judged by hearing, and thus has an effect of saving labor for detecting a sound section.
- the operator only needs to set the CM material on the video player and operate the buttons on the audio processing unit screen. This operation does not require any complicated operations such as frequent repetition of video playback, stop, and reverse playback, and thus has the effect of simplifying the operation.
- voice signals are divided and input at short intervals, voiced segments can be detected in real time, which has the effect of improving work performance.
- the sound of the detected sound section is displayed and reproduced in a waveform, so that the detection result can be immediately confirmed visually and audibly, saving labor for confirmation work.
- a margin can be set in the detected sound section, a high-quality CM video with no variation in time length can be registered in the management device, which has the effect of improving the quality of the registered video.
- the filtering process used for calculating the envelope according to the present invention is a power spectrum. Since the calculation amount is smaller than that of the calculation of the audio signal, it can be realized by a small computer such as a personal computer, and the calculation can be performed quickly even when the sampling rate of the audio signal input is high.
- a device that realizes such a method for detecting a sound section in a video can be realized by a small computer such as a personal computer, and an inexpensive detection device can be achieved.
- the method and apparatus for detecting a sound section according to the present invention are used in a CM registration apparatus that detects a start point and an end point of a CM image composed of video and audio and registers the CM image. Suitable to do.
- the present invention can be used for a CM detection device that detects a section of a CM video inserted in a movie or a TV program.
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP54032098A JP3753384B2 (ja) | 1997-03-19 | 1997-03-19 | 映像中の有音区間の終始点の検出装置 |
US09/341,471 US6600874B1 (en) | 1997-03-19 | 1997-03-19 | Method and device for detecting starting and ending points of sound segment in video |
PCT/JP1997/000905 WO1998041978A1 (fr) | 1997-03-19 | 1997-03-19 | Procede et dispositif destines a detecter des points de depart et de terminaison d'une section son dans une sequence video |
EP97907389A EP0977172A4 (en) | 1997-03-19 | 1997-03-19 | METHOD AND DEVICE FOR DETERMINING THE START AND END POINT OF A SOUND SECTION IN VIDEO |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP1997/000905 WO1998041978A1 (fr) | 1997-03-19 | 1997-03-19 | Procede et dispositif destines a detecter des points de depart et de terminaison d'une section son dans une sequence video |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1998041978A1 true WO1998041978A1 (fr) | 1998-09-24 |
Family
ID=14180261
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP1997/000905 WO1998041978A1 (fr) | 1997-03-19 | 1997-03-19 | Procede et dispositif destines a detecter des points de depart et de terminaison d'une section son dans une sequence video |
Country Status (4)
Country | Link |
---|---|
US (1) | US6600874B1 (ja) |
EP (1) | EP0977172A4 (ja) |
JP (1) | JP3753384B2 (ja) |
WO (1) | WO1998041978A1 (ja) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002049384A (ja) * | 2000-08-02 | 2002-02-15 | Sony Corp | ディジタル信号処理方法及びディジタル信号処理装置並びにプログラム格納媒体 |
WO2007017970A1 (ja) * | 2005-08-11 | 2007-02-15 | Mitsubishi Denki Kabushiki Kaisha | 映像記録装置、シーンチェンジ抽出方法、及び映像音声記録装置 |
JP2007516450A (ja) * | 2003-08-18 | 2007-06-21 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | デジタル音声信号におけるクリッキングノイズ検出 |
EP1953751A3 (en) * | 2007-01-30 | 2008-12-17 | Viktor Company of Japan Ltd. | Reproduction device, reproduction method and computer usable medium having computer readable reproduction embodied therein |
JP2009055620A (ja) * | 2008-09-29 | 2009-03-12 | Sony Corp | 情報処理装置および方法、並びにプログラム |
US7822569B2 (en) | 2005-04-20 | 2010-10-26 | Sony Corporation | Specific-condition-section detection apparatus and method of detecting specific condition section |
JP2011091859A (ja) * | 2011-01-14 | 2011-05-06 | Mitsubishi Electric Corp | 映像記録装置、映像記録方法、映像音声記録装置、及び映像音声記録方法 |
US8195472B2 (en) | 2001-04-13 | 2012-06-05 | Dolby Laboratories Licensing Corporation | High quality time-scaling and pitch-scaling of audio signals |
US8213775B2 (en) | 2004-12-27 | 2012-07-03 | Sony Corporation | Information processing apparatus and method, and program |
JP2012209958A (ja) * | 2012-06-08 | 2012-10-25 | Mitsubishi Electric Corp | 映像音声記録装置及び映像音声記録方法 |
US8488800B2 (en) | 2001-04-13 | 2013-07-16 | Dolby Laboratories Licensing Corporation | Segmenting audio signals into auditory events |
JP2019533189A (ja) * | 2016-09-28 | 2019-11-14 | 華為技術有限公司Huawei Technologies Co.,Ltd. | マルチチャネルオーディオ信号処理方法、装置、およびシステム |
WO2020198230A1 (en) * | 2019-03-27 | 2020-10-01 | On Time Staffing Inc. | Automatic camera angle switching to create combined audiovisual file |
US11023735B1 (en) | 2020-04-02 | 2021-06-01 | On Time Staffing, Inc. | Automatic versioning of video presentations |
US11127232B2 (en) | 2019-11-26 | 2021-09-21 | On Time Staffing Inc. | Multi-camera, multi-sensor panel data extraction system and method |
US11144882B1 (en) | 2020-09-18 | 2021-10-12 | On Time Staffing Inc. | Systems and methods for evaluating actions over a computer network and establishing live network connections |
US11423071B1 (en) | 2021-08-31 | 2022-08-23 | On Time Staffing, Inc. | Candidate data ranking method using previously selected candidate data |
US11727040B2 (en) | 2021-08-06 | 2023-08-15 | On Time Staffing, Inc. | Monitoring third-party forum contributions to improve searching through time-to-live data assignments |
US11907652B2 (en) | 2022-06-02 | 2024-02-20 | On Time Staffing, Inc. | User interface and systems for document creation |
US11961044B2 (en) | 2019-03-27 | 2024-04-16 | On Time Staffing, Inc. | Behavioral data analysis and scoring system |
Families Citing this family (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020120925A1 (en) * | 2000-03-28 | 2002-08-29 | Logan James D. | Audio and video program recording, editing and playback systems using metadata |
JP3070837U (ja) * | 2000-02-08 | 2000-08-15 | 船井電機株式会社 | ビデオテ―プレコ―ダ |
GB0029861D0 (en) * | 2000-12-07 | 2001-01-24 | Sony Uk Ltd | Replaying video information |
US7058889B2 (en) * | 2001-03-23 | 2006-06-06 | Koninklijke Philips Electronics N.V. | Synchronizing text/visual information with audio playback |
US7072908B2 (en) * | 2001-03-26 | 2006-07-04 | Microsoft Corporation | Methods and systems for synchronizing visualizations with audio streams |
US7161887B2 (en) * | 2001-11-13 | 2007-01-09 | Digeo, Inc. | Method and apparatus for extracting digital data from a medium |
US20050065915A1 (en) * | 2003-09-23 | 2005-03-24 | Allen Wayne J. | Method and system to add protocol support for network traffic tools |
FR2880462A1 (fr) | 2005-01-06 | 2006-07-07 | Thomson Licensing Sa | Procede de reproduction de documents comprenant des sequences alterees et, dispositif de reproduction associe |
US20090226144A1 (en) * | 2005-07-27 | 2009-09-10 | Takashi Kawamura | Digest generation device, digest generation method, recording medium storing digest generation program thereon and integrated circuit used for digest generation device |
WO2007039998A1 (ja) * | 2005-09-30 | 2007-04-12 | Pioneer Corporation | 本編外シーン抽出装置およびそのプログラム |
JP4698453B2 (ja) * | 2006-02-28 | 2011-06-08 | 三洋電機株式会社 | コマーシャル検出装置、映像再生装置 |
US7904056B2 (en) * | 2006-03-01 | 2011-03-08 | Ipc Systems, Inc. | System, method and apparatus for recording and reproducing trading communications |
JP4282704B2 (ja) * | 2006-09-27 | 2009-06-24 | 株式会社東芝 | 音声区間検出装置およびプログラム |
JP4909165B2 (ja) * | 2007-04-24 | 2012-04-04 | ルネサスエレクトロニクス株式会社 | シーン変化検出装置、符号化装置及びシーン変化検出方法 |
JP4962783B2 (ja) * | 2007-08-31 | 2012-06-27 | ソニー株式会社 | 情報処理装置および情報処理方法、並びにプログラム |
JP4950930B2 (ja) * | 2008-04-03 | 2012-06-13 | 株式会社東芝 | 音声/非音声を判定する装置、方法およびプログラム |
JP2010074823A (ja) * | 2008-08-22 | 2010-04-02 | Panasonic Corp | 録画編集装置 |
US8811793B2 (en) * | 2008-12-23 | 2014-08-19 | Sony Corporation | Camera event logger |
CN102073635B (zh) * | 2009-10-30 | 2015-08-26 | 索尼株式会社 | 节目端点时间检测装置和方法以及节目信息检索系统 |
US9031384B2 (en) | 2011-06-02 | 2015-05-12 | Panasonic Intellectual Property Corporation Of America | Region of interest identification device, region of interest identification method, region of interest identification program, and region of interest identification integrated circuit |
US10133472B2 (en) * | 2013-03-15 | 2018-11-20 | Disney Enterprises, Inc. | Gesture based video clipping control |
CN105144200A (zh) * | 2013-04-27 | 2015-12-09 | 数据飞讯公司 | 用于处理非结构化数字的基于内容的检索引擎 |
WO2015038121A1 (en) * | 2013-09-12 | 2015-03-19 | Thomson Licensing | Video segmentation by audio selection |
US8719032B1 (en) * | 2013-12-11 | 2014-05-06 | Jefferson Audio Video Systems, Inc. | Methods for presenting speech blocks from a plurality of audio input data streams to a user in an interface |
US10438582B1 (en) * | 2014-12-17 | 2019-10-08 | Amazon Technologies, Inc. | Associating identifiers with audio signals |
JP6060989B2 (ja) * | 2015-02-25 | 2017-01-18 | カシオ計算機株式会社 | 音声録音装置、音声録音方法、及びプログラム |
US10121471B2 (en) * | 2015-06-29 | 2018-11-06 | Amazon Technologies, Inc. | Language model speech endpointing |
JP6553111B2 (ja) | 2017-03-21 | 2019-07-31 | 株式会社東芝 | 音声認識装置、音声認識方法及び音声認識プログラム |
US11170760B2 (en) | 2019-06-21 | 2021-11-09 | Robert Bosch Gmbh | Detecting speech activity in real-time in audio signal |
CN110853622B (zh) * | 2019-10-22 | 2024-01-12 | 深圳市本牛科技有限责任公司 | 语音断句方法及系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS60498A (ja) * | 1983-06-17 | 1985-01-05 | カシオ計算機株式会社 | 音声検出装置 |
JPS6029800A (ja) * | 1983-07-29 | 1985-02-15 | 株式会社東芝 | 音声分析方式 |
JPH0528717A (ja) * | 1991-07-22 | 1993-02-05 | Sony Corp | 表示装置 |
JPH06302160A (ja) * | 1993-04-13 | 1994-10-28 | Sony Corp | 編集装置 |
JPH08205076A (ja) * | 1995-01-20 | 1996-08-09 | Canon Inc | 動画像編集装置及び動画像編集方法 |
JPH08279962A (ja) * | 1995-04-05 | 1996-10-22 | Nec Eng Ltd | Cm送出装置 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5596680A (en) * | 1992-12-31 | 1997-01-21 | Apple Computer, Inc. | Method and apparatus for detecting speech activity using cepstrum vectors |
JPH0990974A (ja) * | 1995-09-25 | 1997-04-04 | Nippon Telegr & Teleph Corp <Ntt> | 信号処理方法 |
JPH0991928A (ja) * | 1995-09-25 | 1997-04-04 | Nippon Telegr & Teleph Corp <Ntt> | 映像の編集方法 |
TW333610B (en) * | 1997-10-16 | 1998-06-11 | Winbond Electronics Corp | The phonetic detecting apparatus and its detecting method |
US6134524A (en) * | 1997-10-24 | 2000-10-17 | Nortel Networks Corporation | Method and apparatus to detect and delimit foreground speech |
-
1997
- 1997-03-19 US US09/341,471 patent/US6600874B1/en not_active Expired - Lifetime
- 1997-03-19 EP EP97907389A patent/EP0977172A4/en not_active Withdrawn
- 1997-03-19 JP JP54032098A patent/JP3753384B2/ja not_active Expired - Lifetime
- 1997-03-19 WO PCT/JP1997/000905 patent/WO1998041978A1/ja not_active Application Discontinuation
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS60498A (ja) * | 1983-06-17 | 1985-01-05 | カシオ計算機株式会社 | 音声検出装置 |
JPS6029800A (ja) * | 1983-07-29 | 1985-02-15 | 株式会社東芝 | 音声分析方式 |
JPH0528717A (ja) * | 1991-07-22 | 1993-02-05 | Sony Corp | 表示装置 |
JPH06302160A (ja) * | 1993-04-13 | 1994-10-28 | Sony Corp | 編集装置 |
JPH08205076A (ja) * | 1995-01-20 | 1996-08-09 | Canon Inc | 動画像編集装置及び動画像編集方法 |
JPH08279962A (ja) * | 1995-04-05 | 1996-10-22 | Nec Eng Ltd | Cm送出装置 |
Non-Patent Citations (2)
Title |
---|
See also references of EP0977172A4 * |
TAKAFUMI MIYATAKE, HITOSHI MATSUSHIAMA, MASAKAZU EJIRI, "High-Speed Computing Method for Maximum/Minimum Image Filtering (in Japanese)", THE TRANSACTION OF IEICE, Vol. J78 D-II, No. 11, 25 November 1995, p. 1598-1607. * |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4538704B2 (ja) * | 2000-08-02 | 2010-09-08 | ソニー株式会社 | ディジタル信号処理方法及びディジタル信号処理装置並びにプログラム格納媒体 |
JP2002049384A (ja) * | 2000-08-02 | 2002-02-15 | Sony Corp | ディジタル信号処理方法及びディジタル信号処理装置並びにプログラム格納媒体 |
US8842844B2 (en) | 2001-04-13 | 2014-09-23 | Dolby Laboratories Licensing Corporation | Segmenting audio signals into auditory events |
US8488800B2 (en) | 2001-04-13 | 2013-07-16 | Dolby Laboratories Licensing Corporation | Segmenting audio signals into auditory events |
US10134409B2 (en) | 2001-04-13 | 2018-11-20 | Dolby Laboratories Licensing Corporation | Segmenting audio signals into auditory events |
US8195472B2 (en) | 2001-04-13 | 2012-06-05 | Dolby Laboratories Licensing Corporation | High quality time-scaling and pitch-scaling of audio signals |
US9165562B1 (en) | 2001-04-13 | 2015-10-20 | Dolby Laboratories Licensing Corporation | Processing audio signals with adaptive time or frequency resolution |
JP4739023B2 (ja) * | 2003-08-18 | 2011-08-03 | ニュアンス コミュニケーションズ オーストリア ゲーエムベーハー | デジタル音声信号におけるクリッキングノイズ検出 |
JP2007516450A (ja) * | 2003-08-18 | 2007-06-21 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | デジタル音声信号におけるクリッキングノイズ検出 |
US8213775B2 (en) | 2004-12-27 | 2012-07-03 | Sony Corporation | Information processing apparatus and method, and program |
US7822569B2 (en) | 2005-04-20 | 2010-10-26 | Sony Corporation | Specific-condition-section detection apparatus and method of detecting specific condition section |
JP4707713B2 (ja) * | 2005-08-11 | 2011-06-22 | 三菱電機株式会社 | 映像記録装置及びシーンチェンジ抽出方法 |
JPWO2007017970A1 (ja) * | 2005-08-11 | 2009-02-19 | 三菱電機株式会社 | 映像記録装置、シーンチェンジ抽出方法、及び映像音声記録装置 |
WO2007017970A1 (ja) * | 2005-08-11 | 2007-02-15 | Mitsubishi Denki Kabushiki Kaisha | 映像記録装置、シーンチェンジ抽出方法、及び映像音声記録装置 |
US8886014B2 (en) | 2005-08-11 | 2014-11-11 | Mitsubishi Electric Corporation | Video recording apparatus, scene change extraction method, and video audio recording apparatus |
US7714223B2 (en) | 2007-01-30 | 2010-05-11 | Victor Company Of Japan, Limited | Reproduction device, reproduction method and computer usable medium having computer readable reproduction program emodied therein |
EP1953751A3 (en) * | 2007-01-30 | 2008-12-17 | Viktor Company of Japan Ltd. | Reproduction device, reproduction method and computer usable medium having computer readable reproduction embodied therein |
JP2009055620A (ja) * | 2008-09-29 | 2009-03-12 | Sony Corp | 情報処理装置および方法、並びにプログラム |
JP2011091859A (ja) * | 2011-01-14 | 2011-05-06 | Mitsubishi Electric Corp | 映像記録装置、映像記録方法、映像音声記録装置、及び映像音声記録方法 |
JP2012209958A (ja) * | 2012-06-08 | 2012-10-25 | Mitsubishi Electric Corp | 映像音声記録装置及び映像音声記録方法 |
JP2019533189A (ja) * | 2016-09-28 | 2019-11-14 | 華為技術有限公司Huawei Technologies Co.,Ltd. | マルチチャネルオーディオ信号処理方法、装置、およびシステム |
US10984807B2 (en) | 2016-09-28 | 2021-04-20 | Huawei Technologies Co., Ltd. | Multichannel audio signal processing method, apparatus, and system |
US11922954B2 (en) | 2016-09-28 | 2024-03-05 | Huawei Technologies Co., Ltd. | Multichannel audio signal processing method, apparatus, and system |
WO2020198230A1 (en) * | 2019-03-27 | 2020-10-01 | On Time Staffing Inc. | Automatic camera angle switching to create combined audiovisual file |
US11961044B2 (en) | 2019-03-27 | 2024-04-16 | On Time Staffing, Inc. | Behavioral data analysis and scoring system |
US11863858B2 (en) | 2019-03-27 | 2024-01-02 | On Time Staffing Inc. | Automatic camera angle switching in response to low noise audio to create combined audiovisual file |
US11457140B2 (en) | 2019-03-27 | 2022-09-27 | On Time Staffing Inc. | Automatic camera angle switching in response to low noise audio to create combined audiovisual file |
US11127232B2 (en) | 2019-11-26 | 2021-09-21 | On Time Staffing Inc. | Multi-camera, multi-sensor panel data extraction system and method |
US11783645B2 (en) | 2019-11-26 | 2023-10-10 | On Time Staffing Inc. | Multi-camera, multi-sensor panel data extraction system and method |
US11636678B2 (en) | 2020-04-02 | 2023-04-25 | On Time Staffing Inc. | Audio and video recording and streaming in a three-computer booth |
US11184578B2 (en) | 2020-04-02 | 2021-11-23 | On Time Staffing, Inc. | Audio and video recording and streaming in a three-computer booth |
US11861904B2 (en) | 2020-04-02 | 2024-01-02 | On Time Staffing, Inc. | Automatic versioning of video presentations |
US11023735B1 (en) | 2020-04-02 | 2021-06-01 | On Time Staffing, Inc. | Automatic versioning of video presentations |
US11720859B2 (en) | 2020-09-18 | 2023-08-08 | On Time Staffing Inc. | Systems and methods for evaluating actions over a computer network and establishing live network connections |
US11144882B1 (en) | 2020-09-18 | 2021-10-12 | On Time Staffing Inc. | Systems and methods for evaluating actions over a computer network and establishing live network connections |
US11727040B2 (en) | 2021-08-06 | 2023-08-15 | On Time Staffing, Inc. | Monitoring third-party forum contributions to improve searching through time-to-live data assignments |
US11966429B2 (en) | 2021-08-06 | 2024-04-23 | On Time Staffing Inc. | Monitoring third-party forum contributions to improve searching through time-to-live data assignments |
US11423071B1 (en) | 2021-08-31 | 2022-08-23 | On Time Staffing, Inc. | Candidate data ranking method using previously selected candidate data |
US11907652B2 (en) | 2022-06-02 | 2024-02-20 | On Time Staffing, Inc. | User interface and systems for document creation |
Also Published As
Publication number | Publication date |
---|---|
EP0977172A1 (en) | 2000-02-02 |
EP0977172A4 (en) | 2000-12-27 |
US6600874B1 (en) | 2003-07-29 |
JP3753384B2 (ja) | 2006-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO1998041978A1 (fr) | Procede et dispositif destines a detecter des points de depart et de terminaison d'une section son dans une sequence video | |
JP3454396B2 (ja) | 動画像の変化点検出制御方法とそれに基づく再生停止制御方法およびそれらを用いた動画像の編集システム | |
US7260306B2 (en) | Editing method for recorded information | |
US9269399B2 (en) | Capture, syncing and playback of audio data and image data | |
US5946445A (en) | Media recorder for capture and playback of live and prerecorded audio and/or video information | |
US20030040917A1 (en) | Device and method for selective recall and preservation of events prior to decision to record the events | |
US20070113182A1 (en) | Replay of media stream from a prior change location | |
KR100903160B1 (ko) | 신호처리장치 및 방법 | |
JPWO2007029479A1 (ja) | 記録再生装置、記録再生方法、記録再生プログラムおよびコンピュータに読み取り可能な記録媒体 | |
CA2477697A1 (en) | Methods and apparatus for use in sound replacement with automatic synchronization to images | |
WO2006134883A1 (ja) | コンテンツタグ付け支援装置およびコンテンツタグ付け支援方法 | |
JP5444611B2 (ja) | 信号処理装置、信号処理方法及びプログラム | |
US20030112260A1 (en) | Information retrieval system and information processing system | |
JPH10191248A (ja) | 映像編集方法およびその方法の手順を記録した記録媒体 | |
US20070055979A1 (en) | Method for recording of data stream on multiple recording media | |
US20080031108A1 (en) | Digital Dubbing Device | |
JP3138168B2 (ja) | 話速変換機能付磁気記録再生装置 | |
JP3133698B2 (ja) | テレビジョン放送信号の記録再生装置 | |
US20040062526A1 (en) | VCR manipulation of broadcast digital content | |
JP2008262647A (ja) | 番組再生装置 | |
JP4836198B2 (ja) | テレビジョン放送のcm放送判定方法、放送番組再生方法及び放送番組記録再生装置 | |
JP2001008157A (ja) | コマーシャル記録装置 | |
JP2006173715A (ja) | 番組自動選択装置 | |
CN115023760A (zh) | 控制信号生成电路、接收装置、系统、生成方法以及非易失性存储介质 | |
JPH118835A (ja) | テレビジョン放送の記録再生装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CN JP KR US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 09341471 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1997907389 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1997907389 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1997907389 Country of ref document: EP |