Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Adjust synchronizations of video and subtitles automatically by temporal distribution

by afoken (Chancellor)
on Jan 31, 2023 at 08:48 UTC ( [id://11150050]=note: print w/replies, xml ) Need Help??


in reply to Adjust synchronizations of video and subtitles automatically by temporal distribution

Is it possible to filter speech frequencies in a video with significant accuracy to identify the passages were people talk?

Telephony started with very bad microphones, transmitting barely anything outside the range 300 Hz to 3 kHz, but that was "good enough". Technical development improved the microphones, but analog telephony was and still is intentionally limited to that frequency range. Even when switching to ISDN, the sampling rate was only 8 kHz, limiting audio to about 3 kHz. Things changed only after migration to SIP, with "HD" audio codecs that allow higher frequencies, using more bandwidth and/or more available computing power.

So I would expect that a filter with that frequency range could be a usable indicator for speech.

Unfortunately, because the human ear is most sensitive in exactly this range, almost all audible warning signals also use that frequency range. So you will get some false positives. A FFT should be able to identify sharp peaks coming from all kind of beepers and ignore those peaks.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Replies are listed 'Best First'.
Re^2: Adjust synchronizations of video and subtitles automatically by temporal distribution
by cavac (Parson) on Feb 02, 2023 at 14:39 UTC

    I suspect you will also get some false positives with big budget movie musical scoring. For example, some tracks of the "Titanic"(*) movie use instruments that are supposed to sound like voices. Unless you want a lot of subtitles saying "aaaaaahhhh", more advanced filtering or access to a soundtrack without the music would be required.

    (*) "Take Her to Sea, Mr. Murdoch" by James Horner

    PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
Re^2: Adjust synchronizations of video and subtitles automatically by temporal distribution
by LanX (Saint) on Feb 01, 2023 at 01:25 UTC
    Thanks.

    Let's simplify this to a decision problem to have a start.

    Let's suppose we have n SRT-files with different time-stamps, and one is a perfect match to a given soundtrack.

    Now we want to rank which ones fit best. (That's actually a real life scenario)

    With SRT-files I can easily tell sequences of non-speech gaps, like here 1.3 secs between 00:05:15,300 and 00:05:16,400

    1 00:05:00,400 --> 00:05:15,300 This is an example of a subtitle. 2 00:05:16,400 --> 00:05:25,300 This is an example of a subtitle - 2nd subtitle.

    I could check how the gaps of those n SRTs overlap with "silent" passages in the soundtrack (e.g an XOR metric) and rank the SRTs by proximity.

    Question: how can I technically get the timestamps of silent passages of a soundtrack?

    Let's define silent as falling under a certain volume's threshold after filtering frequencies.

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11150050]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-03-29 02:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found