GNU social JP
  • FAQ
  • Login
GNU social JPは日本のGNU socialサーバーです。
Usage/ToS/admin/test/Pleroma FE
  • Public

    • Public
    • Network
    • Groups
    • Featured
    • Popular
    • People

Conversation

Notices

  1. Embed this notice
    Yukari Hafner :v_lesbian: (shinmera@mastodon.tymoon.eu)'s status on Saturday, 04-Jan-2025 08:30:34 JST Yukari Hafner :v_lesbian: Yukari Hafner :v_lesbian:

    I'm off to bed now, but in case anyone has thoughts about this I'd be all ears:

    Any ideas on how to do very simple human voice recognition? I just want to detect whether an audio stream is likely to be a voice or not, to improve the accuracy over a simple volume based approach that most chat things use.

    The best I've come up with is checking the largest frequency bin and whether it lies in a normal vocal range (100-8k Hz), but that seems like it'd also have lots of false positives.

    In conversation about 4 months ago from mastodon.tymoon.eu permalink
    • Embed this notice
      screwlisp (screwtape@mastodon.sdf.org)'s status on Saturday, 04-Jan-2025 08:30:33 JST screwlisp screwlisp
      in reply to

      @shinmera matched filter approach based on fragments of the person you expect to hear talking talking?

      In conversation about 4 months ago permalink
    • Embed this notice
      screwlisp (screwtape@mastodon.sdf.org)'s status on Saturday, 04-Jan-2025 08:32:17 JST screwlisp screwlisp
      in reply to

      @shinmera https://en.wikipedia.org/wiki/Matched_filter

      In conversation about 4 months ago permalink

      Attachments

      1. Domain not in remote thumbnail source whitelist: upload.wikimedia.org
        Matched filter
        In signal processing, the output of the matched filter is given by correlating a known delayed signal, or template, with an unknown signal to detect the presence of the template in the unknown signal. This is equivalent to convolving the unknown signal with a conjugated time-reversed version of the template. The matched filter is the optimal linear filter for maximizing the signal-to-noise ratio (SNR) in the presence of additive stochastic noise. Matched filters are commonly used in radar, in which a known signal is sent out, and the reflected signal is examined for common elements of the out-going signal. Pulse compression is an example of matched filtering. It is so called because the impulse response is matched to input pulse signals. Two-dimensional matched filters are commonly used in image processing, e.g., to improve the SNR of X-ray observations. Additional applications of note are in seismology and gravitational-wave astronomy. Matched filtering is a demodulation technique with LTI (linear time invariant) filters to maximize SNR. It was originally also known as a North filter. Derivation...
    • Embed this notice
      screwlisp (screwtape@mastodon.sdf.org)'s status on Saturday, 04-Jan-2025 08:37:14 JST screwlisp screwlisp
      in reply to

      @shinmera I guess you could base it on a range of people instead of one person, and it would work better on average and worse in any particular case. This is a normal receiver operating characteristic scenario isn't it? There will be a lot of implementations of this sitting around I think. (produces none).

      In conversation about 4 months ago permalink
    • Embed this notice
      Yukari Hafner :v_lesbian: (shinmera@mastodon.tymoon.eu)'s status on Saturday, 04-Jan-2025 08:37:15 JST Yukari Hafner :v_lesbian: Yukari Hafner :v_lesbian:
      in reply to
      • screwlisp

      @screwtape Hmm, yeah, I thought about similar stuff, but I really don't want to train on a specific voice or anything. I guess convolving with an inverse average voice frequency response and then checking deviation could work?

      In conversation about 4 months ago permalink
    • Embed this notice
      screwlisp (screwtape@mastodon.sdf.org)'s status on Saturday, 04-Jan-2025 09:28:14 JST screwlisp screwlisp
      in reply to

      @shinmera oh, I'm aware people have done what you're saying in particular, but I've never watched such a thing. Basically you do a discrete convolution/correlation of two arrays, the test sample, and the kernel. We expect that bins in the result exceeding some sensitivity number you choose by trial and error are detections of the kernel in the test sample. You judge your quality by the True Positive Fraction and False Positive Fraction for your chosen sensitivity.

      In conversation about 4 months ago permalink
    • Embed this notice
      Yukari Hafner :v_lesbian: (shinmera@mastodon.tymoon.eu)'s status on Saturday, 04-Jan-2025 09:28:15 JST Yukari Hafner :v_lesbian: Yukari Hafner :v_lesbian:
      in reply to
      • screwlisp

      @screwtape I have no idea, and in general signal processing theory stuff is extremely incompatible with my brain, so....

      Anyway, I just want to do something a bit smarter than the usual voice chat thing of thresholding by volume, with the hopes I'll prevent it from being triggered by random noises.

      Since I want to use it to drive the avatar's mouth open/close, having minor noise be treated as signal is far worse than in a voice chat app situation.

      In conversation about 4 months ago permalink
    • Embed this notice
      screwlisp (screwtape@mastodon.sdf.org)'s status on Saturday, 04-Jan-2025 09:28:59 JST screwlisp screwlisp
      in reply to

      @shinmera I'll do a demo of a matched filter for the show on wednesday, since I was building fourier transform pipeline demos right now anyway.

      In conversation about 4 months ago permalink
    • Embed this notice
      screwlisp (screwtape@mastodon.sdf.org)'s status on Saturday, 04-Jan-2025 09:36:03 JST screwlisp screwlisp
      in reply to

      @shinmera I would run a bank of filters and max the results. That's my feel for "any of multiple things are happening".

      In conversation about 4 months ago permalink

Feeds

  • Activity Streams
  • RSS 2.0
  • Atom
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.