Using UWP monitor live audio and detect gun-fire/clap sound

问题

I am developing a new UWP app which should monitor sound and fire a event for each sudden sound blow (something like gun fire or clap).

It needs to enable default Audio Input and monitor live audio.
Set audio sensitivity for identifying environment noise and recognizing clap/gun-fire
When there is a high frequency sound like a clap/gun-fire sound (Ideally it should be like configured frequency like +/-40 then it is a gun-fire/clap) then it should call a event.

No need to save Audio I tried to implement this

SoundMonitoringPage:

public sealed partial class MyPage : Page
    {
 private async void Page_Loaded(object sender, RoutedEventArgs e)
        {
            string deviceId = Windows.Media.Devices.MediaDevice.GetDefaultAudioCaptureId(Windows.Media.Devices.AudioDeviceRole.Communications);
            gameChatAudioStateMonitor = AudioStateMonitor.CreateForCaptureMonitoringWithCategoryAndDeviceId(MediaCategory.GameChat, deviceId);
            gameChatAudioStateMonitor.SoundLevelChanged += GameChatSoundLevelChanged;

//other logic
}
    }

Sound Level Change:

 private void GameChatSoundLevelChanged(AudioStateMonitor sender, object args)
        {
            switch (sender.SoundLevel)
            {
                case SoundLevel.Full:
                    LevelChangeEvent();
                    break;
                case SoundLevel.Muted:
                    LevelChangeEvent();
                    break;
                case SoundLevel.Low:
                    // Audio capture should never be "ducked", only muted or full volume.
                    Debug.WriteLine("Unexpected audio state change.");
                    break;
            }
        }

ENV: windows 10 (v1809) IDE: VS 2017

Not sure if this is the right approach. This is not enabling audio and not hitting the level change event.

I see other options in WinForms & NAudio tutorial here. Probably with Sampling frequency I can check events... Doesn't have must tutorial on using NAudio with UWP to plot the graph and identify the frequency.

Update:

Followed suggestion from @Rob Caplan - MSFT, here is what I ended up with

IMemoryBufferByteAccess.cs

// We are initializing a COM interface for use within the namespace
    // This interface allows access to memory at the byte level which we need to populate audio data that is generated
    [ComImport]
    [Guid("5B0D3235-4DBA-4D44-865E-8F1D0E4FD04D")]
    [InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]

    unsafe interface IMemoryBufferByteAccess
    {
        void GetBuffer(out byte* buffer, out uint capacity);
    }

GunFireMonitorPage.xaml.cs

 public sealed partial class GunFireMonitorPage : Page
    {
        private MainPage _rootPage;
        public static GunFireMonitorPage Current;


        private AudioGraph _graph;
        private AudioDeviceOutputNode _deviceOutputNode;
        private AudioFrameInputNode _frameInputNode;
        public double Theta;
        public DrivePage()
        {
            InitializeComponent();
            Current = this;
        }

        protected override async void OnNavigatedTo(NavigationEventArgs e)
        {
            _rootPage = MainPage.Current;
            await CreateAudioGraph();
        }


        protected override void OnNavigatedFrom(NavigationEventArgs e)
        {
            _graph?.Dispose();
        }
        private void Page_Loaded(object sender, RoutedEventArgs e)
        {
        }

        private unsafe AudioFrame GenerateAudioData(uint samples)
        {
            // Buffer size is (number of samples) * (size of each sample)
            // We choose to generate single channel (mono) audio. For multi-channel, multiply by number of channels
            uint bufferSize = samples * sizeof(float);
            AudioFrame audioFrame = new AudioFrame(bufferSize);

            using (AudioBuffer buffer = audioFrame.LockBuffer(AudioBufferAccessMode.Write))
            using (IMemoryBufferReference reference = buffer.CreateReference())
            {
                // Get the buffer from the AudioFrame
                // ReSharper disable once SuspiciousTypeConversion.Global
                // ReSharper disable once UnusedVariable
                ((IMemoryBufferByteAccess) reference).GetBuffer(out var dataInBytes, out var capacityInBytes);

                // Cast to float since the data we are generating is float
                var dataInFloat = (float*)dataInBytes;

                float freq = 1000; // choosing to generate frequency of 1kHz
                float amplitude = 0.3f;
                int sampleRate = (int)_graph.EncodingProperties.SampleRate;
                double sampleIncrement = (freq * (Math.PI * 2)) / sampleRate;

                // Generate a 1kHz sine wave and populate the values in the memory buffer
                for (int i = 0; i < samples; i++)
                {
                    double sinValue = amplitude * Math.Sin(Theta);
                    dataInFloat[i] = (float)sinValue;
                    Theta += sampleIncrement;
                }
            }

            return audioFrame;
        }
        private void node_QuantumStarted(AudioFrameInputNode sender, FrameInputNodeQuantumStartedEventArgs args)
        {
            // GenerateAudioData can provide PCM audio data by directly synthesizing it or reading from a file.
            // Need to know how many samples are required. In this case, the node is running at the same rate as the rest of the graph
            // For minimum latency, only provide the required amount of samples. Extra samples will introduce additional latency.
            uint numSamplesNeeded = (uint)args.RequiredSamples;
            if (numSamplesNeeded != 0)
            {
                AudioFrame audioData = GenerateAudioData(numSamplesNeeded);
                _frameInputNode.AddFrame(audioData);
            }
        }

        private void Button_Click(object sender, RoutedEventArgs e)
        {
            if (generateButton.Content != null && generateButton.Content.Equals("Generate Audio"))
            {
                _frameInputNode.Start();
                generateButton.Content = "Stop";
                audioPipe.Fill = new SolidColorBrush(Colors.Blue);
            }
            else if (generateButton.Content != null && generateButton.Content.Equals("Stop"))
            {
                _frameInputNode.Stop();
                generateButton.Content = "Generate Audio";
                audioPipe.Fill = new SolidColorBrush(Color.FromArgb(255, 49, 49, 49));
            }
        }

        private async Task CreateAudioGraph()
        {
            // Create an AudioGraph with default settings
            AudioGraphSettings settings = new AudioGraphSettings(AudioRenderCategory.Media);
            CreateAudioGraphResult result = await AudioGraph.CreateAsync(settings);

            if (result.Status != AudioGraphCreationStatus.Success)
            {
                // Cannot create graph
                _rootPage.NotifyUser($"AudioGraph Creation Error because {result.Status.ToString()}", NotifyType.ErrorMessage);
                return;
            }

            _graph = result.Graph;

            // Create a device output node
            CreateAudioDeviceOutputNodeResult deviceOutputNodeResult = await _graph.CreateDeviceOutputNodeAsync();
            if (deviceOutputNodeResult.Status != AudioDeviceNodeCreationStatus.Success)
            {
                // Cannot create device output node
                _rootPage.NotifyUser(
                    $"Audio Device Output unavailable because {deviceOutputNodeResult.Status.ToString()}", NotifyType.ErrorMessage);
                speakerContainer.Background = new SolidColorBrush(Colors.Red);
            }

            _deviceOutputNode = deviceOutputNodeResult.DeviceOutputNode;
            _rootPage.NotifyUser("Device Output Node successfully created", NotifyType.StatusMessage);
            speakerContainer.Background = new SolidColorBrush(Colors.Green);

            // Create the FrameInputNode at the same format as the graph, except explicitly set mono.
            AudioEncodingProperties nodeEncodingProperties = _graph.EncodingProperties;
            nodeEncodingProperties.ChannelCount = 1;
            _frameInputNode = _graph.CreateFrameInputNode(nodeEncodingProperties);
            _frameInputNode.AddOutgoingConnection(_deviceOutputNode);
            frameContainer.Background = new SolidColorBrush(Colors.Green);

            // Initialize the Frame Input Node in the stopped state
            _frameInputNode.Stop();

            // Hook up an event handler so we can start generating samples when needed
            // This event is triggered when the node is required to provide data
            _frameInputNode.QuantumStarted += node_QuantumStarted;

            // Start the graph since we will only start/stop the frame input node
            _graph.Start();
        }
    }

GunFireMonitorPage.xaml

<Page
    x:Class="SmartPileInspector.xLite.GunFireMonitorPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    mc:Ignorable="d" Loaded="Page_Loaded"
    HorizontalAlignment="Center"
    Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
    <ScrollViewer HorizontalAlignment="Center">
        <StackPanel HorizontalAlignment="Center">
            <!-- more page content -->
            <Grid HorizontalAlignment="Center">
                <Grid.ColumnDefinitions>
                    <ColumnDefinition Width="*"/>
                    <ColumnDefinition Width="*"/>
                </Grid.ColumnDefinitions>

                <Grid.RowDefinitions>
                    <RowDefinition Height="55"></RowDefinition>
                </Grid.RowDefinitions>

            </Grid>


            <AppBarButton x:Name="generateButton" Content="Generate Audio" Click="Button_Click" MinWidth="120" MinHeight="45" Margin="0,50,0,0"/>
            <Border x:Name="frameContainer" BorderThickness="0" Background="#4A4A4A" MinWidth="120" MinHeight="45" Margin="0,20,0,0">
                <TextBlock x:Name="frame" Text="Frame Input" VerticalAlignment="Center" HorizontalAlignment="Center" />
            </Border>
            <StackPanel>
                <Rectangle x:Name="audioPipe" Margin="0,20,0,0" Height="10" MinWidth="160" Fill="#313131" HorizontalAlignment="Stretch"/>
            </StackPanel>
            <Border x:Name="speakerContainer" BorderThickness="0" Background="#4A4A4A" MinWidth="120" MinHeight="45" Margin="0,20,0,0">
                <TextBlock x:Name="speaker" Text="Output Device" VerticalAlignment="Center" HorizontalAlignment="Center" />
            </Border>
            <!--</AppBar>-->

        </StackPanel>
    </ScrollViewer>
</Page>

There is no graph generated. And there is continuous beep sound with blue line. Any help is greatly appreciated

Update: Implemented AudioVisualizer

With the help of AudioVisualizer, I was able to plot the lice audio graph.

  AudioGraph _graph;
        AudioDeviceInputNode _inputNode;
        PlaybackSource _source;
        SourceConverter _converter;
 protected override void OnNavigatedTo(NavigationEventArgs e)
        {
            _rootPage = MainPage.Current;
            _rootPage.SetDimensions(700, 600);

            base.OnNavigatedTo(e);
            CreateAudioGraphAsync();
        }
 protected override void OnNavigatedFrom(NavigationEventArgs e)
        {
            base.OnNavigatedFrom(e);
            _graph?.Stop();
            _graph?.Dispose();
            _graph = null;
        }
 async void CreateAudioGraphAsync()
        {
            var graphResult = await AudioGraph.CreateAsync(new AudioGraphSettings(Windows.Media.Render.AudioRenderCategory.Media));
            if (graphResult.Status != AudioGraphCreationStatus.Success)
                throw new InvalidOperationException($"Graph creation failed {graphResult.Status}");
            _graph = graphResult.Graph;
            var inputNodeResult = await _graph.CreateDeviceInputNodeAsync(MediaCategory.Media);
            if (inputNodeResult.Status == AudioDeviceNodeCreationStatus.Success)
            {

                _inputNode = inputNodeResult.DeviceInputNode;

                _source = PlaybackSource.CreateFromAudioNode(_inputNode);
                _converter = new SourceConverter
                {
                    Source = _source.Source,
                    MinFrequency = 110.0f,
                    MaxFrequency = 3520.0f,
                    FrequencyCount = 12 * 5 * 5,
                    FrequencyScale = ScaleType.Linear,
                    SpectrumRiseTime = TimeSpan.FromMilliseconds(20),
                    SpectrumFallTime = TimeSpan.FromMilliseconds(200),
                    RmsRiseTime = TimeSpan.FromMilliseconds(20),
                    RmsFallTime = TimeSpan.FromMilliseconds(500),
                    ChannelCount = 1
                };
                // Note A2
                // Note A7
                // 5 octaves, 5 bars per note
                // Use RMS to gate noise, fast rise slow fall
                NotesSpectrum.Source = _converter;

                _graph.Start();
            }
            else
            {
                _rootPage.NotifyUser("Cannot access microphone", NotifyType.ErrorMessage);
            }
        }

Now the challenge is how do I wire an event when wave frequency is above a threshold? In that event I would like to count number of shots, timestamp and it's intensity.

Example Sound

Here is my Recording of live sound, as you can here, when there is that big hammer strike (every second or less), I would like to call a event.

回答1:

You can find the decibels of a frame by finding the average amplitude of all the pcm data from that frame.I believe you want create a graph that handles the input so that looks like this

private static event LoudNoise<double>;
private static int quantum = 0;
static AudioGraph ingraph;
private static AudioDeviceInputNode deviceInputNode;
private static AudioFrameOutputNode frameOutputNode;

public static async Task<bool> CreateInputDeviceNode(string deviceId)
{
    Console.WriteLine("Creating AudioGraphs");
    // Create an AudioGraph with default settings
    AudioGraphSettings graphsettings = new AudioGraphSettings(AudioRenderCategory.Media);
    graphsettings.EncodingProperties = new AudioEncodingProperties();
    graphsettings.EncodingProperties.Subtype = "Float";
    graphsettings.EncodingProperties.SampleRate = 48000;
    graphsettings.EncodingProperties.ChannelCount = 2;
    graphsettings.EncodingProperties.BitsPerSample = 32;
    graphsettings.EncodingProperties.Bitrate = 3072000;
    //settings.DesiredSamplesPerQuantum = 960;
    //settings.QuantumSizeSelectionMode = QuantumSizeSelectionMode.ClosestToDesired;
    CreateAudioGraphResult graphresult = await AudioGraph.CreateAsync(graphsettings);

    if (graphresult.Status != AudioGraphCreationStatus.Success)
    {
        // Cannot create graph
        return false;
    }

    ingraph = graphresult.Graph;AudioGraphSettings nodesettings = new AudioGraphSettings(AudioRenderCategory.GameChat);
nodesettings.EncodingProperties = AudioEncodingProperties.CreatePcm(48000, 2, 32);
    nodesettings.DesiredSamplesPerQuantum = 960;
    nodesettings.QuantumSizeSelectionMode = QuantumSizeSelectionMode.ClosestToDesired;
    frameOutputNode = ingraph.CreateFrameOutputNode(ingraph.EncodingProperties);
    quantum = 0;
    ingraph.QuantumStarted += Graph_QuantumStarted;

    DeviceInformation selectedDevice;

    string device = Windows.Media.Devices.MediaDevice.GetDefaultAudioCaptureId(Windows.Media.Devices.AudioDeviceRole.Default);
    if (!string.IsNullOrEmpty(device))
    {
            selectedDevice = await DeviceInformation.CreateFromIdAsync(device);
    } else
    {
        return false;
    }

    CreateAudioDeviceInputNodeResult result =
        await ingraph.CreateDeviceInputNodeAsync(MediaCategory.Media, nodesettings.EncodingProperties, selectedDevice);
    if (result.Status != AudioDeviceNodeCreationStatus.Success)
    {
        // Cannot create device output node
        return false;
    }


    deviceInputNode = result.DeviceInputNode;
    deviceInputNode.AddOutgoingConnection(frameOutputNode);
    frameOutputNode.Start();
    ingraph.Start();
    return true;
}

private static void Graph_QuantumStarted(AudioGraph sender, object args)
{
    if (++quantum % 2 == 0)
    {
        AudioFrame frame = frameOutputNode.GetFrame();
        float[] dataInFloats;
        using (AudioBuffer buffer = frame.LockBuffer(AudioBufferAccessMode.Write))
        using (IMemoryBufferReference reference = buffer.CreateReference())
        {
            // Get the buffer from the AudioFrame
            ((IMemoryBufferByteAccess)reference).GetBuffer(out byte* dataInBytes, out uint capacityInBytes);

            float* dataInFloat = (float*)dataInBytes;
            dataInFloats = new float[capacityInBytes / sizeof(float)];

            for (int i = 0; i < capacityInBytes / sizeof(float); i++)
            {
                dataInFloats[i] = dataInFloat[i];
            }
        }

        double decibels = 0f;
        foreach (var sample in dataInFloats)
        {
            decibels += Math.Abs(sample);
        }
        decibels = 20 * Math.Log10(decibels / dataInFloats.Length);
        // You can pass the decibel value where ever you'd like from here
        if (decibels > 10)
        {
            LoudNoise?.Invoke(this, decibels);
         }
    }
}

P.S. I did all of this static but naturally it'll work if it's all in the same instance

I also copied this partially from my own project so it may have some parts I forgot to trim. Hope it helps

回答2:

Answering the "is this the right approach" question: no, the AudioStateMonitor will not help with the problem.

AudioStateMonitor.SoundLevelChanged tells you if the system is ducking your sound so it doesn't interfere with something else. For example, it may mute music in favour of the telephone ringer. SoundLevelChanged doesn't tell you anything about the volume or frequency of recorded sound, which is what you'll need to detect your handclap.

The right approach will be along the lines of using an AudioGraph (or WASAPI, but not from C#) to capture the raw audio into an AudioFrameOutputNode to process the signal and then run that through an FFT to detect sounds in your target frequencies and volumes. The AudioCreation sample demonstrates using an AudioGraph, but not specifically AudioFrameOutputNode.

Per https://home.howstuffworks.com/clapper1.htm clapping will be in a frequency range of 2200Hz to 2800Hz.

Recognizing gunshots looks like it's significantly more complicated, with different guns having very different signatures. A quick search found several research papers on this rather than trivial algorithms. I suspect you'll want some sort of Machine Learning to classify these. Here's a previous thread discussing using ML to differ between gunshots and non-gunshots: SVM for one Vs all acoustic signal classification

来源：https://stackoverflow.com/questions/54019461/using-uwp-monitor-live-audio-and-detect-gun-fire-clap-sound

标签

audio

uwp

audio-streaming

uwp-xaml