Acoustic Echo Cancellation (AEC) in embedded software

前端 未结 2 1382
闹比i
闹比i 2021-02-11 03:45

I am doing a VoIP project on embedded device. I have built a sample using a 32bits MCU with a low grade audio codec. Now I found that there is echo issue on my device, that is I

2条回答
  •  攒了一身酷
    2021-02-11 04:37

    I had a heck of a time with echo cancellation. I wrote a softphone, and the user can switch their audio input and output devices around to suit their fancy. I tried the Speex echo cancellation library, and several other open source libs I found online. None worked well for me. I tried different speaker/mike configuration and the echo was always there in some form or fashion.

    I believe it would be very hard to create AEC code that would work for all possible speaker configurations / room sizes / background noises..etc. Finally I sat down and wrote my own echo cancellation module for my softphone with this algorithm.

    It's somewhat crude, but it has worked well and is reliable.

    variable1: Keep a record of what the average amplitude is of when the person to whom you're talking is speaking. (Don't factor quiet-time)

    variable2: Keep a record of what the average amplitude is on the input (mike), but only when there is voice- again- don't factor quiet time.

    As soon as there's audio to play- cut the mike. And assuming the person listening is not talking, turn the mike on 150-300ms after the last audible audio frame comes in to be played.

    If the audio from the microphones (that you're dropping during playback) is greater than oh- say (variable2 * 1.5), start sending the audio input frames for a specified duration, resetting that duration every time the input amplitude reaches (variable2 * 1.5).

    That way the person talking will know they are being interrupted, and stop to see what the person is saying. If the person talking doesn't have too noisy of a background, they will probably hear most if not all of the interruption.

    Like I said, not the most graceful, but it doesn't use a lot of resources (CPU, memory) and it actually works pretty darn well. I am very pleased with how mine sounds.

    To implement it, I just made a few functions.

    On a received audio frame, I call a function I called:

    void audioin( AEC *ec, short *frame ) {
        unsigned int tas=0; /* Total sum of all audio in frame (absolute value) */
        int i=0;
        for (;i<160;i++)
            tas+=ABS(frame[i]);
        tas/=160; /* 320 byte frames muLaw */
        if (tas>300) { /* I assume this is audiable */
            lockecho(ec);
            ec->lastaudibleframe=GetTickCount64();
            unlockecho(ec);
        }
        return;
    }
    

    and before sending a frame, I do:

    #define ECHO_THRESHOLD 300 /* Time to keep suppression alive after last audible frame */
    #define ONE_MINUTE 3000 /* 3000 20ms samples */
    #define AVG_PERIOD 250 /* 250 20ms samples */
    #define ABS(x) (x>0?x:-x)
    
    
    char removeecho( AEC *ec, short *aecinput ) {
        int tas=0; /* Average absolute amplitude in this signal */
        int i=0;
        unsigned long long *tot=0;
        unsigned int *ctr=0;
        unsigned short *avg=0;
        char suppressframe=0;
        lockecho(ec);
        if (ec->lastaudibleframe+ECHO_THRESHOLD > GetTickCount64() ) {
            /* If we're still within the threshold for echo (speaker state is ON) */
            tot=&ec->t_aiws;
            ctr=&ec->c_aiws;
            avg=&ec->aiws;
        } else {
            /* If we're outside the threshold for echo (speaker state is OFF) */
            tot=&ec->t_aiwos;
            ctr=&ec->c_aiwos;
            avg=&ec->aiwos;
        }
        for (;i<160;i++) {
            tas+=ABS(aecinput[i]);
        }
        tas/=160;
        if (tas>200) {
            (*tot)+=tas;
            (*avg)=(unsigned short)((*tot)/( (*ctr)?(*ctr):1));
            (*ctr)++;
            if ((*ctr)>AVG_PERIOD) {
                (*tot)=(*avg);
                (*ctr)=0;
            }
        }
        if ( (avg==&ec->aiws) ) {
            tas-=ec->aiwos;
            if (tas<0) {
                tas=0;
            }
            if ( ((unsigned short) tas > (ec->aiws*1.5)) && ((unsigned short)tas>=ec->aiwos) && (ec->aiwos!=0) ) {
                suppressframe=0;
            } else {
                suppressframe=1;
            }
        }
        if (suppressframe) { /* Silence frame */
            memset(aecinput, 0, 320);
        }
        unlockecho(ec);
        return suppressframe;
    }
    

    Which will silence the frame if it needs to. I keep all my variables, like the timers, and amplitude averages in the AEC struct, which I return from a call to

    AEC *initecho( void ) {
        AEC *ec=0;
        ec=(AEC *)malloc(sizeof(AEC));
        memset(ec, 0, sizeof(AEC));
        ec->aiws=200; /* Just a default guess as to what the average amplitude would be */
        return ec;
    }
    
    
    
    
    
    typedef struct aec {
        unsigned long long lastaudibleframe; /* time stamp of last audible frame */
        unsigned short aiws; /* Average mike input when speaker is playing */
        unsigned short aiwos; /*Average mike input when speaker ISNT playing */
        unsigned long long t_aiws, t_aiwos; /* Internal running total (sum of PCM) */
        unsigned int c_aiws, c_aiwos; /* Internal counters for number of frames for     averaging */
        unsigned long lockthreadid; /* Thread ID with lock */
        int stlc; /* Same thread lock-count */
    } AEC;
    

    You can adapt as you need to and play with the idea, but like I said. It actually sounds pretty dang good. The only problem I have is if they have a lot of background noise. But for me, if they pick up their USB handset or are using a headset, they can turn echo cancellation off, and not worry about it...but though PC speakers with a mike...I'm pretty happy with it.

    I hope it helps, or gives you something to build on...

提交回复
热议问题