C/C++/Obj-C Real-time algorithm to ascertain Note (not Pitch) from Vocal Input

后端 未结 9 706
伪装坚强ぢ
伪装坚强ぢ 2020-12-13 01:32

I want to detect not the pitch, but the pitch class of a sung note.

So, whether it is C4 or C5 is not important: they must both be detected as C.<

相关标签:
9条回答
  • 2020-12-13 01:45

    Most of the frequency detection algorithms cited in other answers don't work well for voice. To see why this is so intuitively, consider that all the vowels in a language can be sung at one particular note. Even though all those vowels have very different frequency content, they would all have to be detected as the same note. Any note detection algorithm for voices must take this into account somehow. Furthermore, human speech and song contains many fricatives, many of which have no implicit pitch in them.

    In the generic (non voice case) the feature you are looking for is called the chroma feature and there is a fairly large body of work on the subject. It is equivalently known as the harmonic pitch class profile. The original reference paper on the concept is Tayuka Fujishima's "Real-Time Chord Recognition of Musical Sound: A System Using Common Lisp Music". The Wikipedia entry has an overview of a more modern variant of the algorithm. There are a bunch of free papers and MATLAB implementations of chroma feature detection.

    However, since you are focusing on the human voice only, and since the human voice naturally contains tons of overtones, what you are practically looking for in this specific scenario is a fundamental frequency detection algorithm, or f0 detection algorithm. There are several such algorithms explicitly tuned for voice. Also, here is a widely cited algorithm that works on multiple voices at once. You'd then check the detected frequency against the equal-tempered scale and then find the closest match.

    Since I suspect that you're trying to build a pitch detector and/or corrector a la Autotune, you may want to use M. Morise's excellent WORLD implementation, which permits fast and good quality detection and modification of f0 on voice streams.

    Lastly, be aware that there are only a few vocal pitch detectors that work well within the vocal fry register. Almost all of them, including WORLD, fail on vocal fry as well as very low voices. A number of papers refer to vocal fry as "creaky voice" and have developed specific algorithms to help with that type of voice input specifically.

    0 讨论(0)
  • 2020-12-13 01:45

    Perform a Discrete Fourier Transform on samples from your input waveform, then sum values that correspond to equivalent notes in different octaves. Take the largest value as the dominant frequency.

    You can likely find some existing DFT code in Objective C that suits your needs.

    0 讨论(0)
  • 2020-12-13 01:46

    if you re beginner this may be very helpful. It is available both on Java and IOS.

    dywapitchtrack for ios

    dywapitchtrack for java

    0 讨论(0)
  • 2020-12-13 01:48

    As others have mentioned you should use a pitch detection algorithm. Since that ground is well-covered I will address a few particulars of your question. You said that you are looking for the pitch class of the note. However, the way to find this is to calculate the frequency of the note and then use a table to convert it to the pitch class, octave, and cents. I don't know of any way to obtain the pitch class without finding the fundamental frequency.

    You will need a real-time pitch detection algorithm. In evaluating algorithms pay attention to the latency implied by each algorithm, compared with the accuracy you desire. Although some algorithms are better than others, fundamentally you must trade one for the other and cannot know both with certainty -- sort of like the Heisenberg uncertainty principle. (How can you know the note is C4 when only a fraction of a cycle has been heard?)

    Your "smoothing" approach is equivalent to a digital filter, which will alter the frequency characteristics of the voice. In short, it may interfere with your attempts to estimate the pitch. If you have an interest in digital audio, digital filters are fundamental and useful tools in that field, and a fascinating subject besides. It helps to have a strong math background in understanding them, but you don't necessarily need that to get the basic idea.

    Also, your zero crossing method is a basic technique to estimate the period of a waveform and thus the pitch. It can be done this way, but only with a lot of heuristics and fine-tuning. (Essentially, develop a number of "candidate" pitches and try to infer the dominant one. A lot of special cases will emerge that will confuse this. A quick one is the less 's'.) You'll find it much easier to begin with a frequency domain pitch detection algorithm.

    0 讨论(0)
  • 2020-12-13 01:51

    Putting up information as I find it...

    Pitch detection algorithm on Wikipedia is a good place to start. It lists a few methods that fail for determining octave, which is okay for my purpose.

    A good explanation of autocorrelation can be found here (why can't Wikipedia put things simply like that??).

    0 讨论(0)
  • 2020-12-13 01:53

    If you are looking for the pitch class you should have a look at the chromagram (http://labrosa.ee.columbia.edu/matlab/chroma-ansyn/)

    You can also simply dectect the f0 (using something like YIN algorithm) and return the appropriate semitone, most of fundamental frequency estimation algorithms suffer from octave error

    0 讨论(0)
提交回复
热议问题