How is audio represented with numbers in computers?

后端 未结 10 1176
野的像风
野的像风 2020-11-29 16:50

I like thinking about how everything can be and is represented by numbers. For example, plaintext is represented by a code like ASCII, and images are represented by RGB valu

相关标签:
10条回答
  • 2020-11-29 17:42

    Have you ever looked at a waveform close up? The Y-axis is simply represented as an integer, typically in 16 bits.

    0 讨论(0)
  • 2020-11-29 17:49

    The simplest way to represent sound as numbers is PCM (Pulse Code Modulation). This means that the amplitude of the sound is recorded at a set frequency (each amplitude value is called a sample). CD quality sound for example is 16 bit samples (in stereo) at the frequency 44100 Hz.

    A sample can be represented as an integer number (usually 8, 12, 16, 24 or 32 bits) or a floating point number (16 bit float or 32 bit double). The number can either be signed or unsigned.

    For 16 bit signed samples the value 0 would be in the middle, and -32768 and 32767 would be the maximum amplitues. For 16 bit unsigned samples the value 32768 would be in the middle, and 0 and 65535 would be the maximum amplitudes.

    For floating point samples the usual format is that 0 is in the middle, and -1.0 and 1.0 are the maximum amplitudes.

    The PCM data can then be compressed, for example using MP3.

    0 讨论(0)
  • 2020-11-29 17:50

    The answers all relate to sampling frequency, but don't address the question. A particular snapshot of a sound would, I imagine, include individual amplitudes for a lot of different frequencies (say you hit both an A and a C simultaneously on a keyboard, with the A being louder). How does that get recorded in a 16 bit number? If all you are doing is measuring amplitude (how loud the sound is), how do you get the different notes?

    Ah! I think I get it from this comment: "This number is then converted to the linear displacement of the diaphragm of your speaker." The notes appear by how fast the diaphragm is vibrating. That's why you need the 44,000 different values per second. A note is somewhere on the order of 1000 hertz, so a pure note would make the diaphragm move in and out about 1000 times per second. A recording of a whole orchestrate has many different notes all over the place, and that miraculously can be converted into a single time history of diaphragm motion. 44,000 times per second the diaphragm is instructed to move in or out a little bit, and that simple (long) list of numbers can represent Beyonce to Beethoven!

    0 讨论(0)
  • 2020-11-29 17:51

    Audio can represented by digital samples. Essentially, a sampler (also called an Analog to digital converter) grabs a value of an audio signal every 1/fs, where fs is the sampling frequency. The ADC, then quantizes the signal, which is a rounding operation. So if your signal ranges from 0 to 3 Volts (Full Scale Range) then a sample will be rounded to, for example a 16-bit number. In this example, a 16-bit number is recorded once every 1/fs/

    So for example, most WAV/MP3s are sampled an audio signal at 44 kHz. I don't know how detail you want, but there's this thing called the "Nyquist Sampling Rate" the says that the sampling frequency must be at least twice the desired frequency. So on your WAV/MP3 file you are at best going to be able to hear up tp 22 kHz frequencies.

    There is a lot of detail you can go into in this area. The simplest form would certainly be the WAV format. It is uncompressed audio. Formats like mp3 and ogg are have to be decompressed before you can work with them.

    0 讨论(0)
提交回复
热议问题