Error correcting codes for packet loss (UDP)

问题

I have no real idea what to look for, since all I get with "Error correcting codes" is stuff related to cases where you don't know the location of the error. Thus those codes are much more complicated an inefficient than I need them to be.

In the following, note that bits are equal to packets (because only a whole packet can be missing, thus the bit analogy fits very well).

Are there ECCs that take into account that you already know WHICH k-bits are missing and only provide you with a way to reconstruct the datastream in those k places? Additionally, the bits added by the ECC should be independent (preferably). Such that if packet loss occurs inside the ECC portion of the data, it still can reconstruct some of the original data (not always will there be k errors, mostly there will be none. So its important that the ECC is fault tolerant to its own ECC bits added).

That is a big difference IMO. For one missing bit its simple, I can just use one XOR bit. But I am not clever enough to generalize it to n-bits.

So again, I have a stream of n-bits, and I know that up to k bits are missing (I really know which ones exactly and that they are missing, corruption is not possible). Now I need a codec that can reconstruct them with as little overhead added to the datastream as possible. I am dreaming of having (n+k) bits to correct k random bit errors in an n bit stream :). On top of that, ideally, if any of those k ECC bits added to the n bit data stream gets corrupted, like say c bits of the k bits get corrupted, then it should still be able to reconstruct (k-c) bit errors in the n bit stream.

Please note ofc that I do not know the error positions in advance though xD.

Example:

One algorithm I can think of is this. Datastream of n bits to be protected against errors.

Let p be the smallest relative primes to n. Then iterate through the datastream with i = (p * j) mod n, by incrementing j, XORing the substream obtained by selecting bits of every even j. This substream has n/2 elements. After iterating, we have obtained a parity for n/2 the elements. We can obtain another parity for the other half in the same way (taking odd j).

This yields 50% error reduction for 2 bit losses.

The bright side is we can now get arbitrarily better. Just take the next higher relative prime and do the same again. Now we are at 25% error chance. Basically we can cut the error chance in a half each time we add two additional parity bits.

回答1:

You need an erasure code (not an error detection code). Error detection is taken care of by the link and transport layer. Since you are trying to mitigate UDP packet loss, you already know which parts are missing -- the dropped packet is missing.

There is no such thing as erasures or errors on a byte (or bit) level, at least not with any reasonable likelihood (there are at least two underlying layers of protocols, sometimes three, each with a checksum, which makes sure of that). You either receive a complete, intact datagram, or you don't. Never anything in between.

Cauchy Reed Solomon codes is one class of algorithms you may consider, these transform k blocks of data of some length into k+m blocks, and allow restoration of the original data given at most m erasures. A special case for this kind of algorithm is parity, for which both encoding and decoding is a simple xor operation, and m=1. This is the very algorithm used in Raid-5, which was mentioned in a comment above.

In one word, you want longhair.

As an alternative, if you have a lot of data to transmit to several parties, and you want to be fancy, you can consider fountain codes. These are much more complicated (and thus slower) and less bit-efficient, but they allow you to create an arbitrary number of packets, of which any k will reconstruct the k-lenght original message. This allows you to save a lot of bandwidth if you are able to multicast to a lot of clients who all want one large set of data, but don't necessarily start downloading at the same time.

回答2:

If you were sending infinite precision real/complex numbers, there would be a simple solution based on the fact that through any d points with distinct x coordinates, there is a unique polynomial of degree d-1. So, given d data points, you find this polynomial (say with Lagrange interpolation) and evaluate it on n more points. If you throw away the values at any n of the d+n points, you can still recover the polynomial from the values at the other n. In practice this is usually unusable because it is numerically unstable.

Over a large discrete alphabet, you would be asking about something related to secret sharing in cryptography, where you want to be able to decode a message when you have at least t keys out of n. Your goals are a little different, but some efficient secrete sharing techniques might work.

What you need is an error detecting code. This is not so different from an error correcting code. A binary code with minimum distance n is a set of binary vectors so that any two distinct vectors differ in at least n places. This code will let you detect n-1 errors, and you can correct n-1 errors in known places (or (n-1)/2 errors in unknown places).

If two code words differ in only n places, then if you lose those places, you can't distinguish the code words, so you can't recover the data.

Some of the simplest error correcting codes append check sums to the end. See BCH codes for an infinite family of codes with arbitrarily large minimum distances. It's good to have an infinite family, because while people send small fixed blocks of data in practice, in your problem you want to have one giant block if you want to optimize the number of errors you can correct for the expansion of the message. These generalize the idea of adding a parity bit: You choose some polynomial p(x), and then make sure the bits you send are the coefficients of a polynomial divisible by p(x), with coefficient arithmetic mod 2 (or in a finite field). For example, you can add 8 check bits to 7 data bits and then correct up to 4 errors with known positions. In BCH codes, recovering the message when there are errors is simpler than in many other codes. One decoding method is related to the interpolating polynomials I mentioned for real-valued data.

回答3:

The Reed-Solomon code RS(255,223,32) corrects all error patterns that affect 16 (or less) of the 255 bytes - no matter how they are corrupted. If you know in advance which bytes have been corrupted then the capability is even higher. This type of error is called erasure.

The RS(255, 255-k) decoder corrects all byte error/erasure patterns bounded by:

(2*errorCount + erasureCount) <= k

It means that the encoder takes the information block of (255-k) bytes and calculates k bytes of parity information, which is transmitted together with the information block as a block of 255 bytes.

And the good stuff is that you don't need to decide the errorCount/erasureCount distribution at the time you encode the data, just tell the decoder in the receiving end which byte positions to worry about and it will take care of these errors as well as any remaining errors in unknown positions (as long as the condition above is met).

In some transmission schemes multiple Reed-Solomon blocks are encoded and then transmitted in multiple packets. That way the missing bytes from a single packet loss is spread across multiple Reed-Solomon code words. This is often referred to as interleaving.

You can take a look at my Reed-Solomon Encoder/Decoder C-implementation. It handles both errors and erasures and has a nice test bench included.

来源：https://stackoverflow.com/questions/28924342/error-correcting-codes-for-packet-loss-udp

标签

c++

math

udp

protocols

error-correction