.NET Framework 3.5.
I\'m trying to calculate the average of some pretty large numbers.
For instance:
using System;
using System.Linq;
class Program
{
If you know approximately what the average will be (or, at least, that all pairs of numbers will have a max difference < long.MaxValue
), you can calculate the average difference from that value instead. I take an example with low numbers, but it works equally well with large ones.
// Let's say numbers cannot exceed 40.
List<int> numbers = new List<int>() { 31 28 24 32 36 29 }; // Average: 30
List<int> diffs = new List<int>();
// This can probably be done more effectively in linq, but to show the idea:
foreach(int number in numbers.Skip(1))
{
diffs.Add(numbers.First()-number);
}
// diffs now contains { -3 -6 1 5 -2 }
var avgDiff = diffs.Sum() / diffs.Count(); // the average is -1
// To get the average value, just add the average diff to the first value:
var totalAverage = numbers.First()+avgDiff;
You can of course implement this in some way that makes it easier to reuse, for example as an extension method to IEnumerable<long>
.
For two positive numbers (or two negative numbers) , I found a very elegant solution from here.
where an average computation of (a+b)/2
can be replaced with a+((b-a)/2
.
If you know in advance that all your numbers are going to be 'big' (in the sense of 'much nearer long.MaxValue
than zero), you can calculate the average of their distance from long.MaxValue
, then the average of the numbers is long.MaxValue
less that.
However, this approach will fail if (m)any of the numbers are far from long.MaxValue
, so it's horses for courses...
Averaging numbers of a specific numeric type in a safe way while also only using that numeric type is actually possible, although I would advise using the help of BigInteger in a practical implementation. I created a project for Safe Numeric Calculations that has a small structure (Int32WithBoundedRollover) which can sum up to 2^32 int32s without any overflow (the structure internally uses two int32 fields to do this, so no larger data types are used).
Once you have this sum you then need to calculate sum/total to get the average, which you can do (although I wouldn't recommend it) by creating and then incrementing by total another instance of Int32WithBoundedRollover. After each increment you can compare it to the sum until you find out the integer part of the average. From there you can peel off the remainder and calculate the fractional part. There are likely some clever tricks to make this more efficient, but this basic strategy would certainly work without needing to resort to a bigger data type.
That being said, the current implementation isn't build for this (for instance there is no comparison operator on Int32WithBoundedRollover, although it wouldn't be too hard to add). The reason is that it is just much simpler to use BigInteger at the end to do the calculation. Performance wise this doesn't matter too much for large averages since it will only be done once, and it is just too clean and easy to understand to worry about coming up with something clever (at least so far...).
As far as your original question which was concerned with the long data type, the Int32WithBoundedRollover could be converted to a LongWithBoundedRollover by just swapping int32 references for long references and it should work just the same. For Int32s I did notice a pretty big difference in performance (in case that is of interest). Compared to the BigInteger only method the method that I produced is around 80% faster for the large (as in total number of data points) samples that I was testing (the code for this is included in the unit tests for the Int32WithBoundedRollover class). This is likely mostly due to the difference between the int32 operations being done in hardware instead of software as the BigInteger operations are.
Simple answer with LINQ...
var data = new[] { int.MaxValue, int.MaxValue, int.MaxValue };
var mean = (int)data.Select(d => (double)d / data.Count()).Sum();
Depending on the size of the set fo data you may want to force data
.ToList()
or .ToArray()
before your process this method so it can't requery count on each pass. (Or you can call it before the .Select(..).Sum()
.)
You may try the following approach:
let number of elements is N, and numbers are arr[0], .., arr[N-1].
You need to define 2 variables:
mean and remainder.
initially mean = 0, remainder = 0.
at step i you need to change mean and remainder in the following way:
mean += arr[i] / N;
remainder += arr[i] % N;
mean += remainder / N;
remainder %= N;
after N steps you will get correct answer in mean variable and remainder / N will be fractional part of the answer (I am not sure you need it, but anyway)