I have a 2-dimensional byte array that looks something like this:
0 0 0 0 1
1 1 1 1 0
0 0 1 1 1
1 0 1 0 1
It's hard to work with multidimentional arrays with LINQ, but here's how you could do:
var arr = new [,] { { 0, 0, 0, 0, 1 }, { 1, 1, 1, 1, 0 }, { 0, 0, 1, 1, 1 }, { 1, 0, 1, 0, 1 } };
var data =
Enumerable.Range(0, 4)
.Select(
row =>
new
{
index = row,
count = Enumerable.Range(0, 5).Select(col => arr[row, col]).Count(x => x == 1)
})
.OrderByDescending(x => x.count)
.Select(x => x.index)
.First();
Looking at the issue, this is really a two part answer for whatever is "more efficient" for your code. The loop presented is already very lean on resources, but could be more clear on the intent.
Based on the size of data being moved around, even at 10x that, PLINQ is going to be more resource intensive, just because of how much work it is to spin up a thread.
1.) Using LINQ can make this method more readable
Most 2d array LINQ queries I've come across convert it into a jagged array (or array of arrays) before searching. Here's a helper method that does that conversion for us, and to help make this guy look cleaner:
public static T[][] GetJagged<T>(this T[,] raw)
{
int lenX = raw.GetLength(0);
int lenY = raw.GetLength(1);
T[][] jagged = new T[lenX][];
for (int x = 0; x < lenX; x++)
{
jagged[x] = new T[lenY];
for (int y = 0; y < lenY; y++)
{
jagged[x][y] = raw[x, y];
}
}
return jagged;
}
Now, all we have left is to query the now 1d array for each member, and return the sum of each member. Here, I use the selector (b => b)
, essentially saying if there's a byte, select if for the Sum
method.
static int GetMaxIndexLINQ(byte[,] TwoDArray)
{
byte[][] jagged = TwoDArray.GetJagged();
IEnumerable<int> rowSums = from bitRows in jagged
select bitRows.Sum((b) => b);
int maxIndex = rowSums.Max();
int MaxRowIndex = Array.IndexOf(rowSums.ToArray(), maxIndex);
return MaxRowIndex;
}
This way comes out very legible and even if the reader is new to coding, it's pretty easy to get the gist of what's happening here.
I'd like to point out that making your code more readable is making it more efficient. Teamwork makes the dream work, and the quicker a teammate can clearly make sense of what is happening in your code, the better for everyone.
2.) Optimizing for performance
As I said before, there isn't a lot happening here that can be made any leaner, any method calls or unnecessary checking will just slow this process down.
That being said, there is a small change to be made for some easy optimization. Because in this instance we are only dealing with 1s and 0s, there is a real benefit where we can use the internal optimizations the compiler makes, to our benefit. Rather than checking if a value is 0 or not, it is actually much faster to just add it in to our running sum!
static int GetMaxIndex_EvenBetter(byte[,] TwoDArray)
{
int NumRows = TwoDArray.GetLength(0);
int NumCols = TwoDArray.GetLength(1);
int RowCount, MaxRowCount = 0, MaxRowIndex = 0;
for (int row = 0; row < NumRows; row++)
{
RowCount = 0;
for (int col = 0; col < NumCols; col++)
{
RowCount += TwoDArray[row, col]; //See my change here
}
if (RowCount > MaxRowCount)
{
MaxRowCount = RowCount;
MaxRowIndex = row;
}
}
return MaxRowIndex;
}
In most other cases you aren't working with just the 1s and 0s, so you DO want to check those values before adding, here however, unnecessary.
1) You can do it with LINQ this way...
private static int GetMaxIndex(byte[,] TwoDArray) {
return Enumerable.Range(0, TwoDArray.GetLength(0))
.Select(
x => new {
Index = x,
Count = Enumerable.Range(0, TwoDArray.GetLength(1)).Count(y => TwoDArray[x, y] == 1)
})
.OrderByDescending(x => x.Count)
.First()
.Index;
}
... you'd have to test it to see if LINQ is faster or slower.
2) It is possible to use PLINQ. Just use ParallelEnumerable.Range
for the row index generator
private static int GetMaxIndex2(byte[,] TwoDArray) {
return ParallelEnumerable.Range(0, TwoDArray.GetLength(0))
.Select(
x => new {
Index = x,
Count = Enumerable.Range(0, TwoDArray.GetLength(1)).Count(y => TwoDArray[x, y] == 1)
})
.OrderByDescending(x => x.Count)
.First()
.Index;
}
// This code is extracted from
// http://www.codeproject.com/Articles/170662/Using-LINQ-and-Extension-Methods-in-C-to-Sort-Vect
private static IEnumerable<T[]> ConvertToSingleDimension<T>(T[,] source)
{
T[] arRow;
for (int row = 0; row < source.GetLength(0); ++row)
{
arRow = new T[source.GetLength(1)];
for (int col = 0; col < source.GetLength(1); ++col)
arRow[col] = source[row, col];
yield return arRow;
}
}
// Convert byte[,] to anonymous type {int index, IEnumerable<byte[]>} for linq operation
var result = (from item in ConvertToSingleDimension(Array2D).Select((i, index) => new {Values = i, Index = index})
orderby item.Values.Sum(i => i) descending, item.Index
select item.Index).FirstOrDefault();
Here is how I would do it. It's the same as others more or less, but without any Enumerable.Range
(not that there is anything wrong with those (I use them all the time)...it just makes the code more indented in this case).
This one also includes PLINQ stuff. TPL (async/await) wouldn't be suitable for this because it is computationally bound and TPL is better suited to I/O bound operations. Your code would end up executing sequentially if you used async/await rather than PLINQ. This is because async/await won't go parallel until the thread is released (and it can start the next task...which could then go parallel) and purely synchronous functions (such as CPU stuff) won't every actually await...they'll just run all the way through. Basically, it would finish the first thing in your list before it even started the next thing, making it sequentially executed. PLINQ explicitly starts parallel tasks and doesn't have this issue.
//arry is your 2d byte array (byte[,] arry)
var maxIndex = arry
.Cast<byte>() //cast the entire array into bytes
.AsParallel() //make the transition to PLINQ (remove this to not use it)
.Select((b, i) => new // create indexes
{
value = b,
index = i
})
.GroupBy(g => g.index / arry.GetLength(1)) // group it by rows
.Select((g, i) => new
{
sum = g.Select(g2 => (int)g2.value).Sum(), //sum each row
index = i
})
.OrderByDescending(g => g.sum) //max by sum
.Select(g => g.index) //grab the index
.First(); //this should be the highest index
In terms of efficiency, you would probably get better results with your for loop. The question I would ask is, which is more readable and clear?