How do I iterate through rows in an excel table using epplus?

后端 未结 6 1241
暖寄归人
暖寄归人 2021-01-30 20:19

I am new to epplus, and i\'m trying to read some values from an excel table.

This is what I have so far:

var fileInfo = new FileInfo(filename);
using(var e         


        
相关标签:
6条回答
  • 2021-01-30 20:34

    You can access the .Worksheet property of a table and index its cells. I wrote an extension method for this purpose, which generates a series of dictionaries mapping column name to cell value:

    public static IEnumerable<IDictionary<string, object>> GetRows(this ExcelTable table)
    {
        var addr = table.Address;
        var cells = table.WorkSheet.Cells;
    
        var firstCol = addr.Start.Column;
    
        var firstRow = addr.Start.Row;
        if (table.ShowHeader)
            firstRow++;
        var lastRow = addr.End.Row;
    
        for (int r = firstRow; r <= lastRow; r++)
        {
            yield return Enumerable.Range(0, table.Columns.Count)
                .ToDictionary(x => table.Columns[x].Name, x => cells[r, firstCol + x].Value);
        }
    }
    
    0 讨论(0)
  • 2021-01-30 20:40

    I had the same issue and I solved it using the ExcelTable to get the table boundary and the ExcelWorksheet to retrieve the data. So your code will look something like this:

    var fileInfo = new FileInfo(filename);
    using(var excelPackage = new OfficeOpenXml.ExcelPackage(fileInfo))
    {
        foreach (var sheet in excelPackage.Workbook.Worksheets)
        {
            foreach (ExcelTable table in sheet.Tables)
            {
                ExcelCellAddress start = table.Address.Start;
                ExcelCellAddress end = table.Address.End;
    
                for (int row = start.Row; row <= end.Row; ++row)
                {
                    ExcelRange range = sheet.Cells[row, start.Column, row, end.Column];
                    ...
                }
            }
        }
    }
    

    You need to check for table header or other things, but that did the trick for me.

    0 讨论(0)
  • 2021-01-30 20:44

    Im not sure of epplus, but I thought I would make a quick suggestion of using LinqToExcel

    var excel = new ExcelQueryFactory(excel);
    
    var info = excel.Worksheet("Sheet1")
                    .Select(z=> new
                         {
                          Name = row["Name"].Cast<string>(),
                          Age = row["Age"].Cast<int>(),
                         }).ToList();
    

    you can get it from NuGet

    Install-Package LinqToExcel
    
    0 讨论(0)
  • 2021-01-30 20:54

    While searching for help on the same problem, I stumbled across this link. It certainly worked for me! Definitely better than using Interop objects. :)

    I adapted it slightly though:

    var package = new ExcelPackage(new FileInfo("sample.xlsx"));
    
    ExcelWorksheet workSheet = package.Workbook.Worksheets[0];
    var start = workSheet.Dimension.Start;
    var end = workSheet.Dimension.End;
    for (int row = start.Row; row <= end.Row; row++)
    { // Row by row...
        for (int col = start.Column; col <= end.Column; col++)
        { // ... Cell by cell...
            object cellValue = workSheet.Cells[row, col].Text; // This got me the actual value I needed.
        }
    }
    
    0 讨论(0)
  • 2021-01-30 20:56

    Here's a way to get the complete row as ExcelRange which then can be iterated or used for LINQ:

    for (var rowNum = 1; rowNum <= sheet.Dimension.End.Row; rowNum++)
    {
        var row = sheet.Cells[string.Format("{0}:{0}", rowNum)];
        // just an example, you want to know if all cells of this row are empty
        bool allEmpty = row.All(c => string.IsNullOrWhiteSpace(c.Text));
        if (allEmpty) continue; // skip this row
        // ...
    }
    
    0 讨论(0)
  • 2021-01-30 20:59

    I also was trying to figure out how to properly iterate through the objects and get at the data that i need with this API.

    I collected info from various posts and the getting started page from the author and put it all together to assist myself and others.

    The main issue is your entry point for iteration. Most solutions I've seen go after the Worksheet, whereas this question is specific on the Table, i was curious about both so i'm presenting my findings on both.

    Worksheet Example:

    using (var package = new ExcelPackage(new FileInfo(file)))
    {
        //what i've seen used the most, entry point is the worksheet not the table w/i the worksheet(s)
        using (var worksheet = package.Workbook.Worksheets.FirstOrDefault())
        {
            if (worksheet != null)
            {
                for (int rowIndex = worksheet.Dimension.Start.Row; rowIndex <= worksheet.Dimension.End.Row; rowIndex++)
                {
                    var row = worksheet.Row(rowIndex);
                    //from comments here... https://github.com/JanKallman/EPPlus/wiki/Addressing-a-worksheet
                    //#:# gets entire row, A:A gets entire column
                    var rowCells = worksheet.Cells[$"{rowIndex}:{rowIndex}"];
                    //returns System.Object[,]
                    //type is string so it likely detects many cells and doesn't know how you want the many formatted together...
                    var rowCellsText = rowCells.Text;
                    var rowCellsTextMany = string.Join(", ", rowCells.Select(x => x.Text));
                    var allEmptyColumnsInRow = rowCells.All(x => string.IsNullOrWhiteSpace(x.Text));
                    var firstCellInRowWithText = rowCells.Where(x => !string.IsNullOrWhiteSpace(x.Text)).FirstOrDefault();
                    var firstCellInRowWithTextText = firstCellInRowWithText?.Text;
                    var firstCellFromRow = rowCells[rowIndex, worksheet.Dimension.Start.Column];
                    var firstCellFromRowText = firstCellFromRow.Text;
                    //throws exception...
                    //var badRow = rowCells[worksheet.Dimension.Start.Row - 1, worksheet.Dimension.Start.Column - 1];
    
                    //for me this happened on row1 + row2 beign merged together for the column headers
                    //not sure why the row.merged property is false for both rows though
                    if (allEmptyColumnsInRow)
                        continue;
    
                    for (int columnIndex = worksheet.Dimension.Start.Column; columnIndex <= worksheet.Dimension.End.Column; columnIndex++)
                    {
                        var column = worksheet.Column(columnIndex);
                        var currentRowColumn = worksheet.Cells[rowIndex, columnIndex];
                        var currentRowColumnText = currentRowColumn.Text;
                        var currentRowColumnAddress = currentRowColumn.Address;
                        //likely won't need to do this, but i wanted to show you can tangent off at any level w/ that info via another call
                        //similar to row, doing A:A or B:B here, address is A# so just get first char from address
                        var columnCells = worksheet.Cells[$"{currentRowColumnAddress[0]}:{currentRowColumnAddress[0]}"];
                        var columnCellsTextMany = string.Join(", ", columnCells.Select(x => x.Text));
                        var allEmptyRowsInColumn = columnCells.All(x => string.IsNullOrWhiteSpace(x.Text));
                        var firstCellInColumnWithText = columnCells.Where(x => !string.IsNullOrWhiteSpace(x.Text)).FirstOrDefault();
                        var firstCellInColumnWithTextText = firstCellInColumnWithText?.Text;
                    }
                }
            }
        }
    }
    

    Now things can get a bit messed up here, for me at least i had no tables to start with. Under the same package using statement, if i were to first iterate over the worksheet cells and then touch anything with the Tables property it threw an exception. If i re-instantiate a package and use the same/similar code it doesn't blow up when seeing if we have any Tables or not.

    Table Example:

    //for some reason, if i don't instantiating another package and i work with the 'Tables' property in any way, the API throws a...
    //Object reference not set to an instance of an object.
    //at OfficeOpenXml.ExcelWorksheet.get_Tables()
    //excetion... this is because i have data in my worksheet but not an actual 'table' (Excel => Insert => Table)
    //a parital load of worksheet cell data + invoke to get non-existing tables must have a bug as below code does not
    //throw an exception and detects null gracefully on firstordefault
    using (var package = new ExcelPackage(new FileInfo(file)))
    {
        //however, question was about a table, so lets also look at that... should be the same?
        //no IDisposable? :(
        //adding a table manually to my worksheet allows the 'same-ish' (child.Parent, aka table.WorkSheet) code to iterate
        var table = package.Workbook.Worksheets.SelectMany(x => x.Tables).FirstOrDefault();
    
        if (table != null)
        {
            for (int rowIndex = table.Address.Start.Row; rowIndex <= table.Address.End.Row; rowIndex++)
            {
                var row = table.WorkSheet.Row(rowIndex);
    
                var rowCells = table.WorkSheet.Cells[$"{rowIndex}:{rowIndex}"];
                var rowCellsManyText = string.Join(", ", rowCells.Select(x => x.Text));
    
                for (int columnIndex = table.Address.Start.Column; columnIndex <= table.Address.End.Column; columnIndex++)
                {
                    var currentRowColumn = table.WorkSheet.Cells[rowIndex, columnIndex];
                    var currentRowColumnText = currentRowColumn.Text;
                }
            }
        }
    }
    

    Essentially everything works and operates the same way, you just have to go after child.Parent, AKA table.WorkSheet to get at the same stuff. As others have mentioned, extension methods and possibly even wrapper class(es) could get you more granularity based on the specifics of your business needs but that was not the purpose of this question.

    In regards to the indexing comments and responses, I'd advise sticking with the 'Row' and 'Column' properties, first, last, for, foreach etc. instead of hard-coding index vs non-indexed base attributes, i had no issue here at least w/ the new version.

    0 讨论(0)
提交回复
热议问题