I\'m trying to read data from the .xlsx files using SharpZipLib to unpack it (in memory) and reading the inner xml files. Everything is fine but recognizing the dates - they
You should find the numFmts section somewhere near the top of style.xml, as part of the styleSheet element
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<numFmts count="3">
<numFmt numFmtId="164" formatCode="[$-414]mmmm\ yyyy;@" />
<numFmt numFmtId="165" formatCode="0.000" />
<numFmt numFmtId="166" formatCode="#,##0.000" />
</numFmts>
EDIT
I've been double-checking my xlsx reader code (it's been a long while since I delved into that part of the library); and there are built-in formats. Number format codes (numFmtId) less than 164 are "built-in".
The list that I have is incomplete:
0 = 'General';
1 = '0';
2 = '0.00';
3 = '#,##0';
4 = '#,##0.00';
5 = '$#,##0;\-$#,##0';
6 = '$#,##0;[Red]\-$#,##0';
7 = '$#,##0.00;\-$#,##0.00';
8 = '$#,##0.00;[Red]\-$#,##0.00';
9 = '0%';
10 = '0.00%';
11 = '0.00E+00';
12 = '# ?/?';
13 = '# ??/??';
14 = 'mm-dd-yy';
15 = 'd-mmm-yy';
16 = 'd-mmm';
17 = 'mmm-yy';
18 = 'h:mm AM/PM';
19 = 'h:mm:ss AM/PM';
20 = 'h:mm';
21 = 'h:mm:ss';
22 = 'm/d/yy h:mm';
37 = '#,##0 ;(#,##0)';
38 = '#,##0 ;[Red](#,##0)';
39 = '#,##0.00;(#,##0.00)';
40 = '#,##0.00;[Red](#,##0.00)';
44 = '_("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_)';
45 = 'mm:ss';
46 = '[h]:mm:ss';
47 = 'mmss.0';
48 = '##0.0E+0';
49 = '@';
27 = '[$-404]e/m/d';
30 = 'm/d/yy';
36 = '[$-404]e/m/d';
50 = '[$-404]e/m/d';
57 = '[$-404]e/m/d';
59 = 't0';
60 = 't0.00';
61 = 't#,##0';
62 = 't#,##0.00';
67 = 't0%';
68 = 't0.00%';
69 = 't# ?/?';
70 = 't# ??/??';
There are two ways to get the date format for a cell.
You start by grabbing the "s" or StyleIndex. Note the date in numeric raw format below (40667):
<row r="1">
<c r="A1" s="1">
<v>40667</v>
</c>
</row>
The "s" attribute in the cells nodes points to a zero-based array of styles.xml nodes starting at 0. This is the key to locating the date format, if any, that maps to the raw numeric date data. You see s=1, that points to the 2nd xf node in the following cell formatting styles.xml section of your Excel workbook:
<cellXfs count="2">
<xf numFmtId="0" ... />
<xf numFmtId="14" ... />
</cellXfs>
In the second node you see the numFmtId="14" value. That is the numberFormatID. It tells you that that is the id needed to determine what your date number should be presented in. But that number points to two possible places for the date format. If its number is in the range 14-22 its a built in style for date. If its outside that range its (possibly) a custom date format added by the excel file owner. You wont know until you check both places.
In the first case, if its a value 14-22, you will need to map it to one of the pre-built date formats every excel file has (mm-dd-yy, etc.). You can locate that table in the OpenXML SDK. Here is a sample of those with the numFmtId mapped to the built-in date formats....
14 mm-dd-yy
15 d-mmm-yy
16 d-mmm
17 mmm-yy
18 h:mm AM/PM
At this point you know its a date and what format its to be presented in. If its not one of those values, its likely a custom number. And you now have to search the styles.xml file again for a style node with a matching numFmtId value. Those nodes will contain the custom date format as follows:
<numFmts count="2">
<numFmt numFmtId="164" formatCode="mm/yyyy;@" />
<numFmt numFmtId="165" formatCode="0.000" />
<numFmt numFmtId="166" formatCode="#,##0.000" />
</numFmts>
Note that if your numFmtId was 164, you found its custom date format. So to catch all these crazy date formats, custom and built in, your best bet is to maintain a range of acceptable "formats" as strings, locate your formatCode, then see if it matches one of the acceptable ones in your code.
Good Luck!
I would suggest that numFmtId="14" should be considered to be "Windows Short Date format" as in Australia this format will display a date as, "dd/mm/yy", and not "mm/dd/yy".
Cells may have styles. These are uints that index cellXfs in the styleSheet. Each cellXfs item contains a set of attributes. The most important is NumberFormatID. If its value falls in the range 14-22 it is a "standard" date. If it falls in the range 165 - 180, it is a "formatted" date and will have a corresponding NumberingFormat attribute.
[x:c r="A2" s="2"][x:v]38046[/x:v][/x:c]
[x:xf numFmtId="14" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1" /] (ordinal position 2)
[x:c r="A4" s="4"][x:v]38048[/x:v][/x:c]
[x:xf numFmtId="166" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1" /](ordinal position 4)
[x:numFmt numFmtId="166" formatCode="m/d;@" /]
This code extracts a list of style IDs that correspond to these date formats.
private void GetDateStyles()
{
//
// The only way to tell dates from numbers is by looking at the style index.
// This indexes cellXfs, which contains NumberFormatIds, which index NumberingFormats.
// This method creates a list of the style indexes that pertain to dates.
WorkbookStylesPart workbookStylesPart = (WorkbookStylesPart) UriPartDictionary["/xl/styles.xml"];
Stylesheet styleSheet = workbookStylesPart.Stylesheet;
CellFormats cellFormats = styleSheet.CellFormats;
int i = 0;
foreach (CellFormat cellFormat in cellFormats)
{
uint numberFormatId = cellFormat.NumberFormatId;
if ((numberFormatId >= 14 && numberFormatId <= 22)
|| (numberFormatId >= 165u && numberFormatId <= 180u))
{
_DateStyles.Add(i.ToString());
}
i++;
}