OpenXml Excel: throw error in any word after mail address

随声附和 提交于 2019-12-30 18:38:38

问题


I read Excel files using OpenXml. all work fine but if the spreadsheet contains one cell that has an address mail and after it a space and another word, such as:

abc@abc.com abc

It throws an exception immediately at the opening of the spreadsheet:

var _doc = SpreadsheetDocument.Open(_filePath, false); 

exception:

DocumentFormat.OpenXml.Packaging.OpenXmlPackageException
Additional information:
Invalid Hyperlink: Malformed URI is embedded as a hyperlink in the document.


回答1:


There is an open issue on the OpenXml forum related to this problem: Malformed Hyperlink causes exception

In the post they talk about encountering this issue with a malformed "mailto:" hyperlink within a Word document.

They propose a work-around here: Workaround for malformed hyperlink exception

The workaround is essentially a small console application which locates the invalid URL and replaces it with a hard-coded value; here is the code snippet from their sample that does the replacement; you could augment this code to attempt to correct the passed brokenUri:

private static Uri FixUri(string brokenUri)
{
    return new Uri("http://broken-link/");
}

The problem I had was actually with an Excel document (like you) and it had to do with a malformed http URL; I was pleasantly surprised to find that their code worked just fine with my Excel file.

Here is the entire work-around source code, just in case one of these links goes away in the future:

 void Main(string[] args)
    {
        var fileName = @"C:\temp\corrupt.xlsx";
        var newFileName = @"c:\temp\Fixed.xlsx";
        var newFileInfo = new FileInfo(newFileName);

        if (newFileInfo.Exists)
            newFileInfo.Delete();

        File.Copy(fileName, newFileName);

        WordprocessingDocument wDoc;
        try
        {
            using (wDoc = WordprocessingDocument.Open(newFileName, true))
            {
                ProcessDocument(wDoc);
            }
        }
        catch (OpenXmlPackageException e)
        {
            e.Dump();
            if (e.ToString().Contains("The specified package is not valid."))
            {
                using (FileStream fs = new FileStream(newFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite))
                {
                    UriFixer.FixInvalidUri(fs, brokenUri => FixUri(brokenUri));
                }               
            }
        }
    }

    private static Uri FixUri(string brokenUri)
    {
        brokenUri.Dump();
        return new Uri("http://broken-link/");
    }

    private static void ProcessDocument(WordprocessingDocument wDoc)
    {
        var elementCount = wDoc.MainDocumentPart.Document.Descendants().Count();
        Console.WriteLine(elementCount);
    }
}

public static class UriFixer
{
    public static void FixInvalidUri(Stream fs, Func<string, Uri> invalidUriHandler)
    {
        XNamespace relNs = "http://schemas.openxmlformats.org/package/2006/relationships";
        using (ZipArchive za = new ZipArchive(fs, ZipArchiveMode.Update))
        {
            foreach (var entry in za.Entries.ToList())
            {
                if (!entry.Name.EndsWith(".rels"))
                    continue;
                bool replaceEntry = false;
                XDocument entryXDoc = null;
                using (var entryStream = entry.Open())
                {
                    try
                    {
                        entryXDoc = XDocument.Load(entryStream);
                        if (entryXDoc.Root != null && entryXDoc.Root.Name.Namespace == relNs)
                        {
                            var urisToCheck = entryXDoc
                                .Descendants(relNs + "Relationship")
                                .Where(r => r.Attribute("TargetMode") != null && (string)r.Attribute("TargetMode") == "External");
                            foreach (var rel in urisToCheck)
                            {
                                var target = (string)rel.Attribute("Target");
                                if (target != null)
                                {
                                    try
                                    {
                                        Uri uri = new Uri(target);
                                    }
                                    catch (UriFormatException)
                                    {
                                        Uri newUri = invalidUriHandler(target);
                                        rel.Attribute("Target").Value = newUri.ToString();
                                        replaceEntry = true;
                                    }
                                }
                            }
                        }
                    }
                    catch (XmlException)
                    {
                        continue;
                    }
                }
                if (replaceEntry)
                {
                    var fullName = entry.FullName;
                    entry.Delete();
                    var newEntry = za.CreateEntry(fullName);
                    using (StreamWriter writer = new StreamWriter(newEntry.Open()))
                    using (XmlWriter xmlWriter = XmlWriter.Create(writer))
                    {
                        entryXDoc.WriteTo(xmlWriter);
                    }
                }
            }
        }
    }



回答2:


Unfortunately solution where you have to open file as zip and replace broken hyperlink would not help me.

I just was wondering how it is posible that it works fine when your target framework is 4.0 even if your only installed .Net Framework has version 4.7.2. I have found out that there is private static field inside System.UriParser that selects version of URI's RFC specification. So it is possible to set it to V2 as it is set for .net 4.0 and lower versions of .Net Framework. Only problem that it is private static readonly.

Maybe someone will want to set it globally for whole application. But I wrote UriQuirksVersionPatcher that will update this version and restore it back in Dispose method. It is obviously not thread-safe but it is acceptable for my purpose.

using System;
using System.Diagnostics;
using System.Reflection;

namespace BarCap.RiskServices.RateSubmissions.Utility
{
#if (NET20 || NET35 || NET40)
        public class UriQuirksVersionPatcher : IDisposable
        {
            public void Dispose()
            {
            }
        }
#else

    public class UriQuirksVersionPatcher : IDisposable
    {
        private const string _quirksVersionFieldName = "s_QuirksVersion"; //See Source\ndp\fx\src\net\System\_UriSyntax.cs in NexFX sources
        private const string _uriQuirksVersionEnumName = "UriQuirksVersion";
        /// <code>
        /// private enum UriQuirksVersion
        /// {
        ///     V1 = 1, // RFC 1738 - Not supported
        ///     V2 = 2, // RFC 2396
        ///     V3 = 3, // RFC 3986, 3987
        /// }
        /// </code>
        private const string _oldQuirksVersion = "V2";

        private static readonly Lazy<FieldInfo> _targetFieldInfo;
        private static readonly Lazy<int?> _patchValue;
        private readonly int _oldValue;
        private readonly bool _isEnabled;

        static UriQuirksVersionPatcher()
        {
            var targetType = typeof(UriParser);
            _targetFieldInfo = new Lazy<FieldInfo>(() => targetType.GetField(_quirksVersionFieldName, BindingFlags.Static | BindingFlags.NonPublic));
            _patchValue = new Lazy<int?>(() => GetUriQuirksVersion(targetType));
        }

        public UriQuirksVersionPatcher()
        {
            int? patchValue = _patchValue.Value;
            _isEnabled = patchValue.HasValue;

            if (!_isEnabled) //Disabled if it failed to get enum value
            {
                return;
            }

            int originalValue = QuirksVersion;
            _isEnabled = originalValue != patchValue;

            if (!_isEnabled) //Disabled if value is proper
            {
                return;
            }

            _oldValue = originalValue;
            QuirksVersion = patchValue.Value;
        }

        private int QuirksVersion
        {
            get
            {
                return (int)_targetFieldInfo.Value.GetValue(null);
            }
            set
            {
                _targetFieldInfo.Value.SetValue(null, value);
            }
        }

        private static int? GetUriQuirksVersion(Type targetType)
        {
            int? result = null;
            try
            {
                result = (int)targetType.GetNestedType(_uriQuirksVersionEnumName, BindingFlags.Static | BindingFlags.NonPublic)
                                        .GetField(_oldQuirksVersion, BindingFlags.Static | BindingFlags.Public)
                                        .GetValue(null);
            }
            catch
            {
#if DEBUG

                Debug.WriteLine("ERROR: Failed to find UriQuirksVersion.V2 enum member.");
                throw;

#endif
            }
            return result;
        }

        public void Dispose()
        {
            if (_isEnabled)
            {
                QuirksVersion = _oldValue;
            }
        }
    }
#endif
}

Usage:

using(new UriQuirksVersionPatcher())
{
    using(var document = SpreadsheetDocument.Open(fullPath, false))
    {
       //.....
    }
}

P.S. Later I found that someone already implemented this pathcher: https://github.com/google/google-api-dotnet-client/blob/master/Src/Support/Google.Apis.Core/Util/UriPatcher.cs




回答3:


I haven't use OpenXml but if there's no specific reason for using it then I highly recommend LinqToExcel from LinqToExcel. Example of code is here:

var sheet = new ExcelQueryFactory("filePath");
var allRows = from r in sheet.Worksheet() select r;
foreach (var r in allRows) {
var cella = r["Header"].ToString();
}


来源:https://stackoverflow.com/questions/29970814/openxml-excel-throw-error-in-any-word-after-mail-address

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!