Transforming flat file to XML using XSLT-like technology

寵の児 提交于 2019-12-08 05:19:07

问题


I'm designing a system which is receiving data from a number of partners in the form of CSV files. The files may differ in the number and ordering of columns. For the most part, I will want to choose a subset of the columns, maybe reorder them, and hand them off to a parser. I would obviously prefer to be able to transform the incoming data into some canonical format so as to make the parser as simple as possible.

Ideally, I would like to be able to generate a transformation for each incoming data format using some graphical tool and store the transformation as a document in a database or on disk. Upon receival of data, I would apply the correct transformation (never mind how I determine the correct transformation) to get an XML document in a canonical format. If the incoming files had contained XML I would just have created an XSLT document for each format and been on my way.

I've used BizTalk's Flat File XSLT Extensions (or whatever they are called) for something similar in the past, but I don't want the hassle of BizTalk (and I can't afford it either) on this project.

Does anyone know if there are alternative technologies and/or XSLT extensions which would enable me to achieve my goal in an elegant way?

I'm developing my app in C# on .NET 3.5 SP1 (thus would prefer technologies supported by .NET).


回答1:


XSLT provides new features that make it easier to parse non-XML files.

Andrew Welch posted an XSLT 2.0 example that converts CSV into XML




回答2:


I think you need something like this (sorry, not supported by .NET but code is very simple)

http://csv2xml.sourceforge.net




回答3:


IIRC someone has created a "LINQ to CSV" library that might be a starting point to create the intermediate XML (in memory) as input into the transform.

Found it here.




回答4:


You might try LINQ to CSV. There is one offering from Microsoft's Eric White and another from Matt Perdeck. Others are out there...




回答5:


I have found 2 potential solutions when looking into a similar problem space.

Progress Software has a set of tools and API (.Net), which when used in conjuction with .conv (flat to XML converter) files created in their Stylus Studio tool allows for transformation of any pre-defined flat file format into XML at run time. More info here: http://www.datadirect.com/developer/data-integration/tutorials/converter-sample-code/index.ssp

In addition there is an XML format called XFLAT which allows for the description of flat files in a variety of formats, delimited, fixed width etc... There is a java program which will convert flat files, where you've provied the XFLAT description into XML so that you can continue with a standard XML to XML XSLT transformation. More details can be found here: http://www.unidex.com/overview.htm

I have never actually used either of these tools, but found them when researching a similar problem.




回答6:


Check out this article on implementing an XmlReader that processes non-XML input. It's not a terrifically difficult task, and once you've got it working you don't need to use an XSLT-like technology, you can just use XSLT.




回答7:


this will parse the output from the linux ip route list command. It's just what I had laying around.

you must wrap the output from the comman in an element called 'output' and the style sheet will take it from there. The real key here is the tokenize command in the xpath 2.0 spec. I don't know how you could do this before that. Also this doesn't make a single root element, as that was not what I needed it for. In your case, instead spliting on space, Id spli on a ','

<?xml version="1.0" encoding="UTF-8"?>

<xsl:output method="xml" indent="yes" />

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()" />
    </xsl:copy>
</xsl:template>

<xsl:template match="//output">
    <!-- split things up for each new line -->
    <xsl:variable name="line" select="tokenize(.,'\n')"/>
    <xsl:for-each select="$line">                        
        <!-- split each line into peices based on space -->
        <xsl:variable name="split" select="tokenize(.,' +')"/>
        <xsl:if test="count($split) &gt; 1">
            <xsl:element name="route">                                        
                <xsl:for-each select="$split">
                    <xsl:choose>
                        <xsl:when test="position() = 1">
                            <xsl:attribute name="address" select="."/>
                        </xsl:when>
                        <xsl:otherwise>
                            <xsl:variable name="index" select="position()"/>
                            <xsl:variable name="fieldName" select="."/>
                            <xsl:if test="$fieldName and position() mod 2 = 0">
                                <xsl:attribute name="{$fieldName}" select="$split[$index + 1]"/>
                            </xsl:if>
                        </xsl:otherwise>
                    </xsl:choose>
                </xsl:for-each>
            </xsl:element>
        </xsl:if>
    </xsl:for-each>
</xsl:template>




回答8:


You can also take a look at altova's MapForce



来源:https://stackoverflow.com/questions/315074/transforming-flat-file-to-xml-using-xslt-like-technology

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!