XML Split of a Large file

前端 未结 10 1274
心在旅途
心在旅途 2021-01-04 00:41

I have a 15 GB XML file which I would want to split it .It has approximately 300 Million lines in it . It doesn\'t have any top nodes which are interdependent .Is there any

相关标签:
10条回答
  • 2021-01-04 01:02

    The open source library comma has several tools to find data in very large XMl files and to split those files into smaller files.

    https://github.com/acfr/comma/wiki/XML-Utilities

    The tools were built using the expat SAX parser so that they did not fill memory with a DOM tree like xmlstarlet and saxon.

    0 讨论(0)
  • 2021-01-04 01:02
    Used this for splitting Yahoo Q&A dataset
    
        count = 0
        file_count = 1
        with open('filepath') as f:
    
        current_file = ""
    
        for line in f:
            current_file = current_file + line
    
            if "</your tag to split>" in line:
                count = count + 1
    
            if count==50000:
                current_file = current_file + "</endTag>"
                with open('filepath/Split/file_' +str(file_count)+'.xml' , 'w') as split:
                    split.write(current_file)
                file_count = file_count + 1
                current_file = "<?xml version='1.0' encoding='UTF-8'?>\n<endTag>"
                count = 0
    
    current_file = current_file + "</endTag>"
    with open('filepath/Split/file_' +str(file_count)+'.xml' , 'w') as split:
        split.write(current_file)
    
    0 讨论(0)
  • 2021-01-04 01:02

    I used XmlSplit Wizard tool. It really work nicely and you can specify the split method like element, rows, number of files, or the size of files. The only problem is that I had to buy it for 99$ as the trial version wont allow you to all split data, only odd number of divided files. I was able to split a 70GB file !

    0 讨论(0)
  • 2021-01-04 01:07

    XmlSplit - A Command-line Tool That Splits Large XML Files

    • http://xponentsoftware.com/xmlSplit.aspx

    xml_split - split huge XML documents into smaller chunks

    • http://www.perlmonks.org/index.pl?node_id=429707
    • http://metacpan.org/pod/XML::Twig

    Split that XML by bhayanakmaut (No source code and I could not get this one working)

    • http://sourceforge.net/projects/splitthatxml/

    A similar question: How do I split a large xml file?

    0 讨论(0)
提交回复
热议问题