large-files

How to use XMLReader/DOMDocument with large XML file and prevent 500 error

≯℡__Kan透↙ 提交于 2019-12-30 05:29:04
问题 I have an XML file that is approximately 12mb which has about 16000 product's. I need to process it into a database; however, at about 6000 rows it dies with a 500 error. I'm using the Kohana framework (version 3) just in case that has anything to do with it. Here's my code that I have inside the controller: $xml = new XMLReader(); $xml->open("path/to/file.xml"); $doc = new DOMDocument; // Skip ahead to the first <product> while ($xml->read() && $xml->name !== 'product'); // Loop through

iostream and large file support

倖福魔咒の 提交于 2019-12-30 04:38:06
问题 I'm trying to find a definitive answer and can't, so I'm hoping someone might know. I'm developing a C++ app using GCC 4.x on Linux (32-bit OS). This app needs to be able to read files > 2GB in size. I would really like to use iostream stuff vs. FILE pointers, but I can't find if the large file #defines (_LARGEFILE_SOURCE, _LARGEFILE64_SOURCE, _FILE_OFFSET_BITS=64) have any effect on the iostream headers. I'm compiling on a 32-bit system. Any pointers would be helpful. 回答1: This has already

Reading Huge File in Python

…衆ロ難τιáo~ 提交于 2019-12-29 03:29:47
问题 I have a 384MB text file with 50 million lines. Each line contains 2 space-separated integers: a key and a value. The file is sorted by key. I need an efficient way of looking up the values of a list of about 200 keys in Python. My current approach is included below. It takes 30 seconds. There must be more efficient Python foo to get this down to a reasonable efficiency of a couple of seconds at most. # list contains a sorted list of the keys we need to lookup # there is a sentinel at the end

Parsing extremely large XML files in php

落爺英雄遲暮 提交于 2019-12-28 04:15:16
问题 I need to parse XML files of 40GB in size, and then normalize, and insert to a MySQL database. How much of the file I need to store in the database is not clear, neither do I know the XML structure. Which parser should I use, and how would you go about doing this? 回答1: In PHP, you can read in extreme large XML files with the XMLReaderDocs: $reader = new XMLReader(); $reader->open($xmlfile); Extreme large XML files should be stored in a compressed format on disk. At least this makes sense as

Large file upload with WebSocket

。_饼干妹妹 提交于 2019-12-28 03:30:47
问题 I'm trying to upload large files (at least 500MB, preferably up to a few GB) using the WebSocket API. The problem is that I can't figure out how to write "send this slice of the file, release the resources used then repeat". I was hoping I could avoid using something like Flash/Silverlight for this. Currently, I'm working with something along the lines of: function FileSlicer(file) { // randomly picked 1MB slices, // I don't think this size is important for this experiment this.sliceSize =

How to email large files using c# windows application

≯℡__Kan透↙ 提交于 2019-12-25 05:46:00
问题 I'm developing an windows application in which i need to send some files as attachment through email. Code public string SendMail(string mFrom, string mPass, string mTo, string mSub, string mMsg, string mFile, bool isDel) { string sql = ""; try { System.Net.Mail.MailAddress mailfrom = new System.Net.Mail.MailAddress(mFrom); System.Net.Mail.MailAddress mailto = new System.Net.Mail.MailAddress(mTo); System.Net.Mail.MailMessage newmsg = new System.Net.Mail.MailMessage(mailfrom, mailto); newmsg

Python Write dynamically huge files avoiding 100% CPU Usage

十年热恋 提交于 2019-12-24 16:33:48
问题 I am parsing a huge CSV approx 2 GB files with the help of this great stuff. Now have to generate dynamic files for each column in a new file where column name as file name. So I written this code to write the dynamic files: def write_CSV_dynamically(self, header, reader): """ :header - CSVs first row in string format :reader - CSVs all other rows in list format """ try: headerlist =header.split(',') #-- string headers zipof = lambda x, y: zip(x.split(','), y.split(',')) filename = "{}.csv"

Divide very large file into small ones following pattern

别说谁变了你拦得住时间么 提交于 2019-12-24 13:05:44
问题 I have been working on this problem with only little success so I am coming here to get some fresh advices. I am trying to extract the data of every scan into separate files. The problem is that after 3196 files created I receive the error message : awk “makes too many open files”. I understand that I need to close the files created by awk but I don't know how to do that. Text inputfile is looking like this (up to 80 000 Scan): Scan 1 11111 111 22222 221 ... Scan 2 11122 111 11122 111 ...

Pandas.read_csv() MemoryError

南楼画角 提交于 2019-12-24 11:49:15
问题 I have a 1gb csv file. The file has about 10000000(10 Mil) rows. I need to iterate through the rows to get the max of a few selected rows(based on a condition). The issue is reading the csv file. I use the Pandas package for Python. The read_csv() function throws the MemoryError while reading the csv file. 1) I have tried to split the file into chunks and read them, Now, the concat() function has a memory issue. tp = pd.read_csv('capture2.csv', iterator=True, chunksize=10000, dtype={

NegativeArraySizeException ANTLRv4

℡╲_俬逩灬. 提交于 2019-12-24 10:39:41
问题 I have a 10gb file and I need to parse it in Java, whereas the following error arises when I attempt to do this. java.lang.NegativeArraySizeException at java.util.Arrays.copyOf(Arrays.java:2894) at org.antlr.v4.runtime.ANTLRInputStream.load(ANTLRInputStream.java:123) at org.antlr.v4.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:86) at org.antlr.v4.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:82) at org.antlr.v4.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:90) How can