large-files

Parsing a large XML file to multiple output xmls, using XmlReader - getting every other element

久未见 提交于 2019-12-13 08:52:30
问题 I need to take a very large XML file and create multiple output xml files from what could be thousands of repeating nodes of the input file. There is no whitespace in the source file "AnimalBatch.xml" which looks like this: <?xml version="1.0" encoding="utf-8" ?><Animals><Animal id="1001"><Quantity>One</Quantity><Adjective>Red</Adjective><Name>Rooster</Name></Animal><Animal id="1002"><Quantity>Two</Quantity><Adjective>Stubborn</Adjective><Name>Donkeys</Name></Animal><Animal id="1003">

GREP REGEX LARGE FILE

蹲街弑〆低调 提交于 2019-12-13 08:05:06
问题 On a MAC how do I GREP? I have a large TXT file (200MB). The sample data is below. I want to run a GREP with a regex and be able to get ONLY the following data values in my terminal response: 00424730350000190100130JEAN DANIELE & I want everything up to 82700 . Once I have this information, I can copy it into another file for other purpose. Now I just get back tons of information. Sample Record: 00424730350000190100130JEAN DANIELE & 82700 TINEPORK CT LAT BORAN AK 12345

Server fails when downloading large files with PHP

馋奶兔 提交于 2019-12-13 07:05:41
问题 I'm using the SFTP functions of PHPSecLib to download files from an FTP server. The line $sftp->get($fname); works if the file is up to 200MB, but if it's 300MB, the browser responds with "Firefox can't find the file at [download.php]". That is, it says it can't find the php file I use for downloading the remote file. At first I thought this was due to the memory_limit setting in php.ini, but it doesn't matter if it's set to 128M or 350M; 200MB files still work, and 300MB files fail. And it

Git - repository and file size limits

泄露秘密 提交于 2019-12-13 05:50:19
问题 I've read at various internet resources that Git is handling large files not very well, also, Git seems to have problems with large overall repository sizes. This seems to have initiated projects like git-annex, git-media, git-fat, git-bigfiles, and probably even more... However, after reading Git-Internals it looks to me, like Git's pack file concept should solve all the problems with large files. Q1: What's the fuss about large files in Git? Q2: What's the fuss about Git and large

How to process large images?

眉间皱痕 提交于 2019-12-13 05:19:16
问题 I need to process(export subsections) of a large image(33600x19200) and I'm not sure how to start. I've tried simply allocating an image using openframeworks but I got this error: terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc I'm not experienced with with processing images this large. Where should I start ? 回答1: std::bad_alloc occurs, because you dont have enough memory available to hold the whole image. In order to work with such big things, one has

FORTRAN: Best way to store large amount of data which is readable in MATLAB

拥有回忆 提交于 2019-12-13 05:13:22
问题 I am working on developing an application in Fortran where I have points defining quadrilateral panels on the surface of an object. I am calculating various parameters on these quadrilateral panels for a number of frequencies. The output file should look like: FREQUENCY,PANEL_NUMBER,X1,Y1,Z1,X2,Y2,Z2,X3,Y3,Z3,X4,Y4,Z4,AREA,PRESSURE,.... 0.01,1,.... 0.01,2,.... 0.01,3,.... . . . . 0.01,2000,.... 0.02,1,.... 0.02,2,.... . . . 0.02,2000,... . . I am expecting a maximum of 300,000 rows with 30

large file upload via Zuul

青春壹個敷衍的年華 提交于 2019-12-13 04:54:26
问题 I'm trying to upload a large file through Zuul. Basically I have the applications set up like this: UI: this is where the Zuul Gateway is located Backend: this is where the file must finally arrive. I used the functionality described here so everything works fine if I used "Transfer-Encoding: chunked". However, this can only be set via curl. I haven't found any way to set this header in the browser (the header is rejected with the error message in the console " Refused to set unsafe header ..

What is the easiest way to load a filtered .tda file using pandas?

眉间皱痕 提交于 2019-12-12 20:58:09
问题 Pandas has the excellent .read_table() function, but huge files result in a MemoryError. Since I only need to load the lines that satisfy a certain condition, I'm looking for a way to only load those. This could be done using a temporary file: with open(hugeTdaFile) as huge: with open(hugeTdaFile + ".partial.tmp", "w") as tmp: tmp.write(huge.readline()) # the header line for line in huge: if SomeCondition(line): tmp.write(line) t = pandas.read_table(tmp.name) Is there a way to avoid such a

Raw (binary) data too big to write to disk. How to write chunk-wise to disk (appending)?

帅比萌擦擦* 提交于 2019-12-12 18:14:43
问题 I have a large raw vector in R (i.e. array of binary data) that I want to write to disk, but I'm getting an error telling me the vector is too large. Here's a reproducible example and the error I get: > writeBin(raw(1024 * 1024 * 1024 * 2), "test.bin") Error in writeBin(raw(1024 * 1024 * 1024 * 2), "test.bin") : long vectors not supported yet: connections.c:4147 I've noticed that this is linked to the 2 GB file limit. If I try to write a single byte less (1024 * 1024 * 1024 * 2 - 1), it works

Nodejs Read very large file(~10GB), Process line by line then write to other file

自作多情 提交于 2019-12-12 13:12:42
问题 I have a 10 GB log file in a particular format, I want to process this file line by line and then write the output to other file after applying some transformations . I am using node for this operation. Though this method is fine but it takes a hell lot of time to do this. I was able to do this within 30-45 mins in JAVA, but in node it is taking more than 160 minutes to do the same job. Following is the code: Following is the initiation code which reads each line from the input. var path = '.