large-files

Comparing multiple csv files and finding matches

折月煮酒 提交于 2019-12-11 23:49:56
问题 I have two folders with csv files. A group of 'master' files and a group of 'unmatched' files. Within the master files (~25 files, about 50,000 lines in total), there are unique ids. Each row of the unmatched files (~250 files, about 700,000 lines in total) should have an id in the row that matches a single id in one of the master files. Within each of the unmatched files, all id's should match with a single master file. Further, all ids in the unmatched should fall within a single master.

optimization of sequential i/o operations on large file sizes

£可爱£侵袭症+ 提交于 2019-12-11 09:14:09
问题 Compiler: Microsoft C++ 2005 Hardware: AMD 64-bit (16 GB) Sequential, read-only access from an 18GB file is committed with the following timing, file access, and file structure characteristics: 18,184,359,164 (file length) 11,240,476,672 (ntfs compressed file length) Time File Method Disk 14:33? compressed fstream fixed disk 14:06 normal fstream fixed disk 12:22 normal winapi fixed disk 11:47 compressed winapi fixed disk 11:29 compressed fstream ram disk 10:37 compressed winapi ram disk 7:18

Serving large files (>2GB) with libevent on 32-bit system

拈花ヽ惹草 提交于 2019-12-11 09:02:25
问题 Preamble: lightweight http server written in C based on libevent v2 (evhttp), Linux, ARM, glibc2.3.4 I'm trying to serve big files (over 2GB) using evbuffer_add_file() on 32 bit system. The libevent was compiled with -D_FILE_OFFSET_BITS=64 flag. Here is the simplified code: int fd = -1; if ((fd = open(path, O_RDONLY)) < 0) { // error handling } struct stat st; if (fstat(fd, &st) < 0) { // error handling } struct evbuffer *buffer = evbuffer_new(); evbuffer_set_flags(buffer, EVBUFFER_FLAG

When handling large file transfers in ASP.NET what precautions should you take?

依然范特西╮ 提交于 2019-12-11 07:05:42
问题 My ASP.NET application allows users to upload and download large files. Both procedures involve reading and writing filestreams. What should I do to ensure the application doesn't hang or crash when it handles a large file? Should the file operations be handled on a worker thread for example? 回答1: Make sure you properly buffer the files so that they don't take inordinate amounts of memory in the system. e.g. excerpt from a download application, inside the while loop that reads the file: //

Want to print maxm entry of every email against it in another column

倖福魔咒の 提交于 2019-12-11 05:34:28
问题 I have a huge File around 2 GB having more then 20million rows what i want is Input File will be like this 07.SHEKHAR@GMAIL.COM,1 07SHIBAJI@GMAIL.COM,1 07.SHINDE@GMAIL.COM,1 07.SHINDE@GMAIL.COM,2 07.SHINDE@GMAIL.COM,3 07.SHINDE@GMAIL.COM,4 07.SHINDE@GMAIL.COM,5 07.SHINDE@GMAIL.COM,6 07.SHINDE@GMAIL.COM,7 07.SHOBHIT@GMAIL.COM,1 07SKERCH@RUSKIN.AC.UK,1 07SONIA@GMAIL.COM,1 07SONIA@GMAIL.COM,2 07SONIA@GMAIL.COM,3 07SRAM@GMAIL.COM,1 07SRAM@GMAIL.COM,2 07.SUMANTA@GMAIL.COM,1 07SUPRIYO@GMAIL.COM,1

Random sampling from XML file into data frame in R

北城以北 提交于 2019-12-11 04:16:42
问题 How to get a sample of a given size from a large XML file in R? Unlike reading random lines, which is simple, it is necessary here to preserve the structure of the XML file for R to read it into a proper data.frame. A possible solution is to read the whole file and then sample rows, but is it possible to read only necessary chunks? A sample from the file: <?xml version="1.0" encoding="UTF-8"?> <products> <product> <sku>967190</sku> <productId>98611</productId> ... <listingId/> <sellerId/>

google earth fetchKml timeout

假如想象 提交于 2019-12-11 04:08:26
问题 I am calling the goolge earth api function 'fetchKml' via javascript. When fetching large files firefox gives me a popup that says "A script on this page may be busy, or it may have stopped responding. You can stop the script now, open the script in the debugger, or let the script continue." I noticed a similar question on google groups issue 331 ('fetchKml fails on slower connections or fast connections and large KML/KMZ files'). Well alas - that issue was in 2009. Now is 2012. How do I

Memory issue when Reading HUGE csv file, STORE as Person objects, Write into multiple cleaner/smaller CSV files

有些话、适合烂在心里 提交于 2019-12-11 03:04:53
问题 I have two text files with comma delimited values. One is 150MB and the other is 370MB, so these guys have three million+ rows of data. One document holds information about, let's say soft drink preferences, and the next might have information about, let's say hair colors. Example soft drinks data file, though in the real file the UniqueNames are NOT in order, nor are the dates: "UniqueName","softDrinkBrand","year" "001","diet pepsi","2004" "001","diet coke","2006" "001","diet pepsi","2004"

Creating Setup of large data with NSIS Script

风流意气都作罢 提交于 2019-12-10 23:46:30
问题 I am creating setup of large data approximetly 10 GB with NSIS Script and trying to create a single setup (exe). Its giving an Error - Internal compiler error #12345: error mmapping file (xxxxxxxxxx, xxxxxxxx) is out of range. Note: you may have one or two (large) stale temporary file(s) left in your temporary directory (Generally this only happens on Windows 9x). Please tell me how to solve this issue ? Is there any other way to create a setup for this kinda situation ? 回答1: NSIS installers

Higher speed options for executing very large (20 GB) .sql file in MySQL

余生长醉 提交于 2019-12-10 22:16:54
问题 My firm was delivered a 20+ GB .sql file in reponse to a request for data from the gov't. I don't have many options for getting the data in a different format, so I need options for how to import it in a reasonable amount of time. I'm running it on a high end server (Win 2008 64bit, MySQL 5.1) using Navicat's batch execution tool. It's been running for 14 hours and shows no signs of being near completion. Does anyone know of any higher speed options for such a transaction? Or is this what I