large-data-volumes | 易学教程

How to plot large data vectors accurately at all zoom levels in real time?

阅读更多关于 How to plot large data vectors accurately at all zoom levels in real time?

I have large data sets (10 Hz data, so 864k points per 24 Hours) which I need to plot in real time. The idea is the user can zoom and pan into highly detailed scatter plots. The data is not very continuous and there are spikes. Since the data set is so large, I can't plot every point each time the plot refreshes. But I also can't just plot every nth point or else I will miss major features like large but short spikes. Matlab does it right. You can give it a 864k vector full of zeros and just set any one point to 1 and it will plot correctly in real-time with zooms and pans. How does Matlab do

Docker Data Volume Container - Can I share across swarm

阅读更多关于 Docker Data Volume Container - Can I share across swarm

I know how to create and mount a data volume container to multiple other containers using --volumes-from, but I do have a few questions regarding it's usage and limitations: Situation: I am looking to use a data volume container to store user uploaded images in for my web application. This data volume container will be used/mounted by many other containers running the web frontend. Questions: Can data volume containers be used/mounted in containers residing on other hosts within a docker swarm? How is the performance? is it recommended to structure things this way? Is there a better way to

what changes when your input is giga/terabyte sized?

阅读更多关于 what changes when your input is giga/terabyte sized?

I just took my first baby step today into real scientific computing today when I was shown a data set where the smallest file is 48000 fields by 1600 rows (haplotypes for several people, for chromosome 22). And this is considered tiny. I write Python, so I've spent the last few hours reading about HDF5, and Numpy, and PyTable, but I still feel like I'm not really grokking what a terabyte-sized data set actually means for me as a programmer. For example, someone pointed out that with larger data sets, it becomes impossible to read the whole thing into memory, not because the machine has

NTFS directory has 100K entries. How much performance boost if spread over 100 subdirectories?

阅读更多关于 NTFS directory has 100K entries. How much performance boost if spread over 100 subdirectories?

Context We have a homegrown filesystem-backed caching library. We currently have performance problems with one installation due to large number of entries (e.g. up to 100,000). The problem: we store all fs entries in one "cache directory". Very large directories perform poorly. We're looking at spreading those entries over subdirectories--as git does, e.g. 100 subdirectories with ~ 1,000 entries each. The question I understand that smaller directories sizes will help with filesystem access. But will "spreading into subdirectories" speed up traversing all entries, e.g. enumerating/reading all

SQL Server table structure for storing a large number of images

阅读更多关于 SQL Server table structure for storing a large number of images

问题 What's the best practice for storing a large amount of image data in SQL Server 2008? I'm expecting to store around 50,000 images using approx 5 gigs of storage space. Currently I'm doing this using a single table with the columns: ID: int/PK/identity Picture: Image Thumbnail: Image UploadDate: DateTime I'm concerned because at around 10% of my expected total capacity it seems like inserts are taking a long time. A typical image is around 20k - 30k. Is there a better logical structure to

javascript to find memory available

阅读更多关于 javascript to find memory available

Let's make it immediately clear: this is not a question about memory leak! I have a page which allows the user to enter some data and a JavaScript to handle this data and produce a result. The JavaScript produces incremental outputs on a DIV, something like this: (function() { var newdiv = document.createElement("div"); newdiv.innerHTML = produceAnswer(); result.appendChild(newdiv); if (done) { return; } else { setTimeout(arguments.callee, 0); } })(); Under certain circumstances the computation will produce so much data that IE8 will fail with this message: not enough storage when dealing with

Bad idea to transfer large payload using web services?

阅读更多关于 Bad idea to transfer large payload using web services?

I gather that there basically isn't a limit to the amount of data that can be sent when using REST via a POST or GET. While I haven't used REST or web services it seems that most services involve transferring limited amounts of data. If you want to transfer 1-5MB worth of data (in either direction) are web services considered a bad idea? Update : The apps that we are considering connecting via a REST service are internal apps. We do have the option of picking other connectivity options (ie: RMI) 1-5mb using rest isn't really that large of a dataset. The limiting factor is likely memory.

Bad idea to transfer large payload using web services?

阅读更多关于 Bad idea to transfer large payload using web services?

问题 I gather that there basically isn't a limit to the amount of data that can be sent when using REST via a POST or GET. While I haven't used REST or web services it seems that most services involve transferring limited amounts of data. If you want to transfer 1-5MB worth of data (in either direction) are web services considered a bad idea? Update : The apps that we are considering connecting via a REST service are internal apps. We do have the option of picking other connectivity options (ie:

Processing apache logs quickly

阅读更多关于 Processing apache logs quickly

I'm currently running an awk script to process a large (8.1GB) access-log file, and it's taking forever to finish. In 20 minutes, it wrote 14MB of the (1000 +- 500)MB I expect it to write, and I wonder if I can process it much faster somehow. Here is the awk script: #!/bin/bash awk '{t=$4" "$5; gsub("[\[\]\/]"," ",t); sub(":"," ",t);printf("%s,",$1);system("date -d \""t"\" +%s");}' $1 EDIT: For non-awkers, the script reads each line, gets the date information, modifies it to a format the utility date recognizes and calls it to represent the date as the number of seconds since 1970, finally

Need to compare very large files around 1.5GB in python

阅读更多关于 Need to compare very large files around 1.5GB in python

"DF","00000000@11111.COM","FLTINT1000130394756","26JUL2010","B2C","6799.2" "Rail","00000.POO@GMAIL.COM","NR251764697478","24JUN2011","B2C","2025" "DF","0000650000@YAHOO.COM","NF2513521438550","01JAN2013","B2C","6792" "Bus","00009.GAURAV@GMAIL.COM","NU27012932319739","26JAN2013","B2C","800" "Rail","0000.ANU@GMAIL.COM","NR251764697526","24JUN2011","B2C","595" "Rail","0000MANNU@GMAIL.COM","NR251277005737","29OCT2011","B2C","957" "Rail","0000PRANNOY0000@GMAIL.COM","NR251297862893","21NOV2011","B2C","212" "DF","0000PRANNOY0000@YAHOO.CO.IN","NF251327485543","26JUN2011","B2C","17080" "Rail",