disk-io

Fast Linux File Count for a large number of files

社会主义新天地 提交于 2019-11-29 18:43:26
I'm trying to figure out the best way to find the number of files in a particular directory when there are a very large number of files ( > 100,000). When there are that many files, performing "ls | wc -l" takes quite a long time to execute. I believe this is because it's returning the names of all the files. I'm trying to take up as little of the disk IO as possible. I have experimented with some shell and Perl scripts to no avail. Any ideas? By default ls sorts the names, which can take a while if there are a lot of them. Also there will be no output until all of the names are read and

Store/retrieve a data structure

给你一囗甜甜゛ 提交于 2019-11-29 13:36:06
问题 I have implemented a suffix tree in Python to make full-text-searchs, and it's working really well. But there's a problem: the indexed text can be very big, so we won't be able to have the whole structure in RAM. IMAGE: Suffix tree for the word BANANAS (in my scenario, imagine a tree 100000 times bigger). So, researching a little bit about it I found the pickle module, a great Python module for "loading" and "dumping" objects from/into files, and guess what? It works wonderfully with my data

Why Mongodb performance better on Linux than on Windows?

痴心易碎 提交于 2019-11-29 07:40:37
I created a programme to test sharded MongoDB performance on Linux(Ubuntu) and Windows(Server2008). With inserting large quantity of records, Windows's disk's active time is very high(100%), then performance is very bad. But on Ubuntu, the disk's util% is 60%~70%, and the performance is better than Windows. Can I say MongoDB performance better on Linux? First: all of the filesystems abailable on Windows 2008 server are very, very inefficient. Compared to XFS or ext4, they are up to 40% slower when both the Windows and Linux file systems are optimized. Second: latency might be an issue. The

How to measure file read speed without caching?

为君一笑 提交于 2019-11-28 20:42:44
My java program spends most time by reading some files and I want to optimize it, e.g., by using concurrency, prefetching, memory mapped files , or whatever. Optimizing without benchmarking is a non-sense, so I benchmark. However, during the benchmark the whole file content gets cached in RAM, unlike in the real run. Thus the run-times of the benchmark are much smaller and most probably unrelated to the reality. I'd need to somehow tell the OS (Linux) not to cache the file content, or better to wipe out the cache before each benchmark run. Or maybe consume most of the available RAM (32 GB), so

Fast Linux File Count for a large number of files

左心房为你撑大大i 提交于 2019-11-28 13:26:51
问题 I'm trying to figure out the best way to find the number of files in a particular directory when there are a very large number of files ( > 100,000). When there are that many files, performing "ls | wc -l" takes quite a long time to execute. I believe this is because it's returning the names of all the files. I'm trying to take up as little of the disk IO as possible. I have experimented with some shell and Perl scripts to no avail. Any ideas? 回答1: By default ls sorts the names, which can

Why Mongodb performance better on Linux than on Windows?

拟墨画扇 提交于 2019-11-28 01:01:26
问题 I created a programme to test sharded MongoDB performance on Linux(Ubuntu) and Windows(Server2008). With inserting large quantity of records, Windows's disk's active time is very high(100%), then performance is very bad. But on Ubuntu, the disk's util% is 60%~70%, and the performance is better than Windows. Can I say MongoDB performance better on Linux? 回答1: First: all of the filesystems abailable on Windows 2008 server are very, very inefficient. Compared to XFS or ext4, they are up to 40%