Is it faster to access data from files or a database server? [closed]

前端未结

关注

 14  999

渐次进展

相关标签:

14条回答

青春惊慌失措

2020-12-04 15:55

To quickly access files, depending on what you are doing, an mmap can be very handy. I just wrote about this in the Effective Perl blog as Memory-map files instead of slurping them.

However, I expect that a database server would be much faster. It's difficult to say what would be faster for you when we have no idea what you are doing, what sort of data you need to access, and so on.

0 讨论(0)
发布评论:

提交评论
- 加载中...
北荒

2020-12-04 15:55

Interesting insights.

Tx Guys And gals..

It seems like it really does depend on the entirety of the solutions design and usage( hence I don’t always like agile if I can’t see the requirements ahead nicely)

I wanted to add other considerations from the mainframe approach.

Generally we deal with large data and small windows so performance is very key, but so is reliability, maintainability , sustainability , scalability ( Ja all abilities :-) and also consistency and integrity .. ok I think you get the picture.

What we learned here is that generally when you process in a batch mode ( bulk data arrives at a single point in time ..I.e. like all debits due on month end date) we will use files. Normally indexed or flat depending on wether you need keyed access or read every last record respectively. The files are generally pre allocated and defined and careful consideration is given to block sizes to align pages. Bear in mind a flat read will actually retrieve blocks, which may contain multiple records in sequence for every read command.therefore if aligned Properly it will not cause the file system Manager to have to do work when having to retrieve in less optimal ways(unaligned record sizes) or to dynamically have rearrange it’s allocation and index etc..( very similar to GC) or to lock records at page or block level if you cannot do dirty/uncommitted reads etc..or to make space in the middle for something while running an index file with random inserts etc.. So sometimes even when there is a large percentage of say keyed inserts from one input to another or matching between the two, you can also use flat files and do Things like low key logic matching as you run using the most probable file as your driver file.. We sometimes even find it more efficient to dump the DB to a flat file and then process and reload after( if you gotta hit all record ma for example) Or sometimes even reading the underlying DB files directly..( ps! also a nice option for resilience is that sometimes if the DB connection or something cause it to be unavailable and you need to continue reading then check the return code and read the dB filesystem Directly when return code is on error).. Driver file logic (which drives the main logic . I.e. for every one of the records on that master file do ...XYZ..) is generally as a rule of thumb defined for flat file processing and these files are typically pre sorted in a pre step, using high speed assembly based sort utilities to suit the logic breaks expected in the processing of the records( I.e. you want to process per client then pre sort on client or per branch then pre sort On Branch etc...etc).

Anyway , when it comes to high speed real-time requirements( opposite of batch bulk) we would normally use DB. Mostly this is related to UI , transactional( messaging) / event driven / interactive type use cases and hence better suited for direct/keyed access and it hardly needs the same responses a batch process requires. You also have the possibility to read and lock at record level automatically to ensure integrity(you can also go faster if you don’t care about integrity and can afford to read uncommitted/dirty) ...

If you think about it, the OS is filesystem based and therefore the DB has to ultimately be filesystem based.. however the DB is extremely optimized for typical event/real time based, concurrent and high integrity requirement use cases .. A dev would have to sometimes do a lot to get all that functionality in a real-time/event driven scenario/use case.. think deadlocks , rollbacks , synchpoints , concurrency etc..over a complexed flow and to still remain highly responsive..

Writing this just made me realize that there is much more to consider when deciding what to use and when and why .. Sometimes m, even the same data at different points is best suited to different choices.. I.e. you might acquire the data through front ends using DB’s or other easier direct keyed access methods that’s optimized for it, but you may want to drop that on the filesystem And processes a flat file if you have to do a monthly recon on that data at the specific point in time of the the month for example. It does seem to mean that all this is design dependent. Say for example if you can’t do certain things at the point when you do the transaction/event due to response requirements and have to defer till later and still keep integrity in place etc.. as opposed to being able to do everything in a simpler process and not having to defer etc.. it will also be very dependent on the domain.. I.e. will you know if google missed a random hit when returning results on search ? But I bet you will know when your Bank does not show a transaction ( especially a credit to you ..lol)...

0 讨论(0)
发布评论:

提交评论
- 加载中...
自闭症患者

2020-12-04 15:57
As a general rule, databases are slower than files.

If you require indexing of your files, a hard-coded access path on customised indexing structures will always have the potential to be faster if you do it correctly.

But 'performance' is not the goal when choosing a database over a file based solution.

You should ask yourself whether your system needs any of the benefits that a database would provide. If so, then the small performance overhead is quite acceptable.

So:
1. Do you need to deal with multiple users and concurrent updates? (Well; you did say it's static.)
2. Do you need flexibility in order to easily query the data from a variety of angles?
3. Do you have multiple users, and could gain from making use of an existing security model?
Basically, the question is more of which would be easier to develop. The performance difference between the two is not worth wasting dev time.
0 讨论(0)
发布评论:

提交评论
- 加载中...
夕颜

2020-12-04 16:04

I'm going to give you the same answer everyone else gave you, It Depends

In a simple scenario with a single server that returns data (READ Only), Yes file system will be great and easy to manage.

But, when you have more than one server you'll have to manage distributed files system like glusterfs, ceph, etc..

A database is a tool to manage all of it for you, distributed files system, compression, read/write, locks etc..

hope that's helpful.

0 讨论(0)
发布评论:

提交评论
- 加载中...
轮回少年

2020-12-04 16:05
As others have pointed out: it depends!

If you really need to find out which is going to be more performant for your purposes, you may want to generate some sample data to store in each format and then run some benchmarks. The Benchmark.pm module comes with Perl, and makes it fairly simple to do a side-by-side comparison with something like this:
```
use Benchmark qw(:all) ;

my $count = 1000;  # Some large-ish number of trials is recommended.

cmpthese($count, {
    'File System' => sub { ...your filesystem code... },
    'Database'    => sub { ...your database code... }
});
```
You can type perldoc Benchmark to get more complete documentation.
0 讨论(0)
发布评论:

提交评论
- 加载中...
鱼传尺愫

2020-12-04 16:05

database certainly can be faster,

quoting SQLite test,

SQLite reads and writes small blobs (for example, thumbnail images) 35% faster¹ than the same blobs can be read from or written to individual files on disk using fread() or fwrite().

Furthermore, a single SQLite database holding 10-kilobyte blobs uses about 20% less disk space than storing the blobs in individual files.

The performance difference arises (we believe) because when working from an SQLite database, the open() and close() system calls are invoked only once, whereas open() and close() are invoked once for each blob when using blobs stored in individual files. It appears that the overhead of calling open() and close() is greater than the overhead of using the database. The size reduction arises from the fact that individual files are padded out to the next multiple of the filesystem block size, whereas the blobs are packed more tightly into an SQLite database.

The measurements in this article were made during the week of 2017-06-05 using a version of SQLite in between 3.19.2 and 3.20.0. You may expect future versions of SQLite to perform even better.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题