Using a Filesystem (Not a Database!) for Schemaless Data - Best Practices

后端 未结 4 526
挽巷
挽巷 2021-02-05 03:25

After reading over my other question, Using a Relational Database for Schema-Less Data, I began to wonder if a filesystem is more appropriate than a relational database for stor

相关标签:
4条回答
  • 2021-02-05 04:06

    One thing you may want to take into consideration is Oracle's BFILE datatype, which is a pointer to a file on disk. Perhaps that might be the best of both worlds? Microsoft SQL server doesn't seem to offer this capability.

    0 讨论(0)
  • 2021-02-05 04:12

    You are welcome to take a look at our Solid File System, which is a virtual file system product with built-in support for file metadata and SQL-like search mechanism that searches through this data. Also please read the article that describes the benefits of storing different types of data in different kinds of storages.

    0 讨论(0)
  • 2021-02-05 04:27

    There's a big example of an implementation at Amazon's S3.

    http://aws.amazon.com/s3/

    This sort of implementation is where a lot of companies are moving towards, because it scales fundamentally better than a relational database can. The approach is simple, and it works, and for some problems, it's a great solution. In the case of Amazon's S3, it's particularly nice for cloud storage, if you don't want to have to worry about the hassles of storing the data yourself.

    0 讨论(0)
  • 2021-02-05 04:30

    Yes a filesystem could be taken as a special case of a NOSQL-like database system. It may have some limitations that should be considered during any design decisions:

    pros: - - simple, intuitive.

    • takes advantage of years of tuning and caching algorithms
    • easy backup, potentially easy clustering

    things to think about:

    • richness of metadata - what types of data does it store, how does it let you query them, can you have hierarchal or multivalued attributes

    • speed of querying metadata - not all fs's are particularly well optimized with anything other than size, dates.

    • inability to join queries (though that's pretty much common to NoSQL)

    • inefficient storage usage (unless the file system performs block suballocation, you'll typically blow 4-16K per item stored regardless of size)

    • May not have the kind of caching algorithm you want for it's directory structure
    • tends to be less tunable, etc.
    • backup solutions may have trouble depending on how you store things - too deep, too many items per node, etc - which might obviate an obvious advantage of such a structure. locking for a LOCAL filesystem works pretty well of course if you call the right routines, but not necessarily for a network base fileesytem (those problems have been solved in various ways, but it's certainly a design issue)
    0 讨论(0)
提交回复
热议问题