Methods for storing metadata associated with individual files?

后端 未结 5 2081
太阳男子
太阳男子 2020-12-31 16:02

Given a collection of files which will have associated metadata, what are the recommended methods for storing this metadata?

Some files formats support storing metad

相关标签:
5条回答
  • 2020-12-31 16:24

    Plain text has some obvious advantages over anything else. Something like

    FileName = 'ferrari.gif'
    Title = 'My brand new car'
    Tags = 'cars', 'cool'
    Related = 'michaelknight.mp3'
    

    Picasa's Picasa.ini files are a good example for this kind of metadata. Also, instead of inventing your own format, XML might be worth considering. There are plenty of readily available DOM processors to deal with this format.

    Then again, if the amount of files and relations between them is huge, databases may be better.

    0 讨论(0)
  • 2020-12-31 16:31

    One option might be a relational database, structured like this:

    FILE
    f_id
    f_location
    f_title
    f_description
    
    ATTRIBUTE
    a_id
    a_label
    
    VALUE
    v_id
    v_label
    
    METADATA
    md_file
    md_attribute
    md_value
    

    This implementation has some unique information (title/description), but is primarily targetted at repetitive groups of data.

    For some requirements, other less generic tables may be more useful.


    This has advantages of this being that relational databases are very common, and obviously very good at handling relationships and storing lots of data.

    However, for some uses a database server brings an overhead which might not be desirable. Also, the database server is distinct from the files - they do not sit together, and require different methods of interaction.

    Databases do not (easily) sit under version control - which may be a good or bad thing, depending on your point of view and specific needs.

    0 讨论(0)
  • 2020-12-31 16:35

    I think the "solution" depends greatly upon what you're going to be doing with the metadata.

    For example, almost all of the metadata we store (Multiple datasets of scientific data) are all chopped up and stored in a database. This allows us to create datasets to preserve the common metadata between the files (as you say, categories and tags) while we have file specific structures (title, start/stop time, min/max values etc.) While we could keep these in hidden files, we do a lot of searching and open our interface to outside consumers via web services.

    If you're storing metadata that isn't going to be searched on, hidden files or a dedicated .xml file per "real" file isn't a bad route to take. It's readable by basically anything, can be converted to different formats easily, and won't be lost if you decide to change your storage mechanism.

    Metadata should help you, not hinder you. I've seen (and been a part of) systems where metadata storage has become more burdensome than storing the actual data, and became a liability. Just keep in mind what you are trying to do with it, and don't over extend yourself with "what ifs."

    0 讨论(0)
  • 2020-12-31 16:41

    To store metadata in database has some advantages but main problem with database is that metadata are not directly connected to your data. It is more robust if metada stay with data - like special file in the directory or something like that.

    Some filesystems offer special functionality that can be used for metadata - like NTFS Alternate streams. Unfortunately, this can be used for metadata storage in special cases only, because those streams can be easily lost when copying data to storage system that does not support it. I believe that linux filesystems have also similar storage mechanism.

    Anyway, most common solutions are :

    • separate hidden file(s) (per directory) that hold metadata
    • some application use special hidden directory with metadata (like subversion, cvs etc).
    • or database (of various kinds) for all application specific metada - this database can be used also for caching purposes in most cases

    IMO there is no general purpose solution. I would choose storage of metadata in hidden file (robustness) with use of the database for fast access and caching.

    0 讨论(0)
  • 2020-12-31 16:43

    I would basically make a metadata DB which held this information:

    RESOURCE_TABLE
    RESOURCE_ID
    RESOURCE_TYPE (folder, doctype, web link, other)
    RESOURCE_URL (any URL)

    NOTES_TABLE
    NOTE_ID
    RESOURCE_NO
    RESOURCE_NOTE (long text)

    TAGS_TABLE
    TAG_ID
    RESOURCE_NO
    TAG_TEXT

    Then I would use the note field textual notes to the file/folder/resource. Choose if you would use 1:1 or 1:N for this.

    The tags field I would use to store any number of searchable parameters like YEAR, PROJECT, and other values that will describe and group your content.

    Then you could add tables for owner, stakeholders, and other organisation info etc.

    0 讨论(0)
提交回复
热议问题