I am in the process of building an mobile application (iPhone/Android) and want to store the application data onto Amazon's SimpleDB, because we do not want to host our own server to provide these services. I've been going through all of the documentation and the maximum storage size of element values is 1024 bytes.
In my case we need to store 1024 up to 10K of text data.
I was hoping to find out how other projects are using SimpleDB when they have larger storage needs like our project. I read that one could store pointers to files that are then stored in S3 (file system). Not sure if that is a good solution.
In my mind I am not sure if SimpleDB is the correct solution. Could anyone comment on what that have done or provide a different way to think about this problem?
There are ways to store your 10k text data but whether it will be acceptable will depend on what else you need to store and how you plan to use it.
If you need to store arbitrarily large data (especially binary data) then the S3 file pointer can be attractive. The value that SimpleDB adds in this scenario is the ability to run queries against the file metadata that you store in SimpleDB.
For text data limited to 10k I would recommend storing it directly in SimpleDB. It will easily fit in a single item but you'll have to spread it across multiple attributes. There are basically two ways to do this each with some draw backs.
One way is more flexible and search friendly but requires you to touch your data. You split your data up into chunks of about 1000 bytes and you store each chunk as an attribute value in a multi-valued attribute. There is no ordering imposed on multi-valued attributes so you have to prepend each chunk with a number for ordering (e.g. 01)
The fact that you have all the text stored in one attribute makes queries easy to do with a single attribute name in the predicate. You can add a different size text to each item anywhere from 1k to 200+k and it gets handled appropriately. But you do have to be aware that your prepended line numbers can pop positive for your queries (e.g. if you are searching for 01
every item will match that query).
The second way to store the text within SimpleDB does not require you to place arbitrary ordering data within your text chunks. You do the ordering by placing each text chunk in a different named attribute. For example you could use attribute names: desc01
desc02
... desc10
. Then you place each chunk in the appropriate attribute. You can still do full text search with both methods but the searches will be slower with this method because you will need to specify many predicates and SimpleDB will end up searching through a separate index for each attribute.
It may be easy to think of this type of work around as a hack because with databases we are used to having this type of low level detail handled for us within the database. SimpleDB is specifically designed to push this sort of thing out of the database and into the client as a means of providing availability as a first class feature.
If you found out that a relational database was splitting your text into 1k chunks to store on disk as an implementation detail it wouldn't seem like a hack. The problem is that the current state of SimpleDB clients is such that you have to implement a lot of this type of data formatting yourself. This is the type of thing that ideally will be handled for you in a smart client. There just aren't any smart clients freely available yet.
If you are concerned about cost, you might find that it is cheaper to put the text in S3 and metadata with pointers in SimpleDB.
You could put the 10k text on S3, then create an attribute that has all the unique words of the 10k of text as multiple values. Then searches would be fast. No phrase searching, though.
How many values can you store in one attribute in one 'row' (name)? I looked in the docs, no answer popped out at me.
--Tom
The upcoming release of Simple Savant (a C# persistence library for SimpleDB which I created) will support both attribute spanning as described by Mocky and full-text searches of SimpleDB data using Lucene.NET.
I realize you are probably not building your app in C#, but since your question is a top result when searching for SimpleDB and full-text indexing it seemed worth mentioning.
UPDATE: The Simple Savant release I mentioned above is now available.
SimpleDb is, well, simple. Everything in it is a string. The documentation is very straight-forward. And there are lots of usage restricts. Such as:
- You can only do a
SELECT * FROM ___ WHERE ItemName() IN (...)
with 20ItemName
s in theIN
. - You can only
PUT
(update) to 25 records at a time. - All reads are based on computation time. So if you do a
SELECT
with aLIMIT
of1000
it may return something like800
(or even nothing) along with anextToken
in which you need to make an additional request (with thenextToken
). This means that the nextSELECT
may actually return the limit count, so the sum of returned rows from the twoSELECT
s may be greater than your original limit. This is a concern if you are selecting a lot. Also, if you do aSELECT COUNT(*)
you will hit a similar problem. It will return you a count, along with anextToken
. And you need to keep iterating over thosenextToken
s and sum the returning counts to get the true (total) count. - All of these computation times will be largely affected by larger data in the store.
- If you end up having a large number of records you will likely have to shard your records across multiple domains
- Amazon will throttle your requests if you make too many on a single domain
So, if you plan to use large amounts of string-data, or have a lot of records, then you may want to look elsewhere. SimpleDb is very very reliable, and works as documented, but it can cause lots of headaches.
In your case I'd recommend something like MongoDb. It has its own share of problems as well, but may be better for this case. Though, if you have lots of records (millions and upward) and then try to add indexes to too many records you may break it if it's on spindels and not SSDs.
来源:https://stackoverflow.com/questions/980767/maximum-size-of-attributes-on-aws-simpledb