I thought that I could use SimpleDB to take care of the most challenging area of my application (as far as scaling goes) - twitter-like comments, but with location on top - till
Amazon is trying to get you to implement a simple object database. This is primarily for speed reasons. Think of the SimpleDB records as being a pointer/key to an element in S3. This way you can run queries (slow against SimpleDB to get results lists or you can directly hit S3 with a key (fast) to pull the object when you need to retrieve or modify records one-at-a-time.
The limits seem to apply to the current Beta release. I assume they will allow larger databases in the future, after they figure out how they can serve the demand economically. Even with the limits, a database of 10GB that supports high scalability and reliability is a useful and cost-effective resource.
Note that scalability refers to the ability to keep a steady and shallow performance curve, while the volume of data or the volume of requests grows. It does not necessarily mean optimal performance, nor does it mean very high capacity data storage.
Amazon SimpleDB also offers a free service tier, so you can store up to 1GB, transfer up to 1GB/month, using up to 25 hours of machine time. While this limit sounds very low, the fact that it's free allows some low-scale customers to use the technology, without investing in a big server farm.
If the storage size per attribute is the problem you can use S3 to store larger data, and store the links to the s3 objects in SDB. S3 is not just for files, it's a generic storage solution.
I have about 50GB in SimpleDB, sharded across 30 domains. I use this to allow multiple keys on objects stored in S3, and also to reduce my S3 costs. I haven't played with using SimpleDB for full-text search, but I would not attempt it.
SimpleDB works, it's easy, and so on, but it isn't the right set of features for every situation. In your case, if you need aggregation, SimpleDB is not the right solution. It is built around the school of thought that the DB is just a key value store, and aggregation should be handled by an aggregation process that writes the results back to the key value store. This is exactly what is needed for some applications.
Here is a description of how I pinch pennies using SimpleDB
It's worth adding that while having to write your own sharding logic across domains is not ideal, it is in terms of performance. If for example you need to search across 100gb of data, it's better to ask 20 machines holding 5gb each to perform the same search on the portion they're responsible for, rather than one machine having to perform the entire task. If your goal is to end up with a sorted list, you can take the best results returned from the 20 simultaneous queries and collate them on the machine initiating the request.
That said, I would rather like to see this abstracted from normal use and have something like "hints" in the API if you want to get lower-level. So if you happen to store 100gb of data, let Amazon decide if it's partitioned across 20 machines or 10 or 40, and distribute the work. For example, in Google's BigTable design, as a table grows it's continually partitioned into 400mb tablets. Asking for a row from a table is as simple as that, and BigTable does the job of figuring out where in the one tablet or millions of tablets it lives.
Then again, BigTable requires you to write MapReduce calls to perform a query, while SimpleDB indexes itself dynamically for you, so you win some, you lose some.
I'm building a commericial .NET application which will use SimpleDB as its primary data store. I'm not yet in production, but I've also been building out an open-source library that addresses some of the issues with using SimpleDB vs an RDBS. Some of the features on my roadmap are related to the issues you've mentioned:
SimpleDB is still under active development and will certainly end up with many features it doesn't have today (some added to the core system and some in the code libraries).
The .NET library is Simple Savant.