I want to keep a large ordered list (millions of elements) in Google App Engine datastore. Fast insertion is required.
The simplest way would be adding an indexed proper
alternatively, could you use decimals, or a string?
content order
--------------------
A 'a'
B 'b'
C 'c'
Then to insert D between a and b, give it the value 'aa'
An algorithm for generating the strings is best shown for a binary string: if you want to insert something between "1011" and "1100", do the following:
average, new value = 1+0*(1/2)+1*(1/4)+1*(1/8)+1*(1/16) new string = "10111"
content order
--------------------
A '1011'
new! '10111'
B '1100'
C '1101'
since you always average 2 values, the average will always have a finite binary development, and a finite string. It effectively defines a binary tree.
As you know binary trees don't always turn out well balanced, in other words, some strings will be much longer than others after enough insertions. To keep them short, you could use any even number base - it has to be even because then the development of any average of two values is finite.
But whatever you do, strings will probably become long, and you'll have to do some housekeeping at some point, cleaning up the values so that the string space is used efficiently. What this algorithm gives you is the certainty that between cleanups, the system will keep ticking along.
You probably want to consider using app-engine-ranklist, which uses a tree-based structure to maintain a rank order in the datastore.
Or, if you can describe your requirements in more detail, maybe we can suggest an alternative that involves less overhead.
You could make a giant linked-list.... with each entity pointing to the next one in the list.
It would be extremely slow to traverse the list later, but that might be acceptable depending on how you are using the data, and inserting into the list would only ever be two datastore writes (one to update the insertion point and one for your new entity).
In the database, your linked list can be done like this:
value (PK) predecessor
------------------------
A null
B A
C B
then when you insert new data, change the predecessor:
value (PK) predecessor
------------------------
A null
B A
C D
D B
Inserting is quick, but traversing will be slow indeed!