I am trying to design a data model which can hold a very large amount of data, does anyone with experience in large volumes of data have any feedback on this, ie:
The arrangement you describe should work fine. If your entity group grows excessively big (we're talking hundreds of megabytes of transactions before this becomes an issue), you could write a procedure to 'roll up' old transactions: transactionally replace a set of old transaction records with a single one for the sum of those transactions, in order to maintain the invariant that the balance is equal to the sum of all transactions. If you still need to store a record of these old, 'rolled up' transactions, you can make a copy of them in a separate entity group before you perform the roll-up.
You are correct that Transaction
and TransactionAccount
must be in the same entity group in order to do the transactional insert and update operation.
The reason to shard is to reduce write contention but you say this will be a low write entity, so sharding is not needed here.
To keep the size of your entity groups down, you can device some type of archiving process. For example, if this is for a bank account, then when the monthly statement is generated you could archive that month's worth of transactions.