Strategies for Java ORM with Unreliable Network and Low Bandwidth

梦想与她 提交于 2019-12-10 12:57:18

问题


I am looking at Hibernate for a system which needs to work in an unreliable network. There is a single central database that we need read-write access to, but it is available over a pretty patchy wi-fi network. In addition, there may be power losses which do not shutdown the application cleanly, so any solution must have a persistent cache which can survive power-cycles. Lastly this is an embedded system with only modest memory, and disk space so for example doing full blown replication of the database is not a feasible strategy.

I have a basic understanding of Hibernate 2nd Level caching, and I am wondering if it is possible to configure this with something like Ehcache to solve this problem, but the main thrust of that seems to be performance not availability, so I am not aware of what the pitfalls might be.

I am also quite willing to consider other strategies which involve replication to a local database. I would rather not have to do too much of the heavy lifting myself to implement this.

Looking for some experience or possible alternatives.


回答1:


"In addition, there may be power losses which do not shutdown the application cleanly, so any solution must have a persistent cache which can survive power-cycles."

You already have a solution in your mind with Hibernate level 2 cache. But you didn't say what are the real requirements. You have an unrealiable network. That's OK, you have unrealiable power supply. That's Ok too. Now what level of service do you want to achieve ? What is acceptable or not ?

Is data loss acceptable ? How much could you accept ? What risk do you accept ?

To be more explicit, let say you have a local replica of the database or at least part of it. Let say you know how to queue/save modification made locally. Let say you store theses modification on a harddrive so to be safe in case of power failure. Let say you are able to merge changes with the main database when connection is avaialable again.

That's already a lot of assumptions. Ok but what happens if one harddrive fail after a powerfailure ? You know that harddrive don't like power failure and tend to be corrupted on power failure or even can be damaged ?

So you put on a RAID, and add an uninterruptible power supply. That's nice. Your detect power failure event from the OS. Finish your current transaction and correctly shutdown. You RAID protect you from a disk failure.

Ok, but what happens if the whole computer stop functionning ? What happens in case of fire ? Or water damage ? All disk will be managed, data unrecoverable and what is not synchronized with the central database is lost. Is it acceptable or not ?

Even when the wifi is on, the power supply work perfectly... What is the reliability of the central database anyway ? Do you have regular backups ? Or a clustering solution ? Are you sure your central database is reliable anyway ?

From a Database point of view, it is easy to use a cluster or backup and use transactions to ensure dataconsistency. You can still loose data (if not using a cluster in particular), but you should be able to recover up to the last backup for exemple.

But if you want to work offline (with database not available), and you are not the only one that can modify the database, conflicts WILL occurs. This is no longer a cache, hibernate or anything technical problem.

This is functional problem. What to do when several modifications occurs offline and you have to merge ? What is acceptable ? What is not. This might be that on reconnect, the most recent change apply, older changes are discarded. Or ptential conflicts are detected and prompts user to deal with them. You can try to apply queued change and apply all of them...

I would tend to consider that you can offer an "offline mode" but your users must be aware they are offline, and should have a notification when the change are being made permanent on central database with eventual conflict resolution. But that my point of view.




回答2:


You can't expect to succeed with a network like that between hibernate and the database.

I recommend that you define a set of high-level atomic operations, and then define a set of (e.g.) restful services for them. Or, if you like, you can use soap and look into the WS-* options for reliable messaging to take care of retries and all the other messy details.

Or, you could investigate whether something like cassandra across the link would work better than SQL, or something else big on replication.




回答3:


How about queuing up db operations on a durable/persistent message queue, and let some messaging middleware handle the network problem?

Depending on how you do it, consistency problems (well, "anomaly" is the right word I guess) can arise, but if you have unreliable network and still want decent performance, then settling for relaxed consistency could be the way to go.

I would be hesitant to use EhCache etc. They were not designed for this and hence you might have to "stretch" the framework. Message queues on the other hand have solutions that were designed for such scenarios.




回答4:


If it were just a case of sporadic connection between the two machines, I would recommend keeping a transaction log that can be played back and each entry marked as processed. The limited memory may make that difficult, though.

Maybe you can store the transaction log compressed, though.




回答5:


Hibernate (and the second level cache) are really not designed for this. My guess is that you would probably be best off using a small-scale embedded Java RDBMS (e.g. H2 or HSQLDB) as your local temporary queue (in the most durable mode available) and then do the sync with a background thread. You could then provide a sync spinner UI hooked up to that background thread to provide some degree of feedback for the user.

Incidentally, Hibernate is a bit fat to dump into an embedded environment. You might want to consider myBatis instead.




回答6:


The Daffodil Replicator (http://enterprise.replicator.daffodilsw.com/index.html) allows replication between JDBC sources. It support bidirectional updates, merging and conflict resolution and partial replicas.

This can be used to synchronize the main database with a local (partial) replica. You can use hibernate to talk to the local replica database and have everything else done outside of that process.



来源:https://stackoverflow.com/questions/5845163/strategies-for-java-orm-with-unreliable-network-and-low-bandwidth

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!