Monday, June 11, 2007
Two-Phase-Commit Activation Protocol
Continuing on the previous post on "Guaranteed db activation in all JVM's before table unlock in HA-JDBC ?"...
Actually, I think just doing a synchronous notification is not sufficient. What is required is an atomic/transactional activation of the specified db in all nodes - either all commit or rollback.
This naturally brings up the thought of using 2PC (Two Phase Commit) protocol. Here is a proposal. Say we have a synchronizing node S(ync), and two other members F(oo) and B(ar):
From S's perspective:
Wouldn't that result in data inconsistency (since some JVM which are succesfully notified would start to write to all the active db's in the cluster, whereas some JVM which are not succesfully notified would continue to write to only some but not all active db's in the cluster) ?
Some interesting comments from Bela Ban, the JGroups Lead/ JBoss Clustering team:
Actually, I think just doing a synchronous notification is not sufficient. What is required is an atomic/transactional activation of the specified db in all nodes - either all commit or rollback.
This naturally brings up the thought of using 2PC (Two Phase Commit) protocol. Here is a proposal. Say we have a synchronizing node S(ync), and two other members F(oo) and B(ar):
2PC Activation Protocol
(Assuming tables in the active db are locked, and synchronization has just completed and S is now trying to atomically/transactionally activate the inactive db in all JVM's before unlocking the tables.)From S's perspective:
- S synchronously sends an "ActivationReady" message to F and B. If all such sending failed, simply aborts this 2PC activation procedure (ie inactive db stays inactive). If some but not all such sending failed, S sends an "ActivationAbort" message to others that it has previously successfully sent the ActivationReady message, and then simply aborts this 2PC activation procedure (ie inactive db stays inactive). Else proceed to (2).
- S synchronously sends an "ActivationCommit" message to F and B and then always activates the inactive db locally (regardless of whether any or all of the sending was successful.)
(Note that S always unlocks the tables in the active db afterwards, regardless of whether the 2PC activation has been aborted or not. However, if S detects any possible failure during commit or rollback, it must wait for the other nodes to timeout before unlocking the tables.)
- If it receives an ActivationReady message followed by an ActivationAbort message, do nothing.
- If it receives an ActivationReady message followed by an ActivationCommit message, activate the specified db locally.
- In the extreme case that after an ActivationReady message is received but failed to received a subsequent ActivationCommit or ActivationAbort message received within, say, 5 secs, deactivate all db's locally. (This is to prevent any possible compromise to data consistency caused by not knowing whether the db in concern has been activated in other JVM's or not.)
"I think you are over-engineering this a bit...But then what should we do in the case when the synchronous activation notification (presuming it has been implemented) to other JVM's resulted in partial success ? In other words, some get notified (and therefore have the inactive db activated) and some not ?
Database activation does not benefit from two-phase commit (2pc). The purpose of 2pc is to provide distributed consistency to a fallible operation. The operation in this case, activating a database, is not fallible - it is merely the insertion of a element into a collection, i.e. Balancer.add(...).
If you are curious about an instance where 2pc is warranted, check out the DistributableLockManager in HA-JDBC 2.0, which utilizes the JGroups TwoPhaseVotingAdapter."
Wouldn't that result in data inconsistency (since some JVM which are succesfully notified would start to write to all the active db's in the cluster, whereas some JVM which are not succesfully notified would continue to write to only some but not all active db's in the cluster) ?
Some interesting comments from Bela Ban, the JGroups Lead/ JBoss Clustering team:
"Yes, if you want atomicity guarantees, you either have to use a uniform protocol (a message is only delivered if all receivers acked it, otherwise it will not get delivered at anyone) or 2PC as you outlined below. Uniform delivery is not part of JGroups, although it is on the todo list (http://jira.jboss.com/jira/browse/JGRP-138). I once had an impl, but it was too slow so I trashed it.More tips from Bela Ban:
2PC is probably the way to go. I've done this in JBossCache: if you look at the code that is run when you have (a) a transaction and (b)synchronous method calls, then I pretty much run the classic PREPARE --> success? COMMIT : ROLLBACK pattern.
What you describe is such an implementation, but I guess that's something to be discussed in the scope of HA-JDBC. JGroups just provides the reliable transport with retransmission (lossless transport) and ordering."
"The RequestCorrelator is used by both MessageDispatcher and RpcDispatcher, it is at a lower abstraction level. I would suggest RpcDispatcher (for sync or async cluster-wide method calls) or MessageDispatcher (for sync or async cluster-wide message sending). Note that RpcDispatcher uses MessageDispatcher which uses RequestCorrelator.What do you think ?
In both RpcDispatcher.callRemoteMethods() and MessageDispatcher.castMessage() you get back an RspList which contains the result per member:Paul, I'd suggest you use an {Rpc/Message}Dispatcher directly rather than NotificationBus. NB was designed to be a simple asynchronous notification mechanism."
- The result (null if method was void)
- Whether the member was suspected, e.g. if you invoke foo() on {A,B,C,D}, and C crashes while you wait for acks, then C will be flagged as suspected in the response list
- Whether we received an ack (if the call is bounded by a timeout)
Comments:
<< Home
We have a distributed X.25 data switch system where we use 2PC via Jini transactions to sync all changes between systems into their postgres databases. At startup, the clients which provide editability look up via jini lookup, all machines that are visible to them, and then use the "status" of each to create a voting mechanism of which machines are part of the collection and which aren't. A machine can never vote itself active, and at least 2 machines have to vote something active for it to be active.
The state of participants is transactionally recorded into each machines postgres database so that as changes occur, all active machines know what is active. When a failed machine is resurrected, it can't be a participant unless at least two machines believe it is.
We copy databases between machines, on activation using ssh to pipe a pg_dump across the network to the activated machine.
Post a Comment
The state of participants is transactionally recorded into each machines postgres database so that as changes occur, all active machines know what is active. When a failed machine is resurrected, it can't be a participant unless at least two machines believe it is.
We copy databases between machines, on activation using ssh to pipe a pg_dump across the network to the activated machine.
<< Home