Google
 
Web unafbapune.blogspot.com

Thursday, April 05, 2007

 

Oracle Table Differential

Say you have two Oracle databases, db1 and db2, that contain the same schema. Every table T in db1 should contain the same records as that in db2 . Also, every table only allows read and insert operations (ie no updates or deletes), and every record has a creation date.

Now for some reasons db2 went down. Meanwhile there are continuously new records being inserted into T in db1. When db2 is back up, how do we figure out the set of records that need to be inserted back to T in db2 ?

Note you cannot assume db1 and db2 has the same clock time, but you can make the hypothetical assumption that the creation time of all the records in a specific table are in total chronological order. (In reality multiple records can be created with the same creation date/time, but let's get to that later.)

Here is one way of doing it. Let's call it the Differential Procedure:
db2Count = select count(*) from db2.T;
db1Count = select count(*) from db1.T;
diff = db1Count - db2Count;

select * from (
select * from (
select * from (select * from db1.T order by creation_date)
where rownum <= db1Count
order by creation_date desc))
where rownum <= diff
order by creation_date;
Can you think of a better/faster one ?

Updated on 23Oct - Actually the above SQL can be simpler:
select * from (
select * from (select * from db1.T order by creation_date)
where rownum <= db1Count
order by creation_date desc)
where rownum <= diff
order by creation_date;
Now back to reality. How do we handle those rows that got inserted into db1 as at the time when db2 went down ? They all have the same creation date in db1 but some of them may exist in db2 and some may not! Oh well, how about this:
  1. Figure out all those rows that were created as at the time when db2 went down:
  2. select * from db1.T where creation_date = (
    select max(creation_date) from (
    select creation_date from db1.T order by creation_date) where rownum <= db2Count)
  3. For each of these rows, insert it to db2 only if it doesn't already exist;

  4. Now apply the Differential Procedure to db1 and db2. This time, however, we got a set of records that must all exist in db1 but not in db2. Yum! Simply insert them all to db2.
Finally, if you are like me thinking about using HA-JDBC to cluster db1 and db2, how can we activate/synchronize db1 and db2 on the fly with minimal impact on the clients ? In other words, can synchronization be done correctly without the tables being locked ?

For example, take the hypothetical case above and say table T in db2 had 10 rows when it was deactivated. When db2 becomes ready for activation, db1 may have already got 100 rows. So to sync up, the extra 90 needs to be inserted in db2. Yet once that's done, db1 may have gone up to 110. The problem is db1 may just keep growing!

Is there a clever way to get around this dynamic changing synchronization problem yet without resorting to the use of table locking ? I can't really think of one :(

But how about this:
  1. Repeatedly apply a non-locking synchronization procedure, such as the Differential Procedure, until the delta of table T between the active and inactive db falls below a certain threshold;
  2. Lock table T of all active databases
  3. Apply the synchronization procedure to T for the last time
  4. Active the inactive db
  5. Release the table locks. (Just like what LocalDatabaseCluster#activate does.)
There will still be impact on the clients, but the lock duration is minimized.

What do you think ?

This page is powered by Blogger. Isn't yours?