Thursday, April 05, 2007
Oracle Table Differential
Say you have two Oracle databases, db1 and db2, that contain the same schema. Every table T in db1 should contain the same records as that in db2 . Also, every table only allows read and insert operations (ie no updates or deletes), and every record has a creation date.
Now for some reasons db2 went down. Meanwhile there are continuously new records being inserted into T in db1. When db2 is back up, how do we figure out the set of records that need to be inserted back to T in db2 ?
Note you cannot assume db1 and db2 has the same clock time, but you can make the hypothetical assumption that the creation time of all the records in a specific table are in total chronological order. (In reality multiple records can be created with the same creation date/time, but let's get to that later.)
Here is one way of doing it. Let's call it the Differential Procedure:
Updated on 23Oct - Actually the above SQL can be simpler:
For example, take the hypothetical case above and say table T in db2 had 10 rows when it was deactivated. When db2 becomes ready for activation, db1 may have already got 100 rows. So to sync up, the extra 90 needs to be inserted in db2. Yet once that's done, db1 may have gone up to 110. The problem is db1 may just keep growing!
Is there a clever way to get around this dynamic changing synchronization problem yet without resorting to the use of table locking ? I can't really think of one :(
But how about this:
What do you think ?
Now for some reasons db2 went down. Meanwhile there are continuously new records being inserted into T in db1. When db2 is back up, how do we figure out the set of records that need to be inserted back to T in db2 ?
Note you cannot assume db1 and db2 has the same clock time, but you can make the hypothetical assumption that the creation time of all the records in a specific table are in total chronological order. (In reality multiple records can be created with the same creation date/time, but let's get to that later.)
Here is one way of doing it. Let's call it the Differential Procedure:
Can you think of a better/faster one ?db2Count = select count(*) from db2.T;
db1Count = select count(*) from db1.T;
diff = db1Count - db2Count;
select * from (
select * from (
select * from (select * from db1.T order by creation_date)
where rownum <= db1Count
order by creation_date desc))
where rownum <= diff
order by creation_date;
Updated on 23Oct - Actually the above SQL can be simpler:
Now back to reality. How do we handle those rows that got inserted into db1 as at the time when db2 went down ? They all have the same creation date in db1 but some of them may exist in db2 and some may not! Oh well, how about this:select * from (
select * from (select * from db1.T order by creation_date)
where rownum <= db1Count
order by creation_date desc)
where rownum <= diff
order by creation_date;
- Figure out all those rows that were created as at the time when db2 went down:
- For each of these rows, insert it to db2 only if it doesn't already exist;
- Now apply the Differential Procedure to db1 and db2. This time, however, we got a set of records that must all exist in db1 but not in db2. Yum! Simply insert them all to db2.
select * from db1.T where creation_date = (
select max(creation_date) from (
select creation_date from db1.T order by creation_date) where rownum <= db2Count)
For example, take the hypothetical case above and say table T in db2 had 10 rows when it was deactivated. When db2 becomes ready for activation, db1 may have already got 100 rows. So to sync up, the extra 90 needs to be inserted in db2. Yet once that's done, db1 may have gone up to 110. The problem is db1 may just keep growing!
Is there a clever way to get around this dynamic changing synchronization problem yet without resorting to the use of table locking ? I can't really think of one :(
But how about this:
- Repeatedly apply a non-locking synchronization procedure, such as the Differential Procedure, until the delta of table T between the active and inactive db falls below a certain threshold;
- Lock table T of all active databases
- Apply the synchronization procedure to T for the last time
- Active the inactive db
- Release the table locks. (Just like what LocalDatabaseCluster#activate does.)
What do you think ?
Comments:
<< Home
I think you'll get blown by the fact that the clients accessing the wrong database (with no record client's looking for) will receive an accidental error. Or you need to lock both databases for the duration.
See "every table only allows read and insert operations (ie no updates or deletes)". If there were, this procedure wouldn't work.
Well its a nice question that made me think also to find what to do to solve this problem. I have not heard about this fact and is feeling so nice to learn about this new concept of differential procedure. I am convinced with the solution that you have suggested. Thanks.
Post a Comment
<< Home