

D
atabases, Replication, Locusts And Floods
Barry Nance
Database replication can be both a blessing and a curse. It can help you distribute your system's workload across multiple database servers. It also can eat your network alive like a horde of locusts. The design of your replication scheme is a key factor, and that design depends more on your business requirements (What data needs to be at which location? How volatile is it?) than on technological or relational considerations.
At best, a poorly designed replication scheme can cause torrential floods of network traffic. At worst, it can produce out-of-sync databases that offer different answers when different clients issue the same query. (Actually, the worst design is a recursive, circular replication scheme that points back within itself. We assume that you know your schema well enough to avoid recursion). But a well-designed r
eplication scheme updates distributed data on a timely basis and uses the network efficiently.
Before LAN-based relational database management systems (RDBMS) products gained the ability to replicate automatically, administrators manually replicated database contents. The database administrator checked the system, made backup copies of the files and gave those copies to each remote site's administrator. Today's replication technology allows the distribution and exchange of data among active databases without the need for an administrator at every site.
A replication scheme can send the entire contents of a table to remote servers (a complete refresh), or it can send only changed data--what Oracle Corp. terms a quick refresh. Keep in mind, however, that replication distributes transactions, not data. Even for a complete refresh, the replication mechanism copies entries from the transaction log of the source database server to the distributed servers, which apply th
em in the same order as they were appli
ed at the source database.
Replicated data is read-only and flows from a source location to remote locations. The act of replication copies data from its owner (the group that updates the data) to other groups that make use of that data. Because the process of copying transaction log entries to a remote site and applying those changes at that remote site takes a finite amount of time, replication is an inherently asynchronous operation. Even when the replication mechanism offers a synchronous option, be aware that the data can have only one owner at a time.
To find out how replication can affect a network, we put Oracle7.3 and SQL Server 6.5 to work. For each database, depending on the replication scheme tested, we configured up to five concurrently running database servers. Four of these were Pentium-class machines and one, to see the effect of CPU speed on replication, was a slower 486. The four schemes we exercised were: one master and three slaves; three masters and one slave (the "corporate roll-up
" scenario); a master server, intermediate server and three slaves; and for variety, the use of Oracle as a replication slave to a SQL Server master via Open Database Connectivity (ODBC). For all tests, we used Network General Corp.'s Sniffer software, running on a Dolch Computer Systems PAC63C computer, to monitor results. TCP/IP, IPX and NetBEUI, running on a 10-Mbps Ethernet segment, carried the replication traffic.
Our client driver software merely issued SQL insert, update and delete statements with data based on generated random numbers. A command-line argument specified the interval between database accesses, such as 10 seconds, five seconds, one second or (most rapid) 1/10 of a second. The test database contained three tables, with each table holding five columns--a numeric primary key and four character-type fields. Note that Oracle's replication feature can't handle some data types, such as LONG and LONG RAW. SQL Server 6.5 can replicate large binary (image) da
ta but imposes a configurable limit
on the size of data within each column. For character-mode synchronization, be aware that SQL Server delimits fields and rows with tab and line-feed characters. Columns containing tabs or line feeds can cause SQL Server to become confused; avoid the confusion by explicitly specifying unique delimiters.
Replicated Network Traffic
With the Sniffer, we monitored station-to-station traffic for each of the four replication design models. In the one-to-many model (one master, three slaves), as we increased the rate of database activity and shortened the refresh interval, we noticed that replication caused pending SQL statements to pile up in a queue at every slave database server. This behavior indicated the bottleneck is actually the CPU, not the network.

For the Side Bar on
Databas
e Replication Terms
Multihoming & NT: A WINning Combination
By Alan Hasling and Jay Milne
Remote Management of NT Servers
By Kiran Movva
Updated September 24, 1997
 |