|
Top Down or Bottom Up?
Classically, the meat and potatoes of protocol analysis has been decoding and displaying file-service packets--file open, read and write operations. Protocol analyzers can display data-access packets, but in undecoded (hexadecimal byte-by-byte values) form. Several products decode routing information, address resolution, HTTP, and other network protocols, but the decoding of data-access packets hasn't gotten the attention it deserves. EcoSCOPE and Sniffer are notable exceptions. EcoSCOPE takes a top-down approach to the analysis of data-access traffic; Sniffer, a bottom-up approach.
The first step in network problem solving, establishing a baseline, happens before the problem even occurs. To baseline our network, we picked a week during which the network
appeared to behave itself. We gathered statistics showing minimum, maximum and average access times for the various database clients and documented the network configuration in effect during the baseline period. The few problems that cropped up during the collection period didn't invalidate the baseline because they were sporadic "can't connect" situations from which we quickly recovered. More serious issues, such as those requiring changes to database server profiles or settings, would have caused us to throw the partial baseline figures into the trash and start over.
As we experimented with the Oracle, SQL Server and DB2 database managers in our testing, we noticed the volume of network traffic to and from the database server greatly depended on the data-access protocol. Oracle's SQL*NET generated the least traffic while SQL Server's TDS caused the most. The network protocols we used made quite a difference, too. NetBEUI was the chattiest protocol, IPX/SPX considerably less chatty, and TCP/IP the most fr
ugal user of the network.
Based on EcoSCOPE's and Sniffer's assessments of how database-connected applications performed in a given network environment, we were able to establish baseline performance parameters. We could use the baseline figures to define service levels and perform capacity planning. Note, however, that a useful baseline takes from several days to a few weeks to collect. It's a time-consuming, but necessary, step.
Once armed with baseline data, we proceeded to cause typical network problems for the database clients and see how the analysis tools helped identify the causes of the problems.
Traffic Jam on the Bridge
In the first test, we moved each database server in turn to a different segment, forcing its traffic to pass through a bridge. To simulate what happens when a bridge becomes overloaded with high frame rates, we made the bridge a slow one by using an IBM PS/2 Model 70 running PC Bridge software. We also wrote a program that bombarded the bridge with packets destined
for a node on the server's segment. In this high-traffic environment, application clients often had to retransmit their requests to get them onto the server's network segment.
EcoSCOPE and Sniffer quickly alerted us to the traffic jam, but each presented its news in different ways. EcoSCOPE's response-time monitoring feature generated an alarm through an SNMP trap when response times fell below an acceptable level--a user-definable threshold value we were able to set. We had used response-time monitoring during the baselining process to produce a Microsoft Access report showing minimum, maximum and average response times for a group of client PCs communicating with a specific Oracle database server. Prompted by the alarm, we looked at the new data after the segment reconfiguration to see response times had quadrupled.
Sniffer reported the problem via two alarms. The first indicated high network utilization (40 percent) on the client segment that was programmatically crushing the bridge with traffic. Th
e second alarm identified the slow database server responses that resulted from the lost packets that had to be retransmitted. Sniffer also highlighted the retransmissions in its packet-by-packet display.
To zero in on the problem, we filtered traffic between specific client and database network nodes, examined the delta times for the packets and looked at Sniffer's summary display of traffic statistics. Additionally, this same procedure during baselining let us gauge the amount of bandwidth applications would require under varying conditions. Both analyzers reported the problem solved when we replaced the slow bridge with a switching hub.
Complex SQL
In the second test, we changed the database client's programming to use significantly more complex SQL than it really needed. By the time we'd finished obfuscating the SQL, we had managed to add a redundant join and retrieve data items we didn't intend to use. We wanted to see whether the analyzer tools would detect a problem and, if so, at which c
omponent they'd point a finger.
|