|
ON THE WIRE
The Case Of The Mysterious File Delay
by Bill Alderson and J. Scott Haugdahl
Q: Our corporate network consists of several regional campus
networks interconnected via leased lines nationwide. The campus networks consist of multiple Ethernet and Token-Ring local area networks interconnected with bridges and routers. A second tier of routers is used to connect our campus networks to the wide area network. While most of our day-to-day traffic is constrained to a campus area, we need to access remote sites for e-mail transfer, remote host access and remote file transfer.
The remote file transfer is giving us consternation. Whenever a user copies a file from a remote file server, it always takes a very long time considering that the size of the files is typically between 10 KB and 60 KB. For example, across a 256-Kbps WAN link, it usually takes between 30 and 45 seconds to transfer these small files.
We can attach to servers, do directory listings, perform drive mappings and so on, with good response time. Our remote file transfers are automated via a user-run batch file, which attaches to the remote server, transfers the file, then logs out. We investigated further and discovered another interesting quirk: If we transfer the same file a second time before we log out, the second transfer only takes a few seconds! To add insult to injury, our Ethernet users do not experience the same mysterious delay when running the same batch file. The more we investigate, the more confused we become. Help!
Scott:
Hey Bill, I've got the s
olution already!
Bill:
What's that?
Scott:
Since Ethernet works, just rip out the thousand or so Token-Ring nodes and replace 'em with Ethernet!
Bill:
Yeah, right! Come to think of it, I never saw a networking problem money couldn't solve!
Scott:
Seriously, we grabbed our analysis bag of tricks, hopped a plane and paid the customer a visit.
Bill:
We picked a site that was having this delay problem and started by examining the network documentation.
Scott:
Nothing looked unusual, so we attached our analyzer to the local ring, set a filter on traffic to and from the workstation, and captured packets while the user ran the batch file. Sure enough, the procedure took more than a half a minute to complete.
Bill:
Our analysis of the captured traffic indicated a normal NetWare connection sequence, with the workstation connecting to a NetWare 3.12 server and successfully negotiating a packet size of 4172 bytes.
Scott:
So far, so good. The login script also executed just fine.
Bill:
Now for the interesting part. We'd reached the point where the batch file at the workstation was about to invoke a DOS file copy from the remote server.
Scott:
The file server returned an "OK" response to the file open request, and then the workstation asked to read the first 4110 bytes of the file.
Bill:
But it received no reply from the server.
Scott:
At 2.75 seconds, we had a transport retransmission as the NetWare Core Protocol (NCP) asked for the same 4110 bytes a second time.
Bill:
Then, there was another 2.75 second wait and a third attempt.
By this time, NCP gave up and thought the route to the remote
network had been lost or changed, and issued an IPX RIP request to find the network associated with the remote server.
Scott:
A local router r
esponded immediately and NCP tried the read again, but this time with a request for 4096 bytes.
Bill:
This failed, and we then saw successive attempts at 3584, 2560, 2048, 1536 and 1024 bytes, at which the remote server finally responded (see figure).
Scott:
The retry packet sizes may seem a bit strange at first, but if you start with 4096, you'll see that the NCP at the workstation was dropping the size by 512 bytes for each attempt.
Bill:
Not to mention that it waited 2.75 seconds between each attempt.
Scott:
Thus, the total delay from the first attempt to the successful one was 24 seconds. From that point on, the workstation was smart enough to use the 1-KB maximum packet size for all subsequent remote file operations.
Bill:
Based on the evidence collected with the failed packet sizes, it looked as if the maximum transfer unit (MTU) was set between 1 KB and 1536 bytes at one of the router ports in our path between the workstation and remote server.
Scott:
In this column recently, we saw how a router and workstation handled an MTU that was too large in an IP network.
Bill:
But this time we were dealing with IPX, which had no fragmentation option, so we has to come up with an alternate solution.
Scott:
Increasing the MTU at the tier-two WAN router port was ruled out, since we really didn't want to send 4-KB packets over a 256-Kbps WAN link.
Bill:
Another possibility was to decrease the maximum packet size when the workstation loads the NetWare driver. This solution was undesirable however, since it would penalize the Token-Ring user's ability to continue to use the 4-KB frames locally.
Scott:
We recognized the packet retry behavior as one utilized by a NETX shell, so we upgraded the workstation to VLMs and reanalyzed.
Bill:
This time, the NCP at the workstation used an entirely different a
lgorithm to determine the internet packet size.
Scott:
Immediately following the packet negotiation of 4174 bytes with the remote server, NCP extended the create connection command by sending a create connection command that included Large Internet Packet (LIP) echo data and padding to fill out to 4174 bytes of IPX data.
Bill:
Having failed this, the echo test then tried the old IPX minimum internet MTU of 576 bytes.
Scott:
This worked, as evidenced by the server echoing the command back to the workstation.
Bill:
The workstation then went through a succession of LIP echo tests and settled on an IPX MTU of 1478 bytes, a process that took fewer than eight seconds.
Scott:
This made the workstation's Token-Ring frame size a total of 1515 bytes, including the CRC.
Bill:
Since this MTU worked, and our Ethernet counterparts were using the Ethernet MTU of 1518 bytes with no problem, we concluded that somewhere on the WAN side a router port is likely set to the same maximum MTU as Ethernet.
Scott:
This allowed the small request packet, which read 4 KBs, to get to the server just fine, but the server's 4 KBs of return data were dropped.
Bill:
So we needed the workstation to adjust the MTU, and the VLM solution not only determined the MTU more quickly than NETX, but also gave us a more optimal size.
Scott:
Quite a reasonable solution and another mystery solved.
Bill and Scott are principals of Pine Mountain Group. They can be reached at otw@pmg.com. Portions of the actual trace files from selected columns are available via Pine Mountain Group's Home Page (http://www.pmg.com).
October 1, 1995
|