
Data Comes To An Abrupt FINish
Bill:
Analysis of the failure clearly showed that the gateway was sending a TCP FIN (setting the "no more data from sender" bit) to disconnect the user's TCP session, even though the user never requested a disconnect from the host.
Scott:
Upon receiving the TCP FIN, the application would also send a TCP FIN back to the host, completing the close of both sides of the TCP connection. The application would then immediately close itself under Windows, since it was set up to do this automatically.
Bill:
Users would see a flash as the application disappeared, which is why they were
characterizing the problem as being kicked out of their host session.
Scott:
During one of the subsequent failures, more than one user reported being disconnected. This was evident by the multiple TCP FINs sent from the router to many, but not all, of the terminal emulation user
s.
Bill:
Having captured the failed session with both the Ethernet and FDDI analyzers, we were able to examine the TCP/IP and LAT traffic side-by-side, enabling us to pinpoint the problem.
Scott:
The Alpha was sending a LAT reject packet to the gateway, citing "illegal message or slot format received" inside the packet. The router/gateway would subsequently turn around and disconnect more than one user's TCP session, since multiple slots--user sessions, for example--are carried over one LAT virtual terminal circuit.
Bill:
Analysis of the LAT traffic showed mult
iple virtual circuits in use; thus, only users associated with the virtual circuit of the "illegal" message format were dropped.
Scott:
It then dawned on our customer that reports of dropped sessions were often reported by more than one user around the same time.
Bill:
Just prior to the disconnect LAT packet from the Alpha, the gateway had sent a LAT packet containing a reject slot. The Alpha was rejecting the message format or slot of this packet.
Scott:
A couple of possible solutions to this problem include sending a trace to the router vendor and getting the company to figure out the "silver bullet" (perhaps with Digital's help) and issue a fix, or bypass the gateway altogether by using VT 320 emulation software over telnet directly to the Alpha.
Bill:
Luckily, our customer was in the process of setting up users to use telnet directl
y to the Alpha via a new application that would provide the same functionality as the legacy LAT application.
Bill:
The upgrade project was delayed, however, as problems with the new application were being reported.
Scott:
Since the problem we diagnosed was solely related to TCP/IP to LAT translatio
n, a prescription was written to upgrade the majority of the users to the new application with a follow-up to watch their connections very carefully.
Bill:
The upgrade was performed overnight...
Scott:
...and not a single disconnect was observed or reported the following day.
Bill:
Patient cured.
Scott:
Meanwhile, about that silver bullet. Oh well, maybe later.
Bill and Scott can be reached at otw@pmg.com. Portions of trace files from selected c
olumns are available via Pine Mountain Group's Home Page (www.pmg.com).
|