One of our Token-Rings is giving us a great deal of grief.
We're experiencing a much higher level of burst and line errors than we
see in our other rings, and even occasional beaconing. The moment we think
we've solved the problem, it pops up again somewhere else on the ring. It
reminds us of that arcade game called "Whack-a-Mole," where you
wham one mole only to have another pop up in a different hole.
The problem in our ring seems to be isolated to a group of network interface
cards (NICs) that persistently report problems. Having our outsourcer replace
the cards hasn't helped at all
.
Bill:
Come to think of it, troubleshooting a Token-Ring is a lot
like playing "Whack-a-Mole."
Scott:
Gosh, to think that there's a mole lurking in every Token-Ring.
Bill:
Actually, Token-Ring moles are quite useful-they represent
the stations that report ring problems and include the address of their
upstream neighbor in the report.
Scott:
These soft error reports help to identify physical and access
problems in a ring. The reports are part of the two dozen or so Media Access
Control (MAC) frames that help to keep the Token-Ring operational.
Bill:
In other words, the MAC frames are responsible for the housekeeping
of a Token-Ring.
Scott:
Precisely. Other important MAC frames include the Active Monitor
Present (AMP) frame that is followed by Standby Monitor Present (SMP) frames-a
process that occurs every seven seconds on a normally functioning ring.
Bill:
Thes
e frames are part of the ring poll process that allows
every station to learn who their upstream neighbors are. The ring poll also
shows the insertion order of all the stations in the ring, allowing for
the drawing of those fancy "ring maps" on your management console.
Scott:
In the event that an Active Monitor has to purge the ring
and start a new Active Monitor present/ring poll sequence, the seven-second
timing is disrupted.
Bill:
This knowledge of ring poll timing was helpful in performing
a white-glove test on the Token-Ring by capturing MAC frames on our analyzer
and checking to make sure that the polling process occurs every seven seconds.
Scott:
If the process doesn't occur every seven seconds, we know
that the Active Monitor has purged the ring and has restarted the ring poll
process (or, for that matter, there could be a new Active Monitor altogether).
Two seconds later, a MAC frame comes from the Active Monitor, containing
more d
etail about the error that caused the Active Monitor to issue a ring
purge.
Bill:
For example, a ring purge will occur if the Active Monitor
doesn't see the token passing around the ring (due to a lost token or a
lost frame) or if a sender has failed to strip a frame (the frame circulates
the ring more than once).
Scott:
Token passing usually stops due to errors on the ring. The
most common errors reported in Token-Ring are burst errors, a brief absence
of signaling usually caused by a station inserting into the ring or de-inserting
from the ring.
Bill:
This brief signal loss is due to a mechanical relay in the
hub that either brings a station's wiring into the ring path or removes
a station's wiring from the ring path.
Scott:
Other common errors that are detected in a ring station include
line errors (a frame does not pass the CRC check as the frame is repeated
by that station), or a longer-term signal loss (this leads to a beaco
ning
process).
Bill:
Two seconds after the error is detected, the station will send
out a soft error report MAC frame which includes the type of error and the
upstream DLC address.
Scott:
The theory behind these soft error reports is that you will
focus your attention on components in the "fault domain" between
the station that reports the error and its upstream neighbor.
Bill:
If we see constant interruptions in the seven-second ring poll
process, we know that there are intermittent problems that need to be examined
more closely.
Scott:
Meanwhile, on our customer's network, we were seeing two nodes
that were indicating persistent line and burst errors, but only one node
at a time.
Bill:
Several approaches could help lead us to the source of the
problem. We could...
Scott:
Use an analyzer to track the report fault domains in the error
reports or
Bill:
Take a lo
ok at the manageable hub error statistics or...
Scott:
Analyze ring order for upstream sources of problems or
Bill:
Compare port locations to the errors reported or...
Scott:
Compare the last active station in the hub to the first station
in the next hub or...
Bill:
Punt-buy everything new and start over.
Scott:
In this particular case, the ring with the problems consisted
of several concentrators spanning several floors in a high-rise building.
Bill:
Ah, it's the ol' "spanning ring in the skyscraper"
problem. Now we knew what we were up against.
Scott:
Locating the ports in which the errors were habitual revealed
that they were the first active nodes on a certain concentrator.
Bill
:
We suspected some component between the Ring In port on that
station's concentrator and the Ring Out port of the upstream concentrator.
The ports were connect
ed via fiber. By removing the fiber at one end, we
activated the backup path of the Token-Ring and...
Scott:
All errors disappeared immediately and, subsequently, all
users were able to operate normally.
Bill:
Watching errors on the concentrators from an SNMP-based console
on a regular basis will help determine if more granular analysis with a
portable analyzer is required.
Scott:
Of course the cause of the defective fiber link should be
determined and fixed as soon as possible, since the backup path should only
be used until a solution is found.
Bill:
Otherwise any other serious problems occurring during backup
operation will result in an inoperable ring.
Scott:
Do you think we should recommend that our Token-Ring readers
install "Whack-a-Mole" to better their troubleshooting skills?
Bill:
It would certainly help network troubleshooters vent their
frustrations, but I'd do a ring
order inventory as shown by the ring poll
process, and compare it to the physical hub layout. This way, the visualization
causes the problem to pop up and "get whacked." n
Bill and Scott are principals of Pine Mountain Group. They can be reached
at otw@pmg.com. Portions of the actual trace files from selected columns
are available via Pine Mountain Group's Home Page (
http://www.pmg.com
).
REPORTS
Analyize In-Line NAC strategies and products.
ANALYTICS Plan and design your enterprise blade server deployments
InformationWeek U.S. IT Salary Survey 2008
Salaries for business technology professionals are falling. Here's what you need to know in order to make good hiring decisions and personal career choices. Download Today