Q: The network in our building consists of several Ethernet segments connected
via a number of NetWare servers that are also routing. Many of the segments
are also connected directly to a router for WAN access. Our Ethernet probes
are reporting an unusually high amount of traffic on some segments. We are
also receiving reports from our end users that response time varies from
day to day, yet we can see from our trend analysis that traffic levels remain
fairly constant in any given daily period. We know that our server-based
routing has reached its limit in terms of scalability, but before we do
some
thing drastic--like totally rearchitecting our network--we need to get
a grip on our current situation. What do you suggest we do?
Scott
: Ah, what the heck, why not just throw a bunch of Ethernet
switches at the problem?
Bill
: Yeah, that's kind of like breaking out a box of Band-Aids.
Scott
: Nevertheless, the customer wisely chose to do an in-depth
protocol analysis study before rearchitecting the network based solely on
traffic statistics given to them by their RMON agents.
Bill
: A trip to our customer's site confirmed our worst nightmare--it
was an ad-hoc, grassroots network that grew out of the acquisition of PCs.
Scott
: Originally, an adapter was added to each server in the building
for each Ethernet segment, so that users could communicate directly to any
server from any segment.
Bill
: They eventually needed more than four Ethernet segments, which
was the maximum any one server could ha
ndle at the time.
Scott
: Thus, some servers actually had to route traffic on behalf
of a user communicating to a server not attached to their local segment.
Bill
: To make the situation even more interesting, a standalone router
was then added to provide access to off-site networks.
Scott
: Eventually, they hit the wall with their predominantly server/router-based
architecture, and the segments appeared to be taking on a lot of traffic.
Bill
: With no baselining or monitoring prior to adding or changing
components to the system, it was difficult to determine at what point, if
any, traffic on the segments began to increase dramatically.
Scott
: So, we began a detailed protocol analysis of the existing
system and one particular problem stood out from the others.
Bill
: We noticed this problem soon after collecting and analyzing
data from several segments and comparing the packet flow with the custo
mer's
documentation of the network.
Scott
: In one instance, a server named Corporate5 was connected to
the same physical Ethernet segment as the users who were logged in. We noticed
that while some users were communicating directly to Corporate5, others
were communicating indirectly to Corporate5 via the router (see the figure,
"Taking the Extra Route," below).
Bill
: So the 64-bit question ends up being, "Why were the users
going through the router?"
Scott
: To answer this question, it is necessary that we understand
how Novell uses the Routing Information Protocol (RIP) in conveying IPX
network information between routers and workstations.
Bill
: In the simplest terms, during the connection sequence between
a workstation and server, the workstation asks for the "fastest"
route to the "nearest" server.
Scot
t
: All routers respond to the "fastest" route request,
wh
ich is a RIP broadcast from a workstation asking all routers if they know
the fastest route to a network to which they wish to send packets.
Bill
: Since all NetWare 3.x and higher servers are always routers
(because of its use of the internal network IPX number), both servers and
routers respond to the request.
Scott
: In this case, both Corporate5 and the router responded to
the workstation's request for a route to Corporate5.
Bill
: Sometimes the router would respond before the server, causing
the workstation to pick the router for the duration of the connection to
that particular server.
Scott
: So, the workstation thinks the "fastest" route is
the first packet returned, even if subsequent packets have a lower hop count.
Bill
: Right. Since it picks the router, all of the transactions to
the server get routed through another segment, effectively doubling the
amount of traffic needed to communicate--to
send traffic to and receive
traffic from the server, while adding latency and overhead to the router.
Scott
: All in all, that's not a pretty sight. Rather than wait momentarily
for other replies and make a choice based on lowest hop count, the workstation
simply picks the first reply and goes with it.
Bill
: In this case, the first reply just happened to be who got the
reply out on the Ethernet first.
Scott
: Exactly. This is somewhat different from our textbook view
of the RIP protocol in which we based our routing decisions on the hop count.
Bill
: To go a little deeper, we might ask why the router responded
to the workstation's request when the router knew that Corporate5 was on
the local segment.
Scott
: That's a very good question. We saw that the router was sending
out its RIP advertising packets every 60 seconds, with the internal IPX
number of Corporate5. Nothing wrong there.
Bill
: Now l
et's look at the server. The internal IPX number of Corporate5
was 700001--the IPX network number the station asked for when it wanted
to attach to the server.
Scott
: The server's internal segment was really connected to the
two external physical Ethernet segments via the internal router, as shown
in the figure. Therefore, the internal IPX number (700001) was advertised
to both segments.
Bill
: Since the router was also attached to both segments, it advertised
the IPX network 700001 it had heard from the server on segments away from
where it was heard.
Scott
: We confirmed this by analyzing the router's RIP broadcasts,
and knew that the router was now capable of responding to a "find fastest
route RIP" request from a workstation
Bill
: Which leads us back to our original problem of the workstation
picking the first reply, which may be the router before the server.
Scott
: We should also note that workstat
ions attached to physical
segment 71 will also have the same problem.
Bill
: So, ultimately, the easiest solution was to have the router
delay the response to a RIP request. If your router can't do that, then
RIP filters may be in order.
Bill and Scott are principals of Pine Mountain Group. They can be reached
at otw@pmg.com. Portions of the actual trace files from selected columns
are available via Pine Mountain Group's Home Page (
http://www.pmg.com
).
REPORTS
Analyize In-Line NAC strategies and products.
ANALYTICS Plan and design your enterprise blade server deployments
InformationWeek U.S. IT Salary Survey 2008
Salaries for business technology professionals are falling. Here's what you need to know in order to make good hiring decisions and personal career choices. Download Today