home
NEWS       BLOGS       FORUMS       NEWSLETTERS       RESEARCH       EVENTS       DIGITAL LIBRARY       CAREERS  
Network Computing Network Computing Powered by InformationWeek Business Technology Network

IMMERSE YOURSELF:

SOA

  |

Data Center

  |

802.11n

  |

Data Privacy

  |
APO  |

Virtualization

  |

NAC

  |

Security

  |

Network Mgmt

  |

Enterprise Apps

  |

Storage & Servers



WORKSHOPS: OPERATING SYSTEMS

Consolidating Your File Servers: A Look Back

by Shane R. Yamkowy

Reliable. Manageable. Economical. Our goals at AMOCO Canada aren't any different from yours. As our budgets dwindled, we turned to one of the proven ways to realize our networking goals: centralize hardware resources and enforce standards.

The majority of our applications were already on NetWare file servers, so upgrading and consolidating was a logical first step. We took this step 18 months ago (see "Consolidate Your NetWare File Servers," July 1, 1994). Did it work? What did we learn? Have we created a single point of failure?

That Was Then Before consolidation, we had eight file servers. Data and resource sharing across groups was virtually nonexistent. We consolidated those eight servers running NetWare 3.11 onto one superserver running NetWare 4.01.

We knew going into the consolidation that upgrading our network backbone to provide enough bandwidth for more than 300 users on one server was required. So we installed two Kalpana EPS-1500 EtherSwitches. Although we initially wired between 48 and 72 connections per 10BASE-T EtherSwitch port, we found that spreading out the users improved performance. As a consequence, we dropped the connections to between 24 and 48 per port, targeting 24 connections per port. While many switch vendors suggest 12 connections per switch port as "optimal," our utilization is showing 24 connections per port works fine and even 36 can be acceptable, depending on usage patterns.

Did It Work? Yes. With one superserver, we can concentrate our efforts on bulletproofing one configuration. Some of the data we have captured since consolidating indicates that in the prime time hours of 6 a.m. to 6 p.m., up time was 99.6 percent for the first year, continuing well into six months of the second year. That is an increase of approximately 10 percent to 15 percent over our pre-consolidation record. In addition, we have not had a hardware failure since the superserver installation, approximately 18 months ago, or a server abend in the last 14 months. I believe this is because we can focus on fine-tuning one set of hardware, primarily Compaq equipment. The only other vendor in our server is Seagate, the source of our disk drives.

For spare parts, we pooled our resources with other AMOCO Canada departments, also running Compaq ProLiants. We all shared in the purchase of a spare parts/test server.

New Beginnings With the much-improved graphical interface in the administration tools in NetWare 4.01, particularly NWADMIN, we found administrating standards much easier to manage. Since we were making such big changes on our whole network, we also took the opportunity to tighten our PC, application, drive mapping and printing standards.

We average 335 connections per day on our server and believe that could be increased to more than 500 connections with hardware improvements, such as adding memory and cached SCSI controllers to increase hard drive performance. For increasing overall network throughput, multiple 10BASE-T cards or a couple of 100BASE-TX cards with a switch behind them certainly helps. Standard EISA-based servers can handle more than 200 connections, but adding SCSI controllers and high-speed NICs can put an unnecessary strain on your CPU and EISA bus resources. Smart Ethernet cards and SCSI adapters can help relieve CPU utilization, and PCI-based adapters allow you to add more NICs and SCSI adapters in your server because of their higher throughput and intelligence.

The high-speed EISA bus in our superserver has more capacity than we currently use. The last time we checked EISA bus utilization with Compaq's Insight Manager, we were only at 15 percent to 20 percent utilization at peak times (see figure "Current File Server Configuration").

Lessons Learned We had some abends in the first six weeks, but since they were resolved, there has not been a prime time abend. The abends were the result of NetWare operating system bugs we could not have found easily by testing, since they were load related. (The bugs have all since been corrected by patches.)

After moving our users onto our superserver, we began to install server-based management and data collection tools. Still, the burden of more than 300 users and 28 GB of files, combined with an operating system bug, drove the CPU utilization so high, it was taking some users 30 minutes to login. Patches have since been released by Novell that correct the bug, but were not available at that time.

We promptly removed all server-based tools and considered alternative strategies. We could add a small NLM-server, use PC-based, rather than network-based, tools, or use some of our network-based tools on smaller, more controllable, pieces of our network instead of executing them on the entire server at the same time.

Virus protection was an obvious tool to off-load to the PCs. We altered our system login script to check for the execution of antivirus software on the PC, before allowing the user to log into the server. In addition, we now perform software distribution and inventory in four sweeps on separate days: users A-J, K-N, O-S and T-Z. That left BOOTP and TCP/IP Unix-to-NetWare printing services. For those services, we built a dedicated NLM server.

Today, we are only running the base NetWare 4.01 operating system with patches on our superserver and most NLMs run on an NLM server. After off-loading the NLMs from the main server, the CPU utilization has been well within acceptable limits, and the server has been steady while providing consistent and accep table response time to users. Our average throughput from PC to file server is 400 to 450 Kbps (without using VLMs and Novell's Packet Burst technology).

We are not using VLMs, because when we implemented our NetWare 4.01 server, some of our existing critical applications would not run when combined with VLMs and the new Windows network drivers. There were some conflicts and our routers did not support the use of 802.2. frame types, so we could not take full advantage of the new VLM-features. Since then, we have not had time to test our core set of critical apps and we understand the 32-bit client software for Windows will be released soon, so we thought we would wait, while we upgrade our Cisco routers' microcode.

After the first few abends, we decided a dedicated test server and small test lab was required. We also purchased a small hub and wired a router port into our test lab, allowing us to connect or disconnect easily to the WAN. Now, we test all new *.LAN and *.DSK drivers, operating system patches and upgrades, NLMs, hardware and microcode upgrades before introducing them to our production system.

We especially watch for NetWare 3.x NLMs certified for 4.x. A few of these NLMs have abended our test server and we suspect them in one of our superserver abends. But as NetWare 4.x becomes more widely used, we are noticing that most NLMs are being rewritten for 4.x specifically.

Focus on Backups To manage our backups and restores better, we try to separate our NetWare volumes so that critical applications are on a volume of their own, in their own subdirectory or cataloged as critical. Then we put group data and users' directories in separate volumes or directories. This method not only cuts the time for full daily backups, but in case of a server failure, we can match up the critical applications with its data easily and restore those first.

Our incremental backups run fine, but on weekends we do two full backups, one for on-site storage and one for off-site to satisfy disaster re covery requirements. The full backups take approximately 60 GB and it caused our two 4-mm DAT DDS2 tape changers to fail, because we could not interrupt the backup jobs every eight hours to clean the drives. We decided to replace them with faster, more resilient tape drives: two Digital Linear Tape (DLT) changers. We did increase the network link between our superserver and the server running our backup software that has the two DLTs attached to it. Between the two servers is a 100BASE-TX connection. With this configuration, we are running both DLT's simultaneously and achieving a throughput of 90 MB to 110 MB per minute. We hope to eventually move to a backup software package that will allow us to attach the DLTs directly to our superserver, managing the backups from our NLM server.

Managing the Single Point of Failure Centralizing resources conceptually creates a single point of failure. However, you can add redundancy solutions, reducing the single point of failure to a "minor inconvenience."

As our hard drive usage zoomed up to 28 GB, the long backup times and tape drive failures triggered a request from our management for a maximum five-hour recovery from a server failure. Although we are using RAID 5, and we can cannibalize spare parts from our test server, we would need to be able to perform a complete restore from tape if we experienced a facilities disaster.

Under perfect conditions, with our two DLT changers, it would take approximately five hours to restore our superserver. Our testing has shown we can restore quite easily, but it can take up to two hours of administration "fine-tuning" the server to restore its original state. Finding someone on call to set up, begin and monitor the restoration also adds to the total restoration time.

This puts us around six to seven hours to do a complete recovery. Another problem is that all the data from the last backup before the failure is lost. If you have time-sensitive data like daily transactions, this can make a lot of people unha ppy.

So, we decided to supplement our tape backup disaster recovery measures by investigating various disk and server mirroring solutions. This solves the problem of losing any data, removes 95 percent to 100 percent of the manual labor required to restore a server and, since most mirrored server solutions cut in automatically, you don't have to track down any support people.

We are currently considering four server/data fault tolerance alternatives: a shared hard drive farm, data synchronization via Horizon Technologies LAN Shadow, Vinca's StandBy Server32 and Novell's SFT III.

Solutions using the mainframe strategy of a shared bank of hard

drives are just coming up to speed as the manufacturers are waiting to fine-tune their implementations of SCSI 3 and Fibre Channel. These are still quite expensive alternatives and are slow to come to market. You should check with your vendor to find out about product availability.

Alternative Solutions to Common Problems: Printing As mentioned earlier, we installed an NLM-only server for the BOOTP and TCP/IP (Unix)-to-NetWare printing services we removed from the superserver after the first abends. We set up Novell's LAN Workgroup for BOOTP services and their Flex/IP product to provide printing from our MVS and Unix systems to NetWare print queues. Both products were installed on a 486DX desktop PC running NetWare 3.12.

We configured all of our HP JetDirect cards in our printers with two print queues from each of the two servers to which they were attached. For TCP/IP printing, we use one set of naming standards that is consistent with our previous mainframe naming standards. For NetWare printing, we use a combination of room number and printer type.

Flex/IP is configured so that users do not need to log into the server to access the Flex/IP (or BOOTP) services. For TCP/IP printing, an LPD print request is issued and the NLM-server gets the print job. For NetWare printing, the normal NPRINT or CAPTURE commands route the print job to the N etWare print queue. The naming conventions and locations of the print queues are stored in a spreadsheet on the superserver so everyone can find them easily.

Abends When you have a server abend, one of the best ways to find what caused it is to get a memory dump and call in experts, like Novell's Technical Support, to examine it. However, with 160 MB of RAM, dumping to floppies is not an option. So we installed a 300-MB SCSI boot drive. This provides a place to dump the memory, keep versions and copies of *.LAN and *.DSK drivers, and hold diagnostics such as VREPAIR.

This also solves another concern apparent with using RAID 5. The RAID controller will not RAID a DOS partition, so you need somewhere else to start the server from, and booting a NetWare 4.X server from a floppy is pretty tough.

Server Change Procedures From doing numerous server changes for maintenance and troubleshooting purposes, we began to develop a set of standard change management procedures.

Each time we take down our server, we try to do the following: Before the change, we try to do a full server backup, then we run the EISA configuration utility, Compaq INSPECT and Compaq Diagnostics, and save the results to floppy. Then a copy of the AUTOEXEC and STARTUP.NCF files are saved to floppy with a list of the NLMs running on the server (gathered using the MODULES command at the server console). Finally, we run VREPAIR on all the volumes, giving us a snapshot of the server's configuration and health. Only after this do we make our changes. Based on the change, we repeat some of the above steps.

CD-ROMs Users started coming to us asking for a CD ROM server. The number of concurrent CD-ROM drives required by some of the requested applications numbered around 14! The economics quickly told us the easiest support and hardware cost solution was simply to copy the CDs to the hard drive on the server. Although prices have declined since we made our decision last year, we believe the admin istration, security and throughput performance have proven this to be the best solution for us.

So we added another RAID Level 5 volume of 12 GB, with a dedicated SCSI RAID 5 controller and started copying the CDs to the server. Some CDs forced us to recreate the install procedure or modify the install and configuration files, but we have not found a CD-ROM-based application that has failed to work yet, without some assistance from the vendors. With hard drive prices at approximately 40 cents per MB, it was certainly a cost-effective solution.

This solution also gave us the familiar NetWare security system for audit and security control. We can also use our existing software metering tools on the CD-ROM applications to comply with the CD license restrictions without any major changes. Best of all, the response time from the hard drive is faster than most CD-ROM server solutions. At worst, the users do not notice a difference between what used to be CD-ROM-based applications and other applications on the server.

What Next? In the near future, we will be adding: remote PC control facilities, software distribution, software metering and file server monitoring tools to our network. Again, we will review whether to put them on the superserver, run them from the PCs or put them on our NLM server.

We still have some changes pending to make our server 99.9 percent bullet proof. We will be upgrading to NetWare 4.1 to take advantage of all the patches integrated into the operating system and use the latest NetWare Directory Services NLM (DS.NLM). The next release of DS.NLM is supposed to solve the problem of unexpectedly high CPU utilization.

Most important, if we are to increase the number of users on our superserver, we will need to increase the throughput from our server to our clients. Preferably, switched 100BASE-TX full-duplex connections attached to more 10BASE-T switching technology will eventually be our final answer.

All I Want for Christmas There are two things I would like to request from server hardware vendors and Novell to make network managers' lives easier. From the hardware vendors, I would like a version control and change management database facility that tracks microcode and hardware driver changes. Compaq's Insight Manager does track microcode updates and that is a very progressive start. Well done! From Novell, I would like to see a certification and version control facility for NLMs, as well as a database recording and storing changes of NLMs.

Shane R. Yamkowy, Bcomm, ECNE, is Finance Network Systems Engineer at AMOCO Canada. The author's opinions and views do not necessarily represent the views of AMOCO Canada. He can be reached at 75210.1563@compuserve.com.


Tips N Tricks For Troubleshooting

Baseline your system. Begin with the bare operating system and patches. Next add your users, applications and data. When you are satisfied that your base system works, begin adding the NLMs.

Proactively monitor technical support sources for updates and patches. Monitor NETWIRE on CompuServe. Look for the Top 10 Support Issues and Patches. The support forums on NETWIRE are also an excellent low-cost technical support source.

Track server statistics. Watch your server carefully and familiarize yourself with statistics during both peak and off-peak times.

Buy most of your hardware from one vendor. This will greatly reduce the chance of incompatibilities and make technical support easier. You also usually get the benefit of proprietary management tools to monitor closely the performance of the hardware.

Update your hardware drivers. A low-risk, proactive strategy is to flash the microcode and use your vendor's latest hardware drivers as soon as they are released. Instead of waiting for a problem to occur, just test and implement them with regularly scheduled maintenance.

October 15, 1995







Ready to take that job and shove it?

Function:

Keyword(s):

State:
SPONSOR
RECENT JOB POSTINGS
CAREER NEWS
Go beyond Google and get vertical. These specialized search sites will help you find the business information you need -- fast.

Ari Balogh was named to the post of chief technology officer as the companys for a "realignment" of employees.










InformationWeek U.S. IT Salary Survey 2008
Salaries for business technology professionals are falling. Here's what you need to know in order to make good hiring decisions and personal career choices. Download Today
 
ROLLING RIGHT ALONG
Follow key Network Computing Reviews from conception to completion. This Week: Holistic APM.



Network Computing Reports Emerging Enterprise Podcast Series: Secrets to Success








TechSearch


Microsite of the Week


Powerful Information at Your Fingertips



InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo JitterPlug Into The Cloud
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet EvolutionPyramid Research
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space


App Infrastructure   |   Messaging & Collaboration   |   Network & Systems Mgmt   |   Network Infrastructure   |   Security  |   Storage & Servers   |   Wireless   |   Enterprise Apps
About Us  |  Contact Us  |  Site Map  |  Technology Marketing Solutions  |  Advertising Contacts  |   Briefing Centers
Copyright © 2008  United Business Media LLC  |  Privacy Statement  |  Terms of Service  |  Your California Privacy Rights