The ability to drill down to the fine details of a Web infrastructure can make all the difference between shelling out a service credit for goodwill's sake and pinpointing the problem. AT&T ENS has spent about $1 million on its monitoring architecture, with Empirix's OneSight tool at the core. The system helps AT&T ENS find the root of all kinds of hosting problems. Technicians use OneSight to check network devices, databases, URLs and Web servers.
"That extra level of service lets us expand what we do for our customers. We needed a bigger picture of every download and of all the pages in a site," Schulze says by way of example.
Aside from the routine duties of tracking HTTP/ HTTPS response times, database transaction performance and memory usage in a Web server, the tool also lets AT&T ENS isolate less obvious problem areas, such as Web site design flaws--a missing image or a misplaced icon. Even a poorly integrated banner ad, Schulze says, can be a silent performance killer: When it takes, say, 15 seconds for a banner ad to download because it's coming from a third-party site, it's the AT&T ENS customer who suffers. The ENS client services team can generate a detailed report for a problem like this so the customer can fix the problem on its own or take it to the customer's developer. "I can see how long it takes to connect to or download from a site, and whether the Web developer missed something," Schulze says.
Schulze's team is now integrating some of its niche monitoring and management tools with OneSight for a more unified view of its customers' Web hosting systems as well as its own network devices. ENS just finished integrating some environmental-monitoring tools with OneSight that keep tabs on the air conditioning and power in its data centers. That way, technicians can monitor data center conditions using the same tool.
The AT&T ENS client services team also is working to generate more detailed information about its data center routers using OneSight. It already uses OneSight to track the throughput, port status and CPU conditions on the devices and soon will integrate its now separate, homegrown router monitor that tracks the BGP (Border Gateway Protocol) status of the routers. "We want to correlate our standard port, CPU and BGP status under one view for our technicians," Schulze says. "If BGP goes down, then we can troubleshoot it more quickly with less downtime."
Some of the technical details AT&T ENS gleans from the monitoring tools will be made available to customers through interfaces AT&T is writing between OneSight and AT&T's internally developed Integrated Global Enterprise Management System (iGEMS). AT&T offers a portal to iGEMS customers who want to keep tabs on their managed networks (see "The Hard Sell," page 65). This way, Web hosting customers can view detailed performance characteristics of servers and network devices, such as CPU utilization.
Schulze says ENS hasn't decided whether customers will see all the OneSight information AT&T engineers can see.
Growing Pains
Today there are more than 1,200 OneSight dedicated monitors spread around AT&T ENS' 16 data centers, and AT&T says it plans to add 2,500 more during the next year. The Web monitoring system runs on redundant Microsoft Windows-based Compaq DL280 application and database servers. Schulze says that's plenty of capacity for now.
AT&T found out the hard way what happens when you don't have enough capacity: When it first rolled out OneSight over a year ago, the tool would slow to a crawl and crash. "We hit the wall on the hardware," Schulze says. "We were just looking at URLs at first, so we weren't planning to do what we are doing now."
Luckily, no data was lost, and Schulze and his team switched the setup from a single processor to a dual processor with 2 GB of RAM, and a separate database server with dual processors and 1 GB of RAM.
Meanwhile, the monitors have reduced the labor demands on AT&T ENS: It takes only about 30 seconds to set up a new URL, a job that used to take nearly an hour for a tier-one technician using AT&T's previous URL monitoring tool, Freshwater Systems' SiteScope. With older tools, AT&T technicians could only react to problems, rather than predict or prevent them. It was kind of like hunting at night with your sunglasses on, Schulze says. "We had to sit and wait for something to break."
Still to come for AT&T ENS is a data-replication feature from Empirix that will make all configuration changes in real time between the company's primary and secondary monitoring servers so there's not a lull when one catches up with the other.
The trick with the rollout of the OneSight-iGEMS portal will be to provide enough but not too much information so that the data doesn't raise more questions for customers than it answers. "We'll be fighting against giving too much data to our customers," Schulze says. "Some may not have enough experience to sort through it, which could drive more questions and calls into our operations center."
Post a comment or question on this story.
Tell us about you Network and we may profile it in a future issue. Send e-mail to centerfold@nwc.com or call (516) 562-5914.
|
The Hard Sell: Home-Field Disadvantage
|
When the AT&T ENS group first installed the Empirix OneSight site-monitoring tool, it was as a temporary fix for checking URLs. But the group quickly became hooked on the tool as an SLA manager for for its hosting customers.
Then came the hard part: winning over colleagues at AT&T who had developed AT&T's own management software, Integrated Global Enterprise Management System (iGEMS). AT&T sunk more than $200 million into developing iGEMS, which it uses to manage its customers' voice and data networks.
But Schulze's group had an insider's view of what AT&T's hosting customers experience day to day, and that's what eventually sold AT&T's product and development people on OneSight "We are on the front lines, with customers saying 'fix my problem,' " Schulze says. "Our job is to serve them better, and this tool helps us do our job."
Now AT&T is working on integrating OneSight into the iGEMS-based AT&T Managed Services Portal, where AT&T customers can look at their network performance and behavior online and order new services. With the portal, AT&T Web hosting customers can view their application and transaction performance, for instance, as well as use online tools for billing and contacting AT&T. Next they'll get ENS' technical data with OneSight, Schulze says. "The way OneSight presents data, customers can immediately look at how their URL is responding and have a direct comparison on how their servers respond," Schulze says.
Schulze says the two tools are complementary, so it only makes sense to integrate them. "AT&T has developed some fantastic tools that we rely on heavily. We're able to augment them with OneSight," he says.
Popularity does have its price, though. When OneSight was first installed at AT&T ENS as a URL monitor only, Schulze was the sole administrator of the tool. Now that the tool is so widespread in the organization, Schulze says he can't just make modifications the way he used to without first getting approvals. "The product is so big with us, I have to call several people before I make major changes," he says.
Douglas K. Schulze: AT&T Enhanced network services Client Services Engineering Manager
Doug Schulze, 32, and his client services group are responsible for ensuring Web hosting customers' servers and Web infrastructures operate at peak performance. Their duties include monitoring and supporting network devices, server hardware and operating systems, and AT&T managed applications. Schulze, who has worked at AT&T for four years and in the IT industry for six years, led the upgrade of AT&T ENS' monitoring infrastructure.
Education: Associate's degree in space systems operations/dynamics, Community College of the Air Force
Schulze's Record for Configuring a Monitor: 3 minutes 16 seconds. With the older tools, it took 30 minutes.
Most Ill-Timed Alarm: Fridays at 3 p.m. That's when there is always a critical alert that needs my immediate attention. When I don't have to be anywhere on a Friday evening, the system is quiet. But the moment I need to be out the door by 4 p.m. to start my weekend, my pager goes off. I must not be sacrificing enough to the Internet gods.
What I've Learned: You can never monitor enough points. With each succession of features and additions to our monitoring elements, there's always another need where you didn't expect one.
Next Time I'll: Plan ahead more for our rollout schedules. When we realized that OneSight could be applied to so many aspects of our business, I would not have pushed it out so fast. Next time I would think through the sheer number of monitoring points that we planned on, and leverage the hardware and application against that.
Best Advice I've Ever Received: Never regret what you did yesterday, or how can you trust yourself today or tomorrow?
Biggest Bet I've Ever Made: I bet a friend that the San Diego Chargers would be the first team ever to play the Super Bowl in their home stadium. Now I'll be washing his car until my next birthday, which is several months away.
For Fun: The desert, river, mountains, the beach and Baja Mexico. There are plenty of opportunities to get away from the Internet, and my son and I take advantage of them.