Pacific Connection(英語)

Lawyers and Missing Laptops are Driving Data Backup to Disk

Dressed in a white lab suit and staring forth from a Web streaming video, John Cleese introduces himself as Dr. Harold Twain Weck, director of the Institute for Backup Trauma. "The trauma of data backup can be a lifelong debilitating condition," says Dr. Weck. "Its victims come from all walks of life but each of them have one thing in common. They all relied on tape-based backup. Disk backup is better." Dr. Twain Weck ("train wreck" as it might be pronounced by Tweety Bird or Elmer Fudd) then gives a tour of the "institute," whose recovery patients include a man who is completely wrapped in tape.

This clever piece of "viral" marketing conveys the single most common theme in how data backup and disaster planning are handled, these days: while tape is hardly dead, many companies are replacing it with hard drives. "People have been doing tape-back up for 30 years," says Bob Cramer, CEO of LiveVault, who hired Cleese last February. "Yet when you talk to the analysts, they estimate a 10 to 20 percent failure rate in the best data centers, 20-50 percent failure rate outside of that." The reason, he says, has to do with the entire process of tape backup-everything from installing and using the backup software to not properly labeling the cartridges.

Meanwhile, "the cost of disk space is dropping at three times the rate of Moore's Law," says Ted Theocheung, vice president of marketing and business development for the network-attached storage company, Anthology Solutions. "Just a few years ago, a terabyte used to cost $10,000. About 12 months ago, it slipped past the $1500 mark and now the price is within sight of $1,000."

Disk backup is quicker than tape and involves fewer potential errors from mounting and mismanagement. But its biggest advantage may be its ability to locate and restore very specific information in response to government regulators. Katrina-class hurricanes, earthquakes and 9/11 have resulted in massive data loss, but when worrying about backup, lawyers and government bureaucrats are a more likely threat. "If the one e-mail that may keep the CEO out of court is the last file written to tape, it's going to take a very long time to find that file," said Pete Gerr, senior analyst at Enterprise Strategy Group, in an interview in the American publication Network World.

"Data backup is being driven by the need to maintain records in an automated world," says Dorian Cougias, author of The Backup Book and founder and CEO of Network Frontiers, a consulting company that focuses on disaster recovery and security. He says the regulations are nearly worldwide-encompassing Japan, Europe and the United States, and covering an ever-widening sphere of transactions. "Let' s say you are the franchise that owns one of those vending machines in the train station. When a machine runs low, it' s one computer talking to another, it' s no longer one person talking to another-it' s a computer-to-computer ERP transaction. Three years down the road you' re being audited by the local auditor. They want to see all the records and transactions. Well, where do you get those from? Not from people. It' s not on the books any more. The books are now electronic."

These regulations can bite. "When was the last time somebody sued you for a failed disk?" Cougias asks rhetorically. By contrast, "when was the last time somebody sued somebody over a transaction that they believed was bogus?" In the United States, companies are now required to have stated policies of how long they retain records, including email. That can easily mean a terabyte of data even for a relatively small organization. The same thing is happening in other countries, including Japan. "It actually started in the 1990s with the European Union, who had a thing called the OECD [Organisation for Economic Co-operation and Development] Privacy Framework," Cougias says. "That talked about data that needed to be made available but kept private."

Off-site backup online

With LiveVault and comparable services, companies send data over a broadband connection to an off-site server farm. There' s no tape to deal with and you don' t need an IT manager to make it work. You don' t have to physically move tapes: there' s nothing to drive or fly off-site-a routine practice at some companies. The services charge by the gigabyte of storage. Bob Cramer says that LiveVault covers the so-called SMB market-small and medium size businesses, including insurance agencies, law firms, and small health care facilities. The company does not try to back up client machines. "We do servers-Microsoft Exchange, SQL-the machines that make businesses work."

In the United States, the SMB market is red hot. Companies like Oracle, which traditionally catered to larger companies, are setting their sites downward, while Intuit, maker of the best-selling accounting package for small businesses in the U.S., is moving in the other direction by supporting larger companies. Microsoft, not surprisingly, has a huge SMB marketing effort underway. There are SMB magazines, conferences and SMB blogs. And so while you might not want to hire John Cleese to pitch a backup service to staid IT professionals, the Monty Python alumnus makes perfect sense in the SMB market, where IT is inevitably a part-time job and companies are not so entrenched in any backup method where they won' t consider something new.

"We ourselves are a relatively small company and we didn't have huge advertising budget," Cramer says. "Our technology, which automates the entire backup process, didn't need selling, but the concept did." The answer: create a "viral" marketing campaign that relies heavily on word-of-mouth. Cleese, who has done other business-oriented sales pitches, was recruited. His first effort was heavy on concept, only mentioning LiveVault once. The company has now produced a follow-up 30-minute webinar that is more brand-specific. "Most webinars are dry and boring, but this one is highly educational, not technical, and the 30 minutes feels like five," Cramer says.

LiveVault has been providing its services for five years and its main competitor, EVault, has been transporting data over the Internet since 1997, moving smaller amounts of data over 56K modems. With higher transport speeds and dropping hard drive prices, the company's business has taken off. EVault serves both small companies storing just 5 to 15 GB of data and large companies whose storage goes up to a couple of terabytes. The company offers a managed service where customer-owned hardware is co-located at an EVault data center. EVault has a strong following among health and financial services companies who must follow stringent government regulations on record keeping. Customers were typically backing up to tape, then transporting tape offsite. Now, they let the Internet to the heavy lifting.

EVault operates seven data centers in the U.S., storing 3-6 petabytes of customer data, which after compression consume about 85 terabytes on EVault's server farms. Data "vaults" are housed in Tier-1 data centers comprised of high-end Hewlett-Packard servers and EMC disk storage subsystems. The company is now eyeing the Asian market-especially Japan. "We are just in the planning stages, but don't see a lot of established competition there," says Tony Barbagallo, the company's senior vice president of marketing.

A typical larger EVault customer is St. Vincent Hospital & Health Services, which took 36-48 hours to backup 6TB of data to tape. Each full backup required 90 tapes and cost about $7,500. Now the company uses EVault InfoStage software, paying less than one cent/MB to backup to ATA drives. "The cost of a gigabyte used to be a whole lot more expensive than on tape," says Barbagallo. "But the cost of disk-based backup is declining, which is the primary reason it is growing so popular. And the online cost of transferring data has come down significantly, as well."

Where EVault does not yet compete is with very large data centers that use Veritas or Legato to back up massive databases, accumulating massive tape libraries. It is here that tape backup will probably live on even after most other companies have switched to hard disk backup. But even where tape is the final destination, some companies have deployed "disk-to-disk-to-tape" solutions where hard disk storage becomes the intermediary.

One obvious concern about off-site backup via the Internet is privacy. "Customers definitely ask," Barbagallo says. "Our answer was to get ourselves SAS 70 Type II certified." SAS 70 is the American Institute of Certified Public Accountants' Statement on Auditing Standards No. 70, which promises compliance with various U.S. government privacy and accountability regulations, including the Sarbanes-Oxley Act and the Gramm-Leach-Bliley Act. "For our customers, that means they don't have to conduct their own audits. Data is encrypted and stays encrypted at our facility. We also employ various security measures," as well.

Another obvious problem with online data is recovery time on large datasets. EVault's answer: configure a server that includes its own transfer software, then courier it to the customer. The customer transfers the data and returns the hardware to EVault. And what about disk failures at EVault's end? "All of our back-end is redundant RAID 5 hardware with advanced snapshot technology," he says. "And if customers want even more security, we can arrange to have multiple copies of their data stored in different parts of the country."

A terabyte in a box

Another place hardware is moving in on tape is with a new class of network-attached storage devices targeting smaller businesses. At first glance, Anthology Solutions' Yellow Machine looks like a miniature computer the color of an American taxicab. Inside is a RAID 5 array: four hard disks that can hold a terabyte of data. With the RAID 5 redundancy, the unit holds about 650 GB of information, enough to automatically backup a small office full of computers. "If you figure the average PC stores 40 GB worth of date, you can easily back up 20 or 30 PCs," says Anthology's Ted Theocheung.

A single NAS machine provides fast onsite data protection, but not the off-site disaster recovery available through an EVault or LiveVault. Theocheung concedes that Internet backup can make sense as a disaster recovery move for larger organizations, but is overly expensive for small businesses. "A terabyte for offsite storage can cost $1000-$5000 per month," he says. (He' s right: LiveVault charges $2763 per month for just 250 GB.) A more economical alternative: install two or more Yellow Machines in different locations and do the offsite backup yourself. Or you can occasionally back the NAS system to tapes, then take the tapes off-site.

But can an SMB owner actually set one of these machines up without calling in the high-priced help? Theocheung says that the Yellow Machine is a complete plug-and-play solution. "You power it up, follow the simple setup wizard, tell it whether you are using it just for storage or also as Internet router-and that's it." Backup is managed by EMC/Dantz Retropsect Professional software. If a hard disk fails within the unit, you can replace it yourself. The Yellow Machine differs from its competitors by including a built-in router/firewall. Users directly plug in up to eight PCs in the back, add a broadband modem and perhaps a wireless access point, and the entire office is both online and ready for backup.

Sidebar: An Conversation with Dorian Cougias

Dorian Cougias is principle author of The Backup Book: Disaster Recovery from Desktop to Data Center (published in the United States by Network Frontiers) and the founder and CEO of Network Frontiers, a consulting firm specializing in disaster recovery and security, with clients as far away as Dubai and Kazakhstan. I spoke with him from his office in Oakland, across the bay from San Francisco.

You've said that lawyers are the new force driving data backup. What about the more traditional reason: crashed disks?
Absolutely. Japan came up with the invention of the new IDE drives, which were far less costly than SCSI drives. But SCSI drives last at least five years, while IDE drives, if you look at it the warranties, are guaranteed for as little as 30 days. IDE drives are built of cheaper materials and have a life expectancy of a little over a year.
The biggest thing about backup is that a lot of us just don' t think about what we' re doing. Case in point-one of our client companies in China put in a RAID array with 82 drives. Their problem was they bought all 82 drives from the same vendor on the same day, and they were all from the same lot. Guess how many of those failed on one day: 30. That' s at least one more than it would take to bring down the entire array.
And then there' s the executive who doesn' t backup his computer because his company only backs up servers. So he gets on a puddle jumper to fly from one end of an island to the other end, hits turbulence while he' s writing his presentation on his laptop, and loses his drive. That happens. IBM came out with a hard drive technology that includes an accelerometer to detect shaking.
Where do tapes still fit in?
Conventional wisdom now has it that tapes are for long-term storage-they' re great for archiving, you can easily remove and transport them to another place. They provide the best security: you can put them offline in a locked box. Another advantage is that they can accommodate large datasets that span multiple hard drives. If I' m backing up to a disk drive and the space runs out, I have to start another backup drive. You can' t span disk drive media, while tape media was designed to be spanned.
What about the off-site Internet storage services-how are they changing how companies back up?
I call these "replication services," and they are becoming more accepted for small organizations or organizations that don' t have a lot of data. But they can be slow. We tested one with the new book we' re working on. It' s not the world' s largest book, but for 100 GB of research we had, it took us two days to get the initial information over there.
There are things to consider when looking at replication to a provider or planning replication yourself. How much do you have to move? How big is the pipe you have to move it through? What is the cost of moving it? What' s the cost of storage?
Do RAID arrays have an impact on how people back up?
They are having impact for local storage. They' re great to get data off the machines and off the working network. I recommend you put them on a secondary backup network. You put a second card in your server, with a second switch that goes between the servers and the backup system-so when you ask the server to do a backup, the data doesn' t go across your production network.
Besides Internet services and network-attached storage, what other tactics are companies taking?
Those are the two big ones. Some companies are moving to Internet storage area networks, or IP SANs. This is basically a big drive array that has a TC/PIP address. It doesn' t really matter where that big drive array is located-your backup just sends the data to a TCP/IP address. That works particularly well for what we call 'metropolitan area networks' . When they dug up downtown Tokyo, they laid a lot of 'dark fiber' -extra pairs of fiber used for later expansion. So an organization can lease that dark fiber and put their big storage array of drives in another building on the other side of the town. It' s just as fast as if you sent it on your own local network-but it' s off site and a little bit more expensive because you are paying for that fiber connection.
Does that mean that Tokyo is ahead of the United States in terms of on-line, off-site backup?
In a lot of areas, yes, because they wired later. Shanghai is really ahead of everybody because they wired last.
What about recovery? What about the need to recover specific files from a specific date, rather than an entire month' s worth of data?
Normally I' m very vendor-agnostic but this one actually works. Its called Symantec LiveState Recovery Desktop 6.0 and it covers a big problem particular to the Windows world. Your computer dies, you replace it, but the new machine doesn' t have the same drivers or cards as the old one. The way Microsoft has arranged things, you can' t take one version of a computer and restore it to another because the cards and drivers are different. With any backup system out there before, if you wanted to restore to a different computer you had to do it in layers-first install an operating system, secure it, patch it, etc., then you would layer on top of that the restored applications and restored data. A lot of companies just said screw it, we' re not going to back up local users computers because it' s not worth the effort in restoring them.
Symantec spent a couple of years figuring this out, and the first couple of versions they sent me didn' t work. LiveState Recovery, version 6, works like a champ. It takes a picture of the drive and somehow they' ve collected every driver that everybody has ever made. We tested this. We then cobbled together a computer with hard drives from Dubai, cards from Korea, a monitor driver from a little company called Radius that isn' t even in business anymore. It was the Frankenstein box. Then we then said: restore to this. And by God, it did. I asked Symantec what they were doing in there, and they said they don' t back up the file. They take a picture of the hard drive, the blocks on the hard drive.
What do you do at your office?
For our writing office we do two things. We use LiveState Recovery on the laptops, backing them up once or twice a day. It reassembles the full and incremental backups into a "virtual" full backup that can be restored in a single step. If they break their laptop, you swap in a new one, say "restore my computer" and the software says, "here you are"-and it' s restored. So I know that if they break something we can restore it in 20 minutes. It includes all their local data.
For things we want kept long term, we' ve put in a thing that Microsoft gives away, called Sharepoint Server. It' s a Web-based file server that' s included with the Microsoft small business server. It tracks all of your documents and even lets you use version control-every time you hit save, it' s keeping a version of it up there on the server. You can go back and say "I want the version Tuesday at 2:00" and it then knows to do that. When you' re done with the document, we say to the backup software: continue to back up this directory and archive it and put it on the tape. We move it off site for however many years we contracted with the editor to keep it.
Zooming out to Fortune 500 companies, what do you see on the horizon?
The biggest trend is the "truth trend"-the industry is getting more realistic. For example, many vendors have claimed you need to save everything forever. But if you look at every national records authority-in Japan, the United States, Canada, and the UK-they all say you don' t have to do that. Some things you can throw away immediately, some you need to save for up to 20 years, some things on average you only need to save for three years. Now people are saying-let' s not do it just because we have the technology. Let' s do it only if it makes good business sense. We should make good IT business decisions like we make good business decisions.
Which are not based on fear.
Right. Fear, uncertainty and doubt. The smoke is finally lifting and people are separating the good ideas from the bad ones. If the regulations permit us to destroy some records in three years, we' re going to have a policy that says we are destroying these records on time because we don' t need to legally keep them any longer.