[Dshield] Information

Daniel G. Kluge dkluge at acm.org
Thu Jun 12 20:21:05 GMT 2003


Am Jeudi, 12.06.03, um 16:19 Uhr (Europe/Zurich) schrieb Deb Hale:

> Have any of you on the list had experience with design and development 
> of
> Hot Sites for Disaster Recovery?  I am considering proposing this to 
> out
> local community and am trying to get information.  Any ideas?  Deb
>

I do have some advice here, but since your question is pretty 
open-ended, I'll just give some more general pointers.

The first thing, is you have to figure out what you want to do. Which 
systems have to be replicated, how far away do they have to be, what is 
the recovery time.

The next thing is to make sure that everybody working on the system 
knows the disaster-recovery requirements. If you don't have 
change-management in production, don't even think to replicate that 
system, it will never work! There is nothing more interesting than 
firing up a cold standby, and discovering that neither OS nor 
Application Version matches the current production system...

If you're replicating complete sites with everything, the next point 
isn't that much an issue. But make sure everything wants to talk to the 
disaster recovery site, there's nothing more stressful than to hunt for 
the config file entry in some obscure application where it specifies 
it's TCP peers, or having to reconfigure the fire-wall, so your new 
system is actually visible.

Now the hard part of course is replicating data next to real-time, or 
even doing a transparent fail-over. Here you will be constrained by 
money and distance.

For most relational databases there are multiple variants for 
replication, the cheapest is a shadow database, where you just reapply 
the rollback segments to the database on the disaster recovery site 
whenever a rollover occurs. More expensive and complex are replicated 
databases, using the db-vendor's tools or 3rd party.

The most expensive solutions, which guarantee failover in an hour to 
real failover mostly involve private fibers between the sites. One 
method is to replicate the data-storage, i.e. have the SAN with your 
data replicate itself. similarly you can extend cluster configuration 
to have the 2nd half of the cluster in the disaster recovery sites a 
short distance away.

The most expensive solution is of course having two live sites, 
everything runs replicated, and the last element before the user 
switches/decides which site to use, such a solution has virtually no 
fail-over time.

A final recommendation: For any setup, be sure that your 
consultants/vendors have done such a setup before, and have enough 
knowhow available to support you. This check should include everybody, 
even the industry's largest names; depending on your location they 
might not have the experience or manpower.

Cheers,
-daniel




More information about the list mailing list