Tuesday, May 27, 2014

Crisis Management!


To read the whole story: http://www.thespec.com/news-story/4540411-why-did-the-duck-cross-the-road-/

For more information about Business Continuity, IT Disaster Recovery and Audit Training and Certification, visit www.sentryx.com or contact info@sentryx.com or call 1-800-869-8460

Thursday, May 8, 2014

Article: The Top Five Ways To Fail Business Continuity

by Ryan Hutton and Jacque Rupert

Experienced business continuity professionals often advocate a series of accepted practices to increase the effectiveness and quality of a business continuity program. Common activities include conducting a business impact analysis (BIA), documenting plans, exercising response and recovery capabilities, and training key personnel. However, despite the close attention paid to the details of methodologies and best practices, business continuity professionals often find their programs are not as successful as they should be.

There are many factors that can contribute to a “less-than-perfect” business continuity program – or a program that truly fails to meet management expectations. Five of the most common reasons why business continuity planning initiatives fail, their consequences and what can be done to avoid them.

1. Failing to Understand the Organization
Too often, business continuity professionals attempt to enhance their program by hastily layering in tools and software applications. However, this often becomes a waste of resources because a key underlying issue is a failure to understand the organization and its key products and services.

2. Executing Methodology Instead of Managing a Program
There are a wide variety of business continuity methodologies and standards, all of which are designed to improve how organizations create and continually develop and improve their business continuity programs and practices. Although building a program based on best practices is a great starting point, without an overall strategic goal linking the activities together, it can quickly become a “check-the-box” exercise that does not provide the intended value – or result in an appropriate level of readiness.

3. Unnecessarily Using Business Continuity Jargon
As expected, business continuity jargon can be confusing to non-business continuity professionals. Jargon includes acronyms such as EOC, RTO, RPO, BIA and COOP, or common terms with different meanings such as emergency response or disaster recovery. Using these types of terms can create frustration and unnecessary barriers when trying to communicate with business and technology stakeholders.

4. Unrealistic Recovery Objectives
Many organizations request that each business unit or business process define their own recovery objectives during the analysis phase of a business continuity planning effort. However, managers often struggle to define the appropriate recovery time frame.

5. Failing to Create a Culture of Business Continuity
A business continuity program can have the best people, systems, analytic conclusions, strategies and plans, but that same program will fail if it does not have the support of the business or if the business fails to think about risk mitigation and recoverability when making day-to-day decisions.


About the Author
Ryan Hutton and Jacque Rupert are consultants with Avalution Consulting. They focus on business continuity, including program definition, risk assessment, BIAs, strategy, plan development, testing and training. They have extensive experience working with government, utilities, manufacturing and distribution. They are frequent authors, and can be reached at ryan.hutton@avalution.com and jacque.rupert@avalution.com, 


For more information about Business Continuity, IT Disaster Recovery and Audit Training and Certification, visit www.sentryx.com or contact info@sentryx.com or call 1-800-869-8460.

Wednesday, May 7, 2014

Article: Business Continuity And Disaster Recovery: Big Tent Or Separate Umbrellas?

by Jim Mitchell

Perhaps I’m just a curmudgeon (a crusty, ill-tempered old man), but it irks me when someone uses the term “Business Continuity” exclusively to refer to IT planning.  Perhaps I’ve been in this industry too long.  I remember when IT planning was referred to as “Disaster Recovery”, and only business operations used the term “Business Continuity”.  Suddenly (or at least it seems sudden to me) IT specialists are throwing around the term Business Continuity as though they invented it – and as though everyone should understand what they mean.
Is Business Continuity an appropriate term for everything to do with recovery from, or response to a business disruption – to include both technology and operations?

For more information about Business Continuity, IT Disaster Recovery and Audit Training and Certification, visit www.sentryx.com or contact info@sentryx.com or call 1-800-869-8460.

Tuesday, May 6, 2014

Article: Business Continuity Beyond Company Walls: When A Crisis Hits, Will Your Vendors’ Resiliency Match Your Own?

At a glance

Reliance on third parties is substantial and continues to gain momentum. Companies are increasingly migrating core and strategic functions to external providers with the objectives of improving efficiency, accelerating growth, and enabling operational transformation. This whitepaper highlights the journey to an integrated, responsive, and proactive business continuity management program that extends beyond your company's walls.

Do strategy execution discussions include the need to gain insight into your critical vendors’ resiliency and recovery capabilities? If not, are strategic goals at risk of being derailed by an unfortunate combination of unprepared vendors and insufficient internal resiliency and contingency planning?

To some degree, organizations with global supply and service chains and outsourced business processes live constantly in the cross hairs, with a near guarantee of major impacts from a natural or man-made disaster — if not today, then soon.

Read more at http://www.pwc.com/us/en/risk-assurance-services/publications/bcm-vendor-transparency-resiliency.jhtml


For more information about Business Continuity, IT Disaster Recovery and Audit Training and Certification, visit www.sentryx.com or contact info@sentryx.com or call 1-800-869-8460.

Article: What Does It Mean To Be A Crisis Ready Organization?

by Andrew Griffin

There are six principles for ensuring that your organization is truly crisis ready.
Most of the work done in the name of crisis management is in fact crisis preparedness. “Are you ready to face the worst?” is a question that boards ask, regulators ask, governments ask and investors ask. They want to know that an organization and its senior management are in an advanced state of crisis preparedness. This article looks at how an organization can become ‘crisis ready’.

1. Preparing policy
Principle: Crisis management is a distinct component of an organization’s wider resilience framework.
Crisis management policy should explain how the organization thinks about and prepares for crises as a distinct component of a wider resilience framework.

2. Preparing leaders
Principle: Crisis management requires strong, effective leadership in both preparation and execution.
Crisis management requires creative decision-making, not blind rule following. Leadership therefore makes a huge difference to a crisis response, and leaders must be prepared to fulfil their role.

3. Preparing structure
Principle: Crisis management requires a clearly defined structure delineating powers between different teams.
Crisis management requires structure that empowers the right people and teams at the right levels to make, implement and communicate decisions.

4. Preparing procedures
Principle: Crisis management requires procedures that guide an organization’s crisis response.

The structure is the framework in which people and teams manage crises. Procedures are there to provide them with some rules and guidance.

Crisis procedures are not procedures in the sense familiar to those in business continuity or incident response. Crisis procedures – or a ‘crisis manual’ which I think is a more helpful term – should be a handful of pages long. It is not a step by step guide as to what to do next in any given situation, but is a set of rules within a working framework in which good decisions can be made, implemented and communicated.

5. Preparing people
Principle: Crisis management requires trained, skilled professionals to fulfil specific responsibilities.
Process is a necessary but not sufficient factor in good crisis preparedness. The rest is about people. Crisis management requires trained, skilled professionals to fulfil specific responsibilities. 

6. Preparing culture and relationships
Principle: Crisis management requires a culture that values reputation and the importance of external goodwill and relationships.
Companies that have a positive internal culture where reputation is genuinely understood and valued as a strategic asset will have a good backdrop for successful crisis management. It makes people want to exhibit the right behaviours, do their best, do the right thing and work hard for a company under pressure and scrutiny.
Culture is the internal context; goodwill and relationships provide the external context.

For the complete version of this article, you can go to this link to avail the book http://www.koganpage.com/editions/crisis-issues-and-reputation-management/9780749469924

For more information about Business Continuity, IT Disaster Recovery and Audit Training and Certification, visit www.sentryx.com or contact info@sentryx.com or call 1-800-869-8460.

Article: Final Report, National Institute Of Standards And Technology (NIST) Technical Investigation Of The May 22, 2011, Tornado In Joplin, Missouri

Abstract by Erica D. Kuligowski; Franklin T. Lombardo; Long T. Phan; Marc L. Levitan

This is the final report of the National Institute of Standards and Technology (NIST) investigation of the May 22, 2011 tornado in Joplin, Missouri, conducted under the National Construction Safety Team Act. This report describes the wind field of the tornado and how the wind pressures and windborne debris damaged and destroyed thousands of buildings; the emergency communications before and during the tornado and how the public responded; the influence of tornado hazards and public response and building and designated shelter area performance on survival and injury; and areas of current building and emergency communications codes, standards and practices that warrant revision. Also described in this report is the means by which NIST reached its conclusions. NIST collected large numbers of documents, photographs, videos, and building plans; developed a computer model of the wind field of the tornado as it crossed the City of Joplin; analyzed the performance of a range of building types for life safety and functionality; interviewed many survivors of the tornado, developed an evidence‹based explanation for decisions made and actions taken by the public in response to the tornado; and analyzed the factors affecting life safety outcomes. The report outlines 47 findings related to the May 22, 2011, Joplin tornado and concludes with a list of 16 recommendations for action in areas of improved measurement and characterization of tornado hazards, new methods for tornado resistant design of buildings, enhanced guidance for community tornado sheltering, and improved and standardized emergency communications.


For more information about Business Continuity, IT Disaster Recovery and Audit Training and Certification, visit www.sentryx.com or contact info@sentryx.com or call 1-800-869-8460.

News: Study Finds CISO Appointment, Business Continuity Shrinks Breach Costs

by Danielle Walker, Reporter
By appointing a CISO, breached organizations stand to fare better in their response efforts, lessening their costs by $10 per compromised record, an annual study found.
On Monday, the “2014 Cost of Data Breach Study: United States” was released, offering insight on management efforts which can improve incident response at companies. The ninth annual study, which was sponsored by IBM and conducted by the Ponemon Institute, polled 61 U.S. companies across 16 industries, after firms experienced “the loss or theft of protected personal data and then had to notify breach victims as required by various laws,” the report said.
The study found that the average number of breached records at organizations was around 29,000 records last year. Additionally, the cost of each lost or stolen record, on average, increased from $188 to $201 per record between 2012 and 2013.
The report also noted that the appointment of a CISO, and even the involvement of business continuity management (BCM) in the response process, noticeably shrunk the costs of breaches per record. For instance, having business continuity staff involved in remediation reduced costs by $13 per compromised record (as opposed $10 per record saved under CISOs), the report said.
For more information about Business Continuity, IT Disaster Recovery and Audit Training and Certification, visit www.sentryx.com or contact info@sentryx.com or call 1-800-869-8460.

Wednesday, April 30, 2014

ARTICLE: Pearson Grounds Flights During Ice Storm

In any BC Plan, it is critical to define when and how a disaster should be declared.

“Was the GTAA correct in making the decision to impose ground stop during frigid temperature of -25 to -45 Celsius?”
                                  Pearson right to ground flights during ice storm

These are the facts from those of us who were working on the ground when this decision was made:

Simply put, there is a very good chance that the GTAA's decision saved people's lives. In the proceeding 30 hours before the ground stop, there were two airplane crashes in similar conditions at New York's JFK Airport and Aspen, Colorado. Two days later, an aircraft slid off the runway shortly after landing in Saskatoon.

Years of two-tiered wages and contracting out has forced thousands of our co-workers into precarious, near-minimum-wage jobs. This is creating a high turnover rate and a lost opportunity to retain the experience needed to work in irregular operations. Many airports around the world, particularly in the U.S., are implementing Living Wage Ordinances in recognition that skilled, properly paid people on the ground are necessary for your safety.

Most importantly we need to remember that we are all people first. None of us can control bad weather in an industry with zero room for error. Nothing is achieved when we are abusive to each other — worker or passenger. After all, these decisions are made for both of our groups' safety. 

Sheri Cameron, Martyn Smith and Sean Smith are airline workers and representatives on the Toronto Airport Council of Unions encompassing over 20,000 airport ground handlers and flight attendants in both Terminals 1 and 3 at Pearson Airport.

Source Article (Toronto Star News)



GTAA criticized for “Ground Hold” at Pearson International Airport
The Greater Toronto Airports Authority is being harshly criticized for their decision to stop all arriving North American flights for more than eight hours at Pearson International Airport, which literally stranded thousands of frustrated passengers and caused serious delays since that day.

As a result, more than 50 per cent of all 774 arriving flights, i.e. 381, had to be cancelled as of Tuesday evening. Consequently, hundreds of weary travelers slept on seats or trudged forward in hours-long lines to rebook their cancelled or missed flights.

Vice President of strategy development for the GTAA, Toby Lennox, revealed that the decision to impose ‘Ground Stop’ at the airport is the CEO’s first in his 15-year career. He alleged that usually stops are only imposed due to snowstorms or lightning and last only a few hours. Although, he also admitted that “it’s just never been this extreme,” and “no matter how much you prepare, you’re not going to be able to make the event go away. I can’t prepare to make the weather go away.”

Source Article (Oye! Times):

http://www.oyetimes.com/news/canada/57358-gtaa-criticized-for-ground-hold-at-pearson-international-airport

For more information about Business Continuity, IT Disaster Recovery and Audit Training and Certification, visit www.sentryx.com or contact info@sentryx.com or call 1-800-869-8460.

ARTICLE: Alternate Communications During Times Of Disaster

by Dr. Jim Kennedy, NCE, MRP, MBCI, CBRM

We have witnessed over the last three to five years many disasters both in the United States and abroad. Based on what we are hearing from NOAA and the National Weather Service the US is likely to see the same number, if not more, tropical storms this year. Storms like those of the size and ferocity of the type that were so devastating to the southern portion of the US in 2005. So, tropical storms in the US , earthquakes in South America and Asia or volcanoes anywhere else on the globe, we, humanity, face another year of potential emergencies that will need to be responded to.

One thing that all of these natural disasters have in common, besides the tremendous loss of life and disruption to everyday lives of the populous, is that they are immediately followed by an almost total loss of the ability to communicate with the outside world. Power is lost, telephone services are discontinued, and cell phone service is either non-existent or is so congested that it takes hours to get a call through.

So, every year, companies and emergency planners face the problem of providing continued communication before, during, and after a disaster strikes their areas. This year, more than any other time, in the southern part of America small, medium and large company business continuity planners are looking for alternatives to standard communications so that they can keep their business and critical operations running in the aftermath of a devastating event.

I thought that I would present some alternatives for the spectrum of business types so that those business continuity planners would have choices to make informed decisions about backup communications from. Before we discuss back up communications solutions let’s first discuss the failure mechanisms for the communications used during normal times.

Failure modes
Most companies continue to rely upon the standard telephone system for their communications needs. In order to provide this service the telecommunications carrier, regardless of where you are located in the world, relies upon either copper wire or fiber optic cables from its central offices to its customers' premises. This ‘last mile’ can either be above ground, which is in the majority of cases, or underground. We have all seen those graphic pictures of poles and trees uprooted and thrown to the ground after a hurricane or tornado have devastated an area. When this happens that last mile of connectivity between the business and its telephone provider, Internet provider, or application service provider are abruptly disconnected and utility power is lost. Underground cables are not entirely safe from disruption of service either. Many times due to flooding and/or power loss these underground services are disrupted as well. In the case of cell phone providers the cell towers receive your cell phone’s call they then route it to a local central office. These towers or the equipment inside of them can also be damaged or destroyed as well as the last mile circuits which connect those cell towers to the local telephone network. So cell phone service is as tenuous as the regular telephone service when a disaster strikes. I should also mention that the southeast US is not the only area where loss of communications services takes place and hurricanes and tornadoes are not the only natural disasters that disrupt communications and power. In the northeast US over the last several years ice storms and blizzards have also taken their toll on communications and power utilities, for example.
Usually following an event like a tornado, hurricane, blizzard or the like, the communications and power service providers work very hard to restore service, however, in most cases we are talking several days if not a week for the restoration of power and phone service. This restoration time varies depending on the size and intensity of the disaster. If it is localized, as it could be for a tornado, then service could be restored more quickly.
These copper and fiber optic cables also interconnect the local telephone company’s central offices to other central offices in the region and to long distance providers, cell phone carriers, Internet and data communications service providers anywhere in the world. These inter-exchange or ‘long haul’ circuits provide the ability of inter-connectivity and communication to beyond the local area. So if your business communicates between offices in Baton Rouge LA and St. Louis MO there are probably several service providers and miles of cables involved in carrying the information from one point to the other. These cables travel above and underground and suffer the same fate as the local last mile circuits do. However, because of the number of calls, subscribers and the importance of these circuits, the carriers or the businesses that use them generally employed circuit ‘diversity’. What this means is that there are multiple paths for the voice or data to travel. If one path fails there is another which can be used to take the call to its intended destination. This works well for such things as car vs. pole accidents, isolated incidents like localized fires and floods, but with mass devastation like we experienced with Hurricane Katrina or the tornadoes in the midwest US, even the diverse routes are consumed in the overall damage toll.
Power is another failure mode. The central offices and cell phone sites have their own power sources in the form of batteries and emergency generators. If the event is limited to a few hours or a few days they will be fully operational. However, it was found that in the case of the hurricanes and earthquakes of the last few years power has been interrupted for several days even up to several weeks and the power plants, central offices, or cell towers in the areas of devastation were inaccessible for most of that time. This meant that the fuel trucks needed to refuel the generators were unable to get to their destinations and subsequently the central offices and cell sites went off-line.
So now that we understand that the power and communications utilities have planned for adverse events, but the intensity and massive area of devastation often make these plans fail. It is left to the individual business owner or operator to determine the criticality of their services and to properly plan for potential communication and power failures that might impact them.
In the next part of this article, I will endeavor to present the alternatives that exist in case you experience a disastrous event with a communication failure.

Alternatives
Before I discuss the alternatives I feel that it is important to note that power is a main component of any recovery or mitigation strategy. That is, without power to run these technologies they will not operate. So, it is important to have reliable and sustainable power for the duration of the resumption and/or recovery effort. If you cannot verify that this is the case then alternate site recovery is the only viable alternative.

Infrared
One such alternative to commercial communication systems is infrared. This alternative is used if a company needs to interconnect two buildings together. Infrared provides an optical data, voice and video transmission system. Like fiber optic cable, infrared communications systems use laser light to transmit a digital signal between two transceivers. However, unlike fiber, the laser light is transmitted through the air. In order for the digital signal to be transmitted and received, there must be clear line of site between each unit. In other words, there should be no obstructions such as trees or buildings between the transceiver units. So, if your wire-line or wireless communications fails you can still provide communications between two points. The only drawback is the distance and the line-of-sight requirements.
This solution provides low-cost, high-speed wireless connectivity for a variety of last-mile applications. It provides narrow-band voice and broadband data connectivity and the various products provide scalable, wireless alternatives to leased lines. These infrared systems operate at data rates of 1 Megabit to Multi Gigabit speeds and they are deployable in one day, without requiring right-of-way or government permits for installation. They can provide an alternative communication link in hours instead of weeks or months. This is probably not an option for a small business, but for a medium or large business owner the cost is affordable. Cost can range from $10K to $25K per installation capable of distances of up to 1000 meters.

Microwave
Another alternative to commercial communication systems is microwave (wireless). This alternative is used if a company needs to interconnect two buildings together that are spaced farther apart than the conventional infrared can operate (i.e., in excess of 1000m). Microwave also provides a data, voice and video transmission system. Unlike infrared communications systems, which use laser light to transmit a digital signal between two transceivers, microwave uses ultra-high frequency radio frequency (wireless) transmission. In order for the digital signal to be transmitted and received, there again must be clear line of site between each unit. However, the distance that this alternative can span is up to 60 miles as long as no obstructions such as trees or buildings are located between the two locations. If wire-line or wireless communications fails communications between two points can still take place. There are several drawbacks to this solution:

  • Distance limited to up to 60 miles
  • Requires an FCC license to operate
  • Right of Way Permits may be required
  • Needs highly trained technicians to install equipment
  • Cost can be prohibited to small businesses
The cost of a microwave system can be between $50K and $100K with installation and license preparation charges to be in the area of another $15K. It still provides a viable alternative for medium and large businesses.
Small businesses also have an alternative of smaller wireless systems which utilize non-licensed frequencies and which can be installed by an IT person in the business operation. Cost is about $1000 to $2000, but I must warn you that this is not as reliable a solution as the microwave wireless option and reliable speeds may be slower.

Satellite
So far I have provided solutions that have been better suited for the medium and large business operations. Satellite provides alternatives for small, medium and large enterprises and there are various speed and pricing options, which make it a very attractive alternative or mitigation strategy.

Satellite phones
There are several types of satellite alternatives. If a company is only interested in providing a short term telephone back-up alternative then satellite phone service like INMARSAT, at&t, Iridium, Satcom, Skytel, Worldcell, or Globalstar to name only a few offer basic voice, fax and basic v and e-mail services. They offer mobile phone services and are not usually capable of providing sustained data communication or Internet types of services. However, this communications strategy is good for keeping your senior executives and critical operations personnel in contact during disasters. You can rent phones for about $40/week and then pay about $1.00/minute for basic service or you can buy the phones for $700 to $2000 each and negotiate rates in the area of $0.85/minute. So as you can see this is not an inexpensive option, but usable depending on the need for communications.

VSAT
VSAT is an acronym for Very Small Aperture Terminal, an earthbound station used in satellite communications of data, voice and video signals. A VSAT consists of two parts, a transceiver that is placed outdoors in direct line of sight to the satellite and a device that is placed indoors to interface the transceiver with the end user's communications device, such as a PC. It is very much like a satellite TV setup. VSAT service can be placed into two categories: those that provide basic Internet access services and those that are enterprise grade. For the small and medium sized business the Internet access type service is often what is selected. Such offerings as: DirectWay, WildBlue, and Connexstar all offer low cost, small business types of back up solutions which use equipment much like the in-home satellite television services. The data rates are in the area of 200 kbps uplink and 1.5 Mbps downlink which is very much like residential DSL service. The cost is about $300 for the equipment and around $100 or less each month. This would provide a small business the ability to utilize VoIP, VPN and connect to the Internet. For medium and large size businesses there are more sophisticated satellite services. They require satellite antennas, which are 3 to 5 meters in diameter and much more sophisticated and expensive equipment. Installation of these more sophisticated satellite services can cost in the range of $100K to $250K with monthly operational service charges from $1000 to $5000/month. They provide quality of service and committed information rates as part of the service. They can provide for up to 150 toll-quality phone lines, broadband Internet, and high speed data communications and also provide secure communication (encrypted) is required. Satellite services can also be rented as part of a contract or call up service. But, rental services are on a first-come-first served basis. As we witnessed during the tropical storms of last year these portable rental satellite service providers were inundated with requests and try as they would there were only so many units to go around. Those who did not plan or contract ahead were left without service.

Last Thoughts
I hope that I have given business continuity planners some food for thought in developing alternative communication mitigation strategies. Each strategy has its benefits and drawbacks. You need to look at each potential possibility and determine what is right for you. If you are overwhelmed there are many consulting organizations and even your own telecommunications services provider who can help you to identify and select the best options. However, you need to get started today for the next hurricane, tornado, flood, of catastrophe season in your geographic region. It will be too late to plan after an event occurs.
Dr. Jim Kennedy is the Business Continuity Services Practice Lead and a Consulting Member of Technical Staff for Lucent Technologies. Dr. Kennedy has over 25 years experience in the business continuity and disaster recovery fields and holds numerous Master level certifications in network engineering, information security and business continuity.
He has developed more than 30 recovery plans, planned or participated in more than 100 business continuity and disaster recovery tests, helped to coordinate three actual recovery operations, authored many technical articles on business continuity and disaster recovery and is a contributing author for two books, the "Blackbook of Corporate Security" and "Disaster Recovery Planning: An Introduction."
jtkennedy@lucent.com

For more information about Business Continuity, IT Disaster Recovery and Audit Training and Certification, visit www.sentryx.com or contact info@sentryx.com or call 1-800-869-8460.

ARTICLE: Critical Infrastructure Protection Is All About Operational Resilience And Continuity

By Dr. Jim Kennedy, MRP, MBCI, CBRM
It has always been the policy of the United States to ensure the continuity and security of the critical infrastructures that are essential to the minimum operations of our economy and government. This critical infrastructure includes essential government services, public health, law enforcement, emergency services, information and communications, banking and finance, energy, transportation, and water supply.
So even before the events of 9/11, the Executive Branch of our government, the President through Presidential Decision Directive 63 (PDD 63) issued May 22, 1998, ordered the strengthening of the nation's defenses against emerging unconventional threats to the United States, including those involving terrorist acts, weapons of mass destruction, assaults on critical infrastructures, and cyber-based attacks.
But how many of us really understand what an immense undertaking that was? What is the critical infrastructure in the United States?
  • More than 3,000 government facilities
  • 7,569 Hospitals
  • Telecommunications: 2 billion miles of cable; 1000s of telephone switching central            offices
  • Energy: 2800 Electric power plants; 300,000 oil and natural gas producing sites; 104 nuclear power plants
  • Transportation
    Ø  2 million miles of pipelines
    Ø  300 coastal ports
    Ø  500 major urban public transit operators
    Ø  500,000 highway bridges
    Ø  5000 public airports
  • 4,893 banks or savings institutions have more than $100 billion in assets
  • 66,000 chemical and hazardous material producing plants
  • 75,000 dams
  • 51,450 fire stations responding to 22,616,500 calls for assistance each year.
US business and every individual rely in some manner on the above every day. We depend on their operational resiliency and continuity of operations.
Initially, critical infrastructure assurance was essentially a state and local concern. With the massive use of information technologies and their significant interdependencies it has become a national concern, with major implications for the defense of our homeland and the economic security of the United States.
However, given all of the focus on critical infrastructure still one in three critical infrastructure operations goes without a business continuity or continuity of operations plan and three out of five of those operations with plans have never tested their plans as ‘fit for purpose.’
Up until this year the electrical energy sector had no single body setting security and availability standards and practices for their operation. In 2006 the Federal Energy Regulatory Commission (FERC) selected the North American Electric Reliability Council (NERC) as the Electric Reliability Organization (ERO) and standard setting body in the US for electric utilities. Contingency and continuity of operations plans in this segment of the critical infrastructure is minimal at best as is typical across the entire energy sector (e.g. transmission, generation, oil and gas distribution and etc.).
In the financial sector many institutions, despite regular audits and increased governmental regulations, still do not have adequate continuity plans in place and information security is marginal.
Although the deadline for HIPAA compliance has officially passed, a significant percentage of covered health care organizations still have not achieved basic HIPAA compliance, according to a recent industry survey. They lack emergency operations plans and even in some cases proper disaster recovery plans for patient care systems, which contain critical patient healthcare information.
So even though there are laws and regulations and a very clear focus on the protection and resilience of critical infrastructure operations it has not seemed to translate into practice for the actual critical infrastructure operations across the US.
Critical infrastructure protection is all about operational resilience. In the GAO’s ‘Critical Infrastructure Protection – Significant Challenges in Safeguarding Government and Privately Controlled Systems from Computer-Based Attacks’ the report refers to service continuity controls as: “controls that ensure that when unexpected events occur, critical operations will continue without undue interruption and that crucial, sensitive data are protected.” It (the report) goes on to say that: “Service continuity controls should address the entire range of potential disruptions including relatively minor interruptions, such as temporary power failures or accidental loss or erasure of files, as well as major disasters, such as fires or natural disasters, that would require reestablishing operations at a remote location.”
So how is this to be accomplished? The most effective way is for the development of a thorough and comprehensive business continuity or business resiliency management program. That program can be based on the NIPP Risk Management Framework, which consists of:
  • Setting Security Goals
  • Identify Assets, Systems, Networks, and Functions
  • Assess Risks
  • Prioritize Mitigation Efforts
  • Implement Mitigations Strategies and Protective Programs
  • Measure Effectiveness
  • Start back at the beginning
I have attempted to outline below a process to aid critical infrastructure operations, utilizing the above CIPP Risk Management Framework coupled with an effective governance model, in addressing business continuity and resiliency needs.
First a certified business continuity planner needs to be selected and must obtain senior management agreement and sponsorship for the program to be developed. With this sponsorship budgets and manpower can be allocated for the project.
Second, the planner must solicit the aid from multiple areas of the operation or business. This can be accomplished by establishing a Business Continuity or Business Resiliency Steering Committee. This committee will be comprised of middle management from across the operation (e.g. technical, operational, financial, HR and etc.). The function of this committee is to establish the direction and approve the program, identify tools to be used, establish metrics, and report to senior management on progress.
Next, if the amount of work to be done is substantial or if the business continuity or resiliency program is starting from scratch, is the development of a Business Continuity or Resiliency Program Office. This may be comprised of one or more individuals who are responsible (using project management disciplines) for ensuring that the planning and mitigation tasks are implemented consistently throughout the organization. They must also track and report on progress.
With the governance in place, the CIPP framework can be implemented and work can begin to implement it within the organization. The steering committee will work with senior management to establish the direction and communicating the goals within the organization.
Identifying the critical assets is the next step. In everyday business continuity planning this equates to performing a business impact analysis. Here business continuity planners will work to develop a clear picture of what components (people, process, and/or technology) of the operation are critical to it carrying out its mission and to identify how long it can do without or work-around those components if they are to become unavailable.
Next step in the CIPP Risk Management Framework is the assessment of risk. This equates to the business continuity planner’s risk assessment. The risk assessment is the process of identifying the risks to an organization, assessing the critical functions necessary for an organization to continue business operations, defining the controls in place to reduce organization exposure and evaluating the cost for such controls. Risk analysis often involves an evaluation of the probabilities of a particular event.
Once the risk assessment is complete it will be necessary to move to the next step in the CIPP Framework, that of prioritizing the risks and developing mitigation strategies based on the operations risk appetite. Here is where the organization determines how to address the risk. Mitigate it, pass it on to another entity (insurance) or simply ignore it.
Whatever makes the best business sense is then translated into a protective plan which is then implemented under the direction of the program office. At this point in time, when the mitigation strategies are identified and are being implemented, is where the business continuity or resiliency plan can be developed. Again business continuity subject matter experts are best utilized to accomplish this task as they have developed plans for similar business operations. Once the mitigation efforts are in place and the plans completed awareness training and exercising of the plan is appropriate.
Lastly, before starting the whole effort over again, is measuring effectiveness. Is the plan and are the mitigation strategies “fit for purpose?” Does it adequately protect the operation from adverse events? If not, then the plan and mitigation efforts will have to be reviewed and modified as appropriate.
What has been accomplished is the beginning of a continuing effort to maintain the operation of the critical infrastructure. It has no end. It needs to be reviewed for every change to the operation.
I have been fortunate to help many critical infrastructure organizations build business continuity and resiliency into their operations. It is not easy but, as Presidents past and present indicate, it is of the utmost importance to make sure that the United State’s critical infrastructure is adequately protected as its citizens rely upon it every day for their safety, protection, and well-being. It is difficult but as has been said: the beginning of any important journey starts with a single step.
Dr. Jim Kennedy is the Business Continuity Services Practice Lead and a Consulting Member of Technical Staff for Lucent Technologies. Dr. Kennedy has over 25 years experience in the business continuity and disaster recovery fields and holds numerous Master level certifications in network engineering, information security and business continuity. He has developed more than 30 recovery plans, planned or participated in more than 100 business continuity and disaster recovery tests, helped to coordinate three actual recovery operations, authored many technical articles on business continuity and disaster recovery and is a co-author for two books, the ‘Blackbook of Corporate Security’ and ‘Disaster Recovery Planning: An Introduction’ and author of the e-Book entitled: ‘Business Continuity & Disaster Recovery – Conquering the Catastrophic.’ 



For more information about Business Continuity, IT Disaster Recovery and Audit Training and Certification, visit www. sentryx.com or contact info@sentryx.com or call 1-800-869-8460.

ARTICLE: Developing Seamless Business Continuity And Disaster Recovery Plans

by Jim Kennedy
PDFPrintE-mail
Introduction
The development of recovery times for both the business organization’s business continuity plan and the IT department’s disaster recovery plan need to be developed through the collaboration of both parties for either plan to provide the proper protection. However in my thirty-five years in the business continuity and resiliency field I have found in many situations they are not.
The reasons for this can be timing or a lack of knowledge of the overall business continuity and/or disaster recovery planning process coupled with a lack of understanding of each other’s real recovery timing needs.
The purpose of this article is to provide a framework in which the recovery time objectives (RTOs) for the business continuity and the disaster recovery plan can be developed together.
Reason for inconsistencies and failures
Generally the drivers for business continuity and disaster recovery planning are considered to be one and the same, but this is not always the case. Many times the very design process for IT infrastructure requires that the IT organization develop disaster recovery planning thoughts and plans early in the application and/or systems development process. So, early in the project’s timescale of the development of a new application or system, IT must have some understanding of what kind of recovery timing and recovery point timing will be needed to support the technology to be deployed. IT will try to obtain the RTO and RPO (recovery point objective) numbers, but the business is most often focused on insuring that the deployment of the new business process or function is rolled out on time and within budget. The business organization is not thinking about business continuity planning at this time. So, IT will take it on itself to develop a best guess of the required recovery times either based on conversations with the business organization or on its own, if the latter cannot or will not commit to a number.
In other cases that I have seen, there is a clear lack of knowledge about business continuity and disaster recovery planning. Each organization knows that they need either a business continuity or a disaster recovery plan but they are not trained in the overall steps in developing such plans. As such the business organization does not understand the risks, trade-offs, and costs involved in developing a proper business continuity plan. The business organization also often does not understand that it needs to properly analyze the operation to better understand the recovery requirements during the process/systems/application development phase of the systems/process development life cycle or, as ITIL defines it, the application life cycle (ALC). The business organization needs to quantify the impacts of loss of that process or system; and may not be sure of the right questions to ask - not only in terms of loss of productivity, but in terms of costs to process manually in case of a system loss or failure. Can the organization develop and use manual processes at all if the system or IT infrastructure fails? Does the organization have the human resources to perform the necessary manual processes or will they need to bring in contingent workers and for how long and for what cost? Every business organization needs to clearly understand and to articulate their operation’s maximum tolerable period of disruption (MTPD).
MTPD is the maximum time an activity or resource can be unavailable before irreparable harm is caused to the organization. This applies to both customer-facing and internal activities. Note that the recovery time objective specifies the time by which an organization intends to recover an activity or resource: the maximum tolerable period of disruption is the upper bound on this time.
The business needs to utilize the MTPD to develop its processes and contingency processes, and the IT organization need to understand the MTPD to properly develop its technology and RTO which, in turn, will enable the business to achieve its RTO objectives.
At the same time, IT needs to utilize the recovery time numbers developed by the business organization as a basis for its system and infrastructure RTO values.
Standards and planning process
There are so many business continuity and disaster recovery standards to choose from, as well as other related standards of practice, that this might be the reason for all of the confusion. The fact that none of these standards really talk of integrating the business recovery and the IT technology recovery plans together in to the overall process or application development life cycle complicates the matter even further.
There is also the issue that business continuity and/or disaster recovery planning classes are usually only electives in business administration or computer technology/information systems curriculum. So we are not exactly preparing our next batch of business or technology leaders to properly understand the methods, or importance, of contingency planning.
All that being said, most of the standards that exist do have a pretty consistent set of predefined steps to be reasonably successful. So if we take all of the contingency planning steps and align them with the ITIL ALC phases the planning cycle will integrate system development with continuity planning together at the best possible time in the development process.
I will outline the steps below in developing business continuity and disaster recovery plans with their corresponding points within the ITIL application development life cycle:
STEPS IN BUSINESS CONTINUITY AND DISASTER RECOVERY PLANNINGITIL APPLICATION LIFE CYCLE PHASES
1) Understand the Organization
a. Risk Assessment
b. Business Impact Assessment
            i. Determine MTPD for operation
           ii. Develop RTO for Critical Systems
           iii. Develop RPO for Critical Systems
Requirements – requirements gathered based on business needs of the organization
2) Evaluate and Determine Strategy
a. BC strategy to meet RTO/RPO
b. DR strategy to meet RTO/RPO
Design – requirements translated into specifications
3) Develop Plans
a. BCP – Business Organization
b. DRP –IT Organization
Build – Application and the operational model are made ready for deployment
4) Exercise PlanOperate -- IT operates the application as part of the business service
5) Audit and Maintain PlanOptimize

Using the standards and good practices during the requirements gathering phase of the ITIL ALC the business owner should have also conducted the risk assessment and business impact analysis or BIA. The results of these two activities allow the business owner to clearly see the impact on the business of a failure or discontinuation of operations in either, or both, of the business or IT operations. They can then translate that knowledge from the risk assessment and business impact analysis into quantifiable RTO and RPO numbers to be used in the next phase of business continuity and disaster recovery planning (Evaluate and Determine Strategy) and the Design phase of the ITIL ALC.
The RTO and RPO numbers are used to develop alternative strategies that meet the recovery time and point needs. A cost for each alternative design is developed. The cost is the total of the IT cost to design, implement, build and operate; and the business cost for any workarounds or special handling during the outage period; plus costs to load any transactions processed during that outage period into the system (processing re-synchronization) after they are brought back on-line and are processing again as before the incident.
The alternative strategies are then looked at using a cost and benefit (time, reduced workaround complexity, and etc.) analysis of each alternative. The best option will accomplish return to operation in a reasonable time with an acceptable cost to the business and IT. However, the alternative selected will require input from both IT and the business to properly address the risk of outage. The business will need to insure that it can perform the workarounds and still meet all of the business, regulatory and audit needs of the operation for the time period that the alternative defines the IT organization to need for restoring the IT systems needed to restart the application and its associated services.
For the plans to be effective and ‘fit for purpose’ it is very important that the business and IT are on the ‘same sheet of music’ as to recovery times and points. It is no good if the business has planned its resources and workarounds expecting a system recovery time of 24 hours only to find that the system will be down for 48 hours. On the other side of the coin it is not fiscally responsible to pay the cost to expedite the recovery time of an IT system to less than four hours if the business can tolerate an outage period of 24 hours or more at much less cost for the final IT solution.
Once it has been concluded that both plans are consistent with each other, the actual plans can be developed. While the business prepares for implementation of the new application and/or service, IT will make ready the systems and infrastructure needed to also meet the business schedule for implementation.
Exercising the plans
There is one caveat, however. Even if both sides have planned together and developed their plans based on a single and consistent recovery time, the two planning activities still need to verify (via exercising the plans together) that the IT recovery timing (the disaster recovery plan which includes hardware restoration, software restoration, synchronization of databases, and etc.) actually comes in on time to meet the business’ needs as provided for in the business continuity plan.
Only in testing and timing the two recovery processes to ensure that they are coincident can an organization truly be confident that the overall plans will be successful.
The Author
Dr. Jim Kennedy, MRP, MBCI, CBRM, CHS-IV, CRISC has a PhD in Technology and Operations Management and is the chief consulting officer for Recovery-Solutions. Dr. Kennedy has over 30 years' experience in the information security, business continuity and disaster recovery fields and has been published nationally and internationally on those topics. He is the co-author of three books, ‘Security in a Web 2.0 World – a standards based approach,’ ‘Blackbook of Corporate Security’ and ‘Disaster Recovery Planning: An Introduction’ and is author of the e-book, ‘Business Continuity & Disaster Recovery – Conquering the Catastrophic’. Dr. Kennedy can be reached at Recovery-Solutions@xcellnt.com

For more information about Business Continuity, IT Disaster Recovery and Audit Training and Certification, visit www.sentryx.com or contact info@sentryx.com or call 1-800-869-8460.