Redundancy and Automated Alerts Ensure Business Continuity?

In the UK and Ireland, you are made redundant when you lose your job. When something is redundant, it means that it is unnecessary, a duplicate of the existing. However, in networking and indeed business terms, having redundant options is a positive concept, as it refers to backup solutions that take over when the primary fails.

In a perfect world, where hardware often has a predetermined or estimated lifespan, companies will ensure that business continuity is possible for a wide range of ‘disasters’ whether these include loss of services, hardware failure, data loss or other unexpected events such as fire, flooding and severe weather conditions. These secondary solutions are known as redundant, backup or ‘failover’ solutions as their function is to assume control or allow the means to restore services when the primary goes down.

How important is redundancy for the average company? Is it feasible to guarantee 100 per cent uptime? What steps can companies take to minimise risk or downtime?

Obviously, due to budgetary constraints common to many companies, it is not possible to simply clone an entire IT infrastructure to ensure uptime in all areas. In any case, even if budgets are available, it does not make business or financial sense to do so. However, companies can take steps to protect themselves and reduce downtime risk.

Essential Services

In terms of business continuity, all companies are at the mercy of power companies and loss of power is a problem that faces everyone. It is solved by the use of uninterruptible power supplies (UPS) for every network device. Unfortunately, they are expensive and are not a long-term solution if power loss lasts more than a few hours. Generators will solve the problem and allow internal tasks to resume.

Given the likelihood that any blackout is not limited to your premises, you have also lost internet access, apart from internet-enabled mobile devices, of course.

It is for this reason that many companies utilise cloud services, with managed service providers for key customer-facing elements of the business, such as e-commerce websites, for example. The adoption of a hybrid IT infrastructure makes perfect sense and allows companies to continue working in the cloud until the on-premise network is back online.

In fact, according to a SolarWinds survey, 92 per cent of U.S. IT professionals claim that cloud adoption is important to their organisation. In addition, it is application, database and storage requirements that that drive increasing adoption. When only 6 per cent of have not migrated anything to the cloud, can you afford to ignore the benefits?

However, bear in mind that cloud migration does not eliminate on-premise network concerns as, in the same report, 60 per cent of respondents believe it’s unlikely that everything will be cloud-based, with security and compliance of the greatest concern. Therefore, downtime remains a tangible risk and automated network monitoring can certainly help.

Prompt Response is Key

How will you know if your network goes down? During the working day, it may well be blatantly obvious, as users will immediately contact IT when they can no longer access services. But what happens when IT are offsite or it’s after working hours?

Power loss is admittedly rare in developed countries but loss of broadband or network access is more regular and companies need immediate alerts if this happens, given that key business activities, both internal and external rely on them.

One option is a hardware SMS gateway, which alerts the parties responsible for network monitoring, whether these are on-premise or outsourced from a local IT company. Most importantly, as each gateway contains a SIM Card, alerts are sent even when an internet connection is not present. With a 3G option to facilitate communication, automated email alerts (in addition to SMS) are also possible due to inbuilt modems and watchdog mechanisms.

With such an alert mechanism in place, response time is reduced and your chosen IT professionals can solve the root cause faster, reducing downtime and loss of productivity.

How Much does Downtime Cost?

In many situations, reactive support is necessary, hence the requirement for an automated alert system. With power loss and internet connection issues solved, companies can take additional steps to maintain business continuity.

The big one is, of course, data loss due to hardware failure. Hard drives fail regularly and few companies operate without protecting their data by using real-time backups and regular offsite archiving. However, this is only a small part of the network redundancy options available and each companies needs to evaluate their redundancy strategy. Ask yourself how much it will cost if your internal network goes down for an hour. How about an entire day?

In factory production, for example, an hour could be very costly. In a small office, perhaps not so much. Therefore, weigh the costs of employing network redundancy at all points in the data path against the cost and perceived risk of failure.

Increase Redundancy?

Reducing risk factors is a key objective in business but is generally considered in budgetary terms. If the risk is low and the cost for a redundant feature far exceeds the possible costs of failure then it is not worth implementing.

For example, redundant measures could include but are not limited to:

  • Network cabling setup that facilities redundancy — ring protocols or redundant coupling, for example.
  • Managed switches that reroute connections if one path fails.
  • Redundant dedicated broadband connections from another service provider.
  • Multiple backup plans for servers and desktops.
  • Use of colocation servers and failover technology.
  • Backups for cooling, power, fire and water detection

In conclusion, 100 per cent network redundancy comes with a hefty price tag, requiring ongoing maintenance and management from professionals with a variety of skill sets. Even then, 100 per cent uptime is not guaranteed.

Large enterprises with dedicated data centres can handle these requirements but smaller companies simply do not have the budget or staff to support a fully redundant network. While theoretically, it is indeed better to be proactive, it is more cost-effective to put a preventative maintenance process in place and react to hardware problems as they occur, in accordance with a defined disaster recovery plan. When alerts are automated, what more is needed to reduce downtime?

Monitoring Switches in Data Centers

MONITORING NETWORK SWITCHES IN DATA CENTERS

Network availability and performance are critical parameters in determining the proper operation of LAN, MAN or WAN. Malfunctions in network switches adversely affect the productivity of companies, therefore their proactive monitoring is an important element in the work of the administrator. Here we provide a short overview of methods used nowadays for monitoring of network switches.

SNMP Protocol

All types of switches can be monitored using SNMP. Monitoring can provide information within the port: port availability status and information about transmitted packets. In addition, we can monitor equipment performance metrics: CPU usage, RAM usage, etc.

NetFlow, sFlow, jFlow Protocols

NetFlow is a Cisco protocol running on the switches of the company, sFlow protocols and jFlow are similar technologies developed by competitors. These protocols provide information about a stream of data flowing through the network devices, providing detailed insight into the performance and network bandwidth. Because the data is pre-aggregated, the use of this protocol is easier than using a packet sniffer.

Packet sniffing : monitoring using the monitoring port 

The outer packet sniffer (usually built into NMS system) examines all network data packets sent through a special monitoring port in a switch. This port in a switch sends a copy of all network packets from different port (or ports) of a switch. Such packets are then analyzed by the NMS system. Out of the three switch monitoring technology, this one creates a highest load on CPU and network. 

NMS system 

The central point of the supervised switches environment is IT infrastructure and network monitoring system (NMS). The system aggregates data from the monitored points, provides powerful capabilities for analyzing and visualizing the information collected, and transmits alerts about incidents and failures.

SMS Alerts as an effective notification of failures 

A key element in the course of automatic detection of an incident or failure is early as possible and effective notification of the occurrence of the event. For this purpose, data center administrators often use SMS channel. Due to the very good responsiveness to SMS messages (incoming SMS is considered a high priority by the customer compared to other channels, type of e-mail, instant messaging) and versatility (SMS does not require any dedicated application) it is an often used channel for sending alerts about incidents or failures. In order to shorten the critical path (minimize number of devices between NMS server and GSM/3G network), one can use hardware SMS gateway with built-in GSM/3G modem. Such device allows to send SMS alert directly from NMS to the GSM network excluding external Internet Service Provider.