What to Do If Your Server Crashes

Blog Titles 3 (14)
When running a small business, you will most likely store your company’s data on multiple servers.  This, unfortunately, means that you are susceptible to a server crash.
There are many kinds of server failures that could potentially occur and many things that can cause them. In order to resolve these crashes or errors, you should understand the particular kinds of errors you could be facing.

Defining a Server Error

In order to understand what a server error is, let’s start with what a server is. In the most basic sense, it is a program meant to be used, in the context of a business setting, as a network where multi-user programs such as email, messaging, and printer software are all connected.

Server Crash:

Essentially, a crash is when the server’s hardware fails. It can be caused by many issues, such as “faulty” scripts, running out of RAM space, etc. This causes the servers to halt (i.e. the server does NOTHING), causing the loss of potentially critical data.

Omission Failures:

An omission failure is when a response fails and does not execute. There are many reasons this can occur, but some of the main reasons is when routers become overloaded, or because of a process crash. Omission failures can be categorized even further with process omission failures and communication omission failures.

Arbitrary/Byzantine Failures:

The term ‘Byzantine’ refers to a failure where any error could occur. In regards to a process, these kinds of errors can take the form of incorrect data values, or steps of the process not executing correctly. It’s also important to note that it is a particularly difficult error to detect, so the help of a professional may be needed.

Timing Failures:

This kind of error can only occur if the servers are synchronous, or when program-to-program communication in the server responds consecutively without creating a new communication. Timing failures are when the timing parameters of a synchronous system are not met. It can come in several forms such as clock failures and performance failures.

Response Failures:

This is a very simple error. This is when the server’s response to a process or command is not correct.

Failure Masking – Taking Steps to Resolve an Issue 

Failure masking is essentially the actions taken to solve the server failure.
It has been said that “redundancy is the key for hiding failures” (De Florio & Blondia, 2015). This essentially states that repeating an action until it is resolved is a simple way to resolve the issue.
Another method of failure masking is to add additional data to mask the issue when it is within the software. For example, when trying to resolve a Byzantine failure, it can involve mechanisms such as utilizing checksums to correct incorrect data values, or adding an algorithm to the software to resolve timing failures.
A final solution is to simply add hardware or software to the servers which are designed to help mask the failure.
Ensuring your servers are well-protected and running efficiently can be a daunting task.  At Triella, with our Advantage Total service, any and all server-related issues are managed by our staff, who will monitor, update, maintain, and fix all servers in the network. For more information, call us at 647.426.1004.
Tess Kern is a Student Intern at Triella, a technology consulting firm specializing in providing technology audits, planning advice, project management and other CIO-related services to small and medium sized firms. Tess can be reached at 647.426.1004 x 232. For additional articles, go to www.triella.com/publicationsTriella is a VMware Professional Partner, Microsoft Certified Partner, Citrix Solution Advisor – Silver, Dell Preferred Partner, Authorized Worldox Reseller and a Kaspersky Reseller.
© 2017 by Triella Corp. All rights reserved. Reproduction with credit is permitted.

Share on facebook
Share on twitter
Share on linkedin
Share on pinterest
Follow us
Subscribe to our newsletter