Always check your references

Navision bug, the server keeps crashing.

Background

We were going live in a couple of weeks, and now put the production server into the server room. This was not really a "server room" in the true sense, more an IT office. With servers and other equipment. No racks and stuff in those days, just towers next to one another.

The server was a 386 OS/2 machine running Navision 3.04.

The problem

The basic network (Novell base) had been working reliably for some time, but once we installed the Navision server, the Novel server was failing regularly every second night. Basically the Novel server would shut down around 9:30 pm, and come up again 20-30 minutes later. Unfortunately the interruption was affecting the backup system, and we were not getting a backup on these nights.

Really this was critical. Although Navision was a key need, and we had to go live, we could not afford to loose the entire network and backups, and more importantly we just HAD to know what was going on.

Analysis of the issue

As any Navision professional will know; since Navision is (well was in those days) and unknown quantity, it was generally the norm to blame Navision first. In this case though, clearly the problem happened 2 days after we installed Navision, and regularly. So we looked at what could cause it. The only connection we could see was the network which was an ARC net star configuration, and the power. The UPS on the Novel server, was a big one and had one spare power outlet that we used for Navision. We checked the batteries etc. but no problem. We tried everything, but could not resolve the issue.

We made the decision to remove the Navision server, and took it out of the server room, so we could check it for problems. Sure enough, that night the system worked fine, no system crash. It was now clear that the Navision server was causing the Novel server to shut down, but how and why?

We had now worked out that the server was failing Monday, Wednesday and Friday nights, and it was now Thursday, so we decided to come back on Friday and just monitor everything to see if we could find the failure. We knew it happened between 9 and 10, so at worst case it should take one hour to see something.

Getting Ready.

We setup early. The Navision server is plugged in and back on line. We set up some monitoring tools. We had a Navision client running a repeating report to screen, to see activity. The Novel guys had monitoring software on everything. The Navision server console was open, so we could see if the server stopped.

Testing Time.

It was getting to be a boring uneventful time, but we were pretty sure the system would soon fail, and we were all a bit on edge it was about 9:30, so it should be happening any moment now.

At this point, the door to the office opens. A cleaner sticks her head in, and asks if she can clean the office. We discuss this, and Since its Friday, we decide to let her do her job, otherwise the office will be a mess on Monday.

So the cleaner comes in with her trolley bucket etc. and a vacuum cleaner. She casually walks over to the Novel Server, unplugs the server from the UPS, plugs her vacuum into the UPS and starts vacuuming.

She did look at us all extremely oddly the way we were all laughing, and couldn't understand why it was so important to know if she vacuumed this room every Monday, Wednesday and Friday at 9:30.

 

Send mail to Go-Live With Navision the book  with questions or comments about this project.
Copyright © 2005 David Singleton and Go Live International
Last modified: 31 Jan 2007