May 9th, 2000
 
Dear Avance Users,
 
This morning I was able to bring our DRX-500 back on line. The limited amount of testing I was able to do suggested that it was functioning properly. Please consider the spectrometer to be open for use. If you experience any problems please document them thoroughly and let me know.
 

What Happened

As far as I can tell, the disk drive in the SGI O2 computer suffered a catastrophic failure sometime during the night of Monday May 1st. I have never experienced such a failure, which was without warning, in all my years of working with UNIX systems. Because such a failure is so rare, I initially mis-diagnosed the problem as a computer board, but a second day of diagnosis and swapping components with the DPX-400 computer showed that the disk was at fault.
 
If any of you has any information that might be relevant to the drive failure (e.g., a power glitch occurred), please let me know as it is important to document such occurrences.
 

How It Was Fixed

Since the exact replacement disk is no longer available new, an exact replacement disk and mount was ordered from a third-party vendor. This was done because Don Rippon and I had expected that the problem with the disk was actually on the circuit board that is part of the disk assembly. By obtaining another disk of the same type, we hoped to swap the circuit boards on two disk assemblies and thus "repair" the original disk drive. This would have allowed me to get the system back up more quickly since no system installation and configuration would have to be done. In addition, we would have saved all of the NMR data on the original disk.
 
Unfortunately, swapping the circuit boards did not fix the problem, so the "new" disk drive had to be installed in the 500 computer. This meant that the operating system and NMR software also had to be installed, and some system configurations carried out.
 

Ramifications

I will attempt to get the data on the original disk recovered. Unfortunately, many of the parameter sets on the 500 were lost. I will try to recreate as many of them as possible. In the meantime, if you have run a specific experiment on the 500 in the past and want to run it again, all you have to do is to read the old data set, copy it to a new dataset (wrpa command), and then start your experiment using the new dataset.
 

What Each of You Needs to Keep in Mind

As it happened, no user data was lost, since the user data is on the 6GB external disk. In addition, I backed up the external disk while the 500 was down. Unfortunately, I lost a lost of data that I had obtained for testing purposes on the 500 during the last few months. I do not know if it can be recovered. This just goes to show the importance of backing up your data.
 
I want to stress to every one of you that ultimately it is your responsibility that your data is secure. I strongly recommend either:
 
1. copying data to computers in your own labs, using e.g. Fetch, and then to a tape or ZIP disk, or,
2. copying your data to an account in our computer facility, and then to a Jaz disk, or,
3. copying your data directly to a Jaz disk in the NMR lab (at the moment I have to help you with this, although this will be remedied when we obtain a third SGI computer in the NMR lab for data processing and archiving).
 
In addition, it has become apparent that some of you have way too much data in your data directory. A couple of users have 200 and even 400 MB - the requested limit is 100 MB. I ask all of you to please delete (after archiving if necessary) any data that is no longer needed.
 
Finally, I would like to thank Don Rippon for his help with this problem, and you NMR users for your patience.
 
- John Harwood