Saturday, August 11, 2007

Sometimes Its Just Hard(ware)

Sometimes Its Just Hard(ware)

Yesterday I posted over at TalkBMC about a new-to-me Dell D620 I received at BMC. That post was mostly about installing Mint 3.0 on the D620, making it dual boot with XP, and the fact that, like my personal Acer 5610, Linux installed dead easy. Everything worked right out of the box. The only fiddling around was to install things that did not by default come on the Mint 3.0 LiveCD. Things like Evolution and the Exchange connector, LMSensors, HDDTemp, gsynaptic, and gkrellm.

In the few short hours since that post, I have installed VMWare Server via Synaptic (really: having an APT tool and a touchpad tool with nearly the same name can be confusing), and brought up SimplyMEPIS from my VMWare machines inventory as a test. All still goes extremely well on the D620. Another data point in support of my article about Linux laptops in the current (3/2007) Linux+ magazine.

At the end of the TalkBMC post, I moved on to a point about hardware, and that there are common bits that all vendors use. Things like hard drives or Wifi Cards all come from a small set suppliers. Both my MacBook and my IBM T41 have Wifi chips from Atheros. My Dell D620 and my IBM T41 both have hard drives from Hitachi. The difference in using these common parts comes with the device drivers used and other hardware support bits in the OS, or the heat and shock isolation of the deployment environment. Quick example mentioned there: Both my eMachine 5312, and my wife's Toshiba M45 ran *way* hotter when running XP Home than when running Fedora. Fedora was better at telling the CPU to go to sleep and therefore use less power when the CPU was not needed for things.

Even more confusingly, sometimes the computer being hot means that everything is OK. My Macbook case is aluminium, and it is warm to the touch all the time. That is because aluminium conducts heat extremely well, and the case actually is part of the heat sink: the internal CPU temperatures are more or less the same as they are in my Acer 5610 or my Dell D620, hovering in the 48 to 52C ranges, depending on what I am doing. I have punched it up to 72C when I was doing movie rendering on the Mac, but I stopped doing that when I got the iMac. That is now what it is for. I wanted the Macbook to last, and it already gets carried all over the place. It does not need to be inferno-hot all the time too.

Even Good Parts Go Bad

The idea that parts can fail, even in the best deployments is probably no real surprise to anyone that reads technical blogs. I bring this all up because there does seem to be an attitude among some about Apples and other high end gear that they should never fail. "I paid more for them. They are supposed to be good." Or Linux: "Hey: I thought Linux was all better than everything else, but my hard drive crashed and I lost stuff!"

Everything gets dropped. Everything has cosmic rays beating down on them from out space. No really. My wife used to work at a computer company called Amdahl, and they had a series of mainframes that kept failing mysteriously back in the 1980s. Turned out to be cosmic rays disrupting the chips. And, as a co-worker pointed out to me while I was relating the recent Mac rebuild experience, there is always the "Loose Nut At The Keyboard" factor. yeah. That would be me. I admit. I experiment. I trash and destroy and rebuild all the time. It is how I, as a manager for the last 20+ years, can stay technical.

The Tale of the Failed eMac

Winding back the clock a decade or less: My mom bought on my recommendation an HP Pavillion that ran MS Windows 98. It was a nice little system, but I rebuilt it more or less every six months. It was not hardware then: it was the OS. It internally self destructed under daily usage. Mom net-surfed, and did word processing on the system. That was mostly it. But it just could not survive even that rigor. In talking to others at the office, six months was the generally accepted median lifetime for Win98 systems. Mom replaced the Win98 system with an eMac. I stopped having to fix things. Well... mostly.

A year before the eMac died... to the extent that it died.... the lights at Moms house started going wacky. Dim/Bright. Dim/Bright. Over and over and over. The Linksys WAP died, and was replaced. Then the eMac started to hang at odd times. We put a UPS on it, but too l late. The damage was done. Even an Apple eMac can not survive a year of flaky voltages. A new iMac came in (on the UPS this time) and there have been no further issues. It turned out that the power company finally found a floating ground in the house feed. As the end of Harry Potter and the Deathly Hallows says "All was well".

Huh? How could it be well... all that data on the eMac? Did it survive? Oh yes.

Apples have a cool trick. Hold down 'T' when booting, and they turn into a glorified Firewire harddrive. You hook a Firewire cable to another Mac (or Linux, since it can read HFS+) and you can snarf all the data right off that hard drive. The eMac would run for days in Firewire "Target" mode (thus the "T" at boot). It was weird, but very happy that it did so. Mom lost nothing, and my total housecall on the work was a matter of a few hours.

I could have achieved the same result by removing the hard drive, putting it in an external Firewire or USB enclosure, but not every computer is easy to get the hard drive out of. Target mode is really handy. In fact, this next bit is a great case where taking out the hard drive would have been a whirling PITA:

The Death and Life of the Macbook.

My 18 month old Macbook is starting to have some problems. For one, the internal CD/DVD is stone dead. I have an external device that lets me get around that problem, but someday I'll have to take the MacBook apart and fix that.

I admit I do all sorts of things to the OS of any computer I own to try new things. OS.X is not exempt from my fiddling. I had removed Rosetta early on because I wanted to only run Universal or Intel only apps. I had pulled out the various languages because in the very worst sense of being an American, I only speak one language. That is not a point of pride, just a remnant of youthful stupidity. For 18 months I have installed and removed tons of things. Beta versions of things like Vmware Fusion and Parallels Desktop. System monitoring apps that use undocumented APIs.

SMART never reported there being anything wrong with the hard drive (The Google disk study says that does not mean much), but the Apple disk utility said the hard drive was in serious need of repair. Pointers were whacked. Freespace was not being reclaimed. Permissions in left field. I booted the OS.X 10.4 install disk (on the external CD/DVD device), and ran the disk utility. To fix the boot disk, it has to be this way. Disk Utility will not touch the boot disk while one is booted off it. The utility reported everything was fixed. And that was the end of Macbook booting at all.

Sigh.

I knew OS.X 10.5 was coming soon. I had planned on a clean install then. I guess the time table was moved up.

The iMac has about a terabyte of disk space on it, so I held down the 'T' on the Macbook, hooked them together, and then sucked the "Users" directory off the Macbook to the iMac. Like the eMac, everything was recovered.

The install felt a bit MS Windowsy. I clean installed the 10.4, logged in and installed the 10 updates it needed (about .5 a GB of updates too) and then rebooted. And it needed another 8 updates. And another reboot. Shades of Redmond! Reboot city. Then I pulled the stuff back from the iMac, one directory at a time, making sure that I had all of "Music" installed before I ever fired up iTunes, and all of "Pictures" back before I ever fired up iPhoto.

Total recovery time was something like 10 hours, but eight of those were doing the data copies to and from the Macbooks hard drive.

Now the Macbook is back, happy, clean, and very much faster. Ready for 10.5.

All is well.

What Was It?

Was it a loose nut at the keyboard? Did SMART just not report hardware problems? Does HFS need to be replaced with ZFS sooner rather than later? Maybe it was Cosmic Rays?

Of that list, the only one I know for sure is yes: OS.X needs ZFS. But that is a different post all together.

The one thing I know for sure: Even good computers can fail and even what is normally considered bad ones can last for years.

No comments: