I bought a computer motherboard and it’s been crashing. Right now, I’m trying to figure out what’s wrong, by recording how long the computer is up before it crashes. I’m guessing that it’s bad RAM or a bad RAM slot. It’s been a few days of crashes, and I finally have some results. I said I’d post at 8 RAM configurations, but I went a bit over. When the system is stable, it just takes a lot longer to record crashes. So, thankfully, there have been stable configurations.
If you haven’t been reading this series, here are the past posts in this Crashing RAM drama:
- Crashing Computer from Ebay. Cleaning Out the Dust from the CPU Cooler.
- Crashing Computer. Underclocking to Improve Stability. Adding New Thermal Grease.
- Crashing Computer. Swapping RAM Slots.
- Crashing Computer. RAM Testing Coverage Matrix.
Here are the results so far. The notation is “[C] [Time-to-crash] [S]”. “C” means it’s a cold boot from being shut off a while, like at least an hour. “S” means we shut down manually. So “C 5:00+ S” means we had a cold start, ran for 5:00+ hours (the plus just means I am estimating because I forgot to start the stopwatch), and “S” means we shut down manually because we didn’t have a crash.
The notation along the side is the RAM module and the slot number. So A1B3 means module A in slot 1, and module B in slot 3.
I also added colors to indicate “awesome green we shut down” and “messed up red we crashed”.
So, we’re at 13 configurations tested, for at least three starts per configuration. It’s not a lot of samples, but you can clearly see a pattern here. It looks like the A RAM module doesn’t get along with this motherboard.
There’s probably a better visualization for this, with bar graphs, that will be more illuminating. Also, if this is a RAM problem (and it feels like it), it’s possible that having the larger 4GB module and smaller 2GB module swapped would lead to different results, because it could take longer to touch the RAM in the faulty module.
Checking RAM Slots
I already tried the memory testing program. It didn’t detect anything. It could be that the slots are faulty; perhaps there are some bent pins, or some corrosion, or maybe even a bad capacitor (even though there shouldn’t be any old caps). I’ll need to figure out if we can make some guesses by looking at these crashes. It’ll probably involve rearranging the notation, and then sorting the list differently.
Also, I haven’t done any “cleaning” of the RAM module contacts. A lot of IT people use DeoxyIT, a greasy protectant. I think it’s great stuff, but don’t want to rely on that right now. It’s easier to troubleshoot when you’re able to “see” the trouble.
RAM slots aren’t like other card slots with flexible “fingers” that squeeze against the contacts. The slots have harder, less-flexible contacts, and you need to press the module into the slot, and lock it in. The contacts should scrape away oxidation. So I don’t think contact cleaners have much effect.
This isn’t rocket science, or any science, really, but it’s just to show how using an organized, methodical approach to testing can help uncover obvious patterns. If you don’t take time, and record multiple events, the crashing behavior is going to be hard to analyze.
There are 44 crashes recorded, above, and I hope to record another 20 to 25 crashes.
I also received a couple bigger RAM modules, and may swap those in, as well. (That’s my present to myself. I may sell one of the modules, because I really don’t need 16 GB or RAM. 10 GB or 12 GB is more than enough.)