I bought a motherboard online, as a spare, and didn’t install it for months. What I found was that I had a crashing computer. The return period had passed, so I decided to try and stabilize the system. I’m suspecting RAM. This is a series of blog posts about the process, which has taken days.
If you haven’t been reading this series, here are the other posts in this Crashing Computer drama:
- Crashing Computer from Ebay. Cleaning Out the Dust from the CPU Cooler.
- Crashing Computer. Underclocking to Improve Stability. Adding New Thermal Grease.
- Crashing Computer. Swapping RAM Slots.
- Crashing Computer. RAM Testing Coverage Matrix.
First, Test the RAM with MemTest86+
Before I start seriously monkeying with the RAM, I run the MemTest86 that is installed alongside Linux. This program has found some bad RAM for me in the past, so it works. What I’ve also found is that it usually doesn’t find flaky RAM or incompatibilities. Why? I don’t know. I just know that I’ve run the tester, found the RAM to be good, and then found I could find problems that resolved when the RAM was replaced.
Second, Test all the Possible Arrangements of RAM, and Record the Crashing
Just for “fun”, I’m going to test every single arrangement of RAM, and measure how long it takes for each configuration to reach three consecutive crashes. I’ll also note when systems are “cold”, and starting for the first time.
Let’s start with some notation. I’m labeling the RAM sticks A and B, and I’m using the motherboard’s labeling of the slots 1, 2, 3, 4. So, “A1” means stick A in slot 1.
Stick A is a G.Skill 2GB module. Stick B is a Corsair 4GB module.
Here are the possible configurations.
A1 A2 A3 A4 B1 B2 B3 B4 A1B2 A1B3 A1B4 A2B3 A2B4 A3B4 B1A2 B1A3 B1A4 B2A3 B2A4 B3A4
20 different configurations. I’m going to be waiting for 60 crashes, so I’ll need to use the stopwatch feature on the phone to time how long they run. Since I can’t do this in a spreadsheet, I’ll do it in a paper notebook.
If I am booting a computer “cold” meaning it’s been off for at least an hour, I’ll note it with a “C”. If I complete a session, and have to shut down, I’ll record it as “S”. While I could wait for the final crash, I’d rather just note that it didn’t crash, and move on. Since I’m getting 60 samples, I’ll just live with losing that last bit of crash data.
If there’s a run of many crashes, I’ll assume that there’s a problem with how the sticks were installed, and I’ll try reinstalling the sticks and starting over.
The goal of this is to find stable configurations, so this testing eliminates bad configurations, and selects potential good configurations, which will then need to be tested more extensively.
I’ve got some results already, but will post when I’ve completed the first eight configuration.