Crashing Computer, Troubleshooting RAM: Testing Coverage Matrix

I bought a motherboard online, as a spare, and didn’t install it for months. What I found was that I had a crashing computer.  The return period had passed, so I decided to try and stabilize the system.  I’m suspecting RAM. This is a series of blog posts about the process, which has taken days.

If you haven’t been reading this series, here are the other posts in this Crashing Computer drama:

First, Test the RAM with MemTest86+

Before I start seriously monkeying with the RAM, I run the MemTest86 that is installed alongside Linux.  This program has found some bad RAM for me in the past, so it works. What I’ve also found is that it usually doesn’t find flaky RAM or incompatibilities.  Why? I don’t know. I just know that I’ve run the tester, found the RAM to be good, and then found I could find problems that resolved when the RAM was replaced.

Second, Test all the Possible Arrangements of RAM, and Record the Crashing

Just for “fun”, I’m going to test every single arrangement of RAM, and measure how long it takes for each configuration to reach three consecutive crashes. I’ll also note when systems are “cold”, and starting for the first time.

Let’s start with some notation. I’m labeling the RAM sticks A and B, and I’m using the motherboard’s labeling of the slots 1, 2, 3, 4. So, “A1” means stick A in slot 1.

Stick A is a G.Skill 2GB module. Stick B is a Corsair 4GB module.

Here are the possible configurations.

 A1
 A2
 A3
 A4

 B1
 B2
 B3
 B4

 A1B2
 A1B3
 A1B4

 A2B3
 A2B4

 A3B4

 B1A2
 B1A3
 B1A4

 B2A3
 B2A4

 B3A4

20 different configurations. I’m going to be waiting for 60 crashes, so I’ll need to use the stopwatch feature on the phone to time how long they run. Since I can’t do this in a spreadsheet, I’ll do it in a paper notebook.

If I am booting a computer “cold” meaning it’s been off for at least an hour, I’ll note it with a “C”. If I complete a session, and have to shut down, I’ll record it as “S”. While I could wait for the final crash, I’d rather just note that it didn’t crash, and move on. Since I’m getting 60 samples, I’ll just live with losing that last bit of crash data.

If there’s a run of many crashes, I’ll assume that there’s a problem with how the sticks were installed, and I’ll try reinstalling the sticks and starting over.

The goal of this is to find stable configurations, so this testing eliminates bad configurations, and selects potential good configurations, which will then need to be tested more extensively.

I’ve got some results already, but will post when I’ve completed the first eight configuration.

Leave a Reply