Using Heat Gun to Cause ECC Memory Errors

From: Dave Peterson
Subject: using heat gun to cause ecc memory errors 
Date: 2005年06月20日 14:26
I thought I"d share my recent experiences using a heat gun to cause
ecc memory errors. Perhaps others will find this to be a convenient
means of testing their code. What follows is a detailed description
of the equipment and technique I am using. To make my results easy
to reproduce, I will attempt to be as detailed as possible.
The memory that I am using is Corsair PC-2700 double-sided
registered ECC (512 Mb per stick). The heat gun is a "Master Heat
Gun" (model # HG 501) made by Master Appliance Corporation
(http://www.masterappliance.com/). According to the manufacturer"s
web site, the product has the following specifications:
 Temperature: 500-750 degress F (260-400 degrees C)
 Volts: 120 AC (60 Hz)
 Current: 14 amps
 Power: 1680 watts
On the side of the device, there is a dial that may be turned to
control the width of a number of ventilation slots that regulate the
amount of airflow into the device. Increasing the width of the slots
decreases the temperature of the air that blows from the nozzle. I
open the slots as wide as possible to produce the minimum temperature.
The procedure I use is as follows:
 1. Boot the machine and make sure the ecc or bluesmoke module for
 your chipset is loaded. I like to increase the ECC error
 polling frequency to 1 msec. although this not strictly
 necessary.
 2. Execute the C program below. As a command-line argument, feed
 it a number that is close to the amount of physical memory in
 your machine. Running "top" should show that essentially all
 of physical memory is allocated, the C program is using roughly
 99% of a CPU, and little or no paging activity is occurring.
 3. Adjust the dial on the heat gun so that the ventilation slots
 are opened as wide as possible and the air temperature is
 minimized. Be sure not to forget this step! I haven"t tried
 it with the slots partially or fully closed. Given the amount
 of heat that the gun can produce, I would be concerned that
 partially or fully closing the slots may be enough to melt
 something or start a fire.
 Be careful when using the heat gun. The air that blows from
 the nozzle is hot enough to cause pain if you hold your hand
 in front of it for a few seconds. The nozzle gets hot enough
 that you can burn yourself by touching it accidentally.
 4. Turn on the heat gun and start blowing hot air onto the surface
 of one of the DIMMs. I hold the nozzle roughly 2 1/2 inches
 from the surface of the DIMM and make a back and forth sweeping
 motion across the DIMM (approximately 2 seconds per sweep from
 one end of the DIMM to the other). After doing this for
 approximately a minute and a half, I start seeing printk()
 messages on the console indicating single-bit ECC errors at a
 rate of somewhere between (one error every several seconds) and
 (several errors per second). Once I start seeing ECC errors, I
 stop blowing hot air onto the DIMM, just to be safe and
 minimize the chances of damaging something.
I have found the above procedure to be a very reliable means of causing
single-bit ECC errors. However I state the following disclaimer:
 Before you attempt to perform the above steps, please remember that
 you are doing so at your own risk. If you aren"t careful with the
 heat gun, you may possibly burn yourself, damage your hardware, or
 perhaps even start a fire. As an additional warning, I have
 absolutely no idea how repeated use of this technique may affect
 the performance or reliability of your hardware. It seems
 reasonable to expect that repeated exposure to a heat gun may
 substantially shorten the lifetime of you DIMMs or your motherboard.
Dave
#include < stdio.h>
#include < stdlib.h>
void usage (void)
 { fprintf(stderr,
 "Usage: mem SIZE\n\n"
 " SIZE: number of bytes to malloc()\n");
 exit(1);
 }
int main (int argc, char **argv)
 { long size;
 char *endptr;
 int *buf, i, n, sum;
 if ((argc != 2) || (argv[1][0] == "0円"))
 usage();
 size = strtol(argv[1], &endptr, 0);
 if (*endptr || (size < 0)) usage(); if ((buf = (int *) malloc(size)) == NULL) { fprintf(stderr, "malloc() failed\n"); return 1; } n = size / sizeof(*buf); /* The code below is somewhat ad-hoc. My goal is just to repeat * the following steps forever: * * 1. Go through roughly all of physical memory, doing reads. * 2. Go through roughly all of physical memory, doing writes. */ for (i = 0; i < n; i++) buf[i] = i; for (sum = 0; ; ) { for (i = 0; i < n; i++) sum += buf[i]; for (i = 0; i < n; i++) buf[i] += i + sum; } return 0; } 

AltStyle によって変換されたページ (->オリジナル) /