Sunday, March 15, 2015

windows vista - Hardware interrupts and system unresponsiveness


Very occasionally, about once a week, my Windows Vista Business machine will completely lock up for anything between a minute and several minutes. Once this happens, it recurs more frequently until I reboot.


Process Explorer reveals that during this event, the system is performing "Hardware interrupts & DPCs". The HDD activity led on my machine also remains lit until it becomes responsive again, although I cannot hear any of the disks actually scratching.


Interrupts CPU usage
In the image above, you can see a lockup event as a spike of the red (interrupt) line. It appears to be short, but this is due to Process Explorer not being able to update the graph while the machine is not responding.


Here's a screenshot of the overall CPU usage; there appear to be a large number of interrupts in general.


I get the impression that my machine is experiencing a higher-than-normal number of interrupts. This leads me to suspect that some piece of hardware or a driver is misbehaving. Or could it be an IRQ conflict?


How can I diagnose this?




Edit #1: A look at the system log reveals several warning messages such as:



An error was detected on device \Device\Harddisk1\DR1 during a paging operation.



And:



Reset to device, \Device\RaidPort0, was issued.



I do not, however, have a RAID configuration set up, and all disks are connected directly to my motherboard's SATA ports.




Edit #2: Following the advice given here, I've made some changes to my rig to try to resolve the problem. I haven't experienced any freezes yet, but will return to either accept an answer or keep diagnosing.



  1. I replaced the SATA cable for my system disk;

  2. I plugged the SATA cable into a different SATA port on my Asus M2N-SLI Deluxe motherboard;

  3. I updated my nForce 570 SLI AMD drivers to nVidia's latest version.


I'm making the assumption here that \Device\RaidPort0 is my system disk. If the problem persists, the next step is to detach my other three disks one by one until the problem disappears. If that doesn't resolve it, I'll get rid of nForce altogether. And after that, it seems it can only be the system disk or my motherboard itself.




Edit #3: After swapping the system disk's SATA port with a different disk's port, I found the following entries in the Event Log after several days:



Reset to device, \Device\RaidPort1, was issued.



And:



A request to this device has been cancelled.


Device: \Device\RaidPort1
Model: ST3160812AS
Firmware Version: 3.AA
Serial Number: 5LS34HQ1
Port: 1



It seems fairly clear to me that the problem is neither the disk or the SATA cable, as the errors have entirely shifted to a different port. I will consider this SATA port to be broken and exclusively use the other five.


Answer



The lit HDD LED is a sign for HDD data transfer. If you disk is set to "silent" you may not hear its activity. It could also be a communications error on the SATA (or IDE) cable.


The Windows Event logs might have someting, if there are disk errors.


Update:



An error was detected on device \Device\Harddisk1\DR1 during a paging operation.



SATA CRC error/timeout. And page operations are unlikly preemtible => system hangs for a while.



Reset to device, \Device\RaidPort0, was issued.



The disk did not respond for a while, and windows did a reset of the SATA port. As your system resumes operation, the error condition seems to be temporary.


Have you tried changing SATA cables (take look at the contacts for corrorsion)? If that does not help, I'd try changing the disk.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...