Saturday, November 29, 2014

Windows delays writing FAT table on small USB drive despite “Quick removal”


I am seeing delayed writes to the FAT on a small-capacity FAT(FAT12)-formatted USB flash drive even though the policy for the drive is set to "Quick Removal". (I believe this means the SurpriseRemovalOK flag is set). I've captured the SCSI commands sent to the drive over USB: the file truncation writes happen immediately, the entire file (2 512-byte sectors long) is written immediately after that, but then there's a 20-90 second delay before the FAT is updated to reflect the file write.


The size of the drive is significant. I have tested with and see problems on FAT filesystems of size 15MB and smaller. On 16MB and up, the writes are not delayed. 16MB is the breakpoint I see between using FAT12 and FAT16 when I format a drive in Windows. (Note added later: But FAT12/FAT16 breakpoint is dependent on the number of clusters, not the absolute filesystem size).


On 16MB and larger, Windows sends SCSI Prevent/Allow Medium Removal commands before writes, asking that the device not be removed. The USB stick actually returns failure on these requests (because it can't guarantee no removal), but Windows tries anyway. The 15MB and smaller traces show no Prevent/Allow Medium Removal commands.


(I discovered this problem while using a microcontroller board that supports a tiny FAT filesystem containing Python code. When the microcontroller detects a write to the filesystem, it waits a bit for the write to complete and then automatically restarts and runs the newly written Python code. But the microcontroller was seeing corrupted code or a corrupted filesystem due to the delayed write.)


Why is the write to the FAT delayed so long, despite "Quick Removal" being set? I can force the writes by doing an "Eject" on the drive but that defeats the promise of "Quick Removal". If I pulled the drive early it would have an incorrect FAT table. This belies the statement in the screen shot below about not having to use "Safely Remove Hardware". Is this a bug or am I missing something? Is there any way to force all the writes to happen immediately without a manual "Eject"?


USB drive set to Quick Removal


Here's a pruned extract from a Wireshark/USBPcap trace showing the issue. I truncate an existing file and then write a new copy of it. I've added comments with ###. Most of the writes to the USB drive take place around 5 seconds into the trace, but the final FAT write isn't until 26 seconds.


No.    Time  Source       Destination  Protocol  Length  Info
### write directory entry to truncate file
13 5.225586 host 1.2.2 USBMS 58 SCSI: Write(10) LUN: 0x00 (LBA: 0x00000041, Len: 8)
14 5.225838 host 1.2.2 USB 4123 URB_BULK out
### write FAT entries to truncate file
16 5.230488 host 1.2.2 USBMS 58 SCSI: Write(10) LUN: 0x00 (LBA: 0x0000003b, Len: 1)
17 5.230707 host 1.2.2 USB 539 URB_BULK out
19 5.235110 host 1.2.2 USBMS 58 SCSI: Write(10) LUN: 0x00 (LBA: 0x0000003e, Len: 1)
20 5.235329 host 1.2.2 USB 539 URB_BULK out
### write directory entry for
22 5.252672 host 1.2.2 USBMS 58 SCSI: Write(10) LUN: 0x00 (LBA: 0x00000041, Len: 8)
23 5.252825 host 1.2.2 USB 4123 URB_BULK out
### write out file data (2 sectors of 512 bytes)
25 5.257416 host 1.2.2 USBMS 58 SCSI: Write(10) LUN: 0x00 (LBA: 0x000000c1, Len: 2)
26 5.257572 host 1.2.2 USB 1051 URB_BULK out
### 20 second delay
### finally, write FAT entries to indicate used sectors
79 26.559964 host 1.2.2 USBMS 58 SCSI: Write(10) LUN: 0x00 (LBA: 0x0000003b, Len: 1)
80 26.560191 host 1.2.2 USB 539 URB_BULK out
82 26.560834 host 1.2.2 USBMS 58 SCSI: Write(10) LUN: 0x00 (LBA: 0x0000003e, Len: 1)
83 26.560936 host 1.2.2 USB 539 URB_BULK out

I've generated traces like this using a regular flash drive and also with a microcontroller board that emulates a tiny USB MSC drive, on both Windows 7 and windows 10.


Just to be clear, this is a FAT12-formatted drive, just called "FAT" in the Windows formatting tool.


Answer



I may have found the actual Windows driver code that's causing the issue.


MS happens to include the FAT filesystem driver in a package of sample driver code. There are several places in that driver where, if the filesystem is FAT12, the driver will not bother to do something like set the dirty bit (maybe there is none for FAT12) or flush the FAT data.


https://github.com/Microsoft/Windows-driver-samples/blob/master/filesys/fastfat/verfysup.c#L774
https://github.com/Microsoft/Windows-driver-samples/blob/master/filesys/fastfat/cachesup.c#L1212
and maybe most critically:
https://github.com/Microsoft/Windows-driver-samples/blob/master/filesys/fastfat/cleanup.c#L1101


In the last link, in cleanup.c, the FAT is not flushed if the filesystem is FAT12. I think this may be causing exactly the behavior I see:


    //
// If that worked ok, then see if we should flush the FAT as well.
//
if (NT_SUCCESS(Status) && Fcb && !FatIsFat12( Vcb) &&
FlagOn( Fcb->FcbState, FCB_STATE_FLUSH_FAT)) {
Status = FatFlushFat( IrpContext, Vcb);

Reported to Microsoft in Windows Feedback Hub at https://aka.ms/btvdog (special URL that opens in Feedback Hub).


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...