Wednesday, May 31, 2017

linux - Cygwin 2.9.0 cat/tac Commands Fail on Large Files when Piping to grep -q -m1




I am seeing some odd behavior using Cygwin x64 2.9.0 on Windows 10 Pro x64. The command I am attempting to run is the following:



tac  | grep -q -m1 -F "literal string"


The above command succeeds on all small files that I throw at it (small means <= 15kB). It also succeeds if the final occurrence of literal string is near the start of the file (e.g., literal string appears near the top of the file and nowhere else). Finally, it also succeeds when neither of the {-q, -m1} flags is passed to the grep command.



However, when the file is around 680kB, and the literal string appears near the end of the file, then the tac command prints "tac: write error" to STDERR. Despite this error, the command appears to have succeeded, printing the matching line to output (when the -q flag is omitted) and getting the appropriate return value from grep.



Further testing has revealed that this same error occurs when using cat, except the literal string must appear near the start of the file to generate the error, and the generated error is "cat: write error: No space left on device".




Note that this only occurs if at least one of the {-m1, -q} options is passed to the grep command, the match is near the first processed line of the file (for cat it is near the beginning, for tac it is near the end), and the file is large.



I have run the df command, and it reports 14 MB available on the Cygwin drive, with 60 GiB free on the actual disk. I know I could simply redirect STDERR to the NUL device, but that seems like a hacky work-around. Does anyone know how to fix this properly?



BEGIN EDIT



I found another report of the same error from May 2017, but no solution was presented. The OP of the other post does indicate that he thinks this is a pipe buffer size limitation (perhaps on Windows, perhaps in Cygwin).


Answer



I have discovered a few work-arounds. Simply change the command:




tac  | grep -q -m1 -F "literal string"


to one of:



bash -c "tac  | grep -q -m1 -F 'literal string'"
stdbuf -o L tac | grep -q -m1 -F "literal string"



I think the first works because it is using a Linux pipe, and the second because it forces the tac command output to be line-buffered. Both of these forms make the error go away.



Since this works, I'm guessing the issue is that grep stops processing the input buffer once it finds the first match, but tac keeps processing input. Once the buffer is full (probably 64kiB), the buffer blocks and tac exits with the specified error. However, since tac successfully processed the line I care about before crashing, everything is working as intended.



Timing these options indicates that the call to bash is the faster option. This is probably because using the Linux pipe, tac is able to return immediately once grep finds the first match.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...