Friday, January 13, 2017

exim - does email pipe to program cause problems with unicode characters?

I'm piping incoming mail into a PHP script, immediately storing the RAW email in a MySQL db. It works very well, except ~0.7% of emails arrive with a truncated message body.



I found someone whose emails were failing, and had them send an email TO my gmail account AND to the server. Gmail had no problems, I saw the whole message. But my server cropped the raw message like so:



Delivered-To: asdasd@gmail.com
Received: by 10.152.1.193 with SMTP id 1csp3490lao;

Mon, 20 Oct 2014 05:33:31 -0700 (PDT)
Return-Path:
Received: from vps123.blahblah.com (vps123.blahblah.com. [74.124.111.111])
by mx.google.com with ESMTPS id fb7si7786786pab.30.2014.10.20.05.33.30
for
(version=TLSv1 cipher=RC4-SHA bits=128/128);
Mon, 20 Oct 2014 05:33:30 -0700 (PDT)
Message-ID: <14FBD481E1074C79AF3D@acerDator>
From: =?utf-8?Q?sende=C3=A4r?=
To: "test"

References:
Subject: Message body will contain only Det h
Date: Mon, 20 Oct 2014 14:33:24 +0200
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_0018_01CFEC72.CE424470"
X-Priority: 3
X-MSMail-Priority: Normal
Importance: Normal
X-Mailer: Microsoft Windows Live Mail 14.0.8117.416

X-MimeOLE: Produced By Microsoft MimeOLE V14.0.8117.416
X-Source:
X-Source-Args:
X-Source-Dir:

Det här är ett flerdelat meddelande i MIME-format.

------=_NextPart_000_0018_01CFEC72.CE424470
Content-Type: text/plain;
charset="utf-8"

Content-Transfer-Encoding: quoted-printable

This email will not be received correctly. EXIM may not handle =
some poorly formed emails. For example ...

Det h=E4r =E4r ett flerdelat meddelande i MIME-format.

... is directly above this quoted-printable wrapper, thanks to the =
Swedish email client Microsoft Windows Live (circa 2009), adding UTF-8 =
chars where there should only be ascii. At least, that's what I think =

the problem is.

------=_NextPart_000_0018_01CFEC72.CE424470--


My server crops the message immediately before the first foreign character. The stored raw data contains the headers, a blank line, "Det h", and nothing else.



When I pipe the above email into the PHP script in the shell (/blah/email_in.php < bademail.txt), and it stores the message perfectly. So I don't think my script is at fault, it stores the raw STDIN correctly.



I used cPanel to "Set Default Address" to "Pipe to a program". I don't know whether or not this setting bypasses EXIM entirely, but I read somewhere that EXIM handles the pipe transport, so my first guess is that EXIM is mangling a poorly formatted message, and choking the stream at the first unicode character ä.




To confirm this, I need a way to pipe email INTO EXIM, basically tricking EXIM into thinking it just received an email when actually it just received a txt file. I've found several tutorials on how to telnet to port 25, etc., but nothing that would preserve the headers, multipart boundaries, nor that made sense to a unix n00b like me that relies on cPanel.



Am I correct about EXIM being the likely culprit?
Can anyone suggest a way to test this, or an alternative approach?



My server runs EXIM + Dovecot on CentOS 6.5.



p.s. My only other thought is to let the server store mail normally, and if these messages are magically stored correctly, to use IMAP to retrieve/delete the messages rather than going directly into the pipe... seems less efficient to add the IMAP middleman, though this approach is probably more robust.

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...