MIME/CORE: Simple Text Body Lost?

May 23, 2011 at 5:55 PM

I'm using the latest code downloaded from CodePlex and I'm finding that with a simple email message (no MIME/no content type specified) is not being parsed correctly.  Specifically, while it appears that the Entity.Text property is being set on the Entity, when it's converted into a Message, Text is never used.  Also, the Headers collection of the Message is not being populated.  So the result is a Message with the Subject, Date, and To/From/Cc/Bcc fields set, but with the actual content of the email, as well as any other headers, lost.

Coordinator
May 23, 2011 at 9:27 PM

Thank you for your report,

I was able to reproduce the issue, I will fix it soon ;)

Regards

Alexander

Coordinator
May 23, 2011 at 10:20 PM

Headers are now being parsed in simple mails.

However I was not able to create a situation where the text was not being parsed,

although an Entity has no content type when allocated,

when parsed the fall back value if no content type header exists is text/plain, which should trigger the text parsing.

Regards

Alexander

May 24, 2011 at 12:54 PM

Alexander,

I'm looking at the EntityExtensions class.  The only places the Text property of entity is referenced are in blocks guarded with checks to IsView and IsMessage, both of which return false if entity.HasContentType is false.  As far as I can tell, if the content type is not specifically included in the message, HasContentType is always false.  Thus the Text of the entity (which is the message body) is never used.

 

Thanks,

Cory

Coordinator
May 24, 2011 at 1:00 PM
FenianEMT wrote:

Alexander,

I'm looking at the EntityExtensions class.  The only places the Text property of entity is referenced are in blocks guarded with checks to IsView and IsMessage, both of which return false if entity.HasContentType is false.  As far as I can tell, if the content type is not specifically included in the message, HasContentType is always false.  Thus the Text of the entity (which is the message body) is never used.

 

Thanks,

Cory

Hi Cory,

entity.HasContentType should have a fall back to "text/plain" if no content type header field could be detected while parsing.
If this is not the case, I will try to find the defect and fix the fall back.

Thanks again

Alexander

May 24, 2011 at 1:07 PM

Hi Alexander,

I just took a look.  HasContentType returns false if the ContentTypeHeaderField property returns null.  In the getter for that property, you declare a new variable to hold the content type and pass it as an out parameter to a method that looks for the relevant header in the list of headers parsed out from the message.  If no such header is parsed out (because the message doesn't have an explicit content type header in this case), the out variable is set to null (which it already is in this case because it's only been declared) and that's the end of it.  So ultimately, HasContentType is false and the entity's Text is never copied into the message.

I hope this helps,

Cory

Coordinator
May 24, 2011 at 1:10 PM

Thanks for the quick look up.

Since the RFC documents specify that unspecified/unknown content types should be treated as "text/plain" I will adjust the parser to mimic this behaviour.

Regards 

Alexander

Jun 1, 2011 at 6:51 PM

Alexander,

I'm not sure whether you're still looking at this issue, but it looks like solving the text body problem is as simple as changing IsView (in EntityExtensions) to the following:

public static bool IsView(this Entity entity)
{
    return (!entity.HasContentType) ||
           (entity.HasContentType &&
            entity.ContentTypeHeaderField.MediaType.TrimQuotes().Trim().StartsWith("text"));
}

That way if the entity has no content type, it's assumed to be a text view.

 

As an aside, it appears that the MIME parser chokes on the date-time component of the  "Received:" trace header.  It sends "Param skipped: <date-time text>" with the actual date-time from the message to the debug output.  This isn't really a significant issue for what I'm trying to use it for, but I thought you might want to know, since RFC 5322 definitely specifies that header as containing the date-time value at the end.

Regards,

Cory

Coordinator
Jun 1, 2011 at 8:12 PM

Hi Cory,

although the code you posted is absolutely correct syntactically as in it will provide the expected results it is not correct semantically.
An entity that has no content type is not a view, it just should be treated as such, since we don't know any better. This might seem like splitting hairs, but if someone would use the IsView extension method in another environment to actually check whether the content type is of type text it would give false positives, therefor I'd like to add the fallback into the ToMessage() extension instead into the IsView() extension.

A param is not the header itself but a parameter attached to the header, here is an example from the rfc's where "title0-3" are 3 params attached to the header "Content-Type" and the value "application/x-stuff".

Content-Type: application/x-stuff
    title*0*=us-ascii'en'This%20is%20even%20more%20
    title*1*=%2A%2A%2Afun%2A%2A%2A%20
    title*2="isn't it!"

Would you be so kind to send me a copy of the mail producing the mentioned param skip, so I can check whether its a mistake or a valid skip due to malformation, if it is a senstive mail you can omit all content, I really just need the surrounding mime harness.

Thanks for your help

Alexander

Jun 1, 2011 at 8:27 PM

Alexander,

Your point is well taken on the semantic versus behavioral correctness of the change I suggested.

I'd prefer not to post a whole email, but the relevant header looks something like this:

Received: (from ultracomp@localhost)
	by host.domain.net (8.11.6/8.11.6) id p4JHkq318859
	for user@otherdomain.net; Thu, 19 May 2011 17:46:52 GMT

The debug out put would be:

Param skipped:  Thu, 19 May 2011 17:46:52 GMT

Occasionally the date-time in the header is formatted as follows instead (with the same results):

Thu, 19 May 2011 17:46:52 +0000 (GMT)

I hope this is sufficient to help.

Cory

Coordinator
Jun 1, 2011 at 8:43 PM

Hello Cory,

this is not great an issue, the transfer informations add the time separated by a semicolon, which is also the parameter seperation symbol for headers, the actual value is everything, not just the time.

As suspected the parser tries to parse the date as a param, but fails, since it has not the form Foo=Bar.

A semicolon should not be used unescaped in a header value outside of a comment.

Still, thank you for posting ;)

 

Coordinator
Jun 3, 2011 at 9:17 PM

Issue has been fixed in last change set.

    if (entity.IsView() || !entity.HasContentType) {
                var view = entity.ToView();
                view.IsRelated = isRelated;
                message.Views.Add(view);
                return;
            }

Jun 8, 2011 at 1:28 PM

Awesome!  Thanks for the fix!

Also, while I agree with you that the trace header issue isn't a significant one, the semicolon is correct per the RFC.  See section 3.6.7 of RFC 5322 (http://tools.ietf.org/html/rfc5322#section-3.6.7).  Specifically, check out the definition for the "received" field.  It specifies the literal semi-colon between the received-token and the date-time.  Again, I'm not particularly worried about it, but just want be clear about the correct format per the RFC.

Thanks for a great library!

Cory

Coordinator
Jun 8, 2011 at 1:42 PM

Hey Cory,

thanks for the link, this one was news to me ;)

I will add this as a minor defect to the issue tracker.

Alex