PDA

View Full Version : PostFix: character with tilde not sent properly


felipe.dominguez@gmail.co
09-24-2007, 12:32 PM
Hello.

I am using PostFix to send emails and I am having some problems when sending special characters such as “Número”, “Jamón” or “Ibérico”. All characters with tilde are replaced by a ‘?’.


I get “N?mero”, “Jam?n” or “Ib?rico”.



I use the same code at work, using different email server ( I think is a Microsoft server) and the emails are sent fine, I get the right characters.


I have been looking on the web and it seems that it is a problem with using an MUA, what I don’t really know what it is.

http://groups.google.com/group/list.postfix.users/browse_thread/thread/0d1791d356d9c57d/47319cb145d2a498


Could any one give me a hand on this, or just tell me whet to look at?

Would I avoid the problem by using SendMail instead of PostFix?



As well I have notice that when I send emails they are by default considered to be spam.

Is the any way to avoid this?



Thanks in advance.

Cheers

Felipe

timharig
09-25-2007, 05:52 AM
First, questions should be posted in the forum below named Get Help (http://forums.rimuhosting.com/forums/forumdisplay.php?f=6). This forum is for sharing knowledge.

To my knowledge, non-ascii characters are not allowed in the local part of e-mail addresses. According to RFC 2822, Internet Message Format (http://www.ietf.org/rfc/rfc2822.txt), e-mail addresses are defined as follows:


addr-spec = local-part "@" domain
local-part = dot-atom / quoted-string / obs-local-part


where dot-atom is defined as:


dot-atom = [CFWS] dot-atom-text [CFWS]
dot-atom-text = 1*atext *("." 1*atext)


which eventually leads to:


atext = ALPHA / DIGIT / ; Any character except controls,
"!" / "#" / ; SP, and specials.
"$" / "%" / ; Used for atoms
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"


The only thing that I can see which might allow non-ascii characters is that it does allow quoted strings but that is prohibited by RFC 2821, Simple Mail Transfer Protocol (http://www.ietf.org/rfc/rfc2821), which states:


While the above definition for Local-part is relatively permissive,
for maximum interoperability, a host that expects to receive mail
SHOULD avoid defining mailboxes where the Local-part requires (or
uses) the Quoted-string form or where the Local-part is case-
sensitive. For any purposes that require generating or comparing
Local-parts (e.g., to specific mailbox names), all quoted forms MUST
be treated as equivalent and the sending system SHOULD transmit the
form that uses the minimum quoting possible.

Systems MUST NOT define mailboxes in such a way as to require the use
in SMTP of non-ASCII characters (octets with the high order bit set
to one) or ASCII "control characters" (decimal value 0-31 and 127).
These characters MUST NOT be used in MAIL or RCPT commands or other
commands that require mailbox names.


I don't know of any other RFC which might permit such characters. That a Microsoft product would allow things where are not provided for by standards would not surprise me.

It is possible that some MUAs might support some kind of entity escape codes (such as the & entities for HTML or %20 style HTTP entities) but I am not aware of any official standard that all clients should support. If that is the case then this problem has nothing to do with any MTA.

If you can figure out how such characters are being sent or encoded (either through looking at the logs of incoming messages that are using them or through using a sniffer like wireshark) then it might be possible to "hack" postfix into understanding them through creative tuning of the trivial-rewrite daemon, canonical maps, and alias maps. I don't recommend this but it might be possible.

I don't know that sendmail, or any other MTA, will be of help to you as this seems to be either a violation of standards or an MUA compatibility problem. Sendmail does have a more sophisticated rule based system for address transformations; but, they are tedious to use. Postfix's mapping system was explicitly designed to avoid the problems that they tend to cause.

felipe.dominguez@gmail.co
09-27-2007, 02:11 PM
Hello.

Sorry for misplacing the message. and thanks a lot for your answer,
I will post it again in the get help forum.

just to clarify, the problem with the special characters is not in the email address, but in the text content.


cheers

Felipe

timharig
09-27-2007, 05:22 PM
Sorry about the confusion. The MTA has nothing to do with the content of the message. It just passes on whatever the sending MUA gives it. This is definitely a problem with the receiving MUA or its setup.

The sending MUA must use an extended encoding (ie, UTF, ISO 8859, etc.) to encode non-ASCII characters such as characters with tildes. The encoding used will be given in the header or in the individual MIME headers for multipart documents. They will look something like this:


Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit


In your situation the receiving MUA (or the terminal on which it runs) is not setup to display these extended characters so it is replacing them with question marks in the output. Setting up an MUA to display extended character sets varies with each MUA (and its environment). We would have to know what MUA you are using (and what environment it is running in) before we can give you any advice on how to configure it.

felipe.dominguez@gmail.co
10-01-2007, 09:16 PM
Hello.

I am actually using javamail to send the email.

the strange thing is that when I run the application at work it runs fine, all characters are send and display correctly, but when I upload the application to the server then it does not work. This is why I thought it could be a problem with postfix.

If I find a solution I will post it.

cheers


Felipe

timharig
10-02-2007, 05:16 AM
Javamail is an API used to process mail messages but that doesn't tell us how the message is being displayed. Are you using a web form from a CGI servlet, a text terminal, javax.swing on an X terminal or using VNC on Windows? This is not a problem of how the mail is being processed (unless the character type is being incorrectly set in the message headers) -- it is a problem where whatever is being used to display it is not capable or setup to use extended text types. This could be because your terminal is not set up handle them or because the required fonts are not available. If it is on a web page, and you are feeding the text directly to the web page, then it could be that the web page is declaring a different character set then what is being used for the body of the message (say the web page is declared as ISO 8859 but the e-mail message was sent as UTF-8). The same may be true for the different APIs that you are using to display the body of the message. A text display widget that expects ASCII by default will not work properly if it is fed extended chars without declaring the proper data set. There are just too many possibilities without knowing more about your setup.

felipe.dominguez@gmail.co
10-02-2007, 10:18 PM
Hello.

Still I have not found the problem, but I know it has nothing to do with postfix or with sending the email.

What the application does is to extract a html page from my own web server (tomcat) and send it as an email.

The content type of the html page is set to "text/html;charset=ISO-8859-1"

When I get the html page from my computer at work or at home (XP OS) I see the right characters "Jamón", when I look a the html using the browser I can see the right characters too, but when I get the page programmatically from the Linux server I get "jam?on".

Sorry about the confusion, but the first thing I did was to check the characters on the extracted page, since it was UTF-8 encoded there were some strange chars and change the encoding from UTF-8 to ISO-8859-1. After that I could see the right characters coming to the browser and when I extracted them programmatically. When I sow the error on the email I thought it could be because of postfix, since I was convinced that the characters received from the web server where the right one.

Tomorrow i will try to find out the reason for the different behaviour.

thanks the your interest and support you have given.

Cheers

Felipe

timharig
10-03-2007, 06:00 AM
You're still not giving us what we need to know so I will give you the following scenero:

Server OS: Linux
Client OS: Windows XP
Terminal and server connection: PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/)
Application: less

In this situation you are viewing your messages using less, or a MUA that calls less to view its messages, from a Windows XP computer through a PuTTY connection with PuTTY's built in terminal. First, I would recommend that you check your local settings. Use the env command to see what locales are defined for your user. Look for LANG or any variable names that are prefixed with LC_. LANG or LC_ALL set all of the locale data and LC_CTYPE sets the character conversions for extended character sets. LANG on my system resolves to "en_US". Available locales should be somewhere like /usr/share/locale/. Make sure that it is set to your appropriate setting.

Next, check the application or library that you are using to display it. You are using less in my example. Less defaults to the users locale settings but that may be overridden with command options, environmental variables, etc. See the man page or documentation for the program or library you are using.

If you have properly configured your local and application, and you still have problems, then the problem may be with your terminal. You are using PuTTy in the example above. The terminal, like the application usually defaults to the operating system and user defaults by may also be set through other means. Once again, see the documentation for the terminal that you are using.

Terminal multiplexors such as screen may also need to be taken into account if you are using them and even the Windows operating system settings may influence the behaviour of any terminals running on top of it. Fonts may not be available or properly registred for the terminal that you are using.

Internationalization is a complex topic because it involves every piece of software that may have an influence on the display process. Mis-configurations in any of these places could be causing your problems and they may all be configured differently. It is impossible to give you any more specific information without knowing more about all of these various levels.

felipe.dominguez@gmail.co
10-18-2007, 08:12 PM
Hello.

Sorry I did not reply earlier.

I fact what you have just suggested was the problem.

the encoding in my account was not set.

The scenario was that a web application running on my account at some point was calling it shelf to extract some data and send it via email.

when I tested on windows it did work, but not on my linux account. The default encoding was different in both scenarios.

Thanks a lot for your help.

cheers

Felipe