[Israel.pm] multipart/alternative added

Amit Aronovitch aronovitch at gmail.com
Sun May 30 14:01:01 PDT 2010


On 05/30/2010 08:46 PM, Gaal Yahas wrote:
>
>     > The other is technical. It is simply impossible to get all email
>     > clients to work correctly in bidi languages using only plain text.
>
>     Not impossible. Just "not simple at the moment", as we can see even in
>     Oron's message, which does not mix LTR and RTL text in the same
>     line (in
>     thunderbird, for example, the semicolons/colons display in the wrong
>     side of the code/hebrew when you use the keyboard shortcut to
>     switch to
>     RTL/LTR mode, respectively. You can never see them both in the same
>     window without any garbling).
>
> >
>
> I take this mostly back. I misremembered the spec's treatment of 
> paragraphs: they reset bidi context, which is fine (UAX #9); the 
> problem lies in 5.8 which doesn't make the definition of a paragraph 
> separator bulletproof. I suppose you can start every line with either 
> RLE or LRE and always emit a PDF before linebreaks, to be safe. This 
> is very cumbersome.

In the context I was talking about - a script to be run automatically 
when submitting "rich text" email as plaintext, this is still OK. 
Scripts do not complain about cumbersome keyboard mappings.
It could also make sure that MUA's (compliant ones at least) dont mess 
up the paragraph structure by using LS (U+2028) and PS (U+2029) instead 
of CR/LF or whatever.

>     To really solve the garbling, one has to use unicode control
>     characters.
>     Samples:
>
>     הנה קטעי הקוד של גבור--->:
>
> In my viewer at least, the comment opener here seems wrong (mirrored). 
> Either that, or the closing part of the tag is wrong. I didn't bother 
> inspecting your source (because, here's another problem, bidi marks 
> are invisible and difficult to debug). You're probably aware that in 
> certain cases bidi contexts do not fully reset after PDFs and need a 
> RLM or LRM back in the document directionality for things to work out.

As intended. This was ment to be "example starts here+arrow+colon" 
(simple ASCII graphics, not conforming to any kind of SGML-ish syntax). 
I also did not bless the indicator line itself with any kind of magic 
unicode (I probably should have, because in the quoted line of your 
reply it did get reversed).

>     ‫הגדרת משתנה סקלרי:‬
>     ‪my $x = 42;‬
>     ‫הגדרת מערך:‬
>     ‪my @x = qw(4 2);‬
>
> This part is fine.
>
>     ושיהיה קצת יותר מעניין, בשורה אחת--->:
>
> Comment trouble.
>
>     ‫נגדיר משתנה סקלרי ע"י ‪my $x = 42;‬ ואח"כ עוד משהו.‬
>
> Looks good.
>
>     ‪Look mom, no HTML!‬
>
>     Of course, I "cheated" by using characters which are not available in
>     common keyboard layouts.  The point is that one could write simple
>     scripts to do that automatically in the MUA (e.g. as some plugin
>     activated when submitting "rich text" as plaintext).
>     Once such a solution is out there, it should be easier to spread it to
>     other agents (maybe even to gmail).
>
> My point, apart from the obvious fact that directionality marks are 
> hard to author correctly, was that some of their interpretation is 
> underspecified so receiving MUAs may still behave differently.

Directionality should be well specified and consistent as long as the 
partition to paragraphs and the setting of the paragraphs' 
directionality is fixed. Whatever heuristics MUAs apply to guess these, 
one can at least avoid the garbling within each line by using explicit 
embeddings as you suggested (indeed this is how I did that).

>     > Alignment is the least of your problems.
>
>     But alignment is the only part of the problem that *can not* be solved
>     in plaintext.
>     Simply due to the fact that plaintext does not provide a way to encode
>     that information (so user agents use their own algorithms to
>     decide, if
>     at all, and you can not rely on having it displayed the same way
>     everywhere).
>
> This insufficient determination is compounded by heuristic solutions. 
> HTML-capable viewers may try to do the right thing with completely 
> unmarked text, but that would be a guess and will occasionally be 
> wrong (and wrong differently among viewers). It also means that they 
> have to scan the entire document (or a reasonable portion of it at 
> least) to establish that it indeed contains RTL characters but no bidi 
> marks.
>
> Tightening the specs is the right technical solution, but doing that + 
> getting MUAs to comply is difficult.
>
>     > If you mix Hebrew and English in the same paragraph, it is almost
>     > certain that garbling will occur. In prose this is just very
>     annoying.
>     > In technical discussion it can render text completely unreadable.
>     >
>     > Examples of garbling include reversed parentheses, misplaced
>     > punctuation, reversed number segments. These have potential to
>     do real
>     > damage to coherence of the text. Unicode offers some technology to
>     > help with this, but it is just not sufficient for email when used in
>     > plain text. There are underspecified features that are interpreted
>     > differently by clients, and regardless, these mechanisms are hard to
>     > use, even for a technical user.
>
>     Well, I still have to see if my examples above work or not
>     (thuderbird/icedove is known to do some garbling of its own if you
>     choose the wrong setup option).
>     Unicode does have enough support to prevent all the garbling you
>     mention
>     (excluding alignment). The problem is that user agents do not
>     insert the
>     proper unicode. The community could help by writing plugins, but
>     we are
>     too lazy and prefer to revert to an "evil" but working solution
>     such as
>     HTML (at least until someone else writes the script).
>
> The proper Unicode is not as straightforward to pick as you make it.

Not straightforward to pick manually maybe. But a script could be 
fine-tuned to the point where it would display correctly on all relevant 
viewers. However, I do not see any way to make *alignment* work 
consistently with plain text alone.

     AA

p.s. I hope that if nothing else, this conversation provides some answer 
to Mickael's question about the specific problems that HTML emails were 
supposed to solve.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.perl.org.il/pipermail/perl/attachments/20100531/121e94c5/attachment-0001.htm 


More information about the Perl mailing list