[Israel.pm] Displaying bidi text in re (e.g. in the editor).

Amit Aronovitch aronovitch at gmail.com
Sat Jan 31 19:08:22 PST 2009


Following a discussion I took part in about standartization of the
display of Hebrew text in structured expressions and source code, I
would be happy to hear some opinions about how we would like regular
expressions containing bidi chars to be displayed (in an "ideal
editor" that is fully syntax aware).

In the examples below, caps represent RTL characters and lowercase LTR chars.

The basic principle that was proposed (for structured expressions) is
that text should be split into "separators" and "tokens" according to
the relevant syntax, the general-purpose Bidi rules be applied within
each token only, and then tokens and separators should be concatenated
left to right always.

Applied to regular expressions, I thought that since in RE each
pattern character is an atom (1), then this effectively means to force
LTR everywhere (except maybe stuff like named captures
(?<NAME>...) etc.).
However, it was suggested instead that any sequence of pattern
characters (not containing "special" characters) should be treated as
a token (2).
This would make simple searches easier to read,

e.g. /SHALOM/ would be displayed /SHALOM/ by (1),
but /MOLAHS/ (much more readable if it was actual Hebrew) by (2).
On the other hand, /YADAII?M/ in (2) would show as /IIADAY?M/ ,
 which is very confusing, so I thought the simplification was not worth it.

However, I am used to languages where simple searches are commonly
done by other means, whereas in Perl using RE for simple text search
might be more common because of the specialized syntax. What do you


More information about the Perl mailing list