[Israel.pm] about utf8

Yuval Kogman nothingmuch at woobling.org
Sun Jan 18 12:52:27 PST 2009

2009/1/18 Shmuel Fomberg <semuelf at 012.net.il>:
> I'm planing to add to Data::ParseBinary encoding ability. because it's
> part of binary stream, I can't relay on Perl to take the correct number
> of bytes for me. only after the text is separated from the binary
> stream, I can give it to Perl for decoding.
> So I try to define a character. And debating with myself if surrogate
> pair count as one character or two. And how to detect such pair - if
> it's little or big endian, in the most elegant way.

you can use unpack("W$x") to extract $x utf8 chars.

w is ber compressed (last byte has high bit set) and W is utf8 (last
byte has high bit unset).

