[Israel.pm] Detecting html form charset

Shmuel Fomberg semuelf at 012.net.il
Fri Apr 4 05:21:48 PDT 2008

ik wrote:

> I have an html form, and while my page is set to UTF-8, I had a
> problem that someone used a non UTF-8 text, making it loose the data
> completely.
> Is there a way to know what is the charset each form field is in ?

I didn't understood if the non utf8 text that you ae talking about was 
sent by the user, or do you have strings to display that you do not know 
which encoding they are?

If you have strings that you want to display but don't know their 
encoding, good luck with that.

If you are talking about users submitting forms with other encodings, I 
took idea from this page:
that is to add a hidden input with special charecters, and see what the 
user submit. I wrote the following code snip: (untested)

sub check_encoding {
     my $self = shift;
     my $unicode_check = $self->query->param("charset_check");
     my $check_hexed = unpack "H*", $unicode_check;
     if ($check_hexed eq 'c3a4e284a2c2ae') {
         # got a unicode string. so nothing.
         return sub { return $_[0] };
     } elsif ($check_hexed eq 'e499ae') {
         return sub { return decode("cp1255", $_[0]) };
         #$question = "Windows-1252 " . $question;
     } else {
         #$question = "unknown($check_hexed) " . $question;
         warn "Do not know this encoding: $check_hexed";
         return sub { return $_[0] };


