How to encode cyrillic characters for URL and then decode them?

前端 未结 3 780
情深已故
情深已故 2021-01-18 17:17

I have a form on one page:

One of the input fi

相关标签:
3条回答
  • 2021-01-18 17:42

    Try that in your script (index.cgi) :

    use Encode;
    

    Then...

    $search_string = decode_utf8( $search_string );
    

    Another idea (if you want to create a UTF8-friendly hash of your CGI input) :

    require Encode;
    require CGI;
    my $query = CGI ->new;
    my $form_input = {};  
    foreach my $name ( $query ->param ) {
      my @val = $query ->param( $name );
      foreach ( @val ) {
        $_ = Encode::decode_utf8( $_ );
      }
      $name = Encode::decode_utf8( $name );
      if ( scalar @val == 1 ) {   
        $form_input ->{$name} = $val[0];
      } else {                      
        $form_input ->{$name} = \@val;  # save value as an array ref
      }
    }
    

    Taken from : http://ahinea.com/en/tech/perl-unicode-struggle.html

    0 讨论(0)
  • 2021-01-18 17:55

    A solution that preserves the + and any other character in the original string:

    my $s = '%41F%2F%424+%41F%41E%414%416%410%420%41A%410+%418%417+%421%412%418%41D';
    $s =~ s/%([[:xdigit:]]+)/chr(hex($1))/eg;
    print $s;
    

    Result:

    П/Ф+ПОДЖАРКА+ИЗ+СВИН
    
    0 讨论(0)
  • 2021-01-18 18:02

    Correct solution, including spaces:

    use open ':std', ':encoding(UTF-8)';
    use Encode;
    
    my $escaped = '%41F%2F%424+%41F%41E%414%416%410%420%41A%410+%418%417+%421%412%418%41D';
    (my $unescaped = $escaped) =~ s/\+/ /g;
    $unescaped =~ s/%([[:xdigit:]]+)/chr hex $1/eg;
    print $unescaped;
    # П/Ф ПОДЖАРКА ИЗ СВИН
    

    Credit goes to Renaud Bompuis for recognising as the first that these are Unicode code-points prefixed with %.

    I wish to add that the encoding scheme from the question is very unusual, I haven't seen it before. Normally one would expect the characters string П/Ф ПОДЖАРКА ИЗ СВИН to be encoded as %D0%9F%2F%D0%A4+%D0%9F%D0%9E%D0%94%D0%96%D0%90%D0%A0%D0%9A%D0%90+%D0%98%D0%97+%D0%A1%D0%92%D0%98%D0%9D, that is to say, first the characters are encoded into UTF-8, then the octets are percent-escaped. This scheme works with the answer from Dr.Kameleon.

    0 讨论(0)
提交回复
热议问题