UTF-8 encoded j_security_check username incorrectly decoded as Latin-1 in Tomcat realm

感情迁移 提交于 2019-11-28 11:37:39

问题


I'm investigating an issue where a username with Latin-1 character is introduced in a login form. The username contains character á. I investigate the server part where I have:

public class MyRealm extends RealmBase implements Realm { public Principal authenticate(String username, String password) { ... actual authentication implemented here } }

If I print out the bytes : username.getBytes() I see that character á has: C3 83 C2 A1 Normally character á in UTF8 encoding shoul have : C3 A1. If I encode this in UTF8 again the I get: C3 83 C2 A1 what my software prints out.

I checked in the network capturing that the username is sent correctly with C3 A1. The login page form's source code is:

        <form name="loginForm" action="j_security_check" method="post" enctype="application/x-www-form-urlencoded">
        <table>
            <tr>
                <td colspan="2" align="right">Secure connection:
                    <input type="checkbox" name="checkbox" class="style5" onclick="javascript:httpHttps();"></td>
            </tr>
            <tr>
                <td class="style5">Login:</td>
                <td><input type="text" name="j_username" autocomplete="off" style="width:150px" /></td>
            </tr>

So I think there's nothing wrong (2 times UTF8 conversion) on the client side. If I decode back two times from UTF8 in the authenticate() function the username then authentication works fine, but I'm afraid to apply this solution to my problem

Where should I look for this encoding of the username in the Realm's authenticate(String username, String password) function ? The server side is running on a linux (RedHat) with httpd-2.2.15 and tomcat6-6.0.24.


回答1:


In your example your form is sending UTF-8 char for 'á' to Tomcat utilizing % encoding (so over the wire it is %C3%A1). However Tomcat will interpret it as Latin1 which is the default encoding for POST.

So Tomcat will store C3A1 as 'á' internally since C3 is 'Ã' and A1 is '¡' in Latin1 encoding.

When you asks for username.getBytes() it will create an UTF-8 encoded byte array, so it looks up the two characters of 'á' in the UTF-8 character set which is C383 C2A1.

The FAQ that describes this in detail and the proposed solution: http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q3

Change the Valve of the FormAuthenticator in server.xml to specify characterEncoding="UTF-8"

    <Context path="/YourSercureApp">
            <Valve
            className="org.apache.catalina.authenticator.FormAuthenticator"
            disableProxyCaching="false"
            characterEncoding="UTF-8" />
    </Context>


来源:https://stackoverflow.com/questions/27484135/utf-8-encoded-j-security-check-username-incorrectly-decoded-as-latin-1-in-tomcat

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!