Java Reading Undecoded URL from Servlet

前端未结

关注

 5  984

误落风尘 2021-02-10 02:32

Let\'s presume that I have string like \'=&?/;#+%\' to be a part of my URL, let\'s say like this:

example.com/servletPath/someOtherPath/myString/something.ht


      
      
        
          5条回答        

        
                    
            
            
                         
                
              
              
                
                   执念已碎
                                             
                
                
                (楼主)
            
              
              
                2021-02-10 02:51
              

            
            
                        
Update: this answer was originally wrongly stating that '/' and '%2F' in a path should always be treated the same. They are in fact different because a path is a list of /-separated segments.

You should not have to make a difference between an encoded and not encoded character in the path part of the URL. There is no character inside the path that can have a special meaning in a URL. E.g. '%2F' must be interpreted the same as '/', and a browser accessing such a URL is free to replace one by the other as it sees fit. Making a difference between them is breaking the standard of how URLs are encoded.

In the complete URL, you must make a difference between escaped and non-escape characters for different reasons, including:


To see where the path part ends. Because a ? encoded in the path should not be seen as the end.
Inside the query String. Because part of the value of a parameter could contain '&' or '=',...
Inside a path, a '/' separates two segments while '%2F' can be contained within a segment


Java deals fine with the first two cases:


getPathInfo() which returns only the path part, decoded
getParameter(String) to access parts of the query part


It doesn't deal so well with the third case. If you want to make a difference between '/' as the separation of two path segments, and a '/' inside a path segment (%2F), then you cannot consistently represent the path as one decoded string. You can either represent it as one encoded string (eg "foo/bar%2Fbaz"), or as a list of decoded segments (eg "foo", "bar/baz").
But because getPathInfo() API promises to do just that (one decoded string), it has no choice but to treat '/' and '%2F' as the same.

For usual web applications, this is just fine. If you are in the rare case where you really need to make the difference, you can do your own parsing of the URL, getting the raw version with getRequestURI(). If that one gives the URL decoded as you claim, then that means there is a bug in the servlet implementation you're using.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它5个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复