HTTP Response filter can't decode the response bytes the second time

谁说胖子不能爱 提交于 2019-12-11 05:15:27

问题


I developed an IIS 7 HttpModule. My goal is to check the response content for a particular tag. if the tag is found then something gets logged.

To achieve my goal I developed a customized ASP NET Response Filter. This filter extends the .NET Stream class.

The filter get registered on the OnPreRequestHandlerExecute(Object source, EventArgs e) event.

The HTTP module has been registered correctly. The filter is working. The issue is that when I refreshed the page the Write the Write(byte[] buffer, int offset, int count) method get called as expected, however, the content of the bytes when decoding them are gobbledygook.

it's got me puzzled as why the first time the response bytes get decoded properly, however after a second request (i.e page refresh) they are not. Below is the coded where the filter is set and the code for the writer method of the filter. Any help would be appreciated since I spent already 3 days , debugging , researching on google and still no joy.

public void OnPreRequestHandlerExecute(Object source, EventArgs e)
{

    HttpResponse response = HttpContext.Current.Response;
    if (response.ContentType == "text/html")
    {
        response.ContentEncoding = Encoding.UTF8; //forcing encoding UTF8
        response.Charset = "charset=utf-8";
        Encoding encoding = response.ContentEncoding;
        string encodingName = encoding.EncodingName;
        response.Filter = new MyFilter(response.Filter, response.ContentEncoding);
    }
}

    public override void Write(byte[] buffer, int offset, int count)
    {
        string strBuffer = string.Empty;

        try
        {
            strBuffer = Encoding.UTF8.GetString(buffer);
        }
        catch (EncoderFallbackException ex)
        {
            log(ex.Message);
        }


        // buffer doesn't contain the HTML end tag so we keep storing the 
        //incoming chunck of data

        if (!strBuffer.Contains("</html>"))
        {
            log(strBuffer.ToString() );
            _responseHtml.Append(strBuffer);

        }
        //the strbuffer contains the HTLM end tag ; we wrap it up now
  else
        {
            _responseHtml.Append(strBuffer); //append last chunck of data
            string finalHtml = _responseHtml.ToString();


               byte[] bytesBuffer = Encoding.UTF8.GetBytes(finalHtml);
                outputStream.Write(bytesBuffer, 0, bytesBuffer.Length);
            }

        }

    }

This is what I get, after decoding the response bytes, the 2nd time an html page gets invoked (i.e refresh on the browser)

?\b\0\0\0\0\0\0?yw??/????Og??V.\ak?t:JhY??xP,u?I?Y? \"?\0???w?|?W???\0R?M?Y??I7E{?]??_}???z??8K??!?5O?8??????k?^?~k\?u????f?lE?????s=i??gqY%??O????<9x???BKuZg?a???4?Fq???KJ?t??8??????????$e\?E?,?

UPDATE.

First timer so I am not sure how to update this. so I am putting what I have done to narrow down/fix the issue.

First of all, still No Joy. :-(

This is what I did:

  1. Since the Write method can be called more than once by ASP NET , I store the bytes on a collection, adding them to the collection, every time that the Write method gets called by ASP NET

p

ublic override void Write(byte[] buffer, int offset, int count)
                {

                        for (int i = 0; i < count; i++)
                        {
                            bytesList.Add(buffer[i]);
                        }
                        log("Write was called "+ "number of bytes: "+ bytesList.Count + " - " + count);
                }
  1. On the flush method I call a method that does some work on all the bytes collected:

    public override void Flush() { byte[] bytesContent = ProcessResponseContent(bytesList); outputStream.Write(bytesContent, 0, bytesContent.Length); outputStream.Flush(); }

    public override void Write(byte[] buffer, int offset, int count) {

            for (int i = 0; i < count; i++)
            {
                bytesList.Add(buffer[i]);
            }
            log("Write was called " + "number of bytes: " + bytesList.Count + " -" + count);
        }
    

    private byte[] ProcessResponseContent(List bytesList) {

           byte[] bytesArray = bytesList.ToArray();
            string html = string.Empty;
            byte[] encodedBytes = null;
    
            try
            {
                FilterEncoder encoder = new FilterEncoder();
                html = encoder.DecodeBytes(bytesArray.Length, bytesArray);
                encodedBytes = encoder.EncodeString(html);
                log("after encoding - encodedBytes" + encodedBytes.Length);
                log("after encoding - bytesArray" + bytesArray.Length);
            }
            catch (Exception ex)
            {
                log("exception ocurred " + ex.Message);
    

    .... .....
    }

The ProcessResponseContent is a dumb method. it just convert the List of bytes into an bytes array; this array of bytes gets decoded into a string. Now we shouldn't have any issue because we got all the bytes, sent on the response, in the bytesList (List )

The bytes array gets returned untouched as the purpose of the code is to log into a file the decoded string.

        log("after decoding  " + html);

As i created a UTF8Encoding I am catching an exception. the exception get logged into a file.

First time the html page gets retrieved the content gets log to the file.

When i refreshed the page (Ctrl + F5) an exception gets logged:

"exception ocurred Unable to translate bytes [8B] at index 0 from specified code page to Unicode"

Please, bear on mind that my html page content is very small . all the response content gets processed on one chunk.

The first time the page is visited the number of bytes received is 2805. Right before these bytes are decoded into string.

The second time the page is called up (Ctrl + F5) the number bytes received, before they are even decoded, are 1436.

Why the response has less number of bytes , I am not sure. Is this affecting the decoding operation , probably.

I hope this all make sense, please let me know if something is not clear. I have been looking on this code for a long time.

Thanks,


回答1:


It's hard to tell whether this is all of the problem, but you're ignoring the offset and count parameters in Write, instead assuming that the whole of the buffer is valid:

strBuffer = Encoding.UTF8.GetString(buffer);

You're also assuming that this will be a complete set of characters - it may contain (say) just two bytes out of a three-byte character. You need to make your stream stateful, with an Encoder created from Encoding.UTF8 to maintain the state of partially-written characters between calls.

Also note that you're assuming you'll get the whole of </html> in one call - whereas you could get </ in one call, and html> in the next. It's possible that ASP.NET really only calls you once, at the very end, but you probably shouldn't assume that's the case.



来源:https://stackoverflow.com/questions/10860732/http-response-filter-cant-decode-the-response-bytes-the-second-time

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!