Why URI-encoded ('#') anchors cause 404, and how to deal with it in JS?

后端未结

关注

 2  915

prettyPhoto utilizes hashtags, but if they get encoded (to %23), most browsers will bring up a 404 error. This has been discussed before:

You get a 40

相关标签:

2条回答

予麋鹿

2021-01-12 05:58
To answer #1)

It would become a part of the URL because it's no longer a token which the browser/server/etc know how to parse out.

What I mean is that "?" plays a significant role in URLs -- the server knows to separate what's before from what's after. The browser doesn't need to care about what is or isn't dynamic in the URI - it's all significant (though JavaScript separates the values in the location object).

The browser won't send "#......" to the server, as the hashtag has special connotations for the browser.

However, if you escape that hash in JavaScript, the browser won't hesitate to send that escaped string to the server as a literal value.

Why wouldn't it? If your search query legitimately required a hash character (you make a POST request to a facebook wall, and you're submitting a phonenumber), then you'd be screwed. Or you're doing a GET-based search of some number on 411.com or whatever, and they haven't really thought their application through.

The problem is that the server isn't going to understand that the escaped value is to be held separately from the url, if it's occurring in the actual path.

It has to accept escaped characters, otherwise spaces (%20) and other every-day characters, which are otherwise valid in filenames/paths/queries/values would pose problems.

So if you're looking for:
```
//mysite.gov.on.ca/path/to/file.extension%23action%3Dfullscreen
```
verily, you shall surely 404.

There are a few things that you could do, I'm certain. The first would be in Apache, or whatever you're serving from, you could write a RegEx which matches any url up to the first "%23", assuming that there is no "?" beforehand.

Less soul-rending implementations might involve figuring out if there's a way to escape the "#" that are plug-in friendly.

Google, for-instance, uses a "hash-bang" strategy ("#!") where it asks that URLs be submitted that way, to know whether or not to encode.

Other options might be to check for a "#" character using url.indexOf("#"); and splitting the URL at the hash, and submitting the valid portion.

It really all comes down to what you're trying to accomplish -- I can point at why it's an issue, but the how to best make it a non-issue relies on what you're trying to do, how you're trying to do it, and what's allowed in the context you're working in.
0 讨论(0)
发布评论:

提交评论
- 加载中...
挽巷

2021-01-12 05:59
1. Why would a hash become part of the file just because it's URI-encoded? Isn't it a bug?
If you point your browser to http://example.com/index.html#title, the browser interprets this to make a request for the file index.html from the server example.com. Once the request is complete, the browser looks for an anchor element in the document with the name of 'title' (i.e. <a name="title">My title</a>).

If you instead point to http://example.com/index.html%23title, the browser makes a request for the file index.html%23title from example.com, which probably doesn't exist on the server, giving you a 404. See the difference?

And it's not a bug. It's part of an internet standard last updated in 1998. See RFC 2396. Quoting:

The character "#" is excluded because it is used to delimit a URI from a fragment identifier in URI references (Section 4).

As for 2 and 3, there's not enough context in your example code to tell what you're trying to do. How are you calling your code? What are you trying to do with prettyphoto that isn't working? Are you trying to redirect to a specific photo or gallery from a user click or other javascript event? Are you trying to open the gallery when someone visits a particular page?

I checked the linked question with twitter/oauth, but I don't see how that ties into the code you provided. I started poking at prettyphoto as well, but I don't see how your code relates to that either.

Instead of changing your 404 page, maybe what you need is an in-code handler or server rewrite rule that takes not-found requests with a %23 in them and redirects the user to the decoded url. That could have some drawbacks, but it would be fairly elegant if you're taking incoming requests from other sources you can't control. What is your server environment? (language, server tech, who owns the machine, etc.)

I'd be happy to update my answer with a solution or a work around for you.
0 讨论(0)
发布评论:

提交评论
- 加载中...