I use a URL entered by the user as text to initialize a QUrl object. Later I want to convert the QUrl back into a string for displaying it and to check it using regular expression. This works fine as long as the user does not enter any percent encoded URLs.
Why doesn't the following example code work?
qDebug() << QUrl("http://test.com/query?q=%2B%2Be%3Axyz%2Fen").toDisplayString(QUrl::FullyDecoded);
It simply doesn't decode any of the percent-encoded characters. It should print "http://test.com/query?q=++e:xyz/en"
but it actually prints "http://test.com/query?q=%2B%2Be%3Axyz%2Fen"
.
I also tried a lot of other methods like fromUserInput() but I could not make the code work correctly in Qt5.3.
Can someone explain me how to do this and why the above code doesn't work (i.e. showing the decoded URL) even when using QUrl::FullyDecoded?
UPDATE
After getting the fromPercentEncoding() hint, I tried the following code:
QUrl UrlFromUserInput(const QString& input) { QByteArray latin = input.toLatin1(); QByteArray utf8 = input.toUtf8(); if (latin != utf8) { // URL string containing unicode characters (no percent encoding expected) return QUrl::fromUserInput(input); } else { // URL string containing ASCII characters only (assume possible %-encoding) return QUrl::fromUserInput(QUrl::fromPercentEncoding(input.toLatin1())); } }
This allows the user to input unicode URLs and percent-encoded URLs and it is possible to decode both kinds of URLs for displaying/matching. However the percent-encoded URLs did not work in QWebView... the web-server responded differently (it returned a different page). So obviously QUrl::fromPercentEncoding() is not a clean solution since it effectively changes the URL. I could create two QUrl objects in the above function... one constructed directly, one constructed using fromPercentEncoding(), using the first for QWebView and the latter for displaying/matching only... but this seems absurd.
Conclusion
I've done some research, the conclusion so far is: absurd.
QUrl::fromPercentEncoding()
is the way to go and what OP has done in the UPDATE section should've been the accepted answer to the question in title.
I think Qt's document of QUrl::toDisplayString
is a little bit misleading :
"Returns a human-displayable string representation of the URL. The output can be customized by passing flags with options. The option RemovePassword is always enabled, since passwords should never be shown back to users."
Actually it doesn't claim any decoding ability, the document here is unclear about it's behavior. But at least the password part is true. I've found some clues on Gitorious:
"Add QUrl::toDisplayString(), which is toString() without password. And fix documentation of toString() which said this was the method to use for displaying to humans, while this has never been true."
Test Code
In order to discern the decoding ability of different functions. The following code has been tested on Qt 5.2.1 (not tested on Qt 5.3 yet!)
QString target(/*path*/); QUrl url_path(target); qDebug() << "[Original String]:" << target; qDebug() << "--------------------------------------------------------------------"; qDebug() << "(QUrl::toEncoded) :" << url_path.toEncoded(QUrl::FullyEncoded); qDebug() << "(QUrl::url) :" << url_path.url(); qDebug() << "(QUrl::toString) :" << url_path.toString(); qDebug() << "(QUrl::toDisplayString) :" << url_path.toDisplayString(QUrl::FullyDecoded); qDebug() << "(QUrl::fromPercentEncoding):" << url_path.fromPercentEncoding(target.toUtf8());
P.S. QUrl::url
is just synonym for QUrl::toString
.
Output
[Case 1]: When target path = "%_%"
(test the functionality of encoding):
[Original String]: "%_%" -------------------------------------------------------------------- (QUrl::toEncoded) : "%25_%25" (QUrl::url) : "%25_%25" (QUrl::toString) : "%25_%25" (QUrl::toDisplayString) : "%25_%25" (QUrl::fromPercentEncoding): "%_%"
[Case 2]: When target path = "Meow !"
(test the functionality of encoding):
[Original String]: "Meow !" -------------------------------------------------------------------- (QUrl::toEncoded) : "Meow%20!" (QUrl::url) : "Meow !" (QUrl::toString) : "Meow !" (QUrl::toDisplayString) : "Meow%20!" // "Meow !" when using QUrl::PrettyDecoded mode (QUrl::fromPercentEncoding): "Meow !"
[Case 3]: When target path = "Meow|!"
(test the functionality of encoding):
[Original String]: "Meow|!" -------------------------------------------------------------------- (QUrl::toEncoded) : "Meow%7C!" (QUrl::url) : "Meow%7C!" (QUrl::toString) : "Meow%7C!" (QUrl::toDisplayString) : "Meow|!" // "Meow%7C!" when using QUrl::PrettyDecoded mode (QUrl::fromPercentEncoding): "Meow|!"
[Case 4]: When target path = "http://test.com/query?q=++e:xyz/en"
(none % encoded):
[Original String]: "http://test.com/query?q=++e:xyz/en" -------------------------------------------------------------------- (QUrl::toEncoded) : "http://test.com/query?q=++e:xyz/en" (QUrl::url) : "http://test.com/query?q=++e:xyz/en" (QUrl::toString) : "http://test.com/query?q=++e:xyz/en" (QUrl::toDisplayString) : "http://test.com/query?q=++e:xyz/en" (QUrl::fromPercentEncoding): "http://test.com/query?q=++e:xyz/en"
[Case 5]: When target path = "http://test.com/query?q=%2B%2Be%3Axyz%2Fen"
(% encoded):
[Original String]: "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" -------------------------------------------------------------------- (QUrl::toEncoded) : "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" (QUrl::url) : "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" (QUrl::toString) : "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" (QUrl::toDisplayString) : "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" (QUrl::fromPercentEncoding): "http://test.com/query?q=++e:xyz/en"
P.S. I also encounter the bug that Ilya mentioned in comments: Percent Encoding doesn't seem to be working for '+' in QUrl
Summary
The result of QUrl::toDisplayString
is ambiguous. As the document says, the QUrl::FullyDecoded
mode must be used with care. No matter what type of URL you got, encode them by QUrl::toEncode
and display them with QUrl::fromPercentEncoding
when necessary.
As for the malfunction of percent-encoded URLs in QWebView
mentioned in OP, more details are needed to debug it. Different function and different mode used could be the reason.
Helpful Resources
- RFC 3986 (which QUrl conforms)
- Encode table
- Source of qurl.cpp on Gitorious
You can use QUrlQuery::toString(QUrl::FullyEncoded)
or QUrl::fromPercentEncoding()
for this converting.
I am not sure why toDisplayString(QUrl::FullyDecoded)
does not work.
After trying several versions I have found that copy.query(QUrl::FullyDecoded)
does decode the query part. The Documentation has an example with the the following code does return the decoded URL:
QUrl url("http://test.com/query?q=%2B%2Be%3Axyz%2Fen"); url.setQuery(url.query(QUrl::FullyDecoded), QUrl::DecodedMode); qDebug() << url.toString();
To solve the problem this way is not optimal because the query part is copied without need.