We\'ve noticed that from time to time we will get a HTTP request without a valid User-Agent string. Is there any valid real-world case for accepting this type of HTTP request?
From my experience, there is not Legitimate use case, at least no common one.
From my server logs, all requests without a user agent are malicious. All come from bots.
In theory, someone who "really cares about privacy" could do this. It depends on what service you're providing. I run a couple of websites. People not using browsers are not my clients.
While it's true that it's trivial for a developer of malicious bots to add a user agent (fake one like Chrome), it doesn't mean it's not illegitimate in intention.
I guess many people use HTTP requests without a User-Agent mostly when they are using an API to perform the request.
As stated in RFC 7231 (but nearly the same paragraph can be found in RFC2616):
5.5.3 User-Agent
The "User-Agent" header field contains information about the user agent originating the request, which is often used by servers to help identify the scope of reported interoperability problems, to work around or tailor responses to avoid particular user agent limitations, and for analytics regarding browser or operating system use. A user agent SHOULD send a User-Agent field in each request unless specifically configured not to do so.
The keyword here is SHOULD. And yes, there's an RFC that defines what that word is supposed to mean, RFC 2119:
- SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.
So, although the agents that do not send User-Agent do not follow what can be considered best practice, they do not violate any rule (rfc). So, in my opinion, there's not really a valid technical reason to block them.