I don't actually know the answer but I suspect that it has something to do with the aspect of invoking methods on string literals directly.
If I recall correctly (I didn't actually verify this because I don't have an old IDE handy), early versions of the C# IDE had trouble detecting method calls against string literals in IntelliSense, and that has a big impact on the discoverability of the API. If that was the case, typing the following wouldn't give you any help:
"{0}".Format(12);
If you were forced to type
new String("{0}").Format(12);
It would be clear that there was no advantage to making the Format method an instance method rather than a static method.
The .NET libraries were designed by a lot of the same people that gave us MFC, and the String class in particular bears a strong resemblance to the CString class in MFC. MFC does have an instance Format method (that uses printf style formatting codes rather than the curly-brace style of .NET) which is painful because there's no such thing as a CString literal. So in a MFC codebase that I worked on I see a lot of this:
CString csTemp = "";
csTemp.Format("Some string: %s", szFoo);
which is painful. (I'm not saying that the code above is a great way to do things even in MFC, but that does seem to be the way that most of the developers on the project learned how to use CString::Format). Coming from that heritage, I can imagine that the API designers were trying to avoid that sort of situation again.