问题
[Note to the wise: jump to last EDIT]
I have a very simple txt sitemap (named sitemap.txt) that looks like this:
http://myDomain.com
http://myDomain.com/about.html
http://myDomain.com/faq.html
http://myDomain.com/careers.html
When I load it up on webmaster tools I get:
Sitemap is HTML - Your Sitemap appears to be an HTML page. Please use a supported sitemap format instead
I tried a few alternatives (such as with or without www) but no luck.
Anyone any clue?
Any help appreciated!
EDIT:
I tried with an xml sitemap and getting the same error so it looks like the server is serving everything as HTML (as ceejayoz correctly suggests). Now the question is ... how do I get the appspot server to server text as plain text?
EDIT:
Ok - I got fed up and implemented a servlet to serve my sitemaps (I am now trying with both XML and TXT) explicitly as text/plain. Everything works fine if I manually invoke the servlet but still getting Sitemap is HTML. I don't know where to bang my head!
EDIT: I tried to verify content-type with a firefox plugin - everything seems to be coming up as expected (I am putting the actual URL so that people can have a look):
http://wokheisandbox.appspot.com/sitemaps/sitemap.txt --> Content-type: text/plain http://wokheisandbox.appspot.com/sitemaps/sitemap.xml --> Content-type: application/xml
With my servlet (setting text/plain explicitly): http://wokheisandbox.appspot.com/wokhei/serveSitemap?fileType=TXT --> Content-type: text/plain http://wokheisandbox.appspot.com/wokhei/serveSitemap?fileType=XML --> Content-type: text/plain
All I get from webmaster tool still is -->Sitemap is HTML.
EDIT:
I think I found out the reason --> I registered on google webmaster tools my site as http://mydomain.com but the app is hosted on appspot at http://myapp.appspot.com which is mapped to mydomain.com. If I register http://myapp.appspot.com everything works fine (sitemap validated).
This is good news but it's not ideal because I want mydomain.com to be indexed ... any idea about how to overcome?
回答1:
Sounds like your webserver is serving .txt
files as text/html
instead of text/plain
.
For Apache, the following in a .htaccess file should fix it:
AddType text/plain .txt
回答2:
I found this thread discussing duplicate entries causing recent sitemap grief. I don't see this issue in your sitemap but you don't want any duplicates between entries. For example, make sure your sitemap doesn't contain BOTH of the following:
http://mydomain.com/ or http://www.mydomain.com/
AND
http://mydomain.com/index.html or http://www.mydomain.com/index.html
I think you posted your entire sitemap so, again, I don't think this is your problem exactly. You did mention you have tried various urls (with and without www.) If you are validating the sitemap via the Google WebMaster Tools it may take up to 20 minutes for correction to take affect. I hope it helps.
回答3:
<?xml version='1.0' encoding='utf-8' ?>
<urlset xmlns='http://www.sitemaps.org/schemas/sitemap/0.9'>
<url>
<loc>http://myDomain.com</loc>
</url>
<url>
<loc>http://myDomain.com/about.html</loc>
</url>
<url>
<loc>http://myDomain.com/faq.html</loc>
</url>
<url>
<loc>http://myDomain.com/careers.html</loc>
</url>
</urlset>
This way always works for me.
回答4:
Just in case if you will change your mind about non-xml sitemaps:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.test.com/</loc>
<lastmod>2009-08-03T23:40:40+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>http://test/</loc>
<lastmod>2009-08-03T23:59:08+00:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.6</priority>
</url>
</urlset>
回答5:
I'm fairly certain that you need to provide an XML formatted sitemap file (sitemap.xml). See here for a format example: http://en.wikipedia.org/wiki/Sitemaps.
来源:https://stackoverflow.com/questions/1223955/getting-sitemap-is-html-from-google-webmaster-tool