How to scrape URL data from intranet site using python?

问题

I need a Python Warrior to help me (I'm a noob)! I'm trying to scrape certain data from an intra-net site using Module urllib. However, since it is my company website that is only available to employees to view and not to the public, I think this is why I get this code:

IOError: ('http error', 401, 'Unauthorized', )

How do I come about this? It won't even read the site using htmlfile.read()

Sample code to get public site:

import urllib
import re

htmlfile = urllib.urlopen("http://finance.yahoo.com/q?s=AAPL")

htmltext = htmlfile.read()

regex = '<span id="yfs_l84_aapl">(.+?)</span>' 

pattern = re.compile(regex)

price = re.findall(pattern,htmltext)

print price

回答1:

Try requests with requests_ntlm:

import requests
from requests_ntlm import HttpNtlmAuth

r = requests.get("http://ntlm_protected_site.com",auth=HttpNtlmAuth('domain\\username','password'))

    print r.text

If you need help with any specifics of this library and can't find it in the docs, leave a comment.

来源：https://stackoverflow.com/questions/24805432/how-to-scrape-url-data-from-intranet-site-using-python

标签

python

web-scraping

urllib

intranet

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!