利用python和httpwatch实现自动监控网页

在做网页访问质量监控时，少不了使用到httpwatch这个工具。httpwatch能记录访问一个网页过程中发生的所有细节，包括网页里所有元素，从DNSlookup、网络连接到第一个数据包发送时间等等（如下图所示），都有详细记录，从而为我们查找问题提供了可视的方式。一般我们都是在出现问题时，就用它分析一下。但如果用它去长期跟跟踪一个网页的访问情况，然后记录入库，这些数据就可为分析问题提供一个基础数据，这也是很有意义的。那么httpwatch能实现这个需求吗。答案是肯定的，使用python就可以轻松实现这个功能。下面代码使用了python自动从一个外部文件读取将要监测的页面，并将一些时间要素打印出来，当然，你还可以实现更强的功能

外部文件格式:

http://www.cites.com/

http://www.cites2.com/page1.html

http://www.cites3.com/page2.html

httpwatch默认支持C#用ruby，python如果要调用它，需要用到win32com这个模块，这个需要安装pywin32,可以到这个地址下载

http://sourceforge.net/projects/pywin32/files/pywin32/

以下是程序实现代码：

#coding=UTF-8
import win32com.client

###定义一个函数，通过它读取外部文件来获得将要检查的URL，以列表返回
def getCiteToCheck(filepath):
input = open(filepath,'r')
cites = input.readlines()
return cites

def checkCite(cites):
#创建一个HttpWatch实例,并打开一个IE进程
control = win32com.client.Dispatch('HttpWatch.Controller')
plugin = control.IE.New()
plugin.Log.EnableFilter(False) #httpwatch的可以设置过滤某些条目，这里设为不过滤
plugin.Record() #激活httpwatch记录
i=1
for domain in cites:
  url = domain.strip('\n') #因为从文件里读的地址会带有换行符\n,因此需要先去掉,但测试时，不去掉也可以正常打开
  plugin.GotoURL(url)
  control.Wait(plugin,-1)
  #可以将日志记录到一个xml文件里去
  logFileName='d:\\log'+str(i)+'.xml'
  plugin.Log.ExportXML(logFileName)
  #也可以直接读log的内容
  print(plugin.Log.Entries.Count)
  for s in plugin.Log.Entries: #plugin.log.Entries是一个列表，列表元素是一个对象，它对应一个页面里包含的所有URL元素
   print(s.URL)
   print(s.time)
   #s.Timings.Blocked返回的是一个Timing的对象，Timing对象有三个属性：分别是Duration、Started、Valid
   #Duration是指下载一个RUL元素所耗时间，Started是指开始时间
   #Timings含有Blocked、CacheRead、Connect、DNSLookup、Network、Receice、Send、TTFB、Wait几个对象
   print('Blocked:'+str(s.Timings.Blocked.Duration))
   print('CacheRead:'+str(s.Timings.CacheRead.Duration))
   print('Connect:'+str(s.Timings.Connect.Duration))
   print('DNSLookup:'+str(s.Timings.DNSLookup.Duration))
   print('Network:'+str(s.Timings.Network.Duration))
   print('Receive:'+str(s.Timings.Receive.Duration))
   print('Send:'+str(s.Timings.Send.Duration))
   print('TTFB:'+str(s.Timings.TTFB.Duration))
   print('Wait:'+str(s.Timings.Wait.Duration))
  i=i+1
plugin.Stop()
plugin.CloseBrowser()
###########

cite_file="cite.txt"
cites = getCiteToCheck(cite_file)
########
print(cites)
for i in [1,2,3,4]:
checkCite(cites)

来源：oschina

链接：https://my.oschina.net/u/1590519/blog/342604

标签

python

python函数

str函数

log

HttpWatch