screen scraping using coldfusion

那年仲夏 提交于 2019-12-10 23:56:53

问题


I am trying to screen scrape another application using the below code in Coldfusion.

 <cfhttp url="https://intra.att.com/itscmetrics/EM2/LTMR.cfm" method="get" username="uvwxyz" password="abcdef">  

 <cfhttpparam type="url" name="LTMX" value="Andre Fuetsch / Shelly K Lazzaro">

</cfhttp> 

  <cfset myDocument = cfhttp.fileContent>

<cfoutput>
  #myDocument#
</cfoutput>

Now when I run my cfm page, iam able to access the desitination page, with the above code. The destination page looks like below.

A part of the source code of this is as below.

<table border="1" width=99% style="border-collapse:collapse;">
    <thead>
    <td colspan="12" class="drpmainheader1_2">LTM Detail Report for Andre Fuetsch / Shelly K Lazzaro</td>
    <tr align="center">
      <th class="ptitles">Liaison Name</th>
      <th class="ptitles">Application Acronym</th>
      <th class="ptitles">MOTS ID</th>
      <th class="ptitles">Priority</th> 
      <th class="ptitles">MC</th>
      <th class="ptitles">DR Exercise</th>
      <th class="ptitles">ARM/SRM Maintenance</th>
      <th class="ptitles">ARM/SRM Creation</th>             
      <th class="ptitles">Backup & Recovery Certification</th>
      <th class="ptitles">Interface Certification</th>
      <th class="ptitles">AIA Compliance</th>   
    </tr>
    </thead>

    <tbody>
    <tr>
    <td class="drpdetailtablerowdetailleft">Lynette M Acosta</td>
    <td class="drpdetailtablerowdetailleft">AABA</td>
    <td class="drpdetailtablerowdetail"><a href="http://ebiz.sbc.com/mots/detail.cfm?appl_id=9710" target="_blank" style="color:blue;">9710</a></td>
    <td class="drpdetailtablerowdetail">5</td>
    <td class="drpdetailtablerowdetail">NMC</td>
<td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td>
    </tr>
    </tbody>

    <tbody>
    <tr>
    <td class="drpdetailtablerowdetailleft">Lynette M Acosta</td>
    <td class="drpdetailtablerowdetailleft">ABS RECON+</td>
    <td class="drpdetailtablerowdetail"><a href="http://ebiz.sbc.com/mots/detail.cfm?appl_id=13999" target="_blank" style="color:blue;">13999</a></td>
    <td class="drpdetailtablerowdetail">3</td>
    <td class="drpdetailtablerowdetail">NMC</td>
<td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td>
    </tr>
    </tbody>

I am not good with regex in coldfusion, Can anyone please guide me or give me any starting points as to how to extract the data from the html table using Coldfusion? I do not have access to the DB. Hope this is clear.


回答1:


Parsing HTML using regex? You'll have more options if you use the jsoup HTML Parser w/ColdFusion. Jsoup uses jQuery-like DOM selectors and can quickly convert the HTML table data into arrays.

http://jsoup.org/

Here are some related articles & sample code:

  • http://www.raymondcamden.com/index.cfm/2012/4/6/jsoup-adds-jQuerylike-parsing-in-Java
  • http://www.bennadel.com/blog/2358-Parsing-Traversing-And-Mutating-HTML-With-ColdFusion-And-jSoup.htm
  • http://pastebin.com/U6A86mSi


来源:https://stackoverflow.com/questions/22668870/screen-scraping-using-coldfusion

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!