phpquery | 易学教程

phpQuery—基于jQuery的PHP实现

阅读更多关于 phpQuery—基于jQuery的PHP实现

Query的选择器之强大是有目共睹的， phpQuery 让php也拥有了这样的能力，它就相当于服务端的jQuery。先来看看官方简介： phpQuery is a server-side, chainable, CSS3 selector driven Document Object Model (DOM) API based on jQuery JavaScript Library. Library is written in PHP5 and provides additional Command Line Interface (CLI). 存在的意义我们有时需要抓取一个网页的内容，但只需要特定部分的信息，通常会用正则来解决，这当然没有问题。正则是一个通用解决方案，但特定情况下，往往有更简单快捷的方法。比如你想查询一个编程方面的问题，当然可以使用Google，但 stackoverflow 作为一个专业的编程问答社区，会提供给你更多，更靠谱的答案。对于html页面，不应该使用正则的原因主要有3个 1、编写条件表达式比较麻烦尤其对于新手，看到一堆”不知所云”的字符评凑在一起，有种脑袋都要炸了的感觉。如果要分离的对象没有太明显的特征，正则写起来更是麻烦。 2、效率不高对于php来说，正则应该是没有办法的办法，能通过字符串函数解决的，就不要劳烦正则了

PHP爬虫最全总结2-phpQuery，PHPcrawer，snoopy框架中文介绍

阅读更多关于 PHP爬虫最全总结2-phpQuery，PHPcrawer，snoopy框架中文介绍

第一篇文章介绍了使用原生的PHP和PHP的扩展库实现了爬虫技术。本文尝试使用PHP爬虫框架来写，首先对三种爬虫技术phpQuery，PHPcrawer， snoopy进行对比，然后分析模拟浏览器行为的方式，重点介绍下snoopy 所有代码挂在我的github上 1.几种常用的PHP爬虫框架对比 1.1 phpQuery 优势：类似jquery的强大搜索DOM的能力。 pq()是一个功能强大的搜索DOM的方法，跟jQuery的$()如出一辙，jQuery的选择器基本上都能使用在phpQuery上，只要把“.”变成“->”,Demo如下(对应我的github的Demo5) <?php require('phpQuery/phpQuery.php'); phpQuery::newDocumentFile('http://www.baidu.com/'); $menu_a = pq("a"); foreach($menu_a as $a){ echo pq($a)->html()."<br>"; } foreach($menu_a as $a){ echo pq($a)->attr("href")."<br>"; } ?> 1.2 PHPcrawer 优势：过滤能力比较强。官方给的Demo如下（我的github中对应demo4）： <?php include("PHPCrawl/libs

Fix incorrectly displayed encoding on an html document with php

阅读更多关于 Fix incorrectly displayed encoding on an html document with php

Is there a way to fix the characters that display improperly after running this html markup through phpquery::newDocument? There are slated double quotes around -Classics with modern Woman- in the original document that end up displaying improperly after creating the new doc with phpquery. //Original document is UTF-8 encoded $raw_html = '<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /></head><body><p>Mr. Smith of Bangkok celebrated the “Classics with modern Woman”.</p></body></html>'; print($raw_html); $aNew_document = phpQuery::newDocument($raw_html); print(

phpQuery的使用

阅读更多关于 phpQuery的使用

前言为什么使用phpQuery phpQuery是基于php5新添加的DOMDocument。而DOMDocument则是专门用来处理html/xml。它提供了强大的xpath选择器及其他很多html/xml操作函数，使得处理html/xml起来非常方便。尤其对于新手，看到一堆”不知所云”的字符评凑在一起，有种脑袋都要炸了的感觉。如果要分离的对象没有太明显的特征，正则写起来更是麻烦。学习成本低，jQuery是PHP程序员的标配，那么懂jQuery的话，是可以无缝衔接的，学习成本几乎为0。选择器，节点，节点信息，over 获取SF的所有标签名称 https://segmentfault.com/tags ，审查元素，得到部分标签属性。<a class="tag" data-original-title="负载均衡">负载均衡</a> Demo <?php require("phpQuery.php");//导入phpQuery库 $html = phpQuery::newDocumentFile("https://segmentfault.com/tags"); $hrefList = pq(".tag"); //获取标签为a的所有对象$(".tag") foreach ($hrefList as $href) { echo $href->getAttribute("data

PHP类推荐：QueryList|基于phpQuery的无比强大的PHP采集工具

阅读更多关于 PHP类推荐：QueryList|基于phpQuery的无比强大的PHP采集工具

QueryList的出现让PHP做采集从未如此简单。得益于phpQuery，让使用QueryList几乎没有任何学习成本，只要会CSS3选择器就可以轻松使用QueryList了，和jQuery选择器用法完全通用，它让PHP做采集像jQuery选择元素一样简单。初探看看PHP用QueryList做采集到底有多简洁吧! <?php use QL\QueryList; //采集某页面所有的图片 $data = QueryList::Query('http://cms.querylist.cc/bizhi/453.html',array( //采集规则库 //'规则名' => array('jQuery选择器','要采集的属性'), 'image' => array('img','src') ))->data; //打印结果 print_r($data); //采集某页面所有的超链接 //可以先手动获取要采集的页面源码 $html = file_get_contents('http://cms.querylist.cc/google/list_1.html'); //然后可以把页面源码或者HTML片段传给QueryList $data = QueryList::Query($html,array( 'link' => array('a','href') ))->data; //打印结果