phpquery | 易学教程

url参数里的空格为何有时会是%20有时会是加号

阅读更多关于 url参数里的空格为何有时会是%20有时会是加号

If enc_type is PHP_QUERY_RFC1738, then encoding is performed per » RFC 1738 and the application/x-www-form-urlencoded media type, which implies that spaces are encoded as plus (+) signs. If enc_type is PHP_QUERY_RFC3986, then encoding is performed according to » RFC 3986, and spaces will be percent encoded (%20). 来源： https://www.php.net/manual/en/function.http-build-query.php 根据上述说明，加号或%20，两种数据标准均是RFC定义的参数格式。表单默认是application/x-www-form-urlencoded编码类型时，空格会替换成+号；普通urlencode()方法或浏览器打开的url有空格的，会替换成%20，参数编码的结果里有时会有空格，这点在传参时为何要先urlencode()，才能保证不被上述两种标准搞乱，有些框架会对经过的参数自动做urldecode()，就导致后面流程参数变形，这点也要注意

php+phpquery简易爬虫抓取京东商品分类

阅读更多关于 php+phpquery简易爬虫抓取京东商品分类

这是一个简单的php加phpquery实现抓取京东商品分类页内容的简易爬虫。phpquery可以非常简单地帮助你抽取想要的html内容，phpquery和jquery非常类似，可以说是几乎一样；如果你有jquery的基础的话你可以迅速地上手。　　1、下载phpquery并置于web根目录下的phpQuery文件夹　　　　phpquery下载：https://code.google.com/p/phpquery/downloads/list 　　　　phpquery教程可在这里查看：https://code.google.com/p/phpquery/ 　　2、抓取程序 <?php /* * Created on 2015-1-29 * * To change the template for this generated file go to * Window - Preferences - PHPeclipse - PHP - Code Templates */ header("Content-type:text/html; charset=utf-8"); function getPage( $url ) { $cnt = file_get_contents($url); return mb_convert_encoding($cnt ,"UTF-8","GBK");

使用phpQuery采集文章或者其他数据...

阅读更多关于使用phpQuery采集文章或者其他数据...

【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> //1.首先,你得创建一张表,叫采集表,,如果你只是简单地几条链接,可以手动录入,如果比较多的话,就需要自己用方法,写个for循环,匹配一些规则生成url,存储在表内. CREATE TABLE `cp_lottery_articles_gather_list` ( `id` int(11) NOT NULL AUTO_INCREMENT, `site_id` int(11) NOT NULL COMMENT '站点id', `lottery_id` int(11) DEFAULT NULL COMMENT '彩票id', `type_id` int(11) DEFAULT NULL COMMENT '文章栏目类型id', `category_id` int(11) DEFAULT NULL COMMENT '资讯栏目ID', `lottery_name` varchar(255) DEFAULT NULL COMMENT '关联彩票名称', `type_name` varchar(255) DEFAULT NULL COMMENT '类型名称', `link` varchar(255) NOT NULL COMMENT '采集链接', `total_page` int(11) DEFAULT '1'

PHPQuery WebBrowser plugin - using cookies

阅读更多关于 PHPQuery WebBrowser plugin - using cookies

问题 I'm trying to login into a website using PHPQuery's WebBrowser plugin. I'm able to successfully login but I'm not sure how to reuse cookies from a previous call to the next. $client = phpQuery::browserGet('https://website.com/login', 'success1'); function success1($browser) { $handle = $browser ->WebBrowser('success2'); $handle ->find('input[name=name]') ->val('username'); $handle ->find('input[name=pass]') ->val('password') ->parents('form') ->submit(); } function success2($browser) { print

Fix incorrectly displayed encoding on an html document with php

阅读更多关于 Fix incorrectly displayed encoding on an html document with php

问题 Is there a way to fix the characters that display improperly after running this html markup through phpquery::newDocument? There are slated double quotes around -Classics with modern Woman- in the original document that end up displaying improperly after creating the new doc with phpquery. //Original document is UTF-8 encoded $raw_html = '<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /></head><body>Mr. Smith of Bangkok celebrated the “Classics with modern

Having issue in extracting data from 100 web pages in a single loop

阅读更多关于 Having issue in extracting data from 100 web pages in a single loop

问题 I am sort of stuck. My goal is to extract data from a website that has several hundred pages. Its a sports website and i have to extract the team names and other relevant data. So far i have been successful doing it. I ran the loop for 6-7 pages and its works perfectly well. but when i change the loop to about month(25), it retrieves incomplete data. For instance if the destination date is 25 October, it may stop randomly at 10-12 October. I am using phpQuery and my internet connection is 1MB

phpQuery ignores part of an imported file

阅读更多关于 phpQuery ignores part of an imported file

问题 the following code: <? require_once "phpQuery.php"; $dom = phpQuery::newDocument( "<head></head><body>this is ignored</body>" ); echo nl2br( htmlentities( $dom ) ); ?> should give this is ignore, but the entire body seems to be ignored. I stripped down the code to where the problem was still there. I want to read links ($dom->find('a')) from the body, but found out nothing was found even though there were links in the body. What am I doing wrong? 回答1: Does phpquery require valid xml bodies?

Iterate over every element and get only the content that is directly in the node

阅读更多关于 Iterate over every element and get only the content that is directly in the node

问题 let's assume I have this code FirstLevelP SecondLevelSpan FirstLevelP SecondLevelSpan ThirdLevelP Is it possible to iterate through every element that I have right now, but only get the content, that's in the direct node of it, modify the text and then have it in the original content? Example, If I go through every $('p').each and would extract the text I would also get the text inside the span. Basically this: FirstelElement:

How to find tag name using phpquery?

阅读更多关于 How to find tag name using phpquery?

问题 I am using phpquery to extract some data from a webpage. I need to identify the menu of the page. My implementation is to find each element that has sibilings > 0 and last-child is an "a" . My code is: foreach($this->doc['*'] as $tagObj){ $tag = pq($tagObj); if(count($tag->siblings()) > 0){ if($tag->find(":last-child")->tagName === "a") echo trim(strip_tags($tag->html())) . " "; } } However, I am not getting any output because of $tag->find(":last-child")->tagName which isn't returning

【php采集/爬虫库】phpQuery的用法

阅读更多关于【php采集/爬虫库】phpQuery的用法

下面简单举例： include 'phpQuery.php'; phpQuery::newDocumentFile('http://www.phper.org.cn'); echo pq("title")->text(); // 获取网页标题echo pq("div#header")->html(); // 获取id为header的div的html内容上例中第一行引入phpQuery. PHP 文件，第二行通过newDocumentFile加载一个文件，第三行通过pq()函数获取title标签的文本内容，第四行获取id为header的div标签所包含的HTML内容。主要做了两个动作，即加载文件和读取文件内容。二、载入文档（loading documents）加载文档主要通过phpQuery::newDocument来进行操作，其作用是使得phpQuery可以在服务器预先读取到指定的文件或文本内容。主要的方法包括： phpQuery::newDocument(html,contentType = null) phpQuery::newDocumentFile(file,contentType = null) phpQuery::newDocumentHTML(html,charset = ‘utf-8′) phpQuery::newDocumentXHTML(html

订阅 phpquery