phpquery

url参数里的空格为何有时会是%20有时会是加号

 ̄綄美尐妖づ 提交于 2020-08-05 02:24:31
If enc_type is PHP_QUERY_RFC1738, then encoding is performed per » RFC 1738 and the application/x-www-form-urlencoded media type, which implies that spaces are encoded as plus (+) signs. If enc_type is PHP_QUERY_RFC3986, then encoding is performed according to » RFC 3986, and spaces will be percent encoded (%20). 来源: https://www.php.net/manual/en/function.http-build-query.php 根据上述说明,加号或%20,两种数据标准均是RFC定义的参数格式。表单默认是application/x-www-form-urlencoded编码类型时,空格会替换成+号;普通urlencode()方法或浏览器打开的url有空格的,会替换成%20,参数编码的结果里有时会有空格,这点在传参时为何要先urlencode(),才能保证不被上述两种标准搞乱,有些框架会对经过的参数自动做urldecode(),就导致后面流程参数变形,这点也要注意

php+phpquery简易爬虫抓取京东商品分类

拜拜、爱过 提交于 2020-01-30 00:52:36
  这是一个简单的php加phpquery实现抓取京东商品分类页内容的简易爬虫。phpquery可以非常简单地帮助你抽取想要的html内容,phpquery和jquery非常类似,可以说是几乎一样;如果你有jquery的基础的话你可以迅速地上手。   1、下载phpquery并置于web根目录下的phpQuery文件夹     phpquery下载:https://code.google.com/p/phpquery/downloads/list     phpquery教程可在这里查看:https://code.google.com/p/phpquery/   2、抓取程序 <?php /* * Created on 2015-1-29 * * To change the template for this generated file go to * Window - Preferences - PHPeclipse - PHP - Code Templates */ header("Content-type:text/html; charset=utf-8"); function getPage( $url ) { $cnt = file_get_contents($url); return mb_convert_encoding($cnt ,"UTF-8","GBK");

使用phpQuery采集文章或者其他数据...

风格不统一 提交于 2020-01-07 18:44:02
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> //1.首先,你得创建一张表,叫采集表,,如果你只是简单地几条链接,可以手动录入,如果比较多的话,就需要自己用方法,写个for循环,匹配一些规则生成url,存储在表内. CREATE TABLE `cp_lottery_articles_gather_list` ( `id` int(11) NOT NULL AUTO_INCREMENT, `site_id` int(11) NOT NULL COMMENT '站点id', `lottery_id` int(11) DEFAULT NULL COMMENT '彩票id', `type_id` int(11) DEFAULT NULL COMMENT '文章栏目类型id', `category_id` int(11) DEFAULT NULL COMMENT '资讯栏目ID', `lottery_name` varchar(255) DEFAULT NULL COMMENT '关联彩票名称', `type_name` varchar(255) DEFAULT NULL COMMENT '类型名称', `link` varchar(255) NOT NULL COMMENT '采集链接', `total_page` int(11) DEFAULT '1'

PHPQuery WebBrowser plugin - using cookies

爷,独闯天下 提交于 2020-01-02 03:49:08
问题 I'm trying to login into a website using PHPQuery's WebBrowser plugin. I'm able to successfully login but I'm not sure how to reuse cookies from a previous call to the next. $client = phpQuery::browserGet('https://website.com/login', 'success1'); function success1($browser) { $handle = $browser ->WebBrowser('success2'); $handle ->find('input[name=name]') ->val('username'); $handle ->find('input[name=pass]') ->val('password') ->parents('form') ->submit(); } function success2($browser) { print

Fix incorrectly displayed encoding on an html document with php

僤鯓⒐⒋嵵緔 提交于 2019-12-28 03:00:10
问题 Is there a way to fix the characters that display improperly after running this html markup through phpquery::newDocument? There are slated double quotes around -Classics with modern Woman- in the original document that end up displaying improperly after creating the new doc with phpquery. //Original document is UTF-8 encoded $raw_html = '<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /></head><body><p>Mr. Smith of Bangkok celebrated the “Classics with modern

Having issue in extracting data from 100 web pages in a single loop

痞子三分冷 提交于 2019-12-25 05:57:44
问题 I am sort of stuck. My goal is to extract data from a website that has several hundred pages. Its a sports website and i have to extract the team names and other relevant data. So far i have been successful doing it. I ran the loop for 6-7 pages and its works perfectly well. but when i change the loop to about month(25), it retrieves incomplete data. For instance if the destination date is 25 October, it may stop randomly at 10-12 October. I am using phpQuery and my internet connection is 1MB

phpQuery ignores part of an imported file

馋奶兔 提交于 2019-12-25 01:25:48
问题 the following code: <? require_once "phpQuery.php"; $dom = phpQuery::newDocument( "<head></head><body>this is ignored</body>" ); echo nl2br( htmlentities( $dom ) ); ?> should give this is ignore, but the entire body seems to be ignored. I stripped down the code to where the problem was still there. I want to read links ($dom->find('a')) from the body, but found out nothing was found even though there were links in the body. What am I doing wrong? 回答1: Does phpquery require valid xml bodies?

Iterate over every element and get only the content that is directly in the node

让人想犯罪 __ 提交于 2019-12-24 13:33:26
问题 let's assume I have this code <p>FirstLevelP <span>SecondLevelSpan</span> </p> <p>FirstLevelP <span>SecondLevelSpan <p>ThirdLevelP</p> </span> </p> Is it possible to iterate through every element that I have right now, but only get the content, that's in the direct node of it, modify the text and then have it in the original content? Example, If I go through every $('p').each and would extract the text I would also get the text inside the span. Basically this: FirstelElement:

How to find tag name using phpquery?

北城以北 提交于 2019-12-23 07:08:09
问题 I am using phpquery to extract some data from a webpage. I need to identify the menu of the page. My implementation is to find each element that has sibilings > 0 and last-child is an "a" . My code is: foreach($this->doc['*'] as $tagObj){ $tag = pq($tagObj); if(count($tag->siblings()) > 0){ if($tag->find(":last-child")->tagName === "a") echo trim(strip_tags($tag->html())) . "<br/>"; } } However, I am not getting any output because of $tag->find(":last-child")->tagName which isn't returning

【php采集/爬虫库】phpQuery的用法

≯℡__Kan透↙ 提交于 2019-12-13 16:39:33
下面简单举例: include 'phpQuery.php'; phpQuery::newDocumentFile('http://www.phper.org.cn'); echo pq("title")->text(); // 获取网页标题echo pq("div#header")->html(); // 获取id为header的div的html内容 上例中第一行引入phpQuery. PHP 文件, 第二行通过newDocumentFile加载一个文件, 第三行通过pq()函数获取title标签的文本内容, 第四行获取id为header的div标签所包含的HTML内容。 主要做了两个动作,即加载文件和读取文件内容。 二、载入文档(loading documents) 加载文档主要通过phpQuery::newDocument来进行操作,其作用是使得phpQuery可以在服务器预先读取到指定的文件或文本内容。 主要的方法包括: phpQuery::newDocument(html,contentType = null) phpQuery::newDocumentFile(file,contentType = null) phpQuery::newDocumentHTML(html,charset = ‘utf-8′) phpQuery::newDocumentXHTML(html