1、伪造客户端IP地址,伪造访问referer:(一般情况下这就可以访问到数据了)
curl_setopt($curl, CURLOPT_HTTPHEADER, ['X-FORWARDED-FOR:110.85.108.185', 'CLIENT-IP:110.85.108.185']); curl_setopt($curl, CURLOPT_REFERER, 'http://www.demo.com/test.php');
2、如是上面的还是不行,可能是别人抓到了真实IP,这时候我们就使用代{过}{滤}理访问。
# 详细方式 curl_setopt($curl, CURLOPT_PROXY, 'x.x.x.x'); //代{过}{滤}理服务器地址 curl_setopt($curl, CURLOPT_PROXYPORT, 80); //代{过}{滤}理服务器端口 //curl_setopt($curl, CURLOPT_PROXYUSERPWD, ':''); //http代{过}{滤}理认证帐号,username:password的格式 curl_setopt($curl, CURLOPT_PROXYTYPE, CURLPROXY_HTTP); //使用http代{过}{滤}理模式 # 简写方式 curl_setopt($curl, CURLOPT_PROXY, 'http://x.x.x.x:80');
3、还有一种就是用浏览器可以访问,用curl不行。(对方检查了useragent,如果没有就认为是非法来源等验证了)
$useragent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 '; $useragent.= '(KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36'; curl_setopt($curl, CURLOPT_USERAGENT, $useragent);
PHP完整Curl抓取数据函数:
<?php /** * 模拟post进行url请求 * @param string $url * @param array $postData */ function request_post($url = '', $postData = []) { if (empty($url)) { return false; } if ($postData != []) { $vars = http_build_query($postData, '', '&'); curl_setopt($ch, CURLOPT_POSTFIELDS, $vars); } $header = array( 'CLIENT-IP:1.1.1.1', 'X-FORWARDED-FOR:1.1.1.1', ); $postUrl = $url; //初始化curl //转义 $ch = curl_init(); //抓取指定网页 curl_setopt($ch, CURLOPT_HTTPHEADER, $header); curl_setopt($ch, CURLOPT_URL,$postUrl); //设置header curl_setopt($ch, CURLOPT_HEADER, 0); //要求结果为字符串且输出到屏幕上 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //规避SSL验证 curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); //跳过HOST验证 curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false); //运行curl $data = curl_exec($ch); curl_close($ch); return $data; } /** * 测试 * @param string $url */ function testAction() { $url = 'https://www.baidu.com/'; $res = request_post($url); print_r($res); } testAction();