Goutte是一个抓取网站数据的PHP库。它提供了一个优雅的API,这使得从远程页面上选择特定元素变得简单。
示例代码:
require_once '/path/to/goutte.phar';use Goutte\Client;//发送请求$client = new Client();$crawler = $client->request('GET', 'https://www.oschina.net/');//点击链接$link = $crawler->selectLink('Plugins')->link();$crawler = $client->click($link);//提交表单$form = $crawler->selectButton('sign in')->form();$crawler = $client->submit($form, array('signin[username]' => 'fabien', 'signin[password]' => 'xxxxxx'));//提取数据$nodes = $crawler->filter('.error_list');if ($nodes->count()){ die(sprintf("Authentication error: %s\n", $nodes->text()));}printf("Nb tasks: %d\n", $crawler->filter('#nb_tasks')->text());
评论