一个可扩展的PHPWEB蜘蛛,示例代码:
use VDB\Spider\Spider;use VDB\Spider\Discoverer\XPathExpressionDiscoverer;$spider = new Spider('https://www.oschina.net');特性:
supportstwotraversalalgorithms:breadth-firstanddepth-first
supportsdepthlimitingandqueuesizelimiting
supportsaddingcustomURIdiscoverylogic,basedonXPath,CSSselectors,orplainoldPHP
comeswithausefulsetofURIfilters,suchasDomainlimiting
supportscustomURIfilters,bothprefetch(URI)andpostfetch(Resourcecontent)
supportscustomrequesthandlinglogic
comeswithausefulsetofpersistencehandlers(memory,file.Redissoontofollow)
supportscustompersistencehandlers
collectsstatisticsaboutthecrawlforreporting
dispatchesusefulevents,allowingdeveloperstoaddevenmorecustombehavior
supportsapolitenesspolicy
willsooncomewithmanydefaultdiscoverers:RSS,Atom,RDF,etc.
willsoonsupportmultiplequeueingmechanisms(file,memcache,redis)
willeventuallysupportdistributedspideringwithacentralqueue
评论