htmlcxx是一个C++的HTML解析器和CSS1的解析器。TheparsingpoliticsattempttomimicthebehaviorofMozillaFirefox,soyoushouldexpectparsetreessimilartothosecreatedbyFirefox.However,itdoesnotinsertnonexistentstuffinyourHTML.Therefore,serializingtheDOMtreegivesexactlythesameoutputastheoriginalHTMLdocument.AnotherkeyfeatureisanSTL-liketreenavigationAPIprovidedbythetree.hhtemplatelibrary.
示例代码:
#include<htmlcxx/html/ParserDom.h>...//Parsesomehtmlcodestringhtml="<html><body>hey</body></html>";HTML::ParserDomparser;tree<HTML::Node>dom=parser.parseTree(html);//PrintwholeDOMtreecout<<dom<<endl;//Dumpalllinksinthetreetree<HTML::Node>::iteratorit=dom.begin();tree<HTML::Node>::iteratorend=dom.end();for(;it!=end;++it){if(it->tagName()=="A"){it->parseAttributes();cout<<it->attributes("href");}}//Dumpalltextofthedocumentit=dom.begin();end=dom.end();for(;it!=end;++it){if((!it->isTag())&&(!it->isComment())){cout<<it->text();}}
评论