JerichoHTML解析器是一个Java库,以分析和操纵部分的HTML文件,其中包括服务器端的标签,而过滤掉任何无法识别的或无效的HTML。它也提供高层次的HTML表单操作函数。
示例代码:
importnet.htmlparser.jericho.*;importjava.util.*;importjava.io.*;importjava.net.*;publicclassEncoding{publicstaticvoidmain(String[]args)throwsException{StringsourceUrlString="data/test.html";if(args.length==0)System.err.println("Usingdefaultargumentof\""+sourceUrlString+'"');elsesourceUrlString=args[0];if(sourceUrlString.indexOf(':')==-1)sourceUrlString="file:"+sourceUrlString;System.out.println("\nSourceURL:");System.out.println(sourceUrlString);URLurl=newURL(sourceUrlString);Sourcesource=newSource(url);System.out.println("\nDocumentTitle:");ElementtitleElement=source.getFirstElement(HTMLElementName.TITLE);System.out.println(titleElement!=null?titleElement.getContent().toString():"(none)");System.out.println("\nSource.getEncoding():");System.out.println(source.getEncoding());System.out.println("\nSource.getEncodingSpecificationInfo():");System.out.println(source.getEncodingSpecificationInfo());System.out.println("\nSource.getPreliminaryEncodingInfo():");System.out.println(source.getPreliminaryEncodingInfo());}}
评论