Pdfsandwich是将文本添加到图像形式的文本PDF文件(如扫描书籍)的工具。它使用光学字符识别(OCR)创建一个额外的图层,包含了原始页面已识别的文本。这对于复制和处理文本很有用。
Pdfsandwich是一个命令行工具,与同类的软件相比,它在扫描图像时执行了预处理程序,如版面校正和去除黑边等。
运行效果最终的识别结果
VisionariesII7andsilverligreeornaments;goldandsilverower-stands,etc.;elaboratecolouredpatternsofcarpetsinbrillianttintsarenotuncommon.Anotherpeculiarityresidesintheextremerestlessnessofmyvisualobjects.Itisoftenverydifficulttokeepthemstill,aswellasfromchangingincharacter.Theywillrapidlyoscil-lateorelserotatetoamostperplexingdegree,andwhenthecharacterschangeatthesametimeacriticalexaminationisalmostimpossible.Whentheprocessisinfullactivity,lfeelasifIwereamerespectatoratadioramaofaveryeccentrickind,andwasinnowayconcernedwiththegettingupoftheperformance.Whena.successionofimageshasbeenpassing,Isometimesalezermz'netointroduceanobject,sayawatch.Veryoftenitisnexttoimpossibletosucceed.Thereisanevidentstruggle.Thewatch,pureandsimple,willnotcome;butsomehybridstructureappearssomethinground,perhapsbutitlapsesintoawarming-panorotherunexpectedobject.Thispracticehasbroughttomymindveryclearlythedis-tinctionbetweenatleastoneformofautomatismofthebrainandvolition;butthestrengthoftheformerisenormous,forthevisualobjects,wheninfullcareerofthechange,areimpera-tiveintheirrefusaltobeinterferedwith.[...]
获取代码SVNCheckout
svn checkout svn://svn.code.sf.net/p/pdfsandwich/code/trunk/src pdfsandwich
评论