PDF Clown
0.0.8

it.stefanochizzolini.clown.tools
Class TextExtractor

java.lang.Object
  extended by it.stefanochizzolini.clown.tools.TextExtractor

public class TextExtractor
extends Object

Tool for extracting text from content contexts.

Since:
0.0.8
Version:
0.0.8
Author:
Stefano Chizzolini (http://www.stefanochizzolini.it)

Nested Class Summary
static class TextExtractor.AreaModeEnum
          Text-to-area matching mode.
 
Constructor Summary
TextExtractor()
           
TextExtractor(boolean sorted)
           
TextExtractor(List<Rectangle2D> areas, boolean sorted)
           
 
Method Summary
 Map<Rectangle2D,List<ITextString>> extract(Contents contents)
          Extracts text strings from the given contents.
 Map<Rectangle2D,List<ITextString>> extract(IContentContext contentContext)
          Extracts text strings from the given content context.
 String extractPlain(Contents contents)
          Extracts plain text from the given contents.
 String extractPlain(IContentContext contentContext)
          Extracts plain text from the given content context.
 Map<Rectangle2D,List<ITextString>> filter(List<? extends ITextString> textStrings, Rectangle2D... areas)
          Gets the text strings matching the given areas.
 List<ITextString> filter(List<? extends ITextString> textStrings, Rectangle2D area)
          Gets the text strings matching the given area.
 Map<Rectangle2D,List<ITextString>> filter(Map<Rectangle2D,List<ITextString>> textStrings, Rectangle2D... areas)
          Gets the text strings matching the given areas.
 List<ITextString> filter(Map<Rectangle2D,List<ITextString>> textStrings, Rectangle2D area)
          Gets the text strings matching the given area.
 TextExtractor.AreaModeEnum getAreaMode()
          Gets the text-to-area matching mode.
 List<Rectangle2D> getAreas()
          Gets the graphic areas whose text has to be extracted.
 double getAreaTolerance()
          Gets the admitted outer area (in points) for containment matching purposes.
 boolean isSorted()
          Gets whether the text strings have to be sorted.
 void setAreaMode(TextExtractor.AreaModeEnum value)
           
 void setAreas(List<Rectangle2D> value)
           
 void setAreaTolerance(double value)
           
 void setSorted(boolean value)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TextExtractor

public TextExtractor()

TextExtractor

public TextExtractor(boolean sorted)

TextExtractor

public TextExtractor(List<Rectangle2D> areas,
                     boolean sorted)
Method Detail

extract

public Map<Rectangle2D,List<ITextString>> extract(IContentContext contentContext)
Extracts text strings from the given content context.

Parameters:
contentContext - Source content context.

extract

public Map<Rectangle2D,List<ITextString>> extract(Contents contents)
Extracts text strings from the given contents.

Parameters:
contents - Source contents.

extractPlain

public String extractPlain(IContentContext contentContext)
Extracts plain text from the given content context.

Parameters:
contentContext - Source content context.

extractPlain

public String extractPlain(Contents contents)
Extracts plain text from the given contents.

Parameters:
contents - Source contents.

filter

public List<ITextString> filter(Map<Rectangle2D,List<ITextString>> textStrings,
                                Rectangle2D area)
Gets the text strings matching the given area.

Parameters:
textStrings - Text strings to filter, grouped by source area.
area - Graphic area which text strings have to be matched to.

filter

public Map<Rectangle2D,List<ITextString>> filter(Map<Rectangle2D,List<ITextString>> textStrings,
                                                 Rectangle2D... areas)
Gets the text strings matching the given areas.

Parameters:
textStrings - Text strings to filter, grouped by source area.
areas - Graphic areas which text strings have to be matched to.

filter

public List<ITextString> filter(List<? extends ITextString> textStrings,
                                Rectangle2D area)
Gets the text strings matching the given area.

Parameters:
textStrings - Text strings to filter.
area - Graphic area which text strings have to be matched to.

filter

public Map<Rectangle2D,List<ITextString>> filter(List<? extends ITextString> textStrings,
                                                 Rectangle2D... areas)
Gets the text strings matching the given areas.

Parameters:
textStrings - Text strings to filter.
areas - Graphic areas which text strings have to be matched to.

getAreaMode

public TextExtractor.AreaModeEnum getAreaMode()
Gets the text-to-area matching mode.


getAreas

public List<Rectangle2D> getAreas()
Gets the graphic areas whose text has to be extracted.


getAreaTolerance

public double getAreaTolerance()
Gets the admitted outer area (in points) for containment matching purposes.

Remarks

This measure is useful to ensure that text whose boxes overlap with the area bounds is not excluded from the match.


isSorted

public boolean isSorted()
Gets whether the text strings have to be sorted.


setAreaMode

public void setAreaMode(TextExtractor.AreaModeEnum value)
See Also:
getAreaMode()

setAreas

public void setAreas(List<Rectangle2D> value)
See Also:
getAreas()

setAreaTolerance

public void setAreaTolerance(double value)
See Also:
getAreaTolerance()

setSorted

public void setSorted(boolean value)
See Also:
isSorted()

PDF Clown
0.0.8

Project home page

Copyright © 2006-2010 Stefano Chizzolini. Some Rights Reserved.
This documentation is available under the terms of the GNU Free Documentation License.