Home » pdfbox-1.1.0-src » org.apache.pdfbox.searchengine.lucene » [javadoc | source]
org.apache.pdfbox.searchengine.lucene
public final class: LucenePDFDocument [javadoc | source]
java.lang.Object
   org.apache.pdfbox.searchengine.lucene.LucenePDFDocument
This class is used to create a document for the lucene search engine. This should easily plug into the IndexHTML or IndexFiles that comes with the lucene project. This class will populate the following fields.
Lucene Field Name Description
path File system path if loaded from a file
url URL to PDF document
contents Entire contents of PDF document, indexed but not stored
summary First 500 characters of content
modified The modified date/time according to the url or path
uid A unique identifier for the Lucene document.
CreationDate From PDF meta-data if available
Creator From PDF meta-data if available
Keywords From PDF meta-data if available
ModificationDate From PDF meta-data if available
Producer From PDF meta-data if available
Subject From PDF meta-data if available
Trapped From PDF meta-data if available
Constructor:
 public LucenePDFDocument() 
Method from org.apache.pdfbox.searchengine.lucene.LucenePDFDocument Summary:
convertDocument,   convertDocument,   convertDocument,   getDateTimeResolution,   getDocument,   getDocument,   getDocument,   main,   setDateTimeResolution,   setTextStripper
Methods from java.lang.Object:
clone,   equals,   finalize,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.pdfbox.searchengine.lucene.LucenePDFDocument Detail:
 public Document convertDocument(InputStream is) throws IOException 
    Convert the PDF stream to a lucene document.
 public Document convertDocument(File file) throws IOException 
    This will take a reference to a PDF document and create a lucene document.
 public Document convertDocument(URL url) throws IOException 
    Convert the document from a PDF to a lucene document.
 public Resolution getDateTimeResolution() 
    Get the Lucene data time resolution.
 public static Document getDocument(InputStream is) throws IOException 
    This will get a lucene document from a PDF file.
 public static Document getDocument(File file) throws IOException 
    This will get a lucene document from a PDF file.
 public static Document getDocument(URL url) throws IOException 
    This will get a lucene document from a PDF file.
 public static  void main(String[] args) throws IOException 
    This will test creating a document. usage: java pdfparser.searchengine.lucene.LucenePDFDocument <pdf-document>
 public  void setDateTimeResolution(Resolution resolution) 
    Set the Lucene data time resolution.
 public  void setTextStripper(PDFTextStripper aStripper) 
    Set the text stripper that will be used during extraction.