Home » pdfbox-1.1.0-src » org.apache.pdfbox.pdfparser » [javadoc | source]
org.apache.pdfbox.pdfparser
public class: PDFParser [javadoc | source]
java.lang.Object
   org.apache.pdfbox.pdfparser.BaseParser
      org.apache.pdfbox.pdfparser.PDFParser
This class will handle the parsing of the PDF document.
Fields inherited from org.apache.pdfbox.pdfparser.BaseParser:
ENDSTREAM,  ENDOBJ,  DEF,  pdfSource,  document
Constructor:
 public PDFParser(InputStream input) throws IOException 
    Constructor.
    Parameters:
    input - The input stream that contains the PDF document.
    Throws:
    IOException - If there is an error initializing the stream.
 public PDFParser(InputStream input,
    RandomAccess rafi) throws IOException 
    Constructor to allow control over RandomAccessFile.
    Parameters:
    input - The input stream that contains the PDF document.
    rafi - The RandomAccessFile to be used in internal COSDocument
    Throws:
    IOException - If there is an error initializing the stream.
 public PDFParser(InputStream input,
    RandomAccess rafi,
    boolean force) throws IOException 
    Constructor to allow control over RandomAccessFile. Also enables parser to skip corrupt objects to try and force parsing
    Parameters:
    input - The input stream that contains the PDF document.
    rafi - The RandomAccessFile to be used in internal COSDocument
    force - When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file
    Throws:
    IOException - If there is an error initializing the stream.
Method from org.apache.pdfbox.pdfparser.PDFParser Summary:
getDocument,   getFDFDocument,   getPDDocument,   parse,   setTempDirectory
Methods from org.apache.pdfbox.pdfparser.BaseParser:
isClosing,   isClosing,   isEOL,   isEOL,   isEndOfName,   isWhitespace,   isWhitespace,   parseBoolean,   parseCOSArray,   parseCOSDictionary,   parseCOSName,   parseCOSStream,   parseCOSString,   parseDirObject,   readExpectedString,   readInt,   readLine,   readString,   readString,   setDocument,   skipSpaces
Methods from java.lang.Object:
clone,   equals,   finalize,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.pdfbox.pdfparser.PDFParser Detail:
 public COSDocument getDocument() throws IOException 
    This will get the document that was parsed. parse() must be called before this is called. When you are done with this document you must call close() on it to release resources.
 public FDFDocument getFDFDocument() throws IOException 
    This will get the FDF document that was parsed. When you are done with this document you must call close() on it to release resources.
 public PDDocument getPDDocument() throws IOException 
    This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.
 public  void parse() throws IOException 
    This will parse the stream and populate the COSDocument object. This will close the stream when it is done parsing.
 public  void setTempDirectory(File tmpDir) 
    This is the directory where pdfbox will create a temporary file for storing pdf document stream in. By default this directory will be the value of the system property java.io.tmpdir.