linux poison RSS
linux poison Email

How To Convert PDF files to HTML file - pdftohtml

pdftohtml converts Portable Document Format files to HTML. This release converts text and links. Bold and italic face are preserved, but high level HTML structures (like lists or tables) are not yet generated. Images are ignored in the current version (but you can extract them from the pdf file using pdfimages, distributed with xpdf).

OpenSuSe user can install pdftohtml using "1-click" installer - here

Using  pdftohtml:
pdftohtml runs from the command line with various options. The basic form of the command is:
pdftohtml [pdf file name]
This command gives you a simple HTML file suitable for reading or copying the textual content of the PDF file. You can actually grab the text from your browser and paste it into other applications.

If you want to see graphics, you'll need to use the -c (as in "complex") option:
pdftohtml -c [pdf file name]
This option produces individual HTML files, one for each page of the PDF file, with the PNG references mixed in. The graphics in the original PDF file show up in a browser and the text part can be cut and pasted. The total size of the HTML and PNG files generated with the -c option tend to be roughly equivalent to that of the original PDF.


Post a Comment

Related Posts with Thumbnails