linux poison RSS
linux poison Email

HTML and XML Manipulation Utilities - HTML-XML-utils

HTML-XML-utils consists of a set of small C programs (filters) that read HTML and XML files and can add a table of contents, an alphabetical index, a bibliography, cross-references, numbered headings, remove elements, count elements, pretty-print them, etc. When it reads HTML, it assumes the code is correct HTML 4.0 or close to it.

Below are the sets of utilities included:
 asc2xml      -  convert from UTF-8 to &#nnn; entities
 xml2asc      -  convert from &#nnn; entities to UTF-8
 hxaddid      -  add IDs to selected elements
 hxcite       -  replace bibliographic references by hyperlinks
 hxcite-mkbib -  expand references and create bibliography
 hxclean      -  apply heuristics to correct an HTML file
 hxcopy       -  copy an HTML file while preserving relative links
 hxcount      -  count elements and attributes in HTML or XML files
 hxextract    -  extract selected elements
 hxincl       -  expand included HTML or XML files
 hxindex      -  create an alphabetically sorted index
 hxmkbib      -  create bibliography from a template
 hxmultitoc   -  create a table of contents for a set of HTML files
 hxname2id    -  move some ID= or NAME= from A elements to their parents
 hxnormalize  -  pretty-print an HTML file
 hxnsxml      -  convert output of hxxmlns back to normal XML
 hxnum        -  number section headings in an HTML file
 hxpipe       -  convert XML to a format easier to parse with Perl or AWK
 hxprintlinks -  number links & add table of URLs at end of an HTML file
 hxprune      -  remove marked elements from an HTML file
 hxref        -  generate cross-references
 hxselect     -  extract elements that match a (CSS) selector
 hxtoc        -  insert a table of contents in an HTML file
 hxuncdata    -  replace CDATA sections by character entities
 hxunent      -  replace HTML predefined character entities to UTF-8
 hxunpipe     -  convert output of pipe back to XML format
 hxunxmlns    -  replace "global names" by XML Namespace prefixes
 hxwls        -  list links in an HTML file
 hxxmlns      -  replace XML Namespace prefixes by "global names"

HTML-XML-utils Installation:
Open the terminal and type following command:
sudo apt-get install HTML-XML-utils


Post a Comment

Related Posts with Thumbnails