Xidel is a command line tool to download web pages or JSON-APIs and extract data from them. It can download files over HTTP/S connections, follow redirections, links, (partially) filled-in forms, extracted values, and process local files. The data can be extracted using XPath 2.0, XQuery 1.0, XPath/XQuery 3.0 and JSONiq expressions, CSS 3 selectors, and custom, pattern-matching templates that are like an annotated version of the processed page. The extracted values can then be exported as plain
Ansifilter converts ANSI terminal escape sequences to HTML, RTF, BBCode, Pango Markup, LaTeX and Plain TeX. It also converts ANSI art files (CP437, BIN, XBIN, TND) to HTML or RTF.
Highlight converts sourcecode to HTML, RTF, LaTeX, TeX, SVG, Pango, BBCode and terminal escape sequences with coloured syntax highlighting. Language definitions and colour themes are customizable Lua scripts. It provides a plug-in interface to tweak syntax parsing and coloring.
TEA is powerful text editor that provides hundreds of text processing functions. It supports QML plugins and external scripts. TEA can open plain text files, FB2, ODT, RTF, DOCX, Abiword, KWord KWD, SWX, PDF, DJVU. Other features: Built-in MC-like file manager. Spellchecker (using the aspell or/and Hunspell). Tabbed layout engine. Syntax highlighting for C, C++, Bash script, BASIC, C#, D, Fortran, Java, LilyPond, Lout, Lua, NASM, NSIS, Pascal, Perl, PHP, PO (gettext), Python, Seed7, TeX/LaTeX,
rssgen is a command-line utility that builds an RSS feed file from multiple .html files. It reads either metatags or sets title from h1 tags, description from h2 tags and published Date from file mtime if metatags aren't available.
mmu2html is a tool to convert text files with mixed markup and html code into html files. It can be used for static website generation. It has been designed with asciidoc in mind, but with additional support for menu creation, file linking, and other web site specific features.
Dillo is a small and speed-oriented web browser. It's implemented in C, C++ using the FLTK gui toolkit. It implements a custom HTML and CSS rendering engine for low resource usage. Functionality-wise it targets end-users and web developers.
htmLawed is a PHP script to process text with HTML markup to make it more compliant with HTML standards and administrative policies. It works by making HTML well-formed with balanced and properly nested tags, neutralizing code that may be used for cross-site scripting (XSS) attacks, allowing only specified HTML tags and attributes and URL protocols through black- or white-lists. It can also tidy/pretty-print HTML, make relative URLs absolute, check for spam, etc. It is small (single file of ~50
A small Unix command line tool that can be used to extract data from tables in a HTML-encoded text file. Outputs the (stripped from other HTML tags and (possible whitespace) data as a CSV-formatted file/on stdout. Should handle recursive tables and the most common incorrect HTML errors (missing </td>, </th> or </tr> tags.
WebGrid is a Python library to generate an interactive DHTML datagrid table with sorting, filtering and paging. It's designed to work on top of SQLAlchemy ORM entities. It also allows exporting into Excel files.
Markdown Taglib is a JSP tag library to render Markdown text to HTML. It uses pegdown as dependency. Pegdown is a pure Java library for clean and lightweight Markdown processing.
SubLime is a tool to overlay subtitles loaded from a subtitles file (in SRT format) over an HTML5 video element in your browser. It is available as a bookmarklet and as an extension for the Chrome browser.
Hasciicam makes it possible to have live ASCII video on the Web. It captures video from a TV card and renders it into ASCII, formatting the output into an HTML page with a refresh tag or in a live ASCII window or in a simple text file as well, giving anyone that has a bttv card, a Linux box, and a cheap modem line the ability to show a live asciivideo feed that can be browsable without any need for a plugin, Java, etc.
HtmlToText is a small PHP utility class to convert HTML into plain text, but preserving links in MarkDown format.