Convert HTML into Plain Text

The HTML2TEXT package does not come included within the Docxpresso API Core distribution but it should rather be downloaded from:

HTML2TEXT is really an spinoff of Docxpresso that may be downloaded and used independently.

Although there are several PHP libraries that already convert HTML into plain text we believe that our HTML2TEXT package present several advantages:

  • Its license is MIT so can use it with practically no restriction whatsoever.
  • The developer can use "out of the box" multiple options to customize the output.
  • Only uses DOM methods making it very handy for further customizations.

This package has only one public method:

Signature

public plainText ([$options])

Parameters

  • $options (type: array).
    This array has the following available keys and values:
    • bold (type: string). A string of chars that will wrap text in <b> or <strong>. The default value is an empty string.
    • cellSeparator (type: string, default: ' || '). A string of chars used to separate content between contiguous cells in a row. Default value is " || " (\t may be also a sensible choice)
    • images (type: boolean, default: true). If set to true the alt value associated to the image will be printed like [img: alt value].
    • italics (type: string). A string of chars that will wrap text in <i> or <em>. The default value is an empty string.
    • newLine (type: string). If set it will replace the default value (\n\r) for titles and paragraphs.
    • tab (type: string, default: ' '). A string of chars that will be used like a "tab". The default value is " " (\t may be another standard option).
    • titles (type: string, default: underline). It can be "underline" (default), "uppercase" or "none".

A example of use reads:

<?php
/**
 * This sample assumes theat the library has been installed via composer
 * otherwise you may simply load the library as you will usually do
 */
require __DIR__ . '/../vendor/autoload.php';
use Docxpresso\HTML2TEXT as Parser;
$html = '<p>A simple paragraph.</p>';
$parser = new Parser\HTML2TEXT($html);
echo $parser->plainText();