Hi there

An extension of an idea I've posted earlier, which is useful on its own. With it we can inspect all HTML entities in a bunch of files.

Code:
pabloa$ grep -ohe "&[^;]*;" *.xml | sort | uniq
&
'
>
<
"
In this case we've been lucky and the five remaining entities are XML compliant. Any other should be transformed to get a valid XML file.

If anyone knows a bash equivalent to "html_entity_decode" from php (short of listing each of them in a "sed" script), please let us know.

Cheers.
P.