An extension of an idea I've posted earlier, which is useful on its own. With it we can inspect all HTML entities in a bunch of files.
In this case we've been lucky and the five remaining entities are XML compliant. Any other should be transformed to get a valid XML file.
pabloa$ grep -ohe "&[^;]*;" *.xml | sort | uniq
If anyone knows a bash equivalent to "html_entity_decode" from php (short of listing each of them in a "sed" script), please let us know.