This will be easy in any language. Here it is in pseudo-Python; I've omitted the lxml
bits because I don't have access to them and I can't quite remember the syntax. They're not difficult, though.
with open(...) as htmls, open(...) as slugs, open(...) as output:
for html, slug in zip(htmls, slugs):
root = lxml.etree.fromstring(html)
# do some fiddling with lxml to get the name
slug = slug.split("-")[(len(name.split()):]
# add in the extra child in lxml
output.write(root.tostring())
Interesting features:
This doesn't read in the entire file at once; it does it chunk by chunk (well, line-by-line but Python will buffer it). Useful if the files are huge, but probably irrelevant.
lxml
may be overkill, depending on how rigid the format of the html strings is. If they're guaranteed to be the same and all well-formed, it might be easier for you to use simple string operations. On the other hand, lxml
is pretty fast and offers a lot more flexibility.