Only once, and then it’ll work for 10000 documents. Hopefully. And “the source HTML” is all you have, anyway. If you think about converting the PDF to HTML – don’t. That’ll probably not yield a better structured HTML than the original one.
In any case, a sample PDF and original HTML would help to figure out if/how to get at the data. As it stands, it’s all a bit foggy.
But there is no HTML/XML source available for this record. You’re kind of reading it backwards. source
is meant to access the HTML/XML of an HTML/XML record. Not to add stuff to arbitrary records. And in the case of an HTML record, one can actually modify source
, creating all kinds of havoc.
Since it’s possible to convert a PDF to HTML (at least in the one case I tried it with), the documentation seems in fact to imply that this conversion might happen on the spot, creating an HTML document in the source
property. Which doesn’t happen, though. @cgrunenberg would have to comment on that