Inline Read More Link in Python Using lxml
When dealing with displaying large HTML content, we need a way to generate its summary and a “read more” or “continue” link to the full article.
This is a simple matter of truncating the HTML content at a defined maximum word count and adding the convenient ellipsis (…).
For instance:
Full html:
<p>paragraph 1</p><p>paragraph 2</p><p>paragraph 3</p><p>paragraph 4</p>
Truncated html:
<p>paragraph 1</p><p>paragraph 2...</p>
The problem is: most of the time, trying to add a “read more” link to <p>paragraph 1</p><p>paragraph 2...</p>
would put the link outside of the HTML content. This results in a really frustrating line break since
Instead of:
<p>paragraph 1</p><p>paragraph 2...</p><a href="/read-more/">read more</a>
What we really want:
<p>paragraph 1</p><p>paragraph 2...<a href="/read-more/">read more</a></p>
I ran into this pet peeve a while back when I was trying to add an inline “answer” link to my question and answer joke. Here is the code I wrote to address this:
def insert_into_last_element(html, element):
try:
from lxml.html import fragment_fromstring, fragments_fromstring, tostring
from lxml.etree import ParserError
except ImportError:
raise Exception("Unable to find lxml")
try:
item = fragment_fromstring(element)
except ParserError, TypeError:
item = fragment_fromstring('<span></span>')
try:
doc = fragments_fromstring(html)
doc[-1].append(item)
return ''.join(tostring(e) for e in doc)
except ParserError, TypeError:
return ''
Seeing that the same need exists in Pelican
, I added the functionality to my fork, and submitted a pull request here.
I hope this is useful to you and if anyone has suggestions for improvement, please do not hesitate to let me know.
UPDATE: Per recommendation of Alexis Metaireau, I reimplemented this feature as a plugin and resubmitted another pull request to the pelican-plugins project here.