Skip to content
Vuong Nguyen avatar Vuong Nguyen

Inline Read More Link in Python Using lxml

· Technology

When dealing with displaying large HTML content, we need a way to generate its summary and a “read more” or “continue” link to the full article.

This is a simple matter of truncating the HTML content at a defined maximum word count and adding the convenient ellipsis (…).

For instance:

Full html:

<p>paragraph 1</p><p>paragraph 2</p><p>paragraph 3</p><p>paragraph 4</p>

Truncated html:

<p>paragraph 1</p><p>paragraph 2...</p>

The problem is: most of the time, trying to add a “read more” link to <p>paragraph 1</p><p>paragraph 2...</p> would put the link outside of the HTML content. This results in a really frustrating line break since

is a block element. In my opinion, the line break is disruptive and not aesthetically pleasing.

Instead of:

<p>paragraph 1</p><p>paragraph 2...</p><a href="/read-more/">read more</a>

What we really want:

<p>paragraph 1</p><p>paragraph 2...<a href="/read-more/">read more</a></p>

I ran into this pet peeve a while back when I was trying to add an inline “answer” link to my question and answer joke. Here is the code I wrote to address this:

def insert_into_last_element(html, element):
   try:
      from lxml.html import fragment_fromstring, fragments_fromstring, tostring
      from lxml.etree import ParserError
   except ImportError:
      raise Exception("Unable to find lxml")

   try:
      item = fragment_fromstring(element)
   except ParserError, TypeError:
      item = fragment_fromstring('<span></span>')

   try:
      doc = fragments_fromstring(html)
      doc[-1].append(item)

      return ''.join(tostring(e) for e in doc)
   except ParserError, TypeError:
      return ''

Seeing that the same need exists in Pelican, I added the functionality to my fork, and submitted a pull request here.

I hope this is useful to you and if anyone has suggestions for improvement, please do not hesitate to let me know.

UPDATE: Per recommendation of Alexis Metaireau, I reimplemented this feature as a plugin and resubmitted another pull request to the pelican-plugins project here.