Inline Read More Link in Python Using lxml

on January 06, 2013 in technology about

When dealing with displaying large html content, we need a way to generate its summary and a “read more” or “continue” link to the full article.

This is a simple matter of truncating the html content at a defined maximum word count and adding the convenient ellipsis (…).

For instance:

Full html:
<p>paragraph 1</p><p>paragraph 2</p><p>paragraph 3</p><p>paragraph 4</p>

Truncated html:
<p>paragraph 1</p><p>paragraph 2...</p>

The problem is: most of the time trying to add a “read more” link to <p>paragraph 1</p><p>paragraph 2...</p> would put the link outside of the html content. This results in a really frustrating line break since <p></p> is a block element. In my opinion, the line break is disruptive and not as aesthetically pleasing.

What we really want:

Instead of:
<p>paragraph 1</p><p>paragraph 2...</p><a href="/read-more/">read more</a>

We want:
<p>paragraph 1</p><p>paragraph 2...<a href="/read-more/">read more</a></p>

I ran into this pet peeve a while back when I was trying to add an inline “answer” link to my question & answer joke for officecheese.com. Here is the code I wrote to address this:

def insert_into_last_element(html, element):
    try:
        from lxml.html import fragment_fromstring, fragments_fromstring, tostring
        from lxml.etree import ParserError
    except ImportError:
        raise Exception("Unable to find lxml")

    try:
        item = fragment_fromstring(element)
    except ParserError, TypeError:
        item = fragment_fromstring('<span></span>')

    try:
        doc = fragments_fromstring(html)
        doc[-1].append(item)

        return ''.join(tostring(e) for e in doc)
    except ParserError, TypeError:
        return ''

Seeing that the same need exists in Pelican, I added the functionality to my fork, and submitted a pull request here.

I hope this is useful to you and if anyone has suggestions for improvement, please don’t hesitate to let me know.

UPDATE: Per recommendation of Alexis Metaireau, I reimplemented this feature as a plugin and resubmitted another pull request to the pelican-plugins project here.