TIL - Python's re.sub takes a replacement function

While I'm not a fan of regular expressions, they sometimes do fit a problem quite well. However, when the problem evolves just a bit into the wrong direction, it might just so happen, that your previous simple regex-solution either becomes a monstrosity or just completely unfeasible.

Today, I wanted to solve a simple problem where a regular expression seemed fine:

Given a string (containing HTMLish elements), convert all <Calendar attr="value"> elements into {% component "Calendar" attr="value" %} .

This seemed simple enough for a substitution based on regular expression, so I came up with the following code:

1import re
2
3input = '<Calendar attr="value">'
4pattern = "<Calendar([^>]*)>"
5replacement = '{% component "Calendar"\1 %}'
6
7output = re.sub(pattern, replacement , input)
8assert output == '{% component "Calendar" attr="value" %}'

After I got this first version to work, I wanted to test it on real input. And sure enough, it failed as soon as newlines where introduced. To be more precise, the regular expression substitution still worked fine, but it turns out that the output should not contain newlines, since that would break the Django Templating Language.

So I needed to find a way to remove newlines from the matched group \1 . After almost giving up on supporting this use-case, I found out that the replacement argument to re.sub can be a function which the regex-match is passed to! So, how can we remove newlines from the first matched group? Easy:

 1import re
 2
 3input = '<Calendar attr="value">'
 4pattern = "<Calendar([^>]*)>"
 5def replacement(m):
 6    args = m.group(1).replace("\n", " ")
 7    return f'{{% component "Calendar"{args} %}}'
 8
 9output = re.sub(pattern, replacement , input)
10assert output == '{% component "Calendar" attr="value" %}'

So, maybe this helps somebody. I definitely did not know about this before. Though, after having written these paragraphs it occurred to me to ask ChatGPT for this particular problem and sure enough, it gives the exact same answer. So, at least my solution is not wrong. ;)

6