PEP 502 – String Interpolation - Extended Discussion
- Author:
- Mike G. Miller
- Status:
- Rejected
- Type:
- Informational
- Created:
- 10-Aug-2015
- Python-Version:
- 3.6
Table of Contents
- Abstract
- PEP Status
- Motivation
- Rationale
- Additional Topics
- Acknowledgements
- References
- Copyright
Abstract
PEP 498: Literal String Interpolation, which proposed “formatted strings” was accepted September 9th, 2015. Additional background and rationale given during its design phase is detailed below.
To recap that PEP,
a string prefix was introduced that marks the string as a template to be
rendered.
These formatted strings may contain one or more expressions
built on the existing syntax of str.format()
.
The formatted string expands at compile-time into a conventional string format
operation,
with the given expressions from its text extracted and passed instead as
positional arguments.
At runtime, the resulting expressions are evaluated to render a string to given specifications:
>>> location = 'World'
>>> f'Hello, {location} !' # new prefix: f''
'Hello, World !' # interpolated result
Format-strings may be thought of as merely syntactic sugar to simplify traditional
calls to str.format()
.
PEP Status
This PEP was rejected based on its using an opinion-based tone rather than a factual one. This PEP was also deemed not critical as PEP 498 was already written and should be the place to house design decision details.
Motivation
Though string formatting and manipulation features are plentiful in Python, one area where it falls short is the lack of a convenient string interpolation syntax. In comparison to other dynamic scripting languages with similar use cases, the amount of code necessary to build similar strings is substantially higher, while at times offering lower readability due to verbosity, dense syntax, or identifier duplication.
These difficulties are described at moderate length in the original post to python-ideas that started the snowball (that became PEP 498) rolling. [1]
Furthermore, replacement of the print statement with the more consistent print function of Python 3 (PEP 3105) has added one additional minor burden, an additional set of parentheses to type and read. Combined with the verbosity of current string formatting solutions, this puts an otherwise simple language at an unfortunate disadvantage to its peers:
echo "Hello, user: $user, id: $id, on host: $hostname" # bash
say "Hello, user: $user, id: $id, on host: $hostname"; # perl
puts "Hello, user: #{user}, id: #{id}, on host: #{hostname}\n" # ruby
# 80 ch -->|
# Python 3, str.format with named parameters
print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(**locals()))
# Python 3, worst case
print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(user=user,
id=id,
hostname=
hostname))
In Python, the formatting and printing of a string with multiple variables in a single line of code of standard width is noticeably harder and more verbose, with indentation exacerbating the issue.
For use cases such as smaller projects, systems programming, shell script replacements, and even one-liners, where message formatting complexity has yet to be encapsulated, this verbosity has likely lead a significant number of developers and administrators to choose other languages over the years.
Rationale
Goals
The design goals of format strings are as follows:
- Eliminate need to pass variables manually.
- Eliminate repetition of identifiers and redundant parentheses.
- Reduce awkward syntax, punctuation characters, and visual noise.
- Improve readability and eliminate mismatch errors, by preferring named parameters to positional arguments.
- Avoid need for
locals()
andglobals()
usage, instead parsing the given string for named parameters, then passing them automatically. [2] [3]
Limitations
In contrast to other languages that take design cues from Unix and its
shells,
and in common with Javascript,
Python specified both single ('
) and double ("
) ASCII quote
characters to enclose strings.
It is not reasonable to choose one of them now to enable interpolation,
while leaving the other for uninterpolated strings.
Other characters,
such as the “Backtick” (or grave accent `
) are also
constrained by history
as a shortcut for repr()
.
This leaves a few remaining options for the design of such a feature:
- An operator, as in printf-style string formatting via
%
. - A class, such as
string.Template()
. - A method or function, such as
str.format()
. - New syntax, or
- A new string prefix marker, such as the well-known
r''
oru''
.
The first three options above are mature. Each has specific use cases and drawbacks, yet also suffer from the verbosity and visual noise mentioned previously. All options are discussed in the next sections.
Background
Formatted strings build on several existing techniques and proposals and what we’ve collectively learned from them. In keeping with the design goals of readability and error-prevention, the following examples therefore use named, not positional arguments.
Let’s assume we have the following dictionary, and would like to print out its items as an informative string for end users:
>>> params = {'user': 'nobody', 'id': 9, 'hostname': 'darkstar'}
Printf-style formatting, via operator
This venerable technique continues to have its uses, such as with byte-based protocols, simplicity in simple cases, and familiarity to many programmers:
>>> 'Hello, user: %(user)s, id: %(id)s, on host: %(hostname)s' % params
'Hello, user: nobody, id: 9, on host: darkstar'
In this form, considering the prerequisite dictionary creation,
the technique is verbose, a tad noisy,
yet relatively readable.
Additional issues are that an operator can only take one argument besides the
original string,
meaning multiple parameters must be passed in a tuple or dictionary.
Also, it is relatively easy to make an error in the number of arguments passed,
the expected type,
have a missing key,
or forget the trailing type, e.g. (s
or d
).
string.Template Class
The string.Template
class from PEP 292
(Simpler String Substitutions)
is a purposely simplified design,
using familiar shell interpolation syntax,
with safe-substitution feature,
that finds its main use cases in shell and internationalization tools:
Template('Hello, user: $user, id: ${id}, on host: $hostname').substitute(params)
While also verbose, the string itself is readable.
Though functionality is limited,
it meets its requirements well.
It isn’t powerful enough for many cases,
and that helps keep inexperienced users out of trouble,
as well as avoiding issues with moderately-trusted input (i18n) from
third-parties.
It unfortunately takes enough code to discourage its use for ad-hoc string
interpolation,
unless encapsulated in a convenience library such as flufl.i18n
.
PEP 215 - String Interpolation
PEP 215 was a former proposal of which this one shares a lot in common. Apparently, the world was not ready for it at the time, but considering recent support in a number of other languages, its day may have come.
The large number of dollar sign ($
) characters it included may have
led it to resemble Python’s arch-nemesis Perl,
and likely contributed to the PEP’s lack of acceptance.
It was superseded by the following proposal.
str.format() Method
The str.format()
syntax of PEP 3101 is the most recent and modern of the
existing options.
It is also more powerful and usually easier to read than the others.
It avoids many of the drawbacks and limits of the previous techniques.
However, due to its necessary function call and parameter passing, it runs from verbose to very verbose in various situations with string literals:
>>> 'Hello, user: {user}, id: {id}, on host: {hostname}'.format(**params)
'Hello, user: nobody, id: 9, on host: darkstar'
# when using keyword args, var name shortening sometimes needed to fit :/
>>> 'Hello, user: {user}, id: {id}, on host: {host}'.format(user=user,
id=id,
host=hostname)
'Hello, user: nobody, id: 9, on host: darkstar'
The verbosity of the method-based approach is illustrated here.
PEP 498 – Literal String Formatting
PEP 498 defines and discusses format strings, as also described in the Abstract above.
It also, somewhat controversially to those first exposed, introduces the idea that format-strings shall be augmented with support for arbitrary expressions. This is discussed further in the Restricting Syntax section under Rejected Ideas.
PEP 501 – Translation ready string interpolation
The complimentary PEP 501 brings internationalization into the discussion as a
first-class concern, with its proposal of the i-prefix,
string.Template
syntax integration compatible with ES6 (Javascript),
deferred rendering,
and an object return value.
Implementations in Other Languages
String interpolation is now well supported by various programming languages
used in multiple industries,
and is converging into a standard of sorts.
It is centered around str.format()
style syntax in minor variations,
with the addition of arbitrary expressions to expand utility.
In the Motivation section it was shown how convenient interpolation syntax existed in Bash, Perl, and Ruby. Let’s take a look at their expression support.
Bash
Bash supports a number of arbitrary, even recursive constructs inside strings:
> echo "user: $USER, id: $((id + 6)) on host: $(echo is $(hostname))"
user: nobody, id: 15 on host: is darkstar
Perl
Perl also has arbitrary expression constructs, perhaps not as well known:
say "I have @{[$id + 6]} guanacos."; # lists
say "I have ${\($id + 6)} guanacos."; # scalars
say "Hello { @names.join(', ') } how are you?"; # Perl 6 version
Ruby
Ruby allows arbitrary expressions in its interpolated strings:
puts "One plus one is two: #{1 + 1}\n"
Others
Let’s look at some less-similar modern languages recently implementing string interpolation.
Scala
Scala interpolation is directed through string prefixes. Each prefix has a different result:
s"Hello, $name ${1 + 1}" # arbitrary
f"$name%s is $height%2.2f meters tall" # printf-style
raw"a\nb" # raw, like r''
These prefixes may also be implemented by the user,
by extending Scala’s StringContext
class.
- Explicit interpolation within double quotes with literal prefix.
- User implemented prefixes supported.
- Arbitrary expressions are supported.
ES6 (Javascript)
Designers of Template strings faced the same issue as Python where single and double quotes were taken. Unlike Python however, “backticks” were not. Despite their issues, they were chosen as part of the ECMAScript 2015 (ES6) standard:
console.log(`Fifteen is ${a + b} and\nnot ${2 * a + b}.`);
Custom prefixes are also supported by implementing a function the same name as the tag:
function tag(strings, ...values) {
console.log(strings.raw[0]); // raw string is also available
return "Bazinga!";
}
tag`Hello ${ a + b } world ${ a * b}`;
- Explicit interpolation within backticks.
- User implemented prefixes supported.
- Arbitrary expressions are supported.
C#, Version 6
C# has a useful new interpolation feature as well,
with some ability to customize interpolation via the IFormattable
interface:
$"{person.Name, 20} is {person.Age:D3} year{(p.Age == 1 ? "" : "s")} old.";
- Explicit interpolation with double quotes and
$
prefix. - Custom interpolations are available.
- Arbitrary expressions are supported.
Apple’s Swift
Arbitrary interpolation under Swift is available on all strings:
let multiplier = 3
let message = "\(multiplier) times 2.5 is \(Double(multiplier) * 2.5)"
// message is "3 times 2.5 is 7.5"
- Implicit interpolation with double quotes.
- Arbitrary expressions are supported.
- Cannot contain CR/LF.
Additional examples
A number of additional examples of string interpolation may be found at Wikipedia.
Now that background and history have been covered, let’s continue on for a solution.
New Syntax
This should be an option of last resort, as every new syntax feature has a cost in terms of real-estate in a brain it inhabits. There is however one alternative left on our list of possibilities, which follows.
New String Prefix
Given the history of string formatting in Python and backwards-compatibility, implementations in other languages, avoidance of new syntax unless necessary, an acceptable design is reached through elimination rather than unique insight. Therefore, marking interpolated string literals with a string prefix is chosen.
We also choose an expression syntax that reuses and builds on the strongest of
the existing choices,
str.format()
to avoid further duplication of functionality:
>>> location = 'World'
>>> f'Hello, {location} !' # new prefix: f''
'Hello, World !' # interpolated result
PEP 498 – Literal String Formatting, delves into the mechanics and implementation of this design.
Additional Topics
Safety
In this section we will describe the safety situation and precautions taken in support of format-strings.
- Only string literals have been considered for format-strings,
not variables to be taken as input or passed around,
making external attacks difficult to accomplish.
str.format()
and alternatives already handle this use-case. - Neither
locals()
norglobals()
are necessary nor used during the transformation, avoiding leakage of information. - To eliminate complexity as well as
RuntimeError
(s) due to recursion depth, recursive interpolation is not supported.
However, mistakes or malicious code could be missed inside string literals. Though that can be said of code in general, that these expressions are inside strings means they are a bit more likely to be obscured.
Mitigation via Tools
The idea is that tools or linters such as pyflakes, pylint, or Pycharm, may check inside strings with expressions and mark them up appropriately. As this is a common task with programming languages today, multi-language tools won’t have to implement this feature solely for Python, significantly shortening time to implementation.
Farther in the future, strings might also be checked for constructs that exceed the safety policy of a project.
Style Guide/Precautions
As arbitrary expressions may accomplish anything a Python expression is able to, it is highly recommended to avoid constructs inside format-strings that could cause side effects.
Further guidelines may be written once usage patterns and true problems are known.
Reference Implementation(s)
The say module on PyPI implements string interpolation as described here with the small burden of a callable interface:
> pip install say
from say import say
nums = list(range(4))
say("Nums has {len(nums)} items: {nums}")
A Python implementation of Ruby interpolation is also available. It uses the codecs module to do its work:
> pip install interpy
# coding: interpy
location = 'World'
print("Hello #{location}.")
Backwards Compatibility
By using existing syntax and avoiding current or historical features, format strings were designed so as to not interfere with existing code and are not expected to cause any issues.
Postponed Ideas
Internationalization
Though it was highly desired to integrate internationalization support, (see PEP 501), the finer details diverge at almost every point, making a common solution unlikely: [15]
- Use-cases differ
- Compile vs. run-time tasks
- Interpolation syntax needs
- Intended audience
- Security policy
Rejected Ideas
Restricting Syntax to str.format()
Only
The common arguments against support of arbitrary expressions were:
- YAGNI, “You aren’t gonna need it.”
- The feature is not congruent with historical Python conservatism.
- Postpone - can implement in a future version if need is demonstrated.
Support of only str.format()
syntax however,
was deemed not enough of a solution to the problem.
Often a simple length or increment of an object, for example,
is desired before printing.
It can be seen in the Implementations in Other Languages section that the developer community at large tends to agree. String interpolation with arbitrary expressions is becoming an industry standard in modern languages due to its utility.
Additional/Custom String-Prefixes
As seen in the Implementations in Other Languages section, many modern languages have extensible string prefixes with a common interface. This could be a way to generalize and reduce lines of code in common situations. Examples are found in ES6 (Javascript), Scala, Nim, and C# (to a lesser extent). This was rejected by the BDFL. [14]
Automated Escaping of Input Variables
While helpful in some cases, this was thought to create too much uncertainty of when and where string expressions could be used safely or not. The concept was also difficult to describe to others. [12]
Always consider format string variables to be unescaped, unless the developer has explicitly escaped them.
Environment Access and Command Substitution
For systems programming and shell-script replacements, it would be useful to handle environment variables and capture output of commands directly in an expression string. This was rejected as not important enough, and looking too much like bash/perl, which could encourage bad habits. [13]
Acknowledgements
- Eric V. Smith for the authoring and implementation of PEP 498.
- Everyone on the python-ideas mailing list for rejecting the various crazy ideas that came up, helping to keep the final design in focus.
References
Copyright
This document has been placed in the public domain.
Source: https://github.com/python/peps/blob/main/pep-0502.txt
Last modified: 2022-01-21 11:03:51 GMT