PEP 452 – API for Cryptographic Hash Functions v2.0
- Author:
- A.M. Kuchling <amk at amk.ca>, Christian Heimes <christian at python.org>
- Status:
- Final
- Type:
- Informational
- Created:
- 15-Aug-2013
- Post-History:
- Replaces:
- 247
Abstract
There are several different modules available that implement cryptographic hashing algorithms such as MD5 or SHA. This document specifies a standard API for such algorithms, to make it easier to switch between different implementations.
Specification
All hashing modules should present the same interface. Additional methods or variables can be added, but those described in this document should always be present.
Hash function modules define one function:
new([string]) (unkeyed hashes)
new(key, [string], [digestmod]) (keyed hashes)
- Create a new hashing object and return it. The first form is
for hashes that are unkeyed, such as MD5 or SHA. For keyed
hashes such as HMAC, ‘key’ is a required parameter containing
a string giving the key to use. In both cases, the optional
‘string’ parameter, if supplied, will be immediately hashed
into the object’s starting state, as if
obj.update(string)
was called.After creating a hashing object, arbitrary bytes can be fed into the object using its
update()
method, and the hash value can be obtained at any time by calling the object’sdigest()
method.Although the parameter is called ‘string’, hashing objects operate on 8-bit data only. Both ‘key’ and ‘string’ must be a bytes-like object (bytes, bytearray…). A hashing object may support one-dimensional, contiguous buffers as argument, too. Text (unicode) is no longer supported in Python 3.x. Python 2.x implementations may take ASCII-only unicode as argument, but portable code should not rely on the feature.
Arbitrary additional keyword arguments can be added to this function, but if they’re not supplied, sensible default values should be used. For example, ‘rounds’ and ‘digest_size’ keywords could be added for a hash function which supports a variable number of rounds and several different output sizes, and they should default to values believed to be secure.
Hash function modules define one variable:
digest_size
- An integer value; the size of the digest produced by the hashing objects created by this module, measured in bytes. You could also obtain this value by creating a sample object and accessing its ‘digest_size’ attribute, but it can be convenient to have this value available from the module. Hashes with a variable output size will set this variable to None.
Hashing objects require the following attribute:
digest_size
- This attribute is identical to the module-level digest_size
variable, measuring the size of the digest produced by the
hashing object, measured in bytes. If the hash has a variable
output size, this output size must be chosen when the hashing
object is created, and this attribute must contain the
selected size. Therefore,
None
is not a legal value for this attribute. block_size
- An integer value or
NotImplemented
; the internal block size of the hash algorithm in bytes. The block size is used by the HMAC module to pad the secret key todigest_size
or to hash the secret key if it is longer thandigest_size
. If no HMAC algorithm is standardized for the hash algorithm, returnNotImplemented
instead. name
- A text string value; the canonical, lowercase name of the hashing
algorithm. The name should be a suitable parameter for
hashlib.new
.
Hashing objects require the following methods:
copy()
- Return a separate copy of this hashing object. An update to this copy won’t affect the original object.
digest()
- Return the hash value of this hashing object as a bytes containing 8-bit data. The object is not altered in any way by this function; you can continue updating the object after calling this function.
hexdigest()
- Return the hash value of this hashing object as a string
containing hexadecimal digits. Lowercase letters should be used
for the digits ‘a’ through ‘f’. Like the
.digest()
method, this method mustn’t alter the object. update(string)
- Hash bytes-like ‘string’ into the current state of the hashing
object.
update()
can be called any number of times during a hashing object’s lifetime.
Hashing modules can define additional module-level functions or object methods and still be compliant with this specification.
Here’s an example, using a module named ‘MD5’:
>>> import hashlib
>>> from Crypto.Hash import MD5
>>> m = MD5.new()
>>> isinstance(m, hashlib.CryptoHash)
True
>>> m.name
'md5'
>>> m.digest_size
16
>>> m.block_size
64
>>> m.update(b'abc')
>>> m.digest()
b'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr'
>>> m.hexdigest()
'900150983cd24fb0d6963f7d28e17f72'
>>> MD5.new(b'abc').digest()
b'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr'
Rationale
The digest size is measured in bytes, not bits, even though hash algorithm sizes are usually quoted in bits; MD5 is a 128-bit algorithm and not a 16-byte one, for example. This is because, in the sample code I looked at, the length in bytes is often needed (to seek ahead or behind in a file; to compute the length of an output string) while the length in bits is rarely used. Therefore, the burden will fall on the few people actually needing the size in bits, who will have to multiply digest_size by 8.
It’s been suggested that the update()
method would be better named
append()
. However, that method is really causing the current
state of the hashing object to be updated, and update()
is already
used by the md5 and sha modules included with Python, so it seems
simplest to leave the name update()
alone.
The order of the constructor’s arguments for keyed hashes was a sticky issue. It wasn’t clear whether the key should come first or second. It’s a required parameter, and the usual convention is to place required parameters first, but that also means that the ‘string’ parameter moves from the first position to the second. It would be possible to get confused and pass a single argument to a keyed hash, thinking that you’re passing an initial string to an unkeyed hash, but it doesn’t seem worth making the interface for keyed hashes more obscure to avoid this potential error.
Changes from Version 1.0 to Version 2.0
Version 2.0 of API for Cryptographic Hash Functions clarifies some aspects of the API and brings it up-to-date. It also formalized aspects that were already de facto standards and provided by most implementations.
Version 2.0 introduces the following new attributes:
name
- The name property was made mandatory by issue 18532.
block_size
- The new version also specifies that the return value
NotImplemented
prevents HMAC support.
Version 2.0 takes the separation of binary and text data in Python
3.0 into account. The ‘string’ argument to new()
and update()
as
well as the ‘key’ argument must be bytes-like objects. On Python
2.x a hashing object may also support ASCII-only unicode. The actual
name of argument is not changed as it is part of the public API.
Code may depend on the fact that the argument is called ‘string’.
Recommended names for common hashing algorithms
algorithm | variant | recommended name |
---|---|---|
MD5 | md5 | |
RIPEMD-160 | ripemd160 | |
SHA-1 | sha1 | |
SHA-2 | SHA-224 | sha224 |
SHA-256 | sha256 | |
SHA-384 | sha384 | |
SHA-512 | sha512 | |
SHA-3 | SHA-3-224 | sha3_224 |
SHA-3-256 | sha3_256 | |
SHA-3-384 | sha3_384 | |
SHA-3-512 | sha3_512 | |
WHIRLPOOL | whirlpool |
Changes
- 2001-09-17: Renamed
clear()
toreset()
; addeddigest_size
attribute to objects; added.hexdigest()
method. - 2001-09-20: Removed
reset()
method completely. - 2001-09-28: Set
digest_size
toNone
for variable-size hashes. - 2013-08-15: Added
block_size
andname
attributes; clarified that ‘string’ actually refers to bytes-like objects.
Acknowledgements
Thanks to Aahz, Andrew Archibald, Rich Salz, Itamar Shtull-Trauring, and the readers of the python-crypto list for their comments on this PEP.
Copyright
This document has been placed in the public domain.
Source: https://github.com/python/peps/blob/main/pep-0452.txt
Last modified: 2021-09-17 18:18:24 GMT