Duck typing is quackery¶
Apologies for the pun, and to Phil Haack, who beat me to it.
What is duck typing?¶
Duck typing is a dynamic type system, popularized by Python and Ruby, that applies the duck test:
If it looks like a duck, swims like a duck, and quacks like a duck, then it is probably a duck.
In programming, the duck test becomes:
If it has no-arg functions
swim
andquack
, then it is probably a duck.
The claimed benefits are (a) not having to worry about types, and (b) decreased coupling because interfaces are implied. But the downsides are substantial. I want to focus on two downsides that seem seldom discussed.
- Listing types provide structured, easy-to-glean documentation, a practice that duck typing countermands.
- Because users can pass instances of any types, tests must cover arbitrary inputs.
I’ll discuss these through a real example.
A matrix algorithm¶
You devised some clever matrix algorithm that operates on two matrices and outputs a third. After putting your paper on arXiv, you implement it in three languages—Java, Scala, and Python.
import org.nd4j.linalg.api.ndarray.IndArray;
public class MyAlgorithm {
public Matrix calculate(IndArray matrixA, IndArray matrixB) {
// some fancy things...
return finalMatrix;
}
}
import breeze.linalg.Matrix;
class MyAlgorithm {
def calculate(matrixA: Matrix, matrixB: Matrix): Matrix = {
// some fancy things...
finalMatrix
}
class MyAlgorithm:
def calculate(self, matrix_a, matrix_b):
# some fancy things...
return final_matrix
In Java, the method takes two IndArray
instances and outputs a third. If people read your paper, it’s clear how to use it. In Scala, the function takes two Breeze Matrices and outputs a matrix.
For the Python code, you realize that matrix_a
isn’t clear to users. You don’t want them passing in lists of lists, so you add some documentation:
class MyAlgorithm:
def calculate(self, matrix_a, matrix_b):
"""
Calculates on matrices a and b.
Arguments:
matrix_a: A matrix, like Numpy’s (i.e., not a list of lists)
matrix_b: A second matrix
Returns:
A matrix
"""
# some fancy things...
That looks Pythonic. Better to ask for forgiveness than permission.
Then, you write some tests. In fact, you test your code with Numpy ndarray
, PySpark matrices, and Polars DataFrames. That’s three times the work of your Java and Scala implementations, but at least you’re safe and sound.
A year later, a PhD student in a dark room finds your lovely Python library. It’s got 450 GitHub stars, so it must be good. She incorporates it into her own code, which she’s already written tests for. She’s not testing your code—nor should she. She assumes it works. She gets an answer—a reasonable-looking matrix, and it gets published. Of course, it’s wrong.
What happened? Somewhere in calculate
is this code:
def calculate(self, matrix_a, matrix_b):
# some fancy things
final_matrix = matrix_a_prime * matrix_b_prime
return final_matrix
You assumed element-wise multiplication because thats what Numpy ndarray
does. The matrix library she used defines __mul__
as matrix multiplication. That’s not her fault. Maybe you can improve the documentation:
def calculate(matrix_a, matrix_b):
"""
Calculates on matrices a and b.
Arguments:
matrix_a: A matrix; must define the following methods:
`__mul__`: Matrix multiplication
`__add__`: Elementwise addition
`__len__`: Number of rows
`__item__`: Takes a tuple (row, column) and returns the value
...
matrix_b: A second matrix with the same methods
Returns:
A matrix
"""
# some fancy things...
#
return matrix_a * matrix_b
Ok, it got out of hand. We’ve made a lot of work for ourselves writing docs and tests, made an unreadable mess of documentation, and still failed to over all possible libraries that follow the documented semantics. But it really wasn’t your fault; you were writing Pythonic code and following duck typing.
In contrast, our Scala code was easy to test, and it’s easy to understand how to use at a glance.
import breeze.linalg.Matrix;
class MyAlgorithm {
def calculate(matrixA: Matrix, matrixB: Matrix): Matrix = {
// some fancy things...
matrixAPrime * matrixBPrime
}
}
When documentation is rendered, it shows that matrixA
and matrixB
are both breeze.linalg.Matrix
. Upon seeing the method signature, a user could read up about how Breeze matrices works. Alternatively, they could skip that – because you, as the library author, are certifying that your code works with Breeze matrices. If they’re using matrices of another library, they can write an adapter.
import breeze.linalg.Matrix;
class GooborToBreezeMatrixAdapter extends Matrix {
def *(other: GooborMatrix): GooborMatrix = this.matrixMultiply(other)
}
Scala structural typing
Scala supports true compile-time structural typing. You could easily rewrite calculate
like this:
type Matrix = {
def rows: Int
def cols: Int
def +(other: Matrix): Matrix
def *(other: Matrix): Matrix
}
class MyAlgorithm {
def calculate(matrixA: Matrix, matrixB: Matrix): Matrix = {
// some fancy things...
matrixAPrime * matrixBPrime
}
}
*
is ambiguous). Example 2: Lets break quack
¶
Let’s consider a much simpler example. A quack
function could behave in at least two ways:
from dataclasses import dataclass
@dataclass()
class Duck:
name: str
quacking: bool = False
def quack(self) -> None:
self.quacking = True # (1)!
- Make this duck quack.
from dataclasses import dataclass
from typing import Self
@dataclass(frozen=True, slots=True)
class Duck:
name: str
quacking: bool = False
def quack(self) -> Self:
return Duck(quacking=True) # (1)!
- Return a new duck that’s quacking.
When using this function, how do you call it? You have two options.
def do_something_with_ducks(duck):
duck.quack()
...
def do_something_with_ducks(duck):
duck = duck.quack()
...
In fact, what really matters is the behavior of the duck, not whether it’s a duck. Knowing (or suspecting) that it’s a duck is not enough; you need to know how the duck-like object behaves. As Alex Martelli (the author of Python in a Nutshell) wrote in an email thread:
Quote
Amen, hallelujah. You don’t really care for IS-A – you really only care for BEHAVES-LIKE-A- […], so, if you do test, this behaviour is what you should be testing for — Alex Martelli