Not so Fast#
Before starting your journey with pysmo, you should have a basic understanding of typing in the Python programming language. More precisely, you should know what type hinting is, and how it is used in conjunction with modern code editors (or other tools that check your code before it is executed).
Tip
Keep in mind that not only do programming languages themselves evolve, but also the tools used for writing code. Thus you get the most benefit out of pysmo when used together with a modern editor/IDE such as VSCode, PyCharm, Neovim, etc.
Dynamic and Static Typing#
Python is a dynamically typed language. This means that the type
(float
, str
, etc.) of a variable isn't set until you run
code and assign a value to it. This is convenient, but can produce errors at
runtime if you are not careful. This can be demonstrated with this simple
function:
def division(a, b):
return a / b
We load load this function into an interactive Python session and call it with
the arguments 5
and 2
. Thus both variables a=5
and
b=2
are numbers and we get the expected result:
$ python -i division.py
>>> division(5, 2)
2.5 # (1)!
>>>
- In Python, dividing two integers always creates a float!
In a second run they are set to a="hello"
and b="world"
. They
are now strings, and the code doesn't make much sense anymore...
>>> division("hello", "world")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in division
TypeError: unsupported operand type(s) for /: 'str' and 'str'
>>>
Evidently the function only can be used if the variables a
and b
are numbers. To be clear, there is nothing wrong syntactically in this
example, but certain operations are only available to the correct types (which
is why a TypeError
was raised at runtime). In order to detect
these kinds of issues before running a program, Python allows adding type
annotations to variables. This helps keep track of what types of input a
function accepts, and what kind of output to expect:
def division(a: float, b: float) -> float: # (1)!
return a / b
division("a", "b") # <- produces editor warning
division(1, 2) # <- OK
- Besides specifying that
a
andb
are expected to be floats, we are also making it clear that the object returned by the function is also a float. This is important if the output of thedivision
function itself is used elsewhere.
These annotations, also known as type hints, are not enforced (i.e. Python will still happily try running that function with strings as arguments). However, besides being a useful form of self-documentation, type hints become very powerful in combination with a modern source code editor, or third party tools like mypy. Both will scan code and catch type errors in your code before it is executed, making it sort of "quasi statically typed" (1).
- because you can run your code even with type errors!
Duck Typing#
At this point one may ask why static typing is not enforced everywhere. Well,
sometimes it is more useful to consider how something behaves, rather than what
it actually is. This is often referred to as duck typing. The same way that
something can be considered a duck if it walks and talks like one, any object
that has all the right attributes and methods expected e.g. by a function, can
also be used as input for that function. The following example defines two
classes for ducks and humans, and a function which runs error free when its
argument is duck-like (it can quack and waddle, rather than strictly being of
type Duck
):
class Duck: # (1)!
def quack(self):
return "quack, quack!"
def waddle(self):
return "waddle, waddle!"
class Human: # (2)!
def quack(self):
return "quack, quack!"
def waddle(self):
return "waddle, waddle!"
def is_a_duck(thing): # (3)!
try:
thing.quack()
thing.waddle()
print("I must be a duck!")
except AttributeError:
print("I'm unable to walk and talk like a duck.")
-
The
duck
class has two methods:quack
andwaddle
. -
A human can walk (
waddle
) and talk (quack
) like a duck. -
This function, designed to answer the question of whether or not a
thing
is a duck, actually doesn't really care if thething
is a indeed aduck
(or not). It merely requires thething
to be able to talk and walk like one. It will determine that anything
that is able toquack
andwaddle
is aduck
.
We then use this class in an interactive session, where the is_a_duck
function
tells us that donald
(correctly) and joe
(incorrectly) are both ducks:
$ python -i duck.py
>>> donald = Duck() # (1)!
>>> joe = Human() # (2)!
>>> is_a_duck(donald)
I must be a duck!
>>> is_a_duck(joe)
I must be a duck!
>>>
- Create an instance of
Duck
calleddonald
. - Create an instance of
Human
calledjoe
.
The reason for this, is simply because the is_a_duck
function doesn't check at
all what it is given as input; as long as the thing
object has the methods
quack
and waddle
it will happily tell us something is a duck
. Note that
in some instances this is actually desired behavior.
Duck typing in the wild.
A real world example where duck typing is used in Python, is in the
built-in len()
function:
>>> my_string = "hello world"
>>> len(my_string) # the len() function works with a string (1)!
11
>>> my_list = [1, 2, 3]
>>> len(my_list) # and with a list (2)!
3
>>> my_int = 42
>>> len(my_int) # but not with an integer (3)!
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'int' has no len()
- The
len()
function works with a string, where it returns the number of characters in the string ... - ... and with a list, where it ruturns the number of items in the list.
- But not with an integer.
Behind the scenes, len()
doesn't look for valid input
types, but rather if the object it is given as input possesses the
__len__()
attribute:
>>> hasattr(my_string,'__len__')
True
>>> hasattr(my_list,'__len__')
True
>>> hasattr(my_int,'__len__')
False
Note that we haven't annotated is_a_duck()
with a type signature, making
it quite fragile. If we were to change Duck
or Human
in ways that make them
incompatible with the function we wouldn't find out until runtime. To fix this,
we could annotate is_a_duck()
like this:
def is_a_duck(thing: Duck | Human) -> None: ...
is_a_duck()
now accepts only objects of type Duck
or Human
. This
is now much safer to use, but it is also strongly coupled to both Duck
and
Human
. If we were to change either of those classes, we might have to also
change the function (or even the other class!). If we wanted to use a third
class with this function some time in the future, we might find ourselves
similarly forced to edit code all over the place to make things work. Using
type hints like this to define a function that works with multiple different
input types quickly reaches its limits. Fortunately, Python has a solution to
this problem: Protocol
classes.
Structural subtyping (static duck typing)#
The two strategies (duck vs static typing) may appear somewhat orthogonal. In
cases similar to the len()
function they probably are, but what
if we want duck typing with a bit more control? This is indeed possible with a
strategy called
structural subtyping.
Revisiting the duck example from before, this time with with a new Robot
class and structural subtyping:
from typing import Protocol # (1)!
class Ducklike(Protocol): # (2)!
def quack(self) -> str: ... # (3)!
def waddle(self) -> str: ...
class Duck: # (4)!
def quack(self) -> str:
return "quack, quack!"
def waddle(self) -> str:
return "waddle, waddle!"
class Human: # (5)!
def quack(self) -> str:
return "quack, quack!"
def waddle(self) -> str:
return "waddle, waddle!"
def dance(self) -> str:
return "shaking those hips!"
class Robot: # (6)!
def quack(self) -> bytes:
return bytes("beep, quack!", "UTF-8")
def waddle(self) -> str:
return "waddle, waddle!"
def is_a_duck(thing: Ducklike) -> None: # (7)!
try:
thing.quack()
thing.waddle()
print("I must be a duck!")
except AttributeError:
print("I'm unable to walk and talk like a duck.")
- We import the
Protocol
class ... - ... and use it to define our
Ducklike
class. This protocol class defines a structure (attributes and methods with their respective types) that can be compared with structure present in any other class. If those classes have a matching structure, they are considered subclasses (in terms of typing) of the protocol class. Ellipses (
...
) are preferred overpass
statements here.- We add type hints to the otherwise unchanged Duck class. Because it has the
same structure as the
Ducklike
protocol class, it is implicitly considered a subclass ofDucklike
. - The Human class is also a subclass of
Ducklike
, even though we added a new dance method. - An advanced robot can also walk and talk like a duck. However, it talks in
bytes instead of strings. This means the
Robot
class is notDucklike
- Unlike before, we only add annotations for one class
(the
Protocol
class) to the function. It is now not coupled to any specific classes anymore. All we are saying is that the function works with things that areDucklike
(i.e. the subclasses ofDucklike
-Duck
, andHuman
, but notRobot
).
Loading this new version into an interactive Python session we get the following:
$ python -i duck_protocol.py
>>> donald = Duck()
>>> joe = Human()
>>> robert = Robot()
>>> like_a_duck(donald)
I must be a duck!
like_a_duck(joe)
I must be a duck! # (1)!
like_a_duck(robert)
I must be a duck! # (2)!
>>>
- As before,
donald
andjoe
appear to beducks
. - Even this prints "I must be a duck!", but mypy or your IDE will mark it as incompatible.
The above example illustrates how Protocol
classes are
used, but doesn't explain why they are useful. With regards to pysmo, there are
two important lessons to be learned here:
- The type annotations for the
like_a_duck()
function tell us it is written with the base classDucklike
in mind instead of a particular implementation of aduck
class. This decoupling means we can write code using a well defined and consistent interface. - All attributes and methods in the
Protocol
class need to be matched with the "real" classes, but not the other way around. TheDuck
orHuman
classes may well contain methods likefly
,run
,eat
,sleep
, etc. However, they can safely be ignored bylike_a_duck()
.
In isolation the above two points may not appear that significant, but when we put them together the implications are quite substantial. The goal when writing code should always be to make it easy to understand and as reusable as possible, after all. Protocol classes help with exactly that. They are almost always going to be far less complex than a generic class(1). As such they allow breaking up a problem into smaller pieces, and write e.g. a function that works with a certain protocol rather than one particular class. The protocols define an interface a function can work with. It's as if a contract exists between a class and a function, whereby the class guarantees that the part they have a contract for is never going to change, regardless of what might happen elsewhere in the class. In pysmo, these contracts are the types we will discuss in greater detail later on.
- A generic class is a proper class, which holds data, has methods and
attributes etc (unlike a
Protocol
class, which only contains the structure of a class).
Next steps#
- Learn more about type hinting and how to check your code for type errors using mypy.
- If you aren't already, consider switching to using a code editor that checks your code (not just for typing errors) as you write it.
- Continue onwards to the next chapter and install pysmo!