Tutorial
This tutorial uses a simplified ambient noise scenario as a vehicle for showing how pysmo fits into real code — not as a guide to ambient noise processing itself. Along the way it covers:
- Defining a custom seismogram class for a specific use case.
- Writing functions that operate on it.
- Using pysmo types to make those functions reusable.
Custom seismogram class
Our scenario involves ambient noise data where we want to track whether earthquake signals are present, but have no need for event information. A dataclass fits this well:
from dataclasses import dataclass, field
import numpy as np
import pandas as pd
@dataclass # (1)!
class NoiseSeismogram:
begin_time: pd.Timestamp # (2)!
delta: pd.Timedelta = pd.Timedelta(seconds=0.01) # (3)!
data: np.ndarray = field(default_factory=lambda: np.array([])) # (4)!
@property
def end_time(self) -> pd.Timestamp: # (5)!
if len(self.data) == 0:
return self.begin_time
return self.begin_time + self.delta * (len(self.data) - 1)
contains_earthquake: bool = False # (6)!
dataclassis a decorator that automatically generates special methods for the class, such as__init__,__repr__, and__eq__, based on the class attributes. This makes it easier to create classes that are primarily used to store data.- Instance attributes are defined simply by declaring them in the class body
with type annotations. Note the use of
pandas.Timestamphere - it is used throughout pysmo as the standard type for time information. - Attributes can have default values too.
- Care must be given if default values are mutable types (like lists or
dictionaries). In such cases, you need to use
field(default_factory=...)to ensure that each instance of the class gets its own separate copy of the mutable object. - We add a read-only property
end_timethat computes the end time of the seismogram based on the start time, number of samples, and sampling interval. - Finally, we add an attribute that lets us know if our seismogram contains earthquake signals or not.
A real project would have more attributes, but this is enough to demonstrate the pattern. Creating an instance:
$ python -i noise_seismogram.py
>>> begin_time = Timestamp("2023-01-01", tz="UTC")
>>> data = np.random.randn(1000) # Simulated noise data
>>> noise_seis = NoiseSeismogram(begin_time=begin_time, data=data)
>>>
Key observations
- The
dataclassdecorator generates__init__,__repr__, and__eq__automatically, keeping the class focused on what it stores. - Keeping methods out of the class and writing separate functions instead maintains a clear separation between data storage and processing.
- All attributes are non-optional — no
bool | None. Functions that use this class can assume all fields are present and skip defensiveNonechecks.
Functions that operate on the new class
Two functions handle the processing:
check_for_earthquakes(): checks whether earthquake signals are present.detrend(): detrends the seismogram data.
A first version:
import scipy
from noise_seismogram import NoiseSeismogram
def check_for_earthquakes(seismogram: NoiseSeismogram) -> None:
if seismogram.contains_earthquake is True:
print("Seismogram contains an earthquake.")
elif seismogram.contains_earthquake is False:
print("Seismogram does not contain an earthquake.")
else:
print("Seismogram earthquake status is unknown.")
def detrend(seismogram: NoiseSeismogram) -> None:
seismogram.data = scipy.signal.detrend(seismogram.data)
The type hints are correct and mypy confirms it:
With type checking in place, mypy can also identify unreachable code.
Running with --warn-unreachable:
$ python -m mypy --warn-unreachable functions_v1.py
functions_v1.py:11: error: Statement is unreachable [unreachable]
Found 1 error in 1 file (checked 1 source file)
The else branch is unreachable because contains_earthquake is
non-optional — it can only be True or False. Removing it:
import scipy
from noise_seismogram import NoiseSeismogram
def check_for_earthquakes(seismogram: NoiseSeismogram) -> None:
if seismogram.contains_earthquake is True:
print("Seismogram contains an earthquake.")
else: # (1)!
print("Seismogram does not contain an earthquake.")
def detrend(seismogram: NoiseSeismogram) -> None:
seismogram.data = scipy.signal.detrend(seismogram.data)
- At this point we know that
seismogram.contains_earthquakecan only beFalse, so we don't need anelifcheck anymore.
Mypy is happy with this version too:
Key observations
- Type hints on both the class and the functions let mypy verify their interaction statically.
- Non-optional attributes remove the need for defensive
Nonechecks in functions, which also gives mypy enough information to spot dead code. - Type checking catches errors before runtime. For validation at runtime, consider a library like pydantic.
Reusing functions in other contexts
Comparing the two functions, only check_for_earthquakes() relies on
contains_earthquake — the one attribute specific to this project. The
remaining attributes form a common baseline, suggesting detrend() should
work with other seismogram classes too. To test this, consider a second
project that stores the season alongside seismogram data:
from dataclasses import dataclass, field
from enum import StrEnum
import numpy as np
import pandas as pd
class Season(StrEnum): # (1)!
SPRING = "spring"
SUMMER = "summer"
AUTUMN = "autumn"
WINTER = "winter"
@dataclass
class SeasonSeismogram:
begin_time: pd.Timestamp
delta: pd.Timedelta = pd.Timedelta(seconds=0.01)
data: np.ndarray = field(default_factory=lambda: np.array([]))
@property
def end_time(self) -> pd.Timestamp:
if len(self.data) == 0:
return self.begin_time
return self.begin_time + self.delta * (len(self.data) - 1)
season: Season = Season.SUMMER # (2)!
StrEnumlimits the values a string attribute can take.- Much like with
NoiseSeismogram, we have just one project-specific attribute (season).
Mixin classes
In our two example classes, we write the end_time property in exactly the
same way for both classes. If we had a lot of classes that needed to have
the same implementation, we would be constantly repeating ourselves. To
avoid this, we could write a mixin class that can be included in our
class definitions:
class SeismogramEndtimeMixin:
"""Mixin class to add `end_time` property to a `Seismogram` object."""
@property
def end_time(self: Seismogram) -> pd.Timestamp:
"""Seismogram end time."""
if len(self.data) == 0:
return self.begin_time
return self.begin_time + self.delta * (len(self.data) - 1)
This tiny class can be inherited by both NoiseSeismogram and
SeasonSeismogram, and we can skip writing the end_time property:
@dataclass
class SeasonSeismogram(SeismogramEndtimeMixin): # (1)!
begin_time: Timestamp
delta: Timedelta = Timedelta(seconds=0.01)
data: np.ndarray = field(default_factory=lambda: np.array([]))
season: Season = Season.SUMMER
end_timeis inherited fromSeismogramEndtimeMixin, so we don't need to write a implementation anymore.
Note that there are complications that come along with class inheritance, so it is best to keep your mixin classes simple, or even focused on a single task (you can always add multiple mixins to your class if you need to).
Next we write a script that uses this new class together with the detrend()
function from earlier:
import numpy as np
import pandas as pd
from functions_v2 import detrend
from season_seismogram import Season, SeasonSeismogram
def main() -> None:
# Create a sample SeasonSeismogram instance with random data
begin_time = pd.Timestamp(2023, 1, 1, 0, 0, 0)
data = np.random.randn(1000)
season_seismogram = SeasonSeismogram(
begin_time=begin_time, data=data, season=Season.WINTER
)
# Use the season_seismogram with the detrend function
detrend(season_seismogram)
if __name__ == "__main__":
main()
This script runs correctly:
But mypy flags a type mismatch — detrend() expects a NoiseSeismogram and
is being passed a SeasonSeismogram:
$ python -m mypy season_detrend_v1.py
season_detrend_v1.py:16: error: Argument 1 to "detrend" has incompatible type "SeasonSeismogram"; expected "NoiseSeismogram" [arg-type]
Type annotations prevent using non-existent attributes, but they don't
require using all of them. detrend() only touches data, which both
classes happen to share. We just got lucky this time.
To fix this, we need to amend the type annotations of the detrend() function:
import scipy
from noise_seismogram import NoiseSeismogram
from season_seismogram import SeasonSeismogram # (1)!
def check_for_earthquakes(seismogram: NoiseSeismogram) -> None:
if seismogram.contains_earthquake is True:
print("Seismogram contains an earthquake.")
else:
print("Seismogram does not contain an earthquake.")
def detrend(seismogram: NoiseSeismogram | SeasonSeismogram) -> None:
seismogram.data = scipy.signal.detrend(seismogram.data)
- We need to import
SeasonSeismogramto be able to use it in our type annotations.
With these changes in place, mypy is happy again:
Key observations
- We have successfully reused the
detrend()function in a different context. - However, it did require changing the type annotations of the function.
- While the changes were small, making them every time we want to reuse the function is cumbersome.
- The
check_for_earthquakes()function is not reusable at all, as it relies on thecontains_earthquakeattribute that only exists inNoiseSeismogram. Thus we can identify two types of functions: those that are reusable and those that are not. This is also reflected in their respective type annotations.
Introducing pysmo
Writing custom classes for each project and updating shared functions every time a new class is introduced is difficult to maintain. Each new class requires touching function annotations, and changes to any class risk breaking the functions that depend on it. The standard solution is to define an interface between functions and classes: functions target the interface, and classes conform to it.
Pysmo provides such an interface for seismogram (and other) classes. These
interfaces make use of Python's Protocol class. Below is
the actual implementation of pysmo's Seismogram interface:
@runtime_checkable
class Seismogram(Protocol):
"""Protocol class to define the `Seismogram` type.
Examples:
Usage for a function that takes a Seismogram compatible class instance as
argument and returns the begin time in isoformat:
```python
>>> from pysmo import Seismogram
>>> from pysmo.classes import SAC # SAC is a class that "speaks" Seismogram
>>>
>>> def example_function(seis_in: Seismogram) -> str:
... return seis_in.begin_time.isoformat()
...
>>> sac = SAC.from_file("example.sac")
>>> seismogram = sac.seismogram
>>> example_function(seismogram)
'2005-03-01T07:23:02.160000+00:00'
>>>
```
"""
begin_time: pd.Timestamp
"""Seismogram begin time."""
data: np.ndarray
"""Seismogram data."""
delta: pd.Timedelta
"""The sampling interval.
Should be a positive `pd.Timedelta` instance.
"""
@property
def end_time(self) -> pd.Timestamp:
"""Seismogram end time."""
...
Strip away the docstrings and this looks much like the common structure of
NoiseSeismogram and SeasonSeismogram. The key difference is that end_time
is declared but not implemented — Protocol classes provide
type information only and cannot be instantiated directly.
Note
Python Protocol classes are used almost exclusively in
type annotations. We will therefore refer to the ones shipped with pysmo as
types rather than protocols or interfaces.
Python considers classes that have the same structure as a protocol class to be
subclasses of the protocol class. For our particular case, this means that
instances of NoiseSeismogram and SeasonSeismogram are also instances of
Seismogram.
Annotating detrend() with the Seismogram type rather than listing every
class:
import scipy
from noise_seismogram import NoiseSeismogram
from pysmo import Seismogram # (1)!
def check_for_earthquakes(seismogram: NoiseSeismogram) -> None:
if seismogram.contains_earthquake is True:
print("Seismogram contains an earthquake.")
else:
print("Seismogram does not contain an earthquake.")
def detrend(seismogram: Seismogram) -> None: # (2)!
seismogram.data = scipy.signal.detrend(seismogram.data)
- Instead of importing
SeasonSeismogram, we now importSeismogram. - Any class that satisfies the
Seismogramstructure is now accepted — no further changes todetrend()needed.
Key observations
- The
detrend()function now uses pysmo types in its annotations. - Because
NoiseSeismogramandSeasonSeismogramare subclasses ofSeismogram, type checkers will accept instances ofNoiseSeismogramandSeasonSeismogramas valid inputs fordetrend(). - Future custom seismogram classes will also be accepted as inputs without
any changes to the
detrend()function, provided the structure prescribed by theSeismogramtype is adhered to. - The
check_for_earthquakes()is still annotated withNoiseSeismogram, because it uses thecontains_earthquakeattribute that doesn't exist in theSeismogramtype.
Conclusion
This tutorial introduced the core ideas behind pysmo rather than its API:
- Pysmo is not centred around a single seismogram class. Monolithic classes tend to reflect the use cases their authors had in mind, not the ones users actually have.
- Custom seismogram classes fit specific use cases well, but create friction when writing reusable code.
- Pysmo addresses this by defining interfaces — pysmo types — that capture what different classes have in common. Functions target the interface; any conforming class works.
- Pysmo types are intentionally narrow: few attributes, almost no methods.
The same principles apply to the processing modules pysmo ships with, which is why they work just as well in your own code as in pysmo's.