Functions
Writing functions is the easiest and most common way of using pysmo. Regardless of what the functions do, they will most likely follow one of the patterns discussed below.
Pysmo types as input
The simplest way of using pysmo types is in functions that only use them to annotate inputs. In these instances we don't need to be concerned about how differences between different compatible classes affect other parts of your code, as their "journeys" end here.
For example, the following function takes any Seismogram
compatible object as input, and returns a timedelta:
from pysmo import Seismogram
from datetime import timedelta
def double_delta_td(seismogram: Seismogram) -> timedelta:
"""Return double the sampling interval of a seismogram.
Parameters:
seismogram: Seismogram object.
Returns:
sampling interval multiplied by 2.
"""
return seismogram.delta * 2
Warning
Be careful when changing attributes of the input class inside a function.
Sometimes the attributes are objects that contain other objects (e.g. an
ndarray containing float objects). In our
example above, the seismogram we use as input shares the nested objects
in the data attribute with the seismogram inside the function. Changing
seismogram.data inside the function will therefore also change it outside
too. This behavior is often desired, but you must be aware of when this
occurs and when not.
Pysmo types as output
It gets more complicated when a function returns the data it accepted as input
(annotated with a pysmo type). While using Seismogram to
annotate the input allows us to accept any number of different input types,
that same flexibility means we cannot be certain of what exactly the output
type is. We can explore this with the following snippet:
from pysmo import Seismogram
from pysmo.classes import SAC
from pathlib import Path
from copy import deepcopy
from typing import reveal_type # (1)!
def double_delta(seismogram: Seismogram) -> Seismogram:
"""Double the sampling interval of a seismogram.
Parameters:
seismogram: Seismogram object.
Returns:
Seismogram with double the sampling interval of input seismogram.
"""
clone = deepcopy(seismogram) # (2)!
clone.delta *= 2
return clone
sacfile = Path("example.sac")
my_seis_in = SAC.from_file(sacfile).seismogram
my_seis_out = double_delta(my_seis_in)
reveal_type(my_seis_in)
reveal_type(my_seis_out)
reveal_typeallows us to inspect the actual type of an object. It prints type information at runtime (what it actually is) or when using mypy (what can be inferred from type annotations).Deep copying objects can be expensive if they contain large nested items.
Here, we create a SacSeismogram instance from
a SAC file and pass it to the double_delta function. Inside the function it
gets deepcopied, modified and returned as the same type. We
can verify this by executing the script, whereby the highlighted lines in the
code produce the following output:
As suspected, my_seis_in and my_seis_out are both of type
SacSeismogram at runtime. Running mypy on the
code, however, yields a different type for my_seis_out:
$ mypy double_delta.py
docs/snippets/double_delta.py:26: note: Revealed type is "SacSeismogram"
docs/snippets/double_delta.py:27: note: Revealed type is "Seismogram"
Success: no issues found in 1 source file
This discrepancy is due to the fact that our function is annotated in a way that
tells us any Seismogram is acceptable as input, and that
while a Seismogram is returned, we cannot know which type
exactly that is going to be. While this loss of typing information may be
acceptable for your use case, it is certainly far from ideal.
Mini types as output
The reason return types need to be specified, is because the output of one
function can also be used as input for other functions. If you are chaining
together multiple functions using pysmo types, you may want to consider using
"Mini" types as output. These minimal implementations of pysmo type compatible
classes are simple and efficient. The double_delta function would look like
this:
from pysmo import Seismogram, MiniSeismogram
from pysmo.functions import clone_to_mini
def double_delta_mini(seismogram: Seismogram) -> MiniSeismogram:
"""Double the sampling interval of a seismogram.
Parameters:
seismogram: Seismogram object.
Returns:
MiniSeismogram with double the sampling interval of input seismogram.
"""
clone = clone_to_mini(MiniSeismogram, seismogram) # (1)!
clone.delta *= 2
return clone
- Here we use the
clone_to_minifunction to createMiniSeismograminstances from otherSeismograminstances. It is typically faster than deep copying.
With this approach you could copy your data to a Mini instance early on in your processing, perform multiple processing steps on the efficient Mini instance, and in a last step copy the processed data back to your original data source.
Same input and output type
Another option to be explicit about the output type of a function, is to declare that both the input and output types have to be the same. For pysmo types this requires two things:
- We need to save the input type as variable which we can reference for the output type.
- We need to place bounds on this variable so that it is limited to the desired pysmo type(s).
This typing strategy involves generics, and changes our function to the following:
from pysmo import Seismogram
from pysmo.classes import SAC
from pathlib import Path
from copy import deepcopy
from typing import reveal_type
def double_delta_generic[T: Seismogram](seismogram: T) -> T: # (1)!
"""Double the sampling interval of a seismogram.
Parameters:
seismogram: Seismogram object.
Returns:
Seismogram with double the sampling interval of input seismogram.
"""
clone = deepcopy(seismogram)
clone.delta *= 2
return clone
sacfile = Path("example.sac")
my_seis_in = SAC.from_file(sacfile).seismogram
my_seis_out = double_delta_generic(my_seis_in)
reveal_type(my_seis_in)
reveal_type(my_seis_out)
This syntax is only valid for Python versions 3.12 and above.
In our example [T: Seismogram] defines a type variable T that has to be a
Seismogram. We then use T as before to annotate the
function. This means that if we use it with e.g. an instance of
MiniSeismogram as input for seismogram, T is set
to MiniSeismogram and the function signature
effectively becomes:
Or if we use a SacSeismogram instance:
which is also what we used for our example. Therefore, running mypy on
double_delta_generic.py gives:
$ mypy double_delta_generic.py
double_delta_generic.py:25: note: Revealed type is "SacSeismogram"
double_delta_generic.py:26: note: Revealed type is "SacSeismogram"
Success: no issues found in 1 source file
Crucially, because T has an
upper bound
(in this case Seismogram),
we get all the usual benefits from type hints while coding (autocompletion,
error checking, etc.).
Output type depends on input parameter
Admittedly, things can get rather complex when input and output types of a
function depend on each other. Properly annotating such a function requires
using the overload decorator to declare all possible type
combinations that may occur. The pysmo detrend
function makes use of this feature:
@overload
def detrend(seismogram: Seismogram, clone: Literal[False] = ...) -> None: ...
@overload
def detrend[T: Seismogram](seismogram: T, clone: Literal[True]) -> T: ...
def detrend[T: Seismogram](seismogram: T, clone: bool = False) -> None | T:
"""Remove linear and/or constant trends from a seismogram.
Parameters:
seismogram: Seismogram object.
clone: Operate on a clone of the input seismogram.
Returns:
Detrended [`Seismogram`][pysmo.Seismogram] object if called with `clone=True`.
Examples:
```python
>>> import numpy as np
>>> import pytest
>>> from pysmo.functions import detrend
>>> from pysmo.classes import SAC
>>> sac_seis = SAC.from_file("example.sac").seismogram
>>> 0 == pytest.approx(np.mean(sac_seis.data), abs=1e-11)
np.False_
>>> detrend(sac_seis)
>>> 0 == pytest.approx(np.mean(sac_seis.data), abs=1e-11)
np.True_
>>>
```
"""
if clone is True:
seismogram = deepcopy(seismogram)
seismogram.data = scipy.signal.detrend(seismogram.data)
if clone is True:
return seismogram
return None
Here it looks as if the detrend function is declared multiple times. However,
at runtime the @overload decorator tells Python to ignore that particular
function declaration, as it is only meant to be used by type checkers. Looking
at the declarations from bottom to top we read it as follows:
- The function
detrendtakes two arguments,seismogramandclone, whereby the type ofseismogramis stored in the variableT(bound bySeismogram), andcloneis aboolwith a default value ofFalse. The function returns either aNoneorTtype. - If
cloneisTrue, then an object of typeTis returned. - If
cloneisFalse(the default value), thenNoneis returned. Note that we don't need to useThere; as we don't reuseTelsewhere in this function declaration, it doesn't make much sense to use a type variable in the first place.
Tip
This may seem a bit overwhelming at first, but you will quickly find that the patterns frequently repeat themselves, and that you can simply copy paste a lot of the overloaded function declarations. Remember also that the time invested here likely more than offsets the amount of time spent hunting down avoidable bugs in your code.