Functions

Writing functions is the easiest and most common way of using pysmo. Regardless of what the functions do, they will most likely follow one of the patterns discussed below.

Pysmo types as input

The simplest way of using pysmo types is in functions that only use them to annotate inputs. In these instances we don't need to be concerned about how differences between different compatible classes affect other parts of your code, as their "journeys" end here.

For example, the following function takes any Seismogram compatible object as input, and returns a timedelta:

double_delta_td.py

from pysmo import Seismogram
from datetime import timedelta


def double_delta_td(seismogram: Seismogram) -> timedelta:
    """Return double the sampling interval of a seismogram.

    Parameters:
        seismogram: Seismogram object.

    Returns:
        sampling interval multiplied by 2.
    """

    return seismogram.delta * 2

Warning

Be careful when changing attributes of the input class inside a function. Sometimes the attributes are objects that contain other objects (e.g. an ndarray containing float objects). In our example above, the seismogram we use as input shares the nested objects in the data attribute with the seismogram inside the function. Changing seismogram.data inside the function will therefore also change it outside too. This behavior is often desired, but you must be aware of when this occurs and when not.

Pysmo types as output

It gets more complicated when a function returns the data it accepted as input (annotated with a pysmo type). While using Seismogram to annotate the input allows us to accept any number of different input types, that same flexibility means we cannot be certain of what exactly the output type is. We can explore this with the following snippet:

double_delta.py

from pysmo import Seismogram
from pysmo.classes import SAC
from pathlib import Path
from copy import deepcopy
from typing import reveal_type  # (1)!


def double_delta(seismogram: Seismogram) -> Seismogram:
    """Double the sampling interval of a seismogram.

    Parameters:
        seismogram: Seismogram object.

    Returns:
        Seismogram with double the sampling interval of input seismogram.
    """

    clone = deepcopy(seismogram)  # (2)!
    clone.delta *= 2
    return clone


sacfile = Path("example.sac")
my_seis_in = SAC.from_file(sacfile).seismogram
my_seis_out = double_delta(my_seis_in)

reveal_type(my_seis_in)
reveal_type(my_seis_out)

reveal_type allows us to inspect the actual type of an object. It prints type information at runtime (what it actually is) or when using mypy (what can be inferred from type annotations).
Deep copying objects can be expensive if they contain large nested items.

Here, we create a SacSeismogram instance from a SAC file and pass it to the double_delta function. Inside the function it gets deepcopied, modified and returned as the same type. We can verify this by executing the script, whereby the highlighted lines in the code produce the following output:

$ python double_delta.py
Runtime type is 'SacSeismogram'
Runtime type is 'SacSeismogram'

As suspected, my_seis_in and my_seis_out are both of type SacSeismogram at runtime. Running mypy on the code, however, yields a different type for my_seis_out:

$ mypy double_delta.py
docs/snippets/double_delta.py:26: note: Revealed type is "SacSeismogram"
docs/snippets/double_delta.py:27: note: Revealed type is "Seismogram"
Success: no issues found in 1 source file

This discrepancy is due to the fact that our function is annotated in a way that tells us any Seismogram is acceptable as input, and that while a Seismogram is returned, we cannot know which type exactly that is going to be. While this loss of typing information may be acceptable for your use case, it is certainly far from ideal.

Mini types as output

The reason return types need to be specified, is because the output of one function can also be used as input for other functions. If you are chaining together multiple functions using pysmo types, you may want to consider using "Mini" types as output. These minimal implementations of pysmo type compatible classes are simple and efficient. The double_delta function would look like this:

double_delta_mini.py

from pysmo import Seismogram, MiniSeismogram
from pysmo.functions import clone_to_mini


def double_delta_mini(seismogram: Seismogram) -> MiniSeismogram:
    """Double the sampling interval of a seismogram.

    Parameters:
        seismogram: Seismogram object.

    Returns:
        MiniSeismogram with double the sampling interval of input seismogram.
    """

    clone = clone_to_mini(MiniSeismogram, seismogram)  # (1)!
    clone.delta *= 2
    return clone

Here we use the clone_to_mini function to create MiniSeismogram instances from other Seismogram instances. It is typically faster than deep copying.

With this approach you could copy your data to a Mini instance early on in your processing, perform multiple processing steps on the efficient Mini instance, and in a last step copy the processed data back to your original data source.

Same input and output type

Another option to be explicit about the output type of a function, is to declare that both the input and output types have to be the same. For pysmo types this requires two things:

We need to save the input type as variable which we can reference for the output type.
We need to place bounds on this variable so that it is limited to the desired pysmo type(s).

This typing strategy involves generics, and changes our function to the following:

double_delta_generic.py

from pysmo import Seismogram
from pysmo.classes import SAC
from pathlib import Path
from copy import deepcopy
from typing import reveal_type


def double_delta_generic[T: Seismogram](seismogram: T) -> T:  # (1)!
    """Double the sampling interval of a seismogram.

    Parameters:
        seismogram: Seismogram object.

    Returns:
        Seismogram with double the sampling interval of input seismogram.
    """

    clone = deepcopy(seismogram)
    clone.delta *= 2
    return clone


sacfile = Path("example.sac")
my_seis_in = SAC.from_file(sacfile).seismogram
my_seis_out = double_delta_generic(my_seis_in)

reveal_type(my_seis_in)
reveal_type(my_seis_out)

This syntax is only valid for Python versions 3.12 and above.

In our example [T: Seismogram] defines a type variable T that has to be a Seismogram. We then use T as before to annotate the function. This means that if we use it with e.g. an instance of MiniSeismogram as input for seismogram, T is set to MiniSeismogram and the function signature effectively becomes:

def double_delta_generic(seismogram: MiniSeismogram) -> MiniSeismogram:
  ...

Or if we use a SacSeismogram instance:

def double_delta_generic(seismogram: SacSeismogram) -> SacSeismogram:
  ...

which is also what we used for our example. Therefore, running mypy on double_delta_generic.py gives:

$ mypy double_delta_generic.py
double_delta_generic.py:25: note: Revealed type is "SacSeismogram"
double_delta_generic.py:26: note: Revealed type is "SacSeismogram"
Success: no issues found in 1 source file

Crucially, because T has an upper bound (in this case Seismogram), we get all the usual benefits from type hints while coding (autocompletion, error checking, etc.).

Output type depends on input parameter

Admittedly, things can get rather complex when input and output types of a function depend on each other. Properly annotating such a function requires using the overload decorator to declare all possible type combinations that may occur. The pysmo detrend function makes use of this feature:

@overload
def detrend(seismogram: Seismogram, clone: Literal[False] = ...) -> None: ...


@overload
def detrend[T: Seismogram](seismogram: T, clone: Literal[True]) -> T: ...


def detrend[T: Seismogram](seismogram: T, clone: bool = False) -> None | T:
    """Remove linear and/or constant trends from a seismogram.

    Parameters:
        seismogram: Seismogram object.
        clone: Operate on a clone of the input seismogram.

    Returns:
        Detrended [`Seismogram`][pysmo.Seismogram] object if called with `clone=True`.

    Examples:
        ```python
        >>> import numpy as np
        >>> import pytest
        >>> from pysmo.functions import detrend
        >>> from pysmo.classes import SAC
        >>> sac_seis = SAC.from_file("example.sac").seismogram
        >>> 0 == pytest.approx(np.mean(sac_seis.data), abs=1e-11)
        np.False_
        >>> detrend(sac_seis)
        >>> 0 == pytest.approx(np.mean(sac_seis.data), abs=1e-11)
        np.True_
        >>>
        ```
    """
    if clone is True:
        seismogram = deepcopy(seismogram)

    seismogram.data = scipy.signal.detrend(seismogram.data)

    if clone is True:
        return seismogram
    return None

Here it looks as if the detrend function is declared multiple times. However, at runtime the @overload decorator tells Python to ignore that particular function declaration, as it is only meant to be used by type checkers. Looking at the declarations from bottom to top we read it as follows:

The function detrend takes two arguments, seismogram and clone, whereby the type of seismogram is stored in the variable T (bound by Seismogram), and clone is a bool with a default value of False. The function returns either a None or T type.
If clone is True, then an object of type T is returned.
If clone is False (the default value), then None is returned. Note that we don't need to use T here; as we don't reuse T elsewhere in this function declaration, it doesn't make much sense to use a type variable in the first place.

Tip

This may seem a bit overwhelming at first, but you will quickly find that the patterns frequently repeat themselves, and that you can simply copy paste a lot of the overloaded function declarations. Remember also that the time invested here likely more than offsets the amount of time spent hunting down avoidable bugs in your code.