Skip to content

pysmo

Documentation: https://docs.pysmo.org

Source Code: https://github.com/pysmo/pysmo


Most seismology libraries hand you a single large object — waveform samples, station coordinates, and event parameters all bundled together. This is a common pattern, but it sidesteps a question worth asking: what is a seismogram, actually? Not a file format. A seismogram is a time series: samples, a sampling interval, and a start time. A station is a named geographic position. These are narrow, precise concepts, and modern Python gives us the vocabulary to express them exactly — Protocol classes, Annotated types, structural subtyping. Pysmo follows that structure: each type maps to a real scientific concept, concrete classes enforce correctness at construction, and file-format adapters expose only what a concept actually needs.

When a function declares exactly which concept it needs, your editor knows too — autocomplete is precise, type errors surface before runtime, and a function signature is enough to understand what it consumes. Narrower types also dissolve a question that haunts any large bundled object: what counts as data, and what is metadata? The answer depends on what you are doing, not on the file format that happens to store it. The conventional response is to make fields optional: bundle everything into one object, set unused attributes to None, and let callers decide what matters. This sidesteps the design question but compounds the practical one — a station coordinate of None is no more meaningful than a float with the value "abc", and the error surfaces far from where the bad assumption was made. When types reflect scientific concepts directly, the boundaries emerge from the science instead. Code written against narrow interfaces is reusable for the same reason: a function that accepts a protocol works with any conforming object — a file parser, a hand-written dataclass, or a lightweight instance created in a notebook — without modification.

The same logic also narrows the gap between user code and library. Pysmo ships with a collection of processing tools — though that is not what it is fundamentally about. They exist because the same design applies: any function written against pysmo's protocols is compatible with every conforming object, and therefore useful beyond its original context. The tools can be used directly or as building blocks for something larger. Any well-written pysmo-compatible code is a reasonable candidate for inclusion in the library. Contributions are always welcome, though more often the consequence is simpler: code written for one project finds itself useful in the next.

Quick Start

Pysmo includes concrete classes and processing functions that put everything above into practice. Below, two of those classes are used alongside built-in functions and a simple user-defined one — the latter works with both without any modification, which is the point.

from pysmo import Seismogram, MiniSeismogram
from pysmo.classes import SAC
from pysmo.functions import detrend, normalize, resample

# Read a SAC file — access seismogram data via protocol-typed views
sac = SAC.from_file("myfile.sac")
seis = sac.seismogram  # satisfies the Seismogram protocol

# Process using built-in functions
detrend(seis)
normalize(seis)
resample(seis, seis.delta * 2)

# Write a function that works with ANY Seismogram implementation
def print_info(seismogram: Seismogram) -> None:
    print(f"Start: {seismogram.begin_time}")
    print(f"dt: {seismogram.delta}")

print_info(seis)  # works with SAC

# ...or create a lightweight seismogram from scratch
mini = MiniSeismogram(data=seis.data, delta=seis.delta, begin_time=seis.begin_time)
print_info(mini)  # works with MiniSeismogram too — same protocol

The design shifts the question from what a class provides to what a function needs. Rather than being bound by what a library class exposes, you are free to define bespoke classes for a particular project, and they will work with any function whose protocol they satisfy.

from dataclasses import dataclass
import numpy as np
import pandas as pd

@dataclass
class MySeismogram:
    data: np.ndarray
    delta: pd.Timedelta
    begin_time: pd.Timestamp
    my_attribute: str

    @property
    def end_time(self) -> pd.Timestamp:
        # read-only: derived from begin_time, delta, and data
        return self.begin_time + self.delta * (len(self.data) - 1)

my_seis = MySeismogram(
    data=np.zeros(1000),
    delta=pd.Timedelta(seconds=0.01),
    begin_time=pd.Timestamp("2024-01-01", tz="UTC"),
    my_attribute="hello world",
)

print_info(my_seis)   # same function as above — no changes needed
detrend(my_seis)      # built-in functions work too