Tutorial¶

This short tutorial demonstrates how the Python features we outlined earlier are used in pysmo. This should be kept in mind particularly with regards to "type"; in most cases we are referring to the literal type data exist as inside Python.

Plotting a Seismogram¶

In this first example we will download a sac file from IRIS and then write a function that generates a plot of it. First in a more conventional way, then using pysmo's protocol classes.

Pysmo has a class called SAC, which lets us use sac files directly in Python. We first import the SAC class from pysmo:

In [1]:

Copied!

from pysmo.classes import SAC
from pysmo.classes import SAC

This let's us directly download data from IRIS and create a sac-like object.

In [2]:

Copied!





mysac = SAC.from_iris(
    net="IU",
    sta="ANMO",
    cha="LHZ",
    loc="00",
    start="2005-03-28T15:00:00",
    duration=3600 * 5,
    scale="AUTO",
    demean="true",
    force_single_result=True,
)
mysac = SAC.from_iris(
    net="IU",
    sta="ANMO",
    cha="LHZ",
    loc="00",
    start="2005-03-28T15:00:00",
    duration=3600 * 5,
    scale="AUTO",
    demean="true",
    force_single_result=True,
)

Anyone familiar with the sac fileformat will know a sac file consists of a timeseries and there plenty of headers to store metadata in. For example:

In [3]:

Copied!

print(mysac.delta, mysac.b, mysac.kzdate, ...)
print(mysac.delta, mysac.b, mysac.kzdate, ...)

1.0 9.999999747378752e-05 2005-03-28 Ellipsis

The traditional approach to using data stored in such a Python object would be to write a function that explicitly uses that exact object type is its input. So let's do that first, and then see how it compares to a solution using protocol classes. We begin by importing some modules we need:

In [4]:

Copied!

from matplotlib import pyplot as plt
from datetime import datetime, time, date, timedelta

%matplotlib inline
from matplotlib import pyplot as plt
from datetime import datetime, time, date, timedelta

%matplotlib inline

We can now get an idea of what it might look like by simply plotting the raw data:

In [5]:

Copied!

plt.plot(mysac.data)
plt.plot(mysac.data)

Out[5]:

[<matplotlib.lines.Line2D at 0x7f1fe22665d0>]

No description has been provided for this image

However, a seismogram plot without a properly formatted X-axis is not very useful. Let's write a function for that (with type hints!):

In [6]:

Copied!





def plotsac(sac: SAC) -> None:
    # First let's construct a datetime object of the begin time (and date):
    begin_time = time.fromisoformat(sac.kztime)
    begin_date = date.fromisoformat(sac.kzdate)
    begin_timedate = datetime.combine(begin_date, begin_time)

    # Because sac uses relative times, we must add sac.b
    begin_timedate += timedelta(sac.b)

    # We use a list comprehension to generate our times.
    # This could be solved more elegantly with pandas (if installed)
    t = [
        begin_timedate + timedelta(seconds=(sac.delta * i))
        for i, _ in enumerate(sac.data)
    ]

    # Finally we can plot our sac object and format the x-axis:
    plt.plot(t, sac.data, "r")
    plt.gcf().autofmt_xdate()
def plotsac(sac: SAC) -> None:
    # First let's construct a datetime object of the begin time (and date):
    begin_time = time.fromisoformat(sac.kztime)
    begin_date = date.fromisoformat(sac.kzdate)
    begin_timedate = datetime.combine(begin_date, begin_time)

    # Because sac uses relative times, we must add sac.b
    begin_timedate += timedelta(sac.b)

    # We use a list comprehension to generate our times.
    # This could be solved more elegantly with pandas (if installed)
    t = [
        begin_timedate + timedelta(seconds=(sac.delta * i))
        for i, _ in enumerate(sac.data)
    ]

    # Finally we can plot our sac object and format the x-axis:
    plt.plot(t, sac.data, "r")
    plt.gcf().autofmt_xdate()

And we can call our function to generate the plot:

In [7]:

Copied!

plotsac(mysac)
plotsac(mysac)

This wasn't too bad. Perhaps the time calculations were a bit tedious. Those calculations were also highly sac-specific, as the sac file format has different ways of specifying time in a seismogram. Let's see how to solve this with the Seismogram protocol class in pysmo:

In [8]:

Copied!

from pysmo import Seismogram
from pysmo import Seismogram

We can rewrite the plotsac function to use a Seismogram instead of a SAC object:

In [9]:

Copied!





def plotseismo(seismo: Seismogram) -> None:
    # We can skip the time calculations for begin and end times.
    t = [seismo.begin_time + seismo.delta * i for i, _ in enumerate(seismo.data)]
    plt.plot(t, seismo.data, "g")
    plt.gcf().autofmt_xdate()
def plotseismo(seismo: Seismogram) -> None:
    # We can skip the time calculations for begin and end times.
    t = [seismo.begin_time + seismo.delta * i for i, _ in enumerate(seismo.data)]
    plt.plot(t, seismo.data, "g")
    plt.gcf().autofmt_xdate()

It looks pretty much the same, except for not having to calculate the begin time. Before we use it on our SAC object, we first verify that it is an instance of a Seismogram:

In [10]:

Copied!

isinstance(mysac, SAC)  # obviously true...
isinstance(mysac, SAC)  # obviously true...

Out[10]:

True

In [11]:

Copied!

isinstance(mysac, Seismogram)
isinstance(mysac, Seismogram)

Out[11]:

False

Uh-oh, this is not what we wanted see! The SAC class pretty much mirrors the sac file format, which doesn't match the Seismogram type. However, we added a nested class within the SAC class which does match the Seismogram type:

In [12]:

Copied!

myseismogram = mysac.seismogram
isinstance(myseismogram, Seismogram)
myseismogram = mysac.seismogram
isinstance(myseismogram, Seismogram)

Out[12]:

True

Cool, so we can go ahead and use our sac object with the plotseismo function:

In [13]:

Copied!

plotseismo(myseismogram)
plotseismo(myseismogram)

So what is the significance of this example? At a first glance it would appear we did the same thing twice, just with a differently typed input. In a way that is true, except that there is a difference between the two types that does matter here. The plotseismo function explicitly requires a SAC object as input, whereas the plotseismo function only requires an object to be a subtype of Seismogram. It doesn't need to be a Seismogram, it merely needs to speak Seismogram.

Imagine a scenario where you write hundreds of functions that use a SAC object as input, and then suddenly the SAC format changes. You now have to edit all your functions to adapt to that change. Similarly, if you want to support another (potentially wildly different) input format you may well need to start from scratch.

Calculating Great Circle Distance¶

The initial idea for pysmo stems from the fact that the different file formats used in seismology appear more suitable to archiving data rather than writing code to process these data. What we mean by that is the huge discrepancy between the amount of (meta)data stored in the files and what is actually needed to solve a particular problem. The sac file format specifies about 130 header variables, and in the example above we wrote a function that only uses four of them!

Key for pysmo is that a type of data should not refer to an abstract construct of how data are organised inside a file, but rather to something that exists in the physical world. Pysmo's type definitions are thus quite specific (and narrowly defined).

In this second example we calculate the distance between a station and an event. We skip the "traditional" approach here and go straight to a function that uses pysmo's types. The type we use here is Location.

In [14]:

Copied!

from pysmo import Location
from pyproj import Geod
from pysmo import Location
from pyproj import Geod

The Location type specifies an object with the attributes latitude and longitude. We write the function gcdist which takes two points as input, and calulates the distance using the pyproj module.

In [15]:

Copied!





def gcdist(point1: Location, point2: Location) -> float:
    """Return the distance between two points using WGS84"""
    g = Geod(ellps="WGS84")
    # The g.inv method returns azimuth, back-azimuth and distance (in metres)
    # We only need the distance here:
    _, _, dist = g.inv(
        point1.longitude, point1.latitude, point2.longitude, point2.latitude
    )
    return dist / 1000
def gcdist(point1: Location, point2: Location) -> float:
    """Return the distance between two points using WGS84"""
    g = Geod(ellps="WGS84")
    # The g.inv method returns azimuth, back-azimuth and distance (in metres)
    # We only need the distance here:
    _, _, dist = g.inv(
        point1.longitude, point1.latitude, point2.longitude, point2.latitude
    )
    return dist / 1000

The sac file format usesstla, stlo for the station coordinates, and evla, evlo for the event coordinates. Because event and station coordinates are stored alongside each other, they cannot both use latitude and longitude. Like above, we solve this in pysmo by adding nested classes to the SAC class:

In [16]:

Copied!

mystation, myevent = mysac.station, mysac.event
mystation, myevent = mysac.station, mysac.event

Before proceeding, we must add the actual coordinates either to mysac or mystation and myevent (the are not included in the downloaded file):

In [17]:

Copied!

mystation.latitude, mystation.longitude = 34.9462, 106.4567
myevent.latitude, myevent.longitude = 2.09, -97.15
mystation.latitude, mystation.longitude = 34.9462, 106.4567
myevent.latitude, myevent.longitude = 2.09, -97.15

Behind the scences, the above lines change the values as stored in the parent mysac object. This can be easily verified by printing out the attributes directly:

In [18]:

Copied!

print(mysac.stla, mysac.stlo, mysac.evla, mysac.evlo)
print(mysac.stla, mysac.stlo, mysac.evla, mysac.evlo)

34.9462 106.4567 2.09 -97.15

Before passing mystation and myevent to our function, let's check if they are of the correct type:

In [19]:

Copied!

print(isinstance(mystation, Location), isinstance(myevent, Location))
print(isinstance(mystation, Location), isinstance(myevent, Location))

True True

Excellent! We can now calculate the event-station distance:

In [20]:

Copied!

gcdist(mystation, myevent)
gcdist(mystation, myevent)

Out[20]:

15228.93484528352

This example demonstrates how useful it can be to differentiate between how data are stored in an object, and how they are consumed by a function. Here we grouped event or station related information in SAC.event and SAC.station classes (which do not necessarily need to have a matching pysmo type). This allowed us to use a gcdist function that uses pysmo types to calculate the distance between any two objects that match the Location type. Besides calculating an event-station distance, the function can also be used to calculate the distance between e.g. two stations or two events without any modifications. Had it been written for the sac specific attributes (evla, evlo,...) that would not be the case.

In [ ]: