Parsing fitness tracker data with Python | by Alan Bunbury | Jan, 2021


In every record, you’ll see that the basis part is a gpx part, with a number of attributes describing the writer of the GPX record and the XML namespaces used therein. Within the gpx part there’s a metadata part, with metadata in regards to the record itself, and a trk part representing a “observe”, which is “an ordered record of issues describing a trail”. This loosely corresponds to what we’d typically imagine to be a unmarried job (a run, cycle, stroll, and so on).

The trk part comprises some metadata in regards to the job, reminiscent of its identify and its job kind (extra on that later), in addition to a number of trkseg components, every representing a “observe section”, which is “a listing of Track Points which might be logically hooked up so as”. In different phrases, a trkseg must comprise contiguous GPS data. If your job merely comes to turning in your GPS, working for 10km after which turning off your GPS whilst you’re executed, that entire job will most often be a unmarried observe section. However, if, for no matter explanation why, you’ve became your GPS on and off once more (or misplaced after which regained GPS capability) throughout the job, the trk would possibly include more than one trkseg components. (At least, that’s the idea, consistent with the documentation; after I pause and restart my VA3 throughout a run, it nonetheless turns out to constitute the entire run as a unmarried observe section.)

Each trkseg part must comprise a number of (most likely many) trkpt or “observe level” components, every representing a unmarried (geographical) level detected by your GPS tool. These issues are most often a couple of seconds aside.

At a minimal, a trkpt should comprise latitude and longitude data (as attributes lat and lon of the part) and would possibly optionally come with time and elevation (ele) data, as kid components (data generated by a fitness tracker is very prone to a minimum of come with time). A trkpt might also comprise an extensions part, which is able to comprise additional info. In the instance above, extension components (in Garmin’s TrackPointExtension (TPE) layout) are used to retailer middle charge and cadence data this is equipped by the FR30.

There are 3 primary variations between the 2 GPX recordsdata displayed above that I wish to indicate. First, the kind of the trk part: the Garmin record describes this as “working”, while the Strava record merely describes it as “9”. There is not any standardised approach to constitute the kind of a observe. Garmin makes use of phrases reminiscent of “working”, “strolling”, “mountaineering”, and so on., while Strava makes use of numeric codes, reminiscent of “4” for mountaineering, “9” for working, “10” for strolling, and so on. I couldn’t discover a complete mapping of Strava numeric codes to job varieties. If you wish to have to seek out the code for a selected job kind, it’s essential to edit the job form of an current job on Strava (click on the pencil icon at the left hand aspect of the job web page) after which export it to GPX to test the worth within the kind part.

Secondly, the reported elevations of the observe issues are other, which would possibly appear unexpected for the reason that they’re in accordance with the similar underlying data. Some fitness trackers (together with, it sort of feels, the FR30) both don’t file elevation data or take extremely misguided recordings in accordance with GPS sign. In those instances, apps like Strava and Garmin use their very own inside elevation databases and algorithms to both generate their very own elevation data or modify the data recorded by the tool as a way to give a extra practical studying (see here for more info from Strava). Each app’s strategies for producing or adjusting elevation data might be fairly other, and you might be witnessing the adaptation right here.

Finally, you’re going to word that the latitude and longitude data reported by the Garmin record is way more actual, once in a while giving the worth to about 30 decimal puts, while the Strava record provides the worth to seven decimal puts. The Garmin record seems to mirror the precision of the uncooked data reported by the FR30, while Strava turns out to around the data. It is vital to notice that precision is not the same as accuracy. Reporting latitude and longitude to thirty decimal puts suggests a in reality microscopic degree of precision, while the GPS to your fitness tracker is most likely correct to a couple of metres at very best. Therefore, all that further precision reported by your fitness tracker isn’t specifically helpful. It can, then again, have a small however noticeable have an effect on at the general recorded distance of the job (calculated by including up the distances between all the issues) so the whole distance would possibly range fairly relying on the place the data is coming from.

So, let’s check out the gpxpy library. First, be sure it’s put in:

pip set up gpxpy

Now let’s fan the flames of a Python interpreter in the similar listing as our GPX record (for the remainder of the item I’m the usage of data for a unique job to the only we noticed above). Parsing the record is as simple as:

>>> import gpxpy
>>> with open('activity_strava.gpx') as f:
... gpx = gpxpy.parse(f)
>>> gpx
GPX(tracks=[GPXTrack(name='Morning Walk', segments=[GPXTrackSegment(points=[...])])])

You can see that calling gpxpy.parse at the GPX record object gives you a GPX object. This is a data construction that displays the construction of the GPX record itself. Among different issues, it comprises a listing of GPXTrack gadgets, every representing a observe. Each GPXTrack object comprises some metadata in regards to the observe and a listing of segments.

>>> len(gpx.tracks)
>>> observe = gpx.tracks[0]
>>> observe
GPXTrack(identify='Morning Walk', segments=[GPXTrackSegment(points=[...])])
>>> observe.kind
>>> observe.identify
'Morning Walk'
>>> observe.segments

Each GPXTrackSection, in flip, comprises a listing of GPXTrackLevel gadgets, every reflecting a unmarried observe level.

>>> section = observe.segments[0]
>>> len(section.issues)
>>> random_point = section.issues[44]
>>> random_point
GPXTrackLevel(40.642868, 14.593911, elevation=147.2, time=datetime.datetime(2020, 10, 13, 7, 44, 13, tzinfo=SimpleTZ("Z")))
>>> random_point.latitude
>>> random_point.longitude
>>> random_point.elevation
>>> random_point.time
datetime.datetime(2020, 10, 13, 7, 44, 13, tzinfo=SimpleTZ("Z"))

Information this is saved as an extension within the GPX record may also be accessed too. In that case the related XML components (ie, the kids of the extensions part within the GPX record) are saved in a listing.

>>> random_point.extensions
[<Element {}TrackPointExtension at 0x7f32bcc93540>]
>>> tpe = random_point.extensions[0]
>>> for kid in tpe:
... print(kid.tag, kid.textual content)
{}hr 134
{}cad 43

In addition to conserving the data parsed from the underlying GPX record, GPXTrack and GPXTrackSection gadgets have some helpful strategies for calculating issues we would possibly wish to know in accordance with the data. For instance, you’ll calculate the whole duration of a observe or section:

>>> section.length_2d()  # ignoring elevation
>>> section.length_3d() # together with elevation

Or data about shifting time, or velocity (in metres/2d):

>>> section.get_moving_data()
MovingData(moving_time=7829.0, stopped_time=971.0, moving_distance=8096.192269756624, stopped_distance=160.6149258847903, max_speed=1.7427574692488983)
>>> section.get_speed(44) # The choice of the purpose at which you wish to have to measure velocity

There are more than a few different strategies to be had to calculate different metrics, in addition to the best way to modify or adjust the data, reminiscent of by including or taking out issues, splitting segments, smoothing values, and so on. You can discover those by calling assist at the related object.

Finally, here is a Python script to parse a GPX record and position probably the most key data right into a pandas DataFrame. Calling this script on our GPX record:

python3 activity_strava.gpx

… will output one thing like the next:

A pandas DataFrame with observe level data.

Parsing TCX recordsdata with lxml

The Training Center XML (TCX) layout is some other not unusual layout for storing job data, and used to be created by Garmin. The best approach to perceive the adaptation between GPX and TCX is to have a look at the 2 recordsdata aspect by aspect:


Please enter your comment!
Please enter your name here