Travel Sustainability Analysis
This is the how-to from the more business context post on Linked-in on how I calculated my Carbon Emissions from travelling using Garmin and Strava data.
The Process is essentially:
The data I am using for this V1 of the analysis comes from my Garmin watch (or used to be iphone), but essentially all the data is centralised on Strava.
it should be noted that extracting the data from Strava, doesn’t easily provide the first GPS location of each event, so it needs to be extracted from the individual events file that is associated to the activity.
Data Preparation (Ideally refactored in SAP Data Intelligence)
- Using the Strava Bulk Data Export Feature, I am able to get a file export_<userid>.zip of all my activities
- Copying this file to a linux server, and performing some data preparation, and extracting the data into 4 areas:
- Activities.csv – Strava enriched with each event
- GPX Files – Details of event in GPX format
- TCX Files – Details of event in TCX format
- FIT Files – Garmin Binary format
- Using some open source code, and modernising it for SAP HANA Cloud Target, each of the files were imported into HANA Cloud
- Noting that each of the events does not have a location, so the activities table needed to be linked to the extracted events dataset
- To match each of the source files, the following was received:
- Activities – Created simple table to match
- The initial location of the activity is linked, via a view : This was initially displayed in dbeaver
- Linking the location of the previous event to the next event, based on time, the row_number function was used to group on the Activity Date.
- Then using a relatively based Geospatial predicate. Also to make sure you are using the Geospatial Reference SRID 4326 so that a trip from Australia over the dataline to the Cook Islands – is not going via Europe and Americas.
case WHEN (a.geo.st_distance(b.geo) / 1000 <= '10') THEN '0' WHEN (a.geo.st_distance(b.geo) / 1000 between '10' and '200') THEN (a.geo.st_distance(b.geo) / 1000) * '0.00003' Else (a.geo.st_distance(b.geo) / 1000) * '0.00015' END as emission, case WHEN (a.geo.st_distance(b.geo) / 1000 <= '10') THEN 'Run/Bike' WHEN (a.geo.st_distance(b.geo) / 1000 between '10' and '200') THEN 'Car' Else 'Flight' END as ModeT
The numbers to work out the emission multipliers were based on a range of sites. In the next version, Aircraft emissions APIs could be used.
- Having the Activity Date, Event Details, Location, Emission as a technical view (called FINAL) – this will be used a source to SAP Data Warehouse Cloud to allow of different models to be easily created.
SAP Data Warehouse Cloud
- SAP Data Warehouse Cloud normally is showcased with SAP and non-SAP data. This use case is completely non-SAP data.
- Creating a connection to the HANA Cloud instance
- Access the remote table objects from the HANA Cloud connection, including the FINAL view and ACTIVITIES table. The remote tables have been created as a virtualised dataset.
- Build the model for the emissions scenario, also utilising standard time objects in the model:5. The view also has both the start location and end location linked to the activity. These locations are views that store the st_point (so it can easily be put on a map) using the Spatial Reference ID of 3857
SAP Analytics Cloud
- Using the view from SAP Data Warehouse Cloud, easily able to use the models to display the Live connection. Some sample pages (without styling applied):
- All Years Stats:Using the SAC Flow Layer Feature:Specifically for 2022 (filtered and highlighted):
Both HANA Cloud and SAP Data Warehouse Cloud were used to allow a variety of options of using this data down the track. It could have all been done in SAP Data Warehouse Cloud for simplicity.
Thanks to Kunnal Khanna for building out the HANA Cloud views for me.
Also please note: the above process has been scripted, and will be available for other users when ready. It’s not quite ready to be shared publicly as yet – as I want to test with a few other user data sets first.
And yes, there planned updates:
- Using Activities of type “Commute” (on bike or walk / running) to be negative on my Charts
- Using Aircraft API CO2 emission calculators to more accurately calculate the CO2.
Happy to take more feedback and hopefully soon – publish the process scripts so others can do this.
Thanks for sharing this, Jon!
You mentioned using PHP on XML extracts to import GPX Files into SAP HANA Cloud. Recently I have posted how to use GDAL with SAP HANA OGR driver to do the same n Python: https://blogs.sap.com/2023/01/18/sap-hana-spatial-and-gdal-in-python-on-windows-os/
Good to know. I'll look into this!