Product Lifecycle Management Blogs by Members
Get insider knowledge about product lifecycle management software from SAP. Tap into insights and real-world experiences with community member blog posts.
cancel
Showing results for 
Search instead for 
Did you mean: 
JimSpath
Active Contributor
Previously, my detective work into measuring ambient environmental data led me to the clues presented in the winter-time post "The ABAP Detective Takes The-Heat ... Cleanse/". And then the spring time change post "The ABAP Detective Gets Their Clock Cleaned."

As the sunrise got earlier and the days became longer, solstice kicked in. When I checked a few gauges, I thought the reported temperature values were too high. I could glance at a wall thermometer or HVAC (heating/ventilating/air conditioning) controls and see a discrepancy. Any time the numbers don't look right, it's like the spider sense kicks in. And then the hypotheses begin, even before enough data to draw logical conclusions has been captured.

My first take was to pull out the old spiral notebook and note data manually, as Columbo might do, with pencil and paper. A little of that goes a long way, as any hacker will try to automate a repetitive task after the 3rd or 4th manual effort. Looking around at available reference gauges, nothing nearby was automated, including, especially, the house thermostat. Ripping that device out to get a smarter unit would be an expense and a possible commitment to a third party big tech host, and I decided that could wait.

Digging into the morgue (old clippings, not bodies) I found references to federal sites that publish environmental data, particularly weather related. Links below. To cut to the chase, I set up a recurring job to pull in current weather conditions, then load them into the same monitoring system as the CPU and environment gauges "in house".

Here's an example of the supplied parameters and values:
Baltimore / Martin, MD, United States (KMTN) 39-20N 076-25W
Jun 27, 2022 - 04:55 PM EDT / 2022.06.27 2055 UTC
Wind: from the WNW (300 degrees) at 9 MPH (8 KT):0
Visibility: 10 mile(s):0
Sky conditions: clear
Temperature: 80 F (27 C)
Dew Point: 66 F (19 C)
Relative Humidity: 61%
Pressure (altimeter): 29.96 in. Hg (1014 hPa)
ob: KMTN 272055Z 30008KT 10SM CLR 27/19 A2996
cycle: 21

For a future refinement I'd probably clean up the logic to verify anything being captured and experiment with more frequent updates. It looks like the feed site is updated hourly, and that was the easiest to schedule. There could most likely be xml or json feeds. I maximized the capture by including both Fahrenheit and Celsius in the local archives, as the data volume is minimal at 24 samples per day. The charts below show degrees F since that's the custom here. Hopefully this won't ruin the plot for non-Americans (not Un-Americans).

Once I had enough records to view the difference between a nearby ambient environmental record, and the home temperature sensors, I could determine if the factor previously calculated to mask out the effects of the monitoring system CPU heat emissions against the too-close sensor device needed to be adjusted (see link below for a curve-fitting topic).

 


External and Internal Temperature Readings


You can tell the ambient reading (in green, and below the other 2 values) is from the government data as the values are integers; average is system calculated. To make the diagnostic more challenging, the delta between inside and outside is not fixed, even with the known state of not using air-conditioning, as there's a lag in daylight heating as there is one in nighttime cooling. However, I can glance at the other inside thermometers for a sanity check (oh, it's 85 outside but 83 inside, for example). The chart above shows a 5-10 degree F spread (roughly 3-6 C?). So I came up with a working hypothesis for how much to adjust the previously set offset.

Good thing I waited and looked further though. I had spotted a runaway system process more than once, which looked liked this, on the sensor node:

 
  PID USER  PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND   
12244 user 39 19 480524 34460 672 R 99.7 7.9 20644:16 logrotate
26807 user 20 0 11352 2900 2488 R 1.0 0.7 0:00.09 top
11 user 20 0 0 0 0 S 0.3 0.0 13:33.69 process
22895 user 20 0 0 0 0 I 0.3 0.0 0:01.44 process

Why is logrotate still running? That is still an open case, but the recognition allowed me to make progress in this investigation. Maybe high CPU usage, even on just one core, could generate some heat.


Zoomed in temperature changes


Looks highly suspicious. Timing just after midnight is rookie scheduler work. Always randomize starting times; use prime numbers. Distribute for niceness.

I looked at other systems and parameters to see if the issue was widespread and have not found a conspiracy. One node showed a spike in one CPU metric .


CPU system time - reference


 

Zooming in:

 


CPU system time - zoomed in


 

This looked like a normal "housekeeping" job impact, where file systems were scanned, data digested and indexed, and maybe some uploads/downloads of audit bits. Not a concern since the background level returned to the "before" range.

MORE DATA


 

When I killed the runaway process, I could see the system CPU return to the baseline (just before 2:40 PM local time):


System CPU with a twist


 

I thought this case was wrapped up when I viewed the temperature reading subsequent to the process cleanup (by 2:56 PM):


Temperatures begin to coalesce


 

Then I looked at the internal and external values once more.

 


Temperature values, with a spike


 

Oh dear, invalid testimony. Clean that data spike out of the archives (harder in the summer since 89 degrees is possible, where in the winter it would have been a dead (heat) giveaway.

Fixed it:


Temperature ranges, no spike


 

Both sources stayed in synch from the process cleanup until the following midnight. As expected, on warm days the interior temperature drop is slower than external. For my purposes, the values to be recorded have stabilized. Or had. The problem recurred!

 


Process-caused heat increase


As mentioned earlier, I don't know why this is running yet, nor the best solution given the futility of a fix lasting under 24 hours max.

 

Lessons learned


Calibrate. Check your reference sources. Eat your wheaties.

Calibrate again.

 

 

 

Next steps


Clean this puppy up. Increase the current hourly freely available external data feeds.

 

References


These are U.S. sites except for the wmo.int, Adafruit and PkgSrc. I assume you can find weather stations with little effort locally most places.

 

noaa.gov

weather.gov

https://www.aviationweather.gov/metar

https://weather.gov/xml/current_obs/

https://www.ncei.noaa.gov/products/world-weather-records

public.wmo.int/en

https://learn.adafruit.com/calibrating-sensors/multi-point-curve-fitting

https://pkgsrc.se/sysutils/logrotate

 

https://tgftp.nws.noaa.gov/weather/current/

https://tgftp.nws.noaa.gov/ftpmail.txt
Example: 
To obtain the Atlantic high seas Forecast, WMO header FZNT01 KWBC,
AWIPS header HSFAT1


Send an e-mail to: NWS.FTPMail.OPS@noaa.gov
Subject Line: Put anything you like
Body: open
cd data
cd raw
cd fz
get fznt01.kwbc.hsf.at1.txt
quit
Top kudoed authors