In the prior chapter, we left the ABAP Detective ...
After that, I sat back to drink a strong cup of java, with a hint of cough syrup, pondering what to look at next. |
Missing Process Flyers
I kicked around ideas on what would be the most profitable leads in this case, while sitting in my cubicle waiting for the IP phone to ring, and half-expecting another case to get dropped in my lap before I could solve this one. It happens all the time, and shutting the office door has no effect; the bugs creep over the cubicle walls, try to infect email, and show up on under my windshield wipers like so many outdated parking tickets. I figured one place to get some low-down was in the community forums, but I had to be careful. They don't like strangers nosing around, especially ones with a hard luck story they've heard a thousand times. Those Coffee Corner bar moderators will tear the wall paper right off your screen saver.
After doing the preliminaries of searching the wanted posters, checking out which forum spaces might give a hoot about the lost network child, and crafting telegram-speak, I went down to the Western Union office just off Walldorf Square, and gave the friendly robotic clerk my message. "Are you sure this is a question, gumshoe?" he asked, and I verified my sincerity by stabbing a bit-stained finger at the check-box. "That's what it says there, right?" "Just checking," he says, "We get a lot of foolish posts here and there's no sense wasting everyone's time with a bunch of notions you could have resolved without getting me out of my comfortable cubicle."
The message went out over the wires, and the wireless:
"WSAEADDRINUSE" from custom app - how to debug? (blog post) |
Maybe not the best way to flush out a hardened process sleeper, but it's just part of the business. Within 24 hours, leads started coming in. When I read them over, I knew I hadn't given enough background on the wanted poster. Had I checked over the client settings? I know which end of an SAP note to point at a problem, so it was my fault for not stating the obvious: yes, we checked the book. It says you might need to widen your net to make sure processes don't get choked.
To fill in the case notes from the client, " WSAEADDRINUSE error during connection setup " has the rap sheet closest to our incident. My client shared this dirt with us early in the investigation, and thinks they're clean. Just to make sure I wasn't overlooking the obvious, I asked again if the number of ports had been bumped up (not off), and the retry time was set right. They said yes, "We upped the MaxUserPort to 65000 and rebooted, and there was no impact."
So, I knew I had a dilemma. The maximum number of ports that can be used is very large, yet we're seeing port in use from what is supposed to be a single-threaded application, and no other symptoms are visible on the server side. If we were running out of SAP connections, other commuters wouldn't be getting on the street car, and we should have heard about this since. It had to be something at the client operating system level. But what?
I brought down the case law book from the SAP help library. As usual, not a lot of detail, nor applicable examples.
I doodled a copy of the logic that is supposed to happen. One, open a connection; two, get the data.Sum total of helpful hints from the online help:
Nothing there about what's in a debug file, how to tell if your pool is empty, overflowing, or full of baby ruth bars.
Another clue came in from a peer, who reported:
The error is coming from an in house application that has worked for years and we just started seeing these errors. It is using the SAP Connector to make the connections. |
We're using SAP connector library 2.0.0.23 said my client.
Other correspondents on the ABAP Detective network also said to check the same notes, and that "[a] high frequency of opening and closing RFC connections in the given function in the process step is causing problems." That's what the error message seems to say. I just need to catch the scofflaw red-handed to put this one on the win column.
Is it a woman?
Another intelligence contact suggested the missing piece was "the woman with the smoky voice." Definitely an angle to the case I had not considered.
What was the right approach here? The wrong question and I'd get slapped with a restraining order that would set my case load back to square one. The right person to ask would probably be Dawn, Dawn Haymond, of the Rook Agency. But she never answers my calls because her cases are always more important. Not, it is a silly place, so we won't go there.
HQ
Of course, I needed to see if the SAP Police Headquarters squads could help me. Most of the time, they're too busy for an ABAP Detective. They just want to help on open and shut cases Anything else takes away from their success rate, so the only way to enlist their help is to deliver them the case already solved and gift-wrapped.
Mug Shot Files
Before I could go downtown, I needed to walk the beat and find out what this case was all about Are we talking about a skip trace, a bunko, or maybe a case of mistaken identity? The best way I know is to go on a stakeout, or better yet, infiltrate the mob with a good disguise. To get prepared though, I needed to go through the mug shots and fingerprint files to understand the nature of this case. If the perp is known to the authorities, I'd save myself a lot of investigative work ("Bradshaw?").
One of the first questions I asked was "when does the error occur?" Is it any time, day or night, or is it in a crowded theater, or is it in a still nighttime diner? So I got access to the case files, and ran it into an ex-Cell block to figure out which way the cards would play.
First, error counts by time of day. No pattern I could see, other than that everything cuts out around 23:00 each night.
Second, error counts by the day. Stickups are lower on the weekend, but nothing that would show this is caused by the phases of the moon or anything else on the busines calendar.
Third, another daily chart, with a little twist of the cell bar chart type. It almost, but not quite, looks like problems get worse later in the week, right before pay day. But there's nothing here that could get me a warrant, much less a conviction. So much for the scientific method. It is now time to go old school, shaking up the process table and seeing what drops out. But with kid gloves, since this is a production factory.
Leads
After peering at the evidence, and considering the advice of community counsel, I'm left with a lot of loose ends. None of these are strong leads, though I'll need to look at all possibilities. My gut still tells me it's a code problem. Whether it's ours or theirs remains to be seen. Here's what I've got in my case book:
Errata
In the above referenced SAP note, which is alas dated from 2004, the 2 Microsoft TechNet links lead to dead ends:
Better ones would be (partly courtesy of SCN Neighborhood Watch):
Late Tip
Another tipster suggested late last night to look st smgw logs on the server, with the idea that we've hit the maximum number of connections on the server. I've looked before, and didn't see anything. It's worth putting it in the case folder for due process anyway,