We had an interesting problem crop up over this weekend. A number of
jobs for a particular instance failed to run and the scheduled date
remained the same (Sat. Night for example.) On Monday morning I
logged in and when I did an agentctl status it said the agent was up.
Through the OEM console I checked the status of the node and it last
pinged the agent 4 seconds ago. I should clarify, when I say failed
to run, the job history didn't show anything. The job just didn't run
and scheduled date remained the same.
Shortly after, while looking into the problem I did another status and
the agent was down. I restarted the agent and went back into the OEM
console. The jobs for that node were rescheduled for the next
interval and the job history showed the agent was down for jobs that
were supposed to run over the weekend. The job history showed nothing
before the agent died and was restarted.
So basically, the dbsnmp process was running, agentctl status showed
it was up, the OEM console wasn't complaining that the agent was down
or out of sync but for some reason there was no communcation between
the two and jobs failed to run.
Has anyone experienced this problem before? Its the first time i've
ever seen this happen and i've been using OEM off and on for years.
Thanks