Batch jobs are critical to the organization. While many organization monitored their critical batch jobs, there are five failure types that could go unnoticed and cause a lot of damage.
Batch jobs are background tasks that are provided with more allocated memory than the ones that are done in the foreground. They are used to process high volumes of data that would normally consume long-term memory if run in the foreground, as well as for running programs that require less user interaction.
The problem with batch jobs is that if they don’t run as expected, they can cause major issues with system performance; or even worse, may not even produce the desired results. In other words, they could lead to incorrect reports, manufacturing delays, misrepresented financial data… the possibilities are nearly endless.
And here’s the scary part - 99% of organizations ARE monitoring batch jobs, but are limiting it to merely if jobs failed or not. While reporting and notifying on batch jobs is extremely crucial to your SAP operations, monitoring these five scenarios will dramatically reduce the chances of a batch job issue going unnoticed and creating these awful results:
Ever have the same job run right alongside itself? It happens, and imagine the fun that can be. From affecting business data, such as duplicated orders, to doubling down on system performance degradation.
There are many reasons why this may happen. If periodic jobs are scheduled too close to each other and eventually overlap, jobs are doubled up by human error or third party systems release jobs incorrectly. Even with specifically designed stopgaps to prevent simultaneous running jobs from happening they are never foolproof. Being able to immediately alert, notify and report on these scenarios can save a significant amount of time in manually overriding entries and time spent troubleshooting.
Job Start Delay:
The timing of when a batch job runs can be crucial to the business. It can directly affect additional jobs and impact when data needs to be extrapolated or reported. This usually happens when prior batch jobs are running longer than normal and there aren’t enough resources (work process) available on the system. Your most critical jobs that are identified as needing to run at specific times should be monitored for delayed starts.
Maximum Job Run Time:
Recurring batch jobs typically run within a similar time range, give or take a few minutes. However, from time to time they may run much longer than normal and can cause some big issues. These long-running jobs directly impact the delayed starts and simultaneous running jobs we just identified. This could be due to a multitude of factors but are typically due to additional data to process or the system is low on resources. The later is especially scary as adding a heavy, long-running job to a system that is already low on resources is the start of a downward spiral for system performance.
Minimum Job Run Time:
Here is one that gets overlooked in almost every organization. The idea of long-running batch jobs makes sense, but why monitoring a short running batch job? Typically recurring jobs are processing the same amount of data over and over again. But if one day a job that normally runs for 30 minutes runs for 2 minutes, this may be a red flag that not all the necessary data was available to process.
These situations often get overlooked as the job itself does not appear as a failure in SAP, and in fact probably even show as a successfully run batch job. Being able to alert, notify and report on a job that ran abnormally short can make a basis engineer a business hero overnight.
Most batch jobs run on a recurring schedule, hourly, daily, weekly, monthly, etc… But what happens if a job gets taken out of the recurring schedule by human error? SAP, itself, isn’t going to know it’s human error. It will just continue on and no longer run that particular recurring job. If we’re only monitoring failed batch jobs, an unscheduled job can’t fail - meaning the system will never notify anyone of this change. This happens all the time and often leads to a TON of manual backtracking and entry. A third party tool that can monitor ‘maximum job time since last run’ can identify if a job that is supposed to be running periodically, hasn’t run as it should.
For each of your organization most critical batch jobs, you need monitors tracking each one of the above-mentioned scenarios. You can do it manually, daily (?), or better yet use a tool as such as Syslink Xandria to automate it all.
With Syslink Xandria, these enhanced batch job monitoring techniques can all be configured in one simple to use stage. Easily configure one, or multiple jobs all with little effort and easy implement these checks across multiple systems in seconds.
You can take it a step further and include/exclude (whitelist/blacklist) jobs run by specific users or on specific clients.
Batch jobs are critical to the organization. While many organization monitored their critical batch jobs, there are five failure types that could go unnoticed and cause a lot of damage. These five are:
To ensure proper operation of your organization batch jobs, you have to set up monitors for each such scenario for every critical batch job. Or more easily, using Xandria, in one step you can set up all of this to all your batch jobs