Batch server tuning

"One batch job is in executing status and taking eternity”, “Batch job remains in waiting status”, “Batch does not run on its scheduled time” – These are some common complaints I have heard from client AX administrators. In most of the cases, the setup can be optimized to cater for better batch job performance. 

In order to fine tune the setup of batch jobs, there is a need to understand the software and hardware involved in the process. The below diagram provides the different components involved in the process.


Let's start from batch job. It can have one or more tasks involved in it. Each task can be treated as a thread which can be executed independently. If there is any dependency among task that is explicitly called out during task creation. Developers are encouraged to develop multi-threaded batch jobs. Secondly batch job will have a schedule and optionally recurrence attached to it. At the scheduled date/time, the batch job will be ready to be executed. Actual execution start depends on availability of thread in batch server. The batch job is also attached to a batch group which helps in deciding which batch server it will execute on.

On the server side, any AOS can be marked as batch server. The batch server will have a list of batch groups that it will cater to. It will also have schedule which dictates during which times the AOS will act as a batch server and how many threads it will be using for batch tasks during schedule time. This is quite useful when client serving AOS are used as batch server during off-peak times.

Apart from these software setups, there are factors like CPU cores, memory availability and hard disk usage in both AOS and database server which effects the performance of the batch jobs and the whole system in general.

System takes up the task of starting batch job execution. It looks up at batch server AOS, if it has any schedule for current date/time to act as batch. If yes, it take the account of the number of threads allocated to batch activities in the schedule. If any of these threads is available, then the system looks for the batch groups assigned to the batch server. If any batch task is in waiting status which is from a batch job belonging to the matching batch groups at the batch server level, then it assigns a thread to the task. The status of the task becomes ‘Executing’. The same process keeps on going and batch jobs keep getting executing.

Tuning starts with developer writing multi-threaded batch jobs. If that is not done, there is not much an administrator can do. Second step is to understand whether the batch job is compute intensive or database intensive operation. This will shift focus on AOS hardware or database server hardware for tuning purpose.

To start with create multiple batch groups to evenly divide the load of all of your batches. Depending upon availability of batch servers, assign batch groups to batch servers. Try to avoid multiple batch groups to be running in same server at same time (given that load among groups is already evenly distributed).

Have a look at the performance indicators of the AOS’s e.g. CPU usage, memory usage and disk usage during 24 hours’ time (take mean of couple of month’s data). Try to target 80% usage of any of these resources to get optimal utilization. If any AOS is not having optimal resource utilization of all three indicators for a period of one hour or more, it becomes a candidate to become a batch server. How many threads you should assign to such batch schedule depends on trial-and-error. Start with assigning 4 threads and start increasing in step of 4 threads to the schedule until 80% of any indicator is reached during load time.

If with all above process, your batch jobs are still stuck, then it is time to upgrade your hardware or scale- out the deployment with another AOS batch server.

For some data intensive operations, increasing threads in AOS will not help. For this look at the performance indicators at database server level. If the bottleneck is in DB server, it will impact much more than just batch jobs as database is used by many other services. So system as a whole will be slowed down.

Database tuning is separate topic out of current scope. But CPU cores, memory utilization and disk utilization can be seen as primary indicators here as well. It can hint of any hardware bottleneck

Comments

  1. Thank you for sharing your thoughts and knowledge on this topic.
    D365 AX Online Training

    ReplyDelete
  2. Hello there! I could have sworn I've been to this blog before but after checking through some of the post I realized it's new to me. Nonetheless, I'm definitely glad I found it and I'll be bookmarking and checking back frequently!
    supplement manufacturers

    ReplyDelete

Post a Comment

Popular Posts