Batch server tuning
"One
batch job is in executing status and taking eternity”, “Batch job remains in
waiting status”, “Batch does not run on its scheduled time” – These are some
common complaints I have heard from client AX administrators. In most of the
cases, the setup can be optimized to cater for better batch job performance.
In
order to fine tune the setup of batch jobs, there is a need to understand the
software and hardware involved in the process. The below diagram provides the
different components involved in the process.
Let's start from batch job. It can have one or more tasks involved
in it. Each task can be treated as a thread which can be executed independently.
If there is any dependency among task that is explicitly called out during task
creation. Developers are encouraged to develop multi-threaded batch jobs. Secondly
batch job will have a schedule and optionally recurrence attached to it. At the
scheduled date/time, the batch job will be ready to be executed. Actual
execution start depends on availability of thread in batch server. The batch
job is also attached to a batch group which helps in deciding which batch
server it will execute on.
On the server side, any AOS can be marked as batch server. The
batch server will have a list of batch groups that it will cater to. It will
also have schedule which dictates during which times the AOS will act as a batch
server and how many threads it will be using for batch tasks during schedule
time. This is quite useful when client serving AOS are used as batch server
during off-peak times.
Apart from these software setups, there are factors like CPU
cores, memory availability and hard disk usage in both AOS and database server
which effects the performance of the batch jobs and the whole system in
general.
System takes up the task of starting batch job execution. It
looks up at batch server AOS, if it has any schedule for current date/time to
act as batch. If yes, it take the account of the number of threads allocated to
batch activities in the schedule. If any of these threads is available, then
the system looks for the batch groups assigned to the batch server. If any
batch task is in waiting status which is from a batch job belonging to the
matching batch groups at the batch server level, then it assigns a thread to
the task. The status of the task becomes ‘Executing’. The same process keeps on
going and batch jobs keep getting executing.
Tuning starts with developer writing multi-threaded batch
jobs. If that is not done, there is not much an administrator can do. Second step
is to understand whether the batch job is compute intensive or database intensive
operation. This will shift focus on AOS hardware or database server hardware
for tuning purpose.
To start with create multiple batch groups to evenly divide
the load of all of your batches. Depending upon availability of batch servers,
assign batch groups to batch servers. Try to avoid multiple batch groups to be
running in same server at same time (given that load among groups is already
evenly distributed).
Have a look at the performance indicators of the AOS’s e.g.
CPU usage, memory usage and disk usage during 24 hours’ time (take mean of
couple of month’s data). Try to target 80% usage of any of these resources to
get optimal utilization. If any AOS is not having optimal resource utilization
of all three indicators for a period of one hour or more, it becomes a
candidate to become a batch server. How many threads you should assign to such
batch schedule depends on trial-and-error. Start with assigning 4 threads and
start increasing in step of 4 threads to the schedule until 80% of any
indicator is reached during load time.
If with all above process, your batch jobs are still stuck,
then it is time to upgrade your hardware or scale- out the deployment with
another AOS batch server.
For some data intensive operations, increasing threads in
AOS will not help. For this look at the performance indicators at database
server level. If the bottleneck is in DB server, it will impact much more than
just batch jobs as database is used by many other services. So system as a
whole will be slowed down.
Database tuning is separate topic out of current
scope. But CPU cores, memory utilization and disk utilization can be seen as
primary indicators here as well. It can hint of any hardware bottleneck
Thank you for sharing your thoughts and knowledge on this topic.
ReplyDeleteD365 AX Online Training
Hello there! I could have sworn I've been to this blog before but after checking through some of the post I realized it's new to me. Nonetheless, I'm definitely glad I found it and I'll be bookmarking and checking back frequently!
ReplyDeletesupplement manufacturers