Bookkeeping Service Providers

  • Accounting
  • Bookkeeping
  • US Taxation
  • Financial Planning
  • Accounting Software
  • Small Business Finance
You are here: Home / CLOUD / Monitoring on Azure HDInsight part 4: Workload metrics and logs

Monitoring on Azure HDInsight part 4: Workload metrics and logs

September 10, 2019 by cbn Leave a Comment

This is the fourth blog post in a four-part series on monitoring on Azure HDInsight. Monitoring on Azure HDInsight Part 1: An Overview discusses the three main monitoring categories: cluster health and availability, resource utilization and performance, and job status and logs. Part 2 is centered on the first topic, monitoring cluster health and availability. Part 3 discussed monitoring performance and resource utilization. This blog covers the third of those topics, workload metrics and logs, in more depth.


During normal operations when your Azure HDInsight clusters are healthy and performing optimally, you will likely focus your attention on monitoring the workloads running on your clusters and viewing relevant logs to assist with debugging. Azure HDInsight offers two tools that can be used to monitor cluster workloads: Apache Ambari and integration with Azure Monitor logs. Apache Ambari is included with all Azure HDInsight clusters and provides an easy-to-use web user interface that can be used to monitor the cluster and perform configuration changes. Azure Monitor collects metrics and logs from multiple resources such as HDInsight clusters, into an Azure Monitor Log Analytics workspace. An Azure Monitor Log Analytics workspace presents your metrics and logs as structured, queryable tables that can be used to configure custom alerts. Azure Monitor logs provide an excellent overall experience for monitoring workloads and interacting with logs, especially if you have multiple clusters.

Azure Monitor logs

Azure Monitor logs enable data generated by multiple resources such as HDInsight clusters to be collected and aggregated in one place to achieve a unified monitoring experience. As a prerequisite, you will need a Log Analytics workspace to store the collected data. If you have not already created one, you can follow these instructions for creating an Azure Monitor Log Analytics workspace. You can then easily configure an HDInsight cluster to send a host of logs and metrics to Azure Monitor Log Analytics.

HDInsight monitoring solutions

Azure HDInsight offers pre-made, monitoring dashboards in the form of solutions that can be used to monitor the workloads running on your clusters. There are solutions for Apache Spark, Hadoop, Apache Kafka, live long and process (LLAP), Apache HBase, and Apache Storm available in the Azure Marketplace. Please see our documentation to learn how to install a monitoring solution. These solutions are workload-specific, allowing you to monitor metrics like  central processing unit (CPU) time, available YARN memory, and logical disk writes across multiple clusters of a given type. Selecting a graph takes you to the query used to generate it, shown in the logs view.

An example of the job graph showing stages 0 through 3 for a spark job.

 

The HDInsight Spark monitoring solutions provide a simple pre-made dashboard where you can monitor workload-specific metrics for multiple clusters on a single pane of glass.

The pre-made dashboard for Kafka we offer as part of HDInsight for monitoring Kafka clusters.

The HDInsight Kafka monitoring solution enables you to monitor all of your Kafka clusters on a single pane of glass.

Query using the logs blade

You can also use the logs view in your Log Analytics workspace to query the metrics and tables directly.

HDInsight clusters emit several workload-specific tables of logs, such as log_resourcemanager_CL, log_spark_CL, log_kafkaserver_CL, log_jupyter_CL, log_regionserver_CL, and log_hmaster_CL.

On the metrics side, clusters emit several metrics tables, including metrics_sparkapps_CL, metrics_resourcemanager_queue_root_CL, metrics_kafka_CL, and metrics_hmaster_CL. For more information, please see our documentation, Query Azure Monitor logs to monitor HDInsight clusters.

The log blade in a Log Analytics workspace used to query metrics and logs tables.

The Logs blade in a Log Analytics workspace lets you query collected metrics and logs across many clusters.

Azure Monitor alerts

You can also set up Azure Monitor alerts that will trigger when the value of a metric or the results of a query meet certain conditions. You can condition on a query returning a record with a value that is greater than or less than a certain threshold, or even on the number of results returned by a query. For example, you could create an alert to send an email if a Spark job fails or if a Kafka disk usage becomes over 90 percent full.

There are several types of actions you can choose to trigger when your alert fires such as an email, SMS, push notification, voice, an Azure Function, an Azure LogicApp, a webhook, an IT service management (ITSM), or an automation runbook. You can set multiple actions for a single alert, and find more information about these different types of actions by visiting our documentation, Create and manage action groups in the Azure Portal.

Finally, you can specify a severity for the alert in addition to the name. The ability to specify severity is a powerful tool that can be used when creating multiple alerts. For example, you could create an alert to raise a Sev 1 warning alert if a single head node becomes unavailable and another alert that raises a Sev 0 critical alert in the unlikely event that both head nodes go down. Alerts can be grouped by severity when viewed later.

Apache Ambari

The Apache Ambari dashboard provides links to several different views for monitoring workloads on your cluster.

ResourceManager user interface

The ResourceManager user interface provides several views to monitor jobs on a YARN-based cluster. Here, you can see multiple views, including an overview of finished or running apps and their resource usage, a view of scheduled jobs by queue, and a list of job execution history and the status of each. You can click on an individual application ID to view more details about that job.

The Applications tab in YARN UI, which shows a list of application execution history for a cluster.

Spark History Server

The Apache Spark History Server shows detailed information for completed Spark jobs, allowing for easy monitoring and debugging.  In addition to the traditional tabs across the top (jobs, stages, executors, etc.), you will find additional data, graph, and diagnostic tabs to help with further debugging.

The pre-made dashboard for Spark we offer as part of HDInsight for monitoring Spark clusters.

Cluster logs

YARN log files are available on HDInsight clusters and can be accessed through the ResourceManager logs link in Apache Ambari. For more information about cluster logs, please see our documentation, Manage logs for an HDInsight cluster.

Next steps

If you haven’t read the other blogs in this series, you can check them out below:

  • Monitoring on Azure HDInsight Part 1: An Overview
  • Monitoring on Azure HDInsight Part 2: Cluster health and availability
  • Monitoring on Azure HDInsight Part 3: Performance and resource utilization

About Azure HDInsight

Azure HDInsight is an easy, cost-effective, enterprise-grade service for open source analytics that enables customers to easily run popular open source frameworks including Apache Hadoop, Spark, Kafka, and others. The service is available in 36 regions and Azure Government and national clouds. Azure HDInsight powers mission-critical applications in a wide variety of sectors and enables a wide range of use cases including extract, transform, and load (ETL), streaming, and interactive querying.

Share on FacebookShare on TwitterShare on Google+Share on LinkedinShare on Pinterest

Filed Under: CLOUD

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Archives

  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • May 2021
  • April 2021
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • March 2016

Recent Posts

  • FabCon Vienna: Build data-rich agents on an enterprise-ready foundation
  • Agent Factory: Connecting agents, apps, and data with new open standards like MCP and A2A
  • Azure mandatory multifactor authentication: Phase 2 starting in October 2025
  • Microsoft Cost Management updates—July & August 2025
  • Protecting Azure Infrastructure from silicon to systems

Recent Comments

    Categories

    • Accounting
    • Accounting Software
    • BlockChain
    • Bookkeeping
    • CLOUD
    • Data Center
    • Financial Planning
    • IOT
    • Machine Learning & AI
    • SECURITY
    • Uncategorized
    • US Taxation

    Categories

    • Accounting (145)
    • Accounting Software (27)
    • BlockChain (18)
    • Bookkeeping (205)
    • CLOUD (1,321)
    • Data Center (214)
    • Financial Planning (345)
    • IOT (260)
    • Machine Learning & AI (41)
    • SECURITY (620)
    • Uncategorized (1,284)
    • US Taxation (17)

    Subscribe Our Newsletter

     Subscribing I accept the privacy rules of this site

    Copyright © 2025 · News Pro Theme on Genesis Framework · WordPress · Log in