Bookkeeping Service Providers

  • Accounting
  • Bookkeeping
  • US Taxation
  • Financial Planning
  • Accounting Software
  • Small Business Finance
You are here: Home / CLOUD / Always-on, real-time threat protection with Azure Cosmos DB – part one

Always-on, real-time threat protection with Azure Cosmos DB – part one

July 23, 2019 by cbn Leave a Comment

This two-part blog post is a part of a series about how organizations are using Azure Cosmos DB to meet real world needs, and the difference it’s making to them. In part one, we explore the challenges that led the Microsoft Azure Advanced Threat Protection team to adopt Azure Cosmos DB and how they’re using it. In part two, we’ll examine the outcomes resulting from the team’s efforts.

Transformation of a real-time security solution to cloud scale

Microsoft Azure Advanced Threat Protection is a cloud-based security service that uses customers’ on-premises Azure Active Directory signals to identify, detect, and investigate advanced threats, compromised identities, and malicious insider actions. Launched in 2018, it represents the evolution of Microsoft Advanced Threat Analytics, an on-premises solution, into Azure. Both offerings are composed of two main components:

  1. An agent, or sensor, which is installed on each of an organization’s domain controllers. The sensor inspects traffic sent from users to the domain controller along with Event Tracing for Windows (ETW) events generated by the domain controller, sending that information to a centralized back-end.
  2. A centralized back-end, or center, which aggregates the information from all the sensors, learns the behavior of the organization’s users and computers, and looks for anomalies that may indicate malicious activity.

Advanced Threat Analytics’ center used an on-premises instance of MongoDB as its main database—and still does today for on-premises installations. However, in developing the Azure Advanced Threat Protection center, a managed service in the cloud, Microsoft needed something more performant and scalable. “The back-end of Azure Advanced Threat Protection needs to massively scale, be upgraded on a weekly basis, and run continuously-evolving, advanced detection algorithms—essentially taking full advantage of all the power and intelligence that Azure offers,” explains Yaron Hagai, Principal Group Engineering Manager for Advanced Threat Analytics at Microsoft.

In searching for the best database for Azure Advanced Threat Protection to store its entities and profiles—the data learned in real time from all the sensors about each organization’s users and computers—Hagai’s team mapped out the following key requirements:

  • Elastic, per-customer scalability: Each organization that adopts Azure Advanced Threat Protection can install hundreds of sensors, generating potentially tens of thousands of events per second. To learn each organization’s baseline and apply its anomaly detection algorithms in real-time, Azure Advanced Threat Protection needed a database that could efficiently and cost-effectively scale.
  • Ease of migration: The Azure Advanced Threat Protection data model is constantly evolving to support changes in detection logic. Hagai’s team didn’t want to worry about constantly maintaining backwards compatibility between the service’s code and its ever-changing data model, which meant they needed a database that could support quick and easy data migration with almost every new update to Azure Advanced Threat Protection they deployed.
  • Geo-replication: Like all Azure services, Advanced Threat Protection must support customers’ critical disaster recovery and business continuity needs, including in the highly unlikely event of a datacenter failure. Through the use of geo-replication, customers’ data can be replicated from a primary datacenter to a backup datacenter, and the Azure Advanced Threat Protection workload can be switched to the backup datacenter in the event of a primary datacenter failure.

A managed, scalable, schema-less database in the cloud

The team chose Azure Cosmos DB as the back-end database for Azure Advanced Threat Protection. “As the only managed, scalable, schema-less database in Azure, Azure Cosmos DB was the obvious choice,” says Hagai. “It offered the scalability needed to support our growing customer base and the load that growth would put on our back-end service. It also provided the flexibility needed in terms of the data we store on each organization and its computers and users. And it offered the flexibility needed to continually add new detections and modify existing ones, which in turn requires the ability to constantly change the data stored in our Azure Cosmos DB collections.”

Azure Advanced Threat Protection diagram

Collections and partitioning

Of the many APIs that Azure Cosmos DB supports, the development team considered both the SQL API and the Azure Cosmos DB API for MongoDB for Azure Advanced Threat Protection. Eventually, they chose the SQL API because it gave them access to a rich, Microsoft-authored client SDK with support for multi-homing across global regions, and direct connectivity mode for low latency. Developers chose to allocate one Azure Cosmos DB database per tenant, or customer. Each database has five collections, which each start with a single partition. “This allows us to easily delete the data for a customer if they stop using Azure Advanced Threat Protection,” explains Hagai. “More importantly, however, it lets us scale each customer’s collections independently based on the throughput generated by their on-premises sensors.”

Of the set of collections per customer, two usually grow to more than one partition:

  • UniqueEntity, which contains all the metadata about the computers and users in the organization, as synchronized from Active Directory.
  • UniqueEntityProfile, which contains the behavioral baseline for each entity in the UniqueEntity collection and is used by detection logic to identify behavioral anomalies that imply a compromised user or computer, or a malicious insider.

“Both collections have very high read/write throughput with large Request Units per second (RU/s) consumption,” explains Hagai. “Azure Cosmos DB seamlessly scales out storage of collections as they grow, and some of large customers have scaled up to terabytes in size per collection, which would have not been possible with MongoDB on VMs.”

The other three collections for each customer typically contain less than 1,000 documents and do not grow past a single partition. They include:

  • SystemProfile, which contains data learned for the tenant and applied to behavioral based detections.
  • SystemEntity, which contains configuration information and data about tenants.
  • Alert, which contains alerts that are generated and updated by Azure Advanced Threat Protection.

Migration

As the Azure Advanced Threat Protection detection logic constantly evolves and improves, so does the behavioral data stored in each customer’s UniqueEntityProfile collection. To avoid the need for backwards compatibility with outdated schemas, Azure Advanced Threat Protection maintains two migration mechanisms, which run with each upgrade to the service that includes changes to its data models:

  • On-the-fly: As Azure Advanced Threat Protection reads documents from Azure Cosmos DB, it checks their version field. If the version is outdated, Azure Advanced Threat Protection migrates the document to the current version using explicit transformation logic written by Hagai’s team of developers.
  • Batch: After a successful upgrade, Azure Advanced Threat Protection spins up a scheduled task to migrate all documents for all customers to the newest version, excluding those that have already been migrated by the on-the-fly mechanism.

Together, these two migration mechanisms ensure that after the service was upgraded and the data access layer code was changed, no errors will occur due to parsing outdated documents. No backwards compatibility code is needed besides the explicit migration code, which is always removed in the subsequent version.

Automatic scaling and backups

Collections with very high read/write throughput often are rate-limited as they reach their provisioned RU/s limits for a collection. When one of the service’s nodes, each node is a virtual machine, tries to perform an operation against a collection and gets a “429 Too Many Requests” rate limiting exception, it uses Azure Service Fabric remoting to send a request to a centralized auto-scale service for increased throughput. The centralized service aggregates such requests from multiple nodes to avoid increasing throughput more than once within a short window of time, as this may be caused by a single burst of throughput that affects multiple nodes. To minimize overall RU/s costs, a similar, periodic scale-down process reduces provisioned throughput when appropriate, such as during each customer’s non-working hours.

Azure Advanced Threat Protection takes advantage of the auto-backup feature of Azure Cosmos DB to help protect each of the collections. The backups reside in Azure Blob storage and are replicated to another region through the use of geo-redundant storage (GRS). Azure Advanced Threat Protection also replicates customer configuration data to another region, which allows for quick recovery in the case of a disaster. “We do this primarily to safeguard the sensor configuration data—preventing the need for an IT admin to reconfigure hundreds of sensors if the original database is lost,” explains Hagai.

Azure Advanced Threat Protection recently began onboarding full geo-replication. “We’ve started to enable geo-replication and multi-region writes for seamless and effortless replication of our production data to another region,” says Hagai. “This will allow us to further improve and guarantee service availability and will simplify service delivery versus having to maintain our own high-availability mechanisms.”

Continue on to part two, which covers the outcomes resulting from the Azure Advanced Threat Protection team’s implementation of Azure Cosmos DB.

Share on FacebookShare on TwitterShare on Google+Share on LinkedinShare on Pinterest

Filed Under: CLOUD, SECURITY

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Archives

  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • May 2021
  • April 2021
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • March 2016

Recent Posts

  • FabCon Vienna: Build data-rich agents on an enterprise-ready foundation
  • Agent Factory: Connecting agents, apps, and data with new open standards like MCP and A2A
  • Azure mandatory multifactor authentication: Phase 2 starting in October 2025
  • Microsoft Cost Management updates—July & August 2025
  • Protecting Azure Infrastructure from silicon to systems

Recent Comments

    Categories

    • Accounting
    • Accounting Software
    • BlockChain
    • Bookkeeping
    • CLOUD
    • Data Center
    • Financial Planning
    • IOT
    • Machine Learning & AI
    • SECURITY
    • Uncategorized
    • US Taxation

    Categories

    • Accounting (145)
    • Accounting Software (27)
    • BlockChain (18)
    • Bookkeeping (205)
    • CLOUD (1,321)
    • Data Center (214)
    • Financial Planning (345)
    • IOT (260)
    • Machine Learning & AI (41)
    • SECURITY (620)
    • Uncategorized (1,284)
    • US Taxation (17)

    Subscribe Our Newsletter

     Subscribing I accept the privacy rules of this site

    Copyright © 2025 · News Pro Theme on Genesis Framework · WordPress · Log in