Bookkeeping Service Providers

  • Accounting
  • Bookkeeping
  • US Taxation
  • Financial Planning
  • Accounting Software
  • Small Business Finance
You are here: Home / CLOUD / Feathr: LinkedIn’s feature store is now available on Azure

Feathr: LinkedIn’s feature store is now available on Azure

April 12, 2022 by cbn Leave a Comment

This blog post is co-authored by David Stein, Senior Staff Software Engineer, Jinghui Mo, Staff Software Engineer, and Hangfei Lin, Staff Software Engineer, all from Feathr team.

Feature store motivation

With the advance of AI and machine learning, companies start to use complex machine learning pipelines in various applications, such as recommendation systems, fraud detection, and more. These complex systems usually require hundreds to thousands of features to support time-sensitive business applications, and the feature pipelines are maintained by different team members across various business groups.

In these machine learning systems, we see many problems that consume lots of energy of machine learning engineers and data scientists, in particular duplicated feature engineering, online-offline skew, and feature serving with low latency.

Figure 1: Illustration on problems that feature store solves

Figure 1: Illustration on problems that feature store solves.

Duplicated feature engineering

  • In an organization, thousands of features are buried in different scripts and in different formats; they are not captured, organized, or preserved, and thus cannot be reused and leveraged by teams other than those who generated them.
  • Because feature engineering is so important for machine learning models and features cannot be shared, data scientists must duplicate their feature engineering efforts across teams.

Online-offline skew

  • For features, offline training and online inference usually require different data serving pipelines—ensuring consistent features across different environments is expensive.
  • Teams are deterred from using real-time data for inference due to the difficulty of serving the right data.
  • Providing a convenient way to ensure data point-in-time correctness is key to avoid label leakage.

Serving features with low latency

  • For real-time applications, getting feature lookups from database for real-time inference without compromising response latency and with high throughput can be challenging.
  • Easily accessing features with very low latency is key in many machine learning scenarios, and optimizations needs to be done to combine different REST API calls to features.

To solve those problems, a concept called feature store was developed, so that:

  • Features are centralized in an organization and can be reused
  • Features can be served in a synchronous way between offline and online environment
  • Features can be served in real-time with low latency

Introducing Feathr, a battle-tested feature store

Developing a feature store from scratch takes time, and it takes much more time to make it stable, scalable, and user-friendly. Feathr is the feature store that has been used in production and battle-tested in LinkedIn for over 6 years, serving all the LinkedIn machine learning feature platform with thousands of features in production.

At Microsoft, the LinkedIn team and the Azure team have worked very closely to open source Feathr, make it extensible, and build native integration with Azure. It’s available in this GitHub repository and you can read more about Feathr on the LinkedIn Engineering Blog.

Some of the highlights for Feathr include:

  • Scalable with built-in optimizations. For example, based on some internal use case, Feathr can process billions of rows and PB scale data with built-in optimizations such as bloom filters and salted joins.
  • Rich support for point-in-time joins and aggregations: Feathr has high performant built-in operators designed for Feature Store, including time-based aggregation, sliding window joins, look-up features, all with point-in-time correctness.
  • Highly customizable user-defined functions (UDFs) with native PySpark and Spark SQL support to lower the learning curve for data scientists.
  • Pythonic APIs to access everything with low learning curve; Integrated with model building so data scientists can be productive from day one.
  • Rich type system including support for embeddings for advanced machine learning/deep learning scenarios. One of the common use cases is to build embeddings for customer profiles, and those embeddings can be reused across an organization in all the machine learning applications.
  • Native cloud integration with simplified and scalable architecture, which is illustrated in the next section.
  • Feature sharing and reuse made easy: Feathr has built-in feature registry so that features can be easily shared across different teams and boost team productivity.

Feathr on Azure architecture

The high-level architecture diagram below articulates how would a user interacts with Feathr on Azure:

Feathr on Azure architecture.

Figure 2: Feathr on Azure architecture.

  1. A data or machine learning engineer creates features using their preferred tools (like pandas, Azure Machine Learning, Azure Databricks, and more). These features are ingested into offline stores, which can be either:
    • Azure SQL Database (including serverless), Azure Synapse Dedicated SQL Pool (formerly SQL DW).
    • Object storage, such as Azure BLOB storage, Azure Data Lake Store, and more. The format can be Parquet, Avro, or Delta Lake.
  2. The data or machine learning engineer can persist the feature definitions into a central registry, which is built with Azure Purview.
  3. The data or machine learning engineer can join on all the feature dataset in a point-in-time correct way, with Feathr Python SDK and with Spark engines such as Azure Synapse or Databricks.
  4. The data or machine learning engineer can materialize features into an online store such as Azure Cache for Redis with Active-Active, enabling multi-primary, multi-write architecture that ensures eventual consistency between clusters.
  5. Data scientists or machine learning engineers consume offline features with their favorite machine learning libraries, for example scikit-learn, PyTorch, or TensorFlow to train a model in their favorite machine learning platform such as Azure Machine Learning, then deploy the models in their favorite environment with services such as Azure Machine Learning endpoint.
  6. The backend system makes a request to the deployed model, which makes a request to the Azure Cache for Redis to get the online features with Feathr Python SDK.

A sample notebook containing all the above flow is located in the Feathr repository for more reference.

Feathr has native integration with Azure and other cloud services. The table below shows these integrations:

Feathr component

Cloud Integrations

Offline store – Object Store

Azure Blob Storage
Azure ADLS Gen2
AWS S3

 

Offline store – SQL

Azure SQL DB
Azure Synapse Dedicated SQL Pools (formerly SQL DW)
Azure SQL in VM
Snowflake

Online store

Azure Cache for Redis

Feature Registry

Azure Purview

Compute Engine

Azure Synapse Spark Pools
Databricks

Machine Learning Platform

Azure Machine Learning
Jupyter Notebook

File Format

Parquet
ORC
Avro
Delta Lake

Table 1: Feathr on Azure Integration with Azure Services.

Installation and getting started

Feathr has a pythonic interface to access all Feathr components, including feature definition and cloud interactions, and is open sourced here. The Feathr python client can be easily installed with pip:

pip install -U feathr

For more details on getting started, please refer to the Feathr Quickstart Guide. The Feathr team can also be reached in the Feathr community.

Going forward

In this blog, we’ve introduced a battle-tested feature store, called Feathr, which is scalable and enterprise ready, with native Azure integrations. We are dedicated to bringing more functionalities into Feathr and Feathr on Azure integrations, and feel free to give any feedback by raising issues in Feathr GitHub repository.

  • Checkout the GitHub repository for Feathr.
  • Reach out to Feathr community.
  • Read Feathr open-source blog post from our LinkedIn colleagues.
Share on FacebookShare on TwitterShare on Google+Share on LinkedinShare on Pinterest

Filed Under: CLOUD

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Archives

  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • May 2021
  • April 2021
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • March 2016

Recent Posts

  • How Azure Cobalt 100 VMs are powering real-world solutions, delivering performance and efficiency results
  • FabCon Vienna: Build data-rich agents on an enterprise-ready foundation
  • Agent Factory: Connecting agents, apps, and data with new open standards like MCP and A2A
  • Azure mandatory multifactor authentication: Phase 2 starting in October 2025
  • Microsoft Cost Management updates—July & August 2025

Recent Comments

    Categories

    • Accounting
    • Accounting Software
    • BlockChain
    • Bookkeeping
    • CLOUD
    • Data Center
    • Financial Planning
    • IOT
    • Machine Learning & AI
    • SECURITY
    • Uncategorized
    • US Taxation

    Categories

    • Accounting (145)
    • Accounting Software (27)
    • BlockChain (18)
    • Bookkeeping (205)
    • CLOUD (1,322)
    • Data Center (214)
    • Financial Planning (345)
    • IOT (260)
    • Machine Learning & AI (41)
    • SECURITY (620)
    • Uncategorized (1,284)
    • US Taxation (17)

    Subscribe Our Newsletter

     Subscribing I accept the privacy rules of this site

    Copyright © 2025 · News Pro Theme on Genesis Framework · WordPress · Log in