Bookkeeping Service Providers

  • Accounting
  • Bookkeeping
  • US Taxation
  • Financial Planning
  • Accounting Software
  • Small Business Finance
You are here: Home / Top Trends in Data Lakes

Top Trends in Data Lakes

August 24, 2020 by cbn Leave a Comment

Does it seem too early for data lakes to have trends? The reality is data lakes are on the very edge of business transformation efforts and dramatic change.

Data lake platforms load, store, and analyze volumes of data at scale, providing timely insights into business. Data-driven organizations leverage this data in many ways — advanced analysis to market new promotions, operational analytics to drive efficiency, predictive analytics to evaluate credit risk and detect fraud and many other uses.

Image: Stuart Miles - stock.adobe.com

Image: Stuart Miles – stock.adobe.com

While it may seem like early days for the data lake idea to have trends, the reality is that data lakes are on the very edge of business transformation efforts and therefore there are some dramatic changes happening to them now. Some lakes have even failed, but most of those organizations have retrenched and are coming back for its value proposition.

These are trends that will be tied not only to the data lake, but also to data maturity, and company maturity.

The rise of the lakehouse

The most glaring trend is the merger of the data lake and the data warehouse. The effective “lakehouses” combine a data warehouse on an analytic database that meets enterprise SLAs for performance at scale with a cloud-storage based data lake. The combination is primarily the ability of the data warehouse to reach into the cloud storage as necessary. These structures also live on a pipeline with the cloud storage serving as staging for the data warehouse, which will contain a subset of the data (though as much as is needed for high-fidelity analysis), and the data lake, which data scientists will primarily use.  

Explosion in sensor-based time-series data and edge AI

Data volumes are expanding for many organizations as many are now leveraging 5G and IoT data. The number of sensor-driven sources has grown tremendously, and the data being generated is largely time-series data. This data is generated for every point in a small measure of time and collectively represents how a system/process/behavior changes over time.

Embedded databases are built into software, transparent to the application’s end user and require little or no ongoing maintenance. Embedded databases are growing in ubiquity with the rise of mobile applications and internet of things (IoT), giving innumerable devices robust capabilities via their own local database management system (DBMS). Developers can create sophisticated applications right on the remote device. Today, to fully harness data to gain a competitive advantage, embedded databases and the corresponding data lake intake need a high level of performance to provide real-time processing at scale.

Those using IoT can use embedded databases at the edge to process data immediately, even with artificial intelligence, and to copy the aggregated IoT sensor data to a data lake, while aggregating data from all the IoT devices in the data lake to develop analytics.

All these web, mobile, and IoT applications have generated a new set of technology requirements. Embedded database architecture needs to be far more agile than ever before, and requires an approach to real-time data management that can accommodate unprecedented levels of scale, speed, and data flexibility. 

Leveraging cloud storage for data lakes

Data lakes have almost become synonymous with cloud storage in the industry vernacular. Early data lakes utilized Hadoop (HDFS storage), but many jumped in when cloud storage presented a better option. Cloud storage presents a more achievable separate compute and storage architecture where compute resources (Map/Reduce, Hive, Spark, etc.) can be taken down, scaled up or out, or interchanged without data movement. Storage can be centralized, with compute distributed.

Some even have mechanisms to ensure consistency to achieve ACID-like compliance for remote data changes and remote data replication to ensure redundancy and recovery.

Data integration automation

This is a more general trend than just data lakes. Most enterprise data integration is not to the data lake, but much of it will be.

Data integration constitutes upwards of 75% of the work effort in any data lake initiative. However, the absolute time is going to go down as AI gets ahead of the need upon identification of the source and target. “Common” data integration rules will be suggested or automatically applied. As enterprises grow more comfortable with the automated process, the automation of data integration will grow and efforts around the data lake will shift to management and access.

Retaining structure in structured data

Though you can do schema-less data loading in a data lake, it is important to know when and when not to build a schema for data. As a general rule of thumb, retain structure for already structured data and take the time to build schema for data that has high business or analytic value or is often queried by users. For less important or less-accessed data, or where schema will not be valued, create schema on an ad-hoc or as-needed basis. You can also add data to the lake and create the schema when the data needs to be utilized.

Data quality additions

Another trend in managing a data lake is to build it so that you can handle data quality issues, such as de-duplication. This requires additional planning to make it such that the data lake information remains up to organizational standards for accuracy, consistency and completeness. Data lakes will be brought into your data management and governance processes, just as you would for any information asset. This requires the governance to be light and agile, not heavy-handed and dictatorial. Taking the time to ensure that data quality improvements propagate throughout the lake will keep it providing consistent value and be a trusted resource for your data consumers.

Building a data lake is certainly the right response to alleviate the exponentially growing data needs of the modern enterprise. However, getting value out of a data lake over the long haul requires good information management discipline and tools and the uptake of trends like these that save time and money and add value.

William McKnight is the President of McKnight Consulting Group and has advised many of the world’s best-known organizations. His strategies form the information management plan for leading companies in various industries. He is a prolific author and a popular keynote speaker and trainer. He has performed dozens of benchmarks on leading database, data lake, streaming and data integration products. William is a global influencer in data warehousing and master data management, and he leads McKnight Consulting Group, which has placed on the Inc. 5000 list in 2018 and 2017.

The InformationWeek community brings together IT practitioners and industry experts with IT advice, education, and opinions. We strive to highlight technology executives and subject matter experts and use their knowledge and experiences to help our audience of IT … View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.

More Insights

Share on FacebookShare on TwitterShare on Google+Share on LinkedinShare on Pinterest

Filed Under: Uncategorized

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Archives

  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • May 2021
  • April 2021
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • March 2016

Recent Posts

  • How Azure Cobalt 100 VMs are powering real-world solutions, delivering performance and efficiency results
  • FabCon Vienna: Build data-rich agents on an enterprise-ready foundation
  • Agent Factory: Connecting agents, apps, and data with new open standards like MCP and A2A
  • Azure mandatory multifactor authentication: Phase 2 starting in October 2025
  • Microsoft Cost Management updates—July & August 2025

Recent Comments

    Categories

    • Accounting
    • Accounting Software
    • BlockChain
    • Bookkeeping
    • CLOUD
    • Data Center
    • Financial Planning
    • IOT
    • Machine Learning & AI
    • SECURITY
    • Uncategorized
    • US Taxation

    Categories

    • Accounting (145)
    • Accounting Software (27)
    • BlockChain (18)
    • Bookkeeping (205)
    • CLOUD (1,322)
    • Data Center (214)
    • Financial Planning (345)
    • IOT (260)
    • Machine Learning & AI (41)
    • SECURITY (620)
    • Uncategorized (1,284)
    • US Taxation (17)

    Subscribe Our Newsletter

     Subscribing I accept the privacy rules of this site

    Copyright © 2025 · News Pro Theme on Genesis Framework · WordPress · Log in