While working on batch ingestion use cases on Data Lake, we mainly deal with files for updates. While designing native Cloud-based data platforms and leveraging available frameworks, there is always a concern about reducing the laborious efforts of implementing file-based frameworks and notification mechanisms. Implementing own custom frameworks for file life cycle would be ‘reinventing the wheel’ with a high risk for failure when there has been native integration between Cloud native storages – S3, ADLS, and PaaS-based Data services – Databricks to provide a complete solution.
I encountered a similar decision-making process and saw that file-based ingestion was streamlined. The analytics delta lake layer was kept hot with incoming data updates with Databricks Autoloader. Autoloader has flawless integration between Azure ADLS, AWS S3, and Spark, Delta lake format on Azure, and AWS platforms with a native cloud file-based notification mechanism.