System security usually includes two core topics: authentication and authorization. One solves the problem of “Who is s/he?” and the other solves the problem of “Does s/he have permission to perform an operation?” In the big data area, Apache Ranger is one of the most popular choices for authorization, it supports all mainstream big data components, including HDFS, Hive, HBase, and so on. As Amazon EMR rolls out native ranger (plugins) features, users can manage the authorization of EMRFS(S3), Spark, Hive, and Trino all together. For authentication, an organization usually has its own centralized authentication infrastructure, i.e., Windows AD or OpenLDAP; however, for most big data components, Kerberos is only supported authentication mechanism, so users usually need to integrate Windows AD/OpenLDAP and Kerberos together to unify authentication.
We will focus on how to implement automated installation and integration for Amazon EMR and Apache Ranger. This series is composed of four articles. Each article will introduce a completed solution against different technology stacks.