System security usually includes two core topics: Authentication and Authorization. One solves the problem of “Who is s/he?” the other solves the problem of “Does s/he have permission to perform an operation?” In big data area, Apache Ranger is the most popular choice for authorization; it supports all mainstream big data components, including HDFS, Hive, HBase and so on. As Amazon EMR rolls out the native ranger (plugins) feature, users can also manage the authorization of EMRFS(S3) and Spark. For authentication, an organization usually has its own centralized authentication infrastructure, i.e., Windows AD or OpenLDAP; however, for most big data components, Kerberos is only supported authentication mechanism, so users usually need to integrate Windows AD/OpenLDAP and Kerberos together to unify authentication.

This is a series of articles. We will focus on how to implement automated installation and integration for Amazon EMR and Apache Ranger. This series is composed of four articles; each article will introduce a completed solution against different technology stacks.

Generated by Feedzy