If you have been using AWS Glue lately, you might have witnessed the complexity of setting up the infrastructure for building, testing and running a Glue job using Glue Dev endpoint. Setting up a Dev endpoint is no easy task as it takes a lot of effort to be done on your local machine. By using interactive sessions, you can not only author a job faster than ever but also make the whole process easier for you.
Drawbacks of Using Glue Dev Endpoint
Cost: When you want to author a lot of jobs, the dev endpoint can be of great help but if you want to build and run only a few jobs it will turn out to be a costly investment. Since a dev endpoint is an EC2 machine backed with the Glue libraries, cost turns out to be a major factor in using the dev endpoint for just a handful of jobs. Moreover, the minimum billing duration for each provisioned dev endpoint is 10 minutes, which does not make it a great choice for running a single job that takes about 2-3 minutes to complete.
Complexity: Setting up a dev endpoint is a complex task. It requires the stuff to be downloaded on your local machine which makes it difficult for the systems protected with a firewall or the systems without admin rights.
Time: Timing is another drawback of using the Dev endpoint for a less number of jobs. Suppose you want to author 2 PySpark ETL jobs that take a minute each to run. Now, provisioning and establishing a dev endpoint and transferring files to the dev endpoint will take a lot more time to complete than completing the jobs themselves.
Flexibility: Once a dev endpoint has been provisioned the billing continues until you manually delete the dev endpoint. Also note, that AWS continues to charge you till the dev endpoint is in a READY state.
Solution – Interactive Session
An interactive session allows you to leverage the simplicity of Jupyter notebooks while authoring the complex glue jobs interactively. So, let us deep dive into setting up our own interactive session.