data science Tips And Tricks
import pandas in aws lambda

How to Import Pandas in AWS Lambda

AWS Lambda is one of the most popular serverless compute services and it has many benefits. However, one of the drawbacks of AWS Lambda is that the list of supported packages is limited and most of the Python modules are not available by default. Some third-party packages are already embedded in AWS Lambda Python runtimes, but if you need to add other external dependencies it can be difficult to install and configure them for use in your Lambda functions.

In this blog post, we want to show you how to successfully import into AWS lambda one of the most commonly used Python libraries for data analysis: ‘pandas’.

What is AWS Lambda?

 

AWS Lambda is a serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers. This is sometimes referred to as function-as-a-service (FaaS).

AWS Documentation

The code which runs on AWS Lambda is uploaded as a “Lambda function” and it is stateless.

AWS Lambda natively supports different languages like Java, Node.js, C#, and many more. For this blog post, we are using Python as our runtime language.

Lambda functions can be invoked manually or by an event source trigger.

What is a Lambda Layer?

A Lambda layer is an archive containing additional code, such as libraries, dependencies, or even custom runtimes.

AWS Lambda already includes pre-compiled layers that you can easily add and use within your Lambda function, otherwise, the simplest way to install third-party packages is by creating a new custom layer.

A Lambda layer is simply a .zip file that contains a collection of packages; it can be customized to include all packages you need and then can be easily uploaded and attached to one or more Lambda functions of interest:

Difference between using and not using a Lambda layer (from AWS Documentation)

When you invoke a Lambda, it loads the attached Layer together with the function, and the contents of the layer are extracted to the /opt directory in the execution environment.

It will help reduce complexity and make it easier to manage dependency versions across all of your functions.

How to add external dependencies

Imagine you need to run a Python script in AWS Lambda where you need to import the ‘pandas’ library.

pandas is a fast, powerful, flexible open source library that provides easy-to-use data structures and manipulation and data analysis tools for the Python programming language.

When you run the code, unfortunately, you will get an error message that says:

Response
{
  "errorMessage": "Unable to import module 'lambda_function': No module named 'pandas',
  "errorType": "Runtime.ImportModuleError"
  ...
}
Why does this error message come out?

This is because the ‘pandas’ library is not available in AWS Lambda Python environments by default.

Depending on the runtime language selected, some packages may be pre-installed within your Lambda container, but if you need additional requirements you should add them manually. For example with the Python 3.9 runtime, ‘JSON’ is included but ‘pandas’ is not.

Let’s see then how to import Pandas in AWS Lambda.

How to import Pandas in AWS Lambda

We will now cover how to successfully import the Pandas package into an AWS lambda function in a few steps.

Note: This example procedure shows how to install pandas and create a custom pandas layer through the AWS console, for a Lambda function that uses Python 3.9 runtime that runs on x86_64 architecture.

Create Layer Contents (locally)

Step 1: Download files from Python Package Index (PyPI)

The Python Package Index, abbreviated as PyPI, is the primary software repository for Python. We used it to download the Linux distributions of the necessary dependencies.

Since we created the lambda function using the Python 3.9 runtime and the default architecture, we need to make sure we download the wheels files with the right built distribution type; in particular, we need to look at:

  • Python version –> cp39
  • architecture –> x86_64

For operating ‘pandas’, you need to install two more required dependencies: ‘NumPy’ and ‘pytz’; so, for our purpose, we downloaded from PyPI these 3 files:

  1. pandas: pandas-1.4.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
  2. NumPy: numpy-1.23.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
  3. pytz: pytz-2022.2.1-py2.py3-none-any.whl
Step 2: Unzip wheels files

Once you have downloaded the WHL files to your desktop, you need to unzip them.

A wheel file (.whl) is similar in many ways to a ZIP file (.zip) and is a package saved in the Wheel format, which is the default built-package format used for Python distributions.

You can use the wheel library to unzip the .whl files. Remember that you have to run the commands directly from where you downloaded those files, e.g. ‘downloads’ folder:

cd downloads
pip install wheel
wheel unpack full-name-of-the-file-you-want-to-unpack.whl

In our case, as mentioned above, we downloaded and unpacked 3 wheel files.

Note: Since a wheel is a ZIP file, unzip works too (unzip full-name-of-the-file-you-want-to-unpack.whl)

Step 3: Create a new directory named 'python'
mkdir python

IMPORTANT: the folder name must be exactly this, otherwise the import fails!

Step 4: Copy the contents of the wheel files and paste them into the 'python' directory

As you can see from the screenshot below, the Pandas and Numpy wheel files contained 3 folders while Pytz just 2. We just copied and pasted them into the ‘python’ folder:

Step 5: Zip 'python' directory

What you finally need to do is to compress the ‘python’ folder so that you can upload it as a lambda layer and use it in your lambda function.

Create a Custom Lambda Layer (Lambda console)

You are now ready to create your custom lambda layer. Using the AWS console you need just to:

  • enter the required and optional information (i.e. Name and Description)
  • upload the previously created ‘python’ .zip file from your computer
  • choose the compatible instruction set architecture and runtimes (i.e. version of a programming language to write the function), such as those chosen when creating the lambda function

Add the Custom Layer to your Lambda function (Lambda console)

Select the lambda function of interest and add the custom layer version, previously created; you can easily do this by clicking on Layers in the ‘Function Overview’ pane or you can scroll down to the ‘Layers’ section:

Deploy and Run your Python Code

Now that we have successfully uploaded our .zip file to the Lambda layer and linked this layer to our lambda function, we can deploy our Python code and try to import pandas. We are finally able to run a test event without any “module not found” error.

Hope you found this blog post interesting and useful.

Thanks for reading and see you soon for other tips and tricks!

// our service

Shameless plug for our training and services!

Did I mention we do beginner-level training sessions? Check these training about Apache Solr and Elasticsearch.
We also provide other training tailored to your level and needs: get in touch if you want to bring your career to the next level!

// STAY ALWAYS UP TO DATE

Subscribe to our newsletter

Did you like this post about Drop constant features: a real-world Learning to Rank scenario? Don’t forget to subscribe to our Newsletter to stay always updated from the Information Retrieval world!

Author

Ilaria Petreti

Ilaria is a Data Scientist passionate about the world of Artificial Intelligence. She loves applying Data Mining and Machine Learnings techniques, strongly believing in the power of Big Data and Digital Transformation.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.