Data Science, Machine Learning, and AI
Contact
Content Hub
Blog Post

How to Scan Your Code and Dependencies in Python

  • Expert Thomas Alcock
  • Date 21. July 2022
  • Topic Data EngineeringData SciencePython
  • Format Blog
  • Category Technology
How to Scan Your Code and Dependencies in Python

Be Safe!

In the age of open-source software projects, attacks on vulnerable software are ever present. Python is the most popular language for Data Science and Engineering and is thus increasingly becoming a target for attacks through malicious libraries. Additionally, public facing applications can be exploited by attacking vulnerabilities in the source code.

For this reason it’s crucial that your code does not contain any CVEs (common vulnerabilities and exposures) or uses other libraries that might be malicious. This is especially true if it’s public facing software, e.g. a web application. At statworx we look for ways to increase the quality of our code by using automated scanning tools. Hence, we’ll discuss the value of two code and package scanners for Python.

Automatic screening

There are numerous tools for scanning code and its dependencies, here I will provide an overview of the most popular tools designed with Python in mind. Such tools fall into one of two categories:

  • Static Application Security Testing (SAST): look for weaknesses in code and vulnerable packages
  • Dynamic Application Security Testing (DAST): look for vulnerabilities that occur at runtime

In what follows I will compare bandit and safety using a small streamlit application I’ve developed. Both tools fall into the category of SAST, since they don’t need the application to run in order to perform their checks. Dynamic application testing is more involved and may be the subject of a future post.

The application

For the sake of context, here’s a brief description of the application: it was designed to visualize the convergence (or lack thereof) in the sampling distributions of random variables drawn from different theoretical probability distributions. Users can choose the distribution (e.g. Log-Normal), set the maximum number of samples and pick different sampling statistics (e.g. mean, standard deviation, etc.).

Bandit

Bandit is an open-source python code scanner that checks for vulnerabilities in code and only in your code. It decomposes the code into its abstract syntax tree and runs plugins against it to check for known weaknesses. Among other tests it performs checks on plain SQL code which could provide an opening for SQL injections, passwords stored in code and hints about common openings for attacks such as use of the pickle library. Bandit is designed for use with CI/CD and throws an exit status of 1 whenever it encounters any issues, thus terminating the pipeline. A report is generated, which includes information about the number of issues separated by confidence and severity according to three levels: low, medium, and high. In this case, bandit finds no obvious security flaws in our code.

Run started:2022-06-10 07:07:25.344619

Test results:
        No issues identified.

Code scanned:
        Total lines of code: 0
        Total lines skipped (#nosec): 0

Run metrics:
        Total issues (by severity):
                Undefined: 0
                Low: 0
                Medium: 0
                High: 0
        Total issues (by confidence):
                Undefined: 0
                Low: 0
                Medium: 0
                High: 0
Files skipped (0):

All the more reason to carefully configure Bandit to use in your project. Sometimes it may raise a flag even though you already know that this would not be a problem at runtime. If, for example, you have a series of unit tests that use pytest and run as part of your CI/CD pipeline Bandit will normally throw an error, since this code uses the assert statement, which is not recommended for code that does not run without the -O flag.

To avoid this behaviour you could:

  1. run scans against all files but exclude the test using the command line interface
  2. create a yaml configuration file to exclude the test

Here’s an example:

# bandit_cfg.yml
skips: ["B101"] # skips the assert check

Then we can run bandit as follows: bandit -c bandit_yml.cfg /path/to/python/files and the unnecessary warnings will not crop up.

Safety

Developed by the team at pyup.io, this package scanner runs against a curated database which consists of manually reviewed records based on publicly available CVEs and changelogs. The package is available for Python >= 3.5 and can be installed for free. By default it uses Safety DB which is freely accessible. Pyup.io also offers paid access to a more frequently updated database.

Running safety check --full-report -r requirements.txt on the package root directory gives us the following output (truncated the sake of readability):

+==============================================================================+
|                                                                              |
|                               /$$$$$$            /$$                         |
|                              /$$__  $$          | $$                         |
|           /$$$$$$$  /$$$$$$ | $$  \__//$$$$$$  /$$$$$$   /$$   /$$           |
|          /$$_____/ |____  $$| $$$$   /$$__  $$|_  $$_/  | $$  | $$           |
|         |  $$$$$$   /$$$$$$$| $$_/  | $$$$$$$$  | $$    | $$  | $$           |
|          \____  $$ /$$__  $$| $$    | $$_____/  | $$ /$$| $$  | $$           |
|          /$$$$$$$/|  $$$$$$$| $$    |  $$$$$$$  |  $$$$/|  $$$$$$$           |
|         |_______/  \_______/|__/     \_______/   \___/   \____  $$           |
|                                                          /$$  | $$           |
|                                                         |  $$$$$$/           |
|  by pyup.io                                              \______/            |
|                                                                              |
+==============================================================================+
| REPORT                                                                       |
| checked 110 packages, using free DB (updated once a month)                   |
+============================+===========+==========================+==========+
| package                    | installed | affected                 | ID       |
+============================+===========+==========================+==========+
| urllib3                    | 1.26.4    | <1.26.5                  | 43975    |
+==============================================================================+
| Urllib3 1.26.5 includes a fix for CVE-2021-33503: An issue was discovered in |
| urllib3 before 1.26.5. When provided with a URL containing many @ characters |
| in the authority component, the authority regular expression exhibits        |
| catastrophic backtracking, causing a denial of service if a URL were passed  |
| as a parameter or redirected to via an HTTP redirect.                        |
| https://github.com/advisories/GHSA-q2q7-5pp4-w6pg                            |
+==============================================================================+

The report includes the number of packages that were checked, the type of database used for reference and information on each vulnerability that was found. In this example an older version of the package urllib3 is affected by a vulnerability which technically could be used by an to perform a denial-of-service attack.

Integration into your workflow

Both bandit and safety are available as GitHub Actions. The stable release of safety also provides integrations for TravisCI and GitLab CI/CD.

Of course, you can always manually install both packages from PyPI on your runner if no ready-made integration like a GitHub action is available. Since both programs can be used from the command line, you could also integrate them into a pre-commit hook locally if using them on your CI/CD platform is not an option.

The CI/CD pipeline for the application above was built with GitHub Actions. After installing the application’s required packages, it runs bandit first and then safety to scan all packages. With all the packages updated, the vulnerability scans pass and the docker image is built.

Package check Code Check

Conclusion

I would strongly recommend using both bandit and safety in your CI/CD pipeline, as they provide security checks for your code and your dependencies. For modern applications manually reviewing every single package your application depends on is simply not feasible, not to mention all of the dependencies these packages have! Thus, automated scanning is inevitable if you want to have some level of awareness about how unsafe your code is.

While bandit scans your code for known exploits, it does not check any of the libraries used in your project. For this, you need safety, as it informs you about known security flaws in the libraries your application depends on. While neither frameworks are completely foolproof, it’s still better to be notified about some CVEs than none at all. This way, you’ll be able to either fix your vulnerable code or upgrade a vulnerable package dependency to a more secure version.

Keeping your code safe and your dependencies trustworthy can ward off potentially devastating attacks on your application. Thomas Alcock Thomas Alcock Thomas Alcock Thomas Alcock Thomas Alcock Thomas Alcock Thomas Alcock Thomas Alcock Thomas Alcock Thomas Alcock

Learn more!

As one of the leading companies in the field of data science, machine learning, and AI, we guide you towards a data-driven future. Learn more about statworx and our motivation.
About us