Built-in Pipelines

As you may already know that pipelines are Python scripts that perform code analysis by executing a sequence of steps. ScanCode.io offers the following built-in—available—pipelines:

Pipeline Base Class

class scanpipe.pipelines.Pipeline

Base class for all pipelines.

__init__(run)

Load the Run and Project instances.

classmethod get_steps()

Raises a deprecation warning when the steps are defined as a tuple instead of a classmethod.

classmethod get_doc()

Returns a docstring.

classmethod get_graph()

Returns a graph of steps.

classmethod get_info()

Returns a dictctionary of combined data about the current pipeline.

log(message)

Logs the given message to the current module logger and Run instance.

save_errors(*exceptions)

Context manager to save specified exceptions as ProjectError in the database.

Example in a Pipeline step:

with self.save_errors(rootfs.DistroNotFound):

rootfs.scan_rootfs_for_system_packages(self.project, rfs)

Docker Image Analysis

class scanpipe.pipelines.docker.Docker

A pipeline to analyze Docker images.

extract_images()

Extracts images from input tarballs.

extract_layers()

Extracts layers from input images.

find_images_os_and_distro()

Finds the operating system and distro of input images.

collect_images_information()

Collects and stores image information in a project.

collect_and_create_codebase_resources()

Collects and labels all image files as CodebaseResources.

collect_and_create_system_packages()

Collects installed system packages for each layer based on the distro.

tag_uninteresting_codebase_resources()

Flags files that don’t belong to any system package.

Docker Windows Image Analysis

class scanpipe.pipelines.docker_windows.DockerWindows

A pipeline to analyze Windows Docker images.

tag_known_software_packages()

Flags files from known software packages by checking common install paths.

tag_uninteresting_codebase_resources()

Flags files that are known/labelled as uninteresting.

tag_program_files_dirs_as_packages()

Reports the immediate subdirectories of Program Files and Program Files (x86) as packages.

tag_data_files_with_no_clues()

Flags data files that have no clues on their origin as uninteresting.

Load Inventory From Scan

class scanpipe.pipelines.load_inventory.LoadInventory

A pipeline to load one or more inventory of files and packages from a ScanCode JSON scan results. (Presumably containing resource information and package scan data).

get_scan_json_inputs()

Locates all the ScanCode JSON scan results from the project’s input/ directory. This includes all files with a .json extension.

build_inventory_from_scans()

Processes JSON scan results files to populate codebase resources and packages.

Root Filesystem Analysis

class scanpipe.pipelines.root_filesystems.RootFS

A pipeline to analyze a Linux root filesystem, aka rootfs.

extract_input_files_to_codebase_directory()

Extracts root filesystem input archives with extractcode.

find_root_filesystems()

Finds root filesystems in the project’s codebase/.

collect_rootfs_information()

Collects and stores rootfs information in the project.

collect_and_create_codebase_resources()

Collects and labels all image files as CodebaseResource.

collect_and_create_system_packages()

Collects installed system packages for each rootfs based on the distro. The collection of system packages is only available for known distros.

tag_uninteresting_codebase_resources()

Flags files—not worth tracking—that don’t belong to any system packages.

tag_empty_files()

Flags empty files.

scan_for_application_packages()

Scans unknown resources for packages information.

match_not_analyzed_to_system_packages()

Matches “not-yet-analyzed” files to files already belong to system packages.

match_not_analyzed_to_application_packages()

Matches “not-yet-analyzed” files to files already belong to application packages.

scan_for_files()

Scans unknown resources for copyrights, licenses, emails, and urls.

analyze_scanned_files()

Analyzes single file scan results for completeness.

tag_not_analyzed_codebase_resources()

Checks for any leftover files for sanity; there should be none.

Scan Codebase

class scanpipe.pipelines.scan_codebase.ScanCodebase

A pipeline to scan a codebase resource with ScanCode-toolkit.

Input files are copied to the project’s codebase/ directory and are extracted in place before running the scan. Alternatively, the code can be manually copied to the project codebase/ directory.

copy_inputs_to_codebase_directory()

Copies input files to the project’s codebase/ directory. The code can also be copied there prior to running the Pipeline.

extract_archives()

Extracts archives with extractcode.

collect_and_create_codebase_resources()

Collects and create codebase resources.

tag_empty_files()

Flags empty files.

scan_for_application_packages()

Scans unknown resources for packages information.

scan_for_files()

Scans unknown resources for copyrights, licenses, emails, and urls.

Scan Package

class scanpipe.pipelines.scan_package.ScanPackage

A pipeline to scan a single package archive with ScanCode-toolkit. The output is a summary of the scan results in JSON format.

get_package_archive_input()

Locates the input package archive in the project’s input/ directory.

collect_archive_information()

Collects and store information about the input archive in the project.

extract_archive_to_codebase_directory()

Extracts package archive with extractcode.

run_scancode()

Scans extracted codebase/ content.

build_inventory_from_scan()

Processes a JSON Scan results file to populate codebase resources and packages.

make_summary_from_scan_results()

Builds a summary in JSON format from the generated scan results.