Built-in Pipelines
As you may already know that pipelines are Python scripts that perform code analysis by executing a sequence of steps. ScanCode.io offers the following built-in—available—pipelines:
Pipeline Base Class
- class scanpipe.pipelines.Pipeline
Base class for all pipelines.
- __init__(run)
Load the Run and Project instances.
- classmethod get_steps()
Raises a deprecation warning when the steps are defined as a tuple instead of a classmethod.
- classmethod get_doc()
Returns a docstring.
- classmethod get_graph()
Returns a graph of steps.
- classmethod get_info()
Returns a dictionary of combined data about the current pipeline.
- log(message)
Logs the given message to the current module logger and Run instance.
- save_errors(*exceptions)
Context manager to save specified exceptions as ProjectError in the database.
Example in a Pipeline step:
- with self.save_errors(rootfs.DistroNotFound):
rootfs.scan_rootfs_for_system_packages(self.project, rfs)
Docker Image Analysis
- class scanpipe.pipelines.docker.Docker
A pipeline to analyze Docker images.
- extract_images()
Extracts images from input tarballs.
- extract_layers()
Extracts layers from input images.
- find_images_os_and_distro()
Finds the operating system and distro of input images.
- collect_images_information()
Collects and stores image information in a project.
- collect_and_create_codebase_resources()
Collects and labels all image files as CodebaseResources.
- collect_and_create_system_packages()
Collects installed system packages for each layer based on the distro.
- tag_uninteresting_codebase_resources()
Flags files that don’t belong to any system package.
Docker Windows Image Analysis
- class scanpipe.pipelines.docker_windows.DockerWindows
A pipeline to analyze Windows Docker images.
- tag_known_software_packages()
Flags files from known software packages by checking common install paths.
- tag_uninteresting_codebase_resources()
Flags files that are known/labelled as uninteresting.
- tag_program_files_dirs_as_packages()
Reports the immediate subdirectories of Program Files and Program Files (x86) as packages.
- tag_data_files_with_no_clues()
Flags data files that have no clues on their origin as uninteresting.
Find Vulnerabilities
- class scanpipe.pipelines.find_vulnerabilities.FindVulnerabilities
A pipeline to find vulnerabilities for discovered packages in the VulnerableCode database.
Vulnerability data is stored in the extra_data field of each package.
- check_vulnerablecode_service_availability()
Check if the VulnerableCode service if configured and available.
- lookup_vulnerabilities()
Check for vulnerabilities on each of the project’s discovered package.
Inspect Manifest
- class scanpipe.pipelines.inspect_manifest.InspectManifest
A pipeline to inspect one or more manifest files and resolve its packages.
Supports: - PyPI “requirements.txt” files - SPDX document as JSON “.spdx.json” - AboutCode “.ABOUT” files
- get_manifest_inputs()
Locates all the manifest files from the project’s input/ directory.
- create_packages_from_manifest()
Resolves manifest files into packages.
Load Inventory From Scan
- class scanpipe.pipelines.load_inventory.LoadInventory
A pipeline to load one or more inventory of files and packages from a ScanCode JSON scan results. (Presumably containing resource information and package scan data).
- get_scan_json_inputs()
Locates all the ScanCode JSON scan results from the project’s input/ directory. This includes all files with a .json extension.
- build_inventory_from_scans()
Processes JSON scan results files to populate codebase resources and packages.
Root Filesystem Analysis
- class scanpipe.pipelines.root_filesystems.RootFS
A pipeline to analyze a Linux root filesystem, aka rootfs.
- extract_input_files_to_codebase_directory()
Extracts root filesystem input archives with extractcode.
- find_root_filesystems()
Finds root filesystems in the project’s codebase/.
- collect_rootfs_information()
Collects and stores rootfs information in the project.
- collect_and_create_codebase_resources()
Collects and labels all image files as CodebaseResource.
- collect_and_create_system_packages()
Collects installed system packages for each rootfs based on the distro. The collection of system packages is only available for known distros.
- tag_uninteresting_codebase_resources()
Flags files—not worth tracking—that don’t belong to any system packages.
- tag_empty_files()
Flags empty files.
- scan_for_application_packages()
Scans unknown resources for packages information.
- match_not_analyzed_to_system_packages()
Matches “not-yet-analyzed” files to files already belong to system packages.
- match_not_analyzed_to_application_packages()
Matches “not-yet-analyzed” files to files already belong to application packages.
- scan_for_files()
Scans unknown resources for copyrights, licenses, emails, and urls.
- analyze_scanned_files()
Analyzes single file scan results for completeness.
- tag_not_analyzed_codebase_resources()
Checks for any leftover files for sanity; there should be none.
Scan Codebase
- class scanpipe.pipelines.scan_codebase.ScanCodebase
A pipeline to scan a codebase resource with ScanCode-toolkit.
Input files are copied to the project’s codebase/ directory and are extracted in place before running the scan. Alternatively, the code can be manually copied to the project codebase/ directory.
- copy_inputs_to_codebase_directory()
Copies input files to the project’s codebase/ directory. The code can also be copied there prior to running the Pipeline.
- extract_archives()
Extracts archives with extractcode.
- collect_and_create_codebase_resources()
Collects and create codebase resources.
- tag_empty_files()
Flags empty files.
- scan_for_application_packages()
Scans unknown resources for packages information.
- scan_for_files()
Scans unknown resources for copyrights, licenses, emails, and urls.
Scan Package
- class scanpipe.pipelines.scan_package.ScanPackage
A pipeline to scan a single package archive with ScanCode-toolkit. The output is a summary of the scan results in JSON format.
- get_package_archive_input()
Locates the input package archive in the project’s input/ directory.
- collect_archive_information()
Collects and store information about the input archive in the project.
- extract_archive_to_codebase_directory()
Extracts package archive with extractcode.
- run_scancode()
Scans extracted codebase/ content.
- build_inventory_from_scan()
Processes a JSON Scan results file to populate codebase resources and packages.
- make_summary_from_scan_results()
Builds a summary in JSON format from the generated scan results.