Built-in Pipelines
As you may already know that pipelines are Python scripts that perform code analysis by executing a sequence of steps. ScanCode.io offers the following built-in—available—pipelines:
Pipeline Base Class
- class scanpipe.pipelines.Pipeline
Base class for all pipelines.
- __init__(run)
Load the Run and Project instances.
- classmethod get_steps()
Raise a deprecation warning when the steps are defined as a tuple instead of a classmethod.
- classmethod get_doc()
Get the doc string of this pipeline.
- classmethod get_graph()
Return a graph of steps.
- classmethod get_info()
Get a dictionary of combined information data about this pipeline.
- classmethod get_summary()
Get the doc string summary.
- log(message)
Log the given message to the current module logger and Run instance.
- execute()
Execute each steps in the order defined on this pipeline class.
- add_error(error)
Create a ProjectError record on the current project.
- save_errors(*exceptions)
Context manager to save specified exceptions as ProjectError in the database.
Example in a Pipeline step:
- with self.save_errors(rootfs.DistroNotFound):
rootfs.scan_rootfs_for_system_packages(self.project, rfs)
Deploy To Develop
- class scanpipe.pipelines.deploy_to_develop.DeployToDevelop
Relate deploy and develop code trees.
This pipeline is expecting 2 archive files with “from-” and “to-” filename prefixes as inputs: - “from-[FILENAME]” archive containing the development source code - “to-[FILENAME]” archive containing the deployment compiled code
- get_inputs()
Locate the
from
andto
archives.
- extract_inputs_to_codebase_directory()
Extract input files to the project’s codebase/ directory.
- extract_archives_in_place()
Extract recursively from* and to* archives in place with extractcode.
- collect_and_create_codebase_resources()
Collect and create codebase resources.
- flag_empty_and_ignored_files()
Flag empty and ignored files using names and extensions.
- map_checksum()
Map using SHA1 checksum.
- find_java_packages()
Find the java package of the .java source files.
- map_java_to_class()
Map a .class compiled file to its .java source.
- flag_to_meta_inf_files()
Flag all
META-INF/*
file of theto/
directory as ignored.
- map_jar_to_source()
Map .jar files to their related source directory.
- map_javascript()
Map a packed or minified JavaScript, TypeScript, CSS and SCSS to its source.
- match_purldb()
Match selected files by extension in PurlDB.
- map_path()
Map using path similarities.
- flag_mapped_resources_and_ignored_directories()
Flag all codebase resources that were mapped during the pipeline.
- scan_mapped_from_for_files()
Scan mapped
from/
files for copyrights, licenses, emails, and urls.
Docker Image Analysis
- class scanpipe.pipelines.docker.Docker
Analyze Docker images.
- extract_images()
Extract images from input tarballs.
- extract_layers()
Extract layers from input images.
- find_images_os_and_distro()
Find the operating system and distro of input images.
- collect_images_information()
Collect and store image information in a project.
- collect_and_create_codebase_resources()
Collect and labels all image files as CodebaseResources.
- collect_and_create_system_packages()
Collect installed system packages for each layer based on the distro.
- tag_uninteresting_codebase_resources()
Flag files that don’t belong to any system package.
Docker Windows Image Analysis
- class scanpipe.pipelines.docker_windows.DockerWindows
Analyze Windows Docker images.
- tag_known_software_packages()
Flag files from known software packages by checking common install paths.
- tag_uninteresting_codebase_resources()
Flag files that are known/labelled as uninteresting.
- tag_program_files_dirs_as_packages()
Report the immediate subdirectories of
Program Files
andProgram Files (x86)
as packages.
- tag_data_files_with_no_clues()
Flag data files that have no clues on their origin as uninteresting.
Find Vulnerabilities
- class scanpipe.pipelines.find_vulnerabilities.FindVulnerabilities
Find vulnerabilities for discovered packages in the VulnerableCode database.
Vulnerability data is stored in the extra_data field of each package.
- check_vulnerablecode_service_availability()
Check if the VulnerableCode service if configured and available.
- lookup_vulnerabilities()
Check for vulnerabilities on each of the project’s discovered package.
Inspect Manifest
- class scanpipe.pipelines.inspect_manifest.InspectManifest
Inspect one or more manifest files and resolve its packages.
Supports: - BOM: SPDX document, CycloneDX BOM, AboutCode ABOUT file - Python: requirements.txt, setup.py, setup.cfg, Pipfile.lock - JavaScript: yarn.lock lockfile, npm package-lock.json lockfile - Java: Java JAR MANIFEST.MF, Gradle build script - Ruby: RubyGems gemspec manifest, RubyGems Bundler Gemfile.lock - Rust: Rust Cargo.lock dependencies lockfile, Rust Cargo.toml package manifest - PHP: PHP composer lockfile, PHP composer manifest - NuGet: nuspec package manifest - Dart: pubspec manifest, pubspec lockfile - OS: FreeBSD compact package manifest, Debian installed packages database
Full list available at https://scancode-toolkit.readthedocs.io/en/ doc-update-licenses/reference/available_package_parsers.html
- get_manifest_inputs()
Locate all the manifest files from the project’s input/ directory.
- get_packages_from_manifest()
Get packages data from manifest files.
- create_resolved_packages()
Create the resolved packages and their dependencies in the database.
Load Inventory From Scan
- class scanpipe.pipelines.load_inventory.LoadInventory
Load JSON/XLSX inventory files generated with ScanCode-toolkit or ScanCode.io.
Supported format are ScanCode-toolkit JSON scan results, ScanCode.io JSON output, and ScanCode.io XLSX output.
An inventory is composed of packages, dependencies, resources, and relations.
- get_inputs()
Locate all the supported input files from the project’s input/ directory.
- build_inventory_from_scans()
Process JSON scan results files to populate packages, dependencies, and resources.
Root Filesystem Analysis
- class scanpipe.pipelines.root_filesystems.RootFS
Analyze a Linux root filesystem, aka rootfs.
- extract_input_files_to_codebase_directory()
Extract root filesystem input archives with extractcode.
- find_root_filesystems()
Find root filesystems in the project’s codebase/.
- collect_rootfs_information()
Collect and stores rootfs information in the project.
- collect_and_create_codebase_resources()
Collect and label all image files as CodebaseResource.
- collect_and_create_system_packages()
Collect installed system packages for each rootfs based on the distro. The collection of system packages is only available for known distros.
- tag_uninteresting_codebase_resources()
Flag files—not worth tracking—that don’t belong to any system packages.
- tag_empty_files()
Flag empty files.
- scan_for_application_packages()
Scan unknown resources for packages information.
- match_not_analyzed_to_system_packages()
Match files with “not-yet-analyzed” status to files already belonging to system packages.
- match_not_analyzed_to_application_packages()
Match files with “not-yet-analyzed” status to files already belonging to application packages.
- scan_for_files()
Scan unknown resources for copyrights, licenses, emails, and urls.
- analyze_scanned_files()
Analyze single file scan results for completeness.
- tag_not_analyzed_codebase_resources()
Check for any leftover files for sanity; there should be none.
Scan Codebase
- class scanpipe.pipelines.scan_codebase.ScanCodebase
Scan a codebase with ScanCode-toolkit.
If the codebase consists of several packages and dependencies, it will try to resolve and scan those too.
Input files are copied to the project’s codebase/ directory and are extracted in place before running the scan. Alternatively, the code can be manually copied to the project codebase/ directory.
- copy_inputs_to_codebase_directory()
Copy input files to the project’s codebase/ directory. The code can also be copied there prior to running the Pipeline.
- extract_archives()
Extract archives with extractcode.
- collect_and_create_codebase_resources()
Collect and create codebase resources.
- tag_empty_files()
Flag empty files.
- scan_for_application_packages()
Scan unknown resources for packages information.
- scan_for_files()
Scan unknown resources for copyrights, licenses, emails, and urls.
Scan Package
- class scanpipe.pipelines.scan_package.ScanPackage
Scan a single package archive with ScanCode-toolkit.
The output is a summary of the scan results in JSON format.
- get_package_archive_input()
Locate the input package archive in the project’s input/ directory.
- collect_archive_information()
Collect and store information about the input archive in the project.
- extract_archive_to_codebase_directory()
Extract package archive with extractcode.
- run_scancode()
Scan extracted codebase/ content.
- load_inventory_from_toolkit_scan()
Process a JSON Scan results to populate codebase resources and packages.
- make_summary_from_scan_results()
Build a summary in JSON format from the generated scan results.