Built-in Pipelines
Pipelines in ScanCode.io are Python scripts that facilitate code analysis by executing a sequence of steps. The platform provides the following built-in pipelines:
Tip
If you are unsure which pipeline suits your requirements best, check out the Which pipeline should I use? section for guidance.
Pipeline Base Class
- class scanpipe.pipelines.Pipeline
Alias for the ProjectPipeline class.
Analyse Docker Image
- class scanpipe.pipelines.docker.Docker
Analyze Docker images.
- extract_images()
Extract images from input tarballs.
- extract_layers()
Extract layers from input images.
- find_images_os_and_distro()
Find the operating system and distro of input images.
- collect_images_information()
Collect and store image information in a project.
- collect_and_create_codebase_resources()
Collect and labels all image files as CodebaseResources.
- collect_and_create_system_packages()
Collect installed system packages for each layer based on the distro.
- flag_uninteresting_codebase_resources()
Flag files that don’t belong to any system package.
Analyze Root Filesystem or VM Image
- class scanpipe.pipelines.root_filesystem.RootFS
Analyze a Linux root filesystem, also known as rootfs.
- extract_input_files_to_codebase_directory()
Extract root filesystem input archives with extractcode.
- find_root_filesystems()
Find root filesystems in the project’s codebase/.
- collect_rootfs_information()
Collect and stores rootfs information on the project.
- collect_and_create_codebase_resources()
Collect and label all image files as CodebaseResource.
- collect_and_create_system_packages()
Collect installed system packages for each rootfs based on the distro. The collection of system packages is only available for known distros.
- flag_uninteresting_codebase_resources()
Flag files—not worth tracking—that don’t belong to any system packages.
- scan_for_application_packages()
Scan unknown resources for packages information.
- match_not_analyzed_to_system_packages()
Match files with “not-yet-analyzed” status to files already belonging to system packages.
- match_not_analyzed_to_application_packages()
Match files with “not-yet-analyzed” status to files already belonging to application packages.
- scan_for_files()
Scan unknown resources for copyrights, licenses, emails, and urls.
- analyze_scanned_files()
Analyze single file scan results for completeness.
- flag_not_analyzed_codebase_resources()
Check for any leftover files for sanity; there should be none.
Analyse Docker Windows Image
- class scanpipe.pipelines.docker_windows.DockerWindows
Analyze Windows Docker images.
- flag_known_software_packages()
Flag files from known software packages by checking common install paths.
- flag_uninteresting_codebase_resources()
Flag files that are known/labelled as uninteresting.
- flag_program_files_dirs_as_packages()
Report the immediate subdirectories of
Program Files
andProgram Files (x86)
as packages.
- flag_data_files_with_no_clues()
Flag data files that have no clues on their origin as uninteresting.
Collect string with Xgettext (addon)
Collect symbols, string and comments with Pygments (addon)
- class scanpipe.pipelines.collect_symbols_pygments.CollectSymbolsPygments
Collect source symbols, string literals and comments with Pygments.
- collect_and_store_pygments_symbols_and_strings()
Collect symbols, strings and comments from codebase files using pygments and store them in the extra data field.
Collect symbols and string with Tree-Sitter (addon)
Enrich With PurlDB (addon)
Warning
This pipeline requires access to a PurlDB service. Refer to PURLDB to configure access to PurlDB in your ScanCode.io instance.
Find Vulnerabilities (addon)
Warning
This pipeline requires access to a VulnerableCode database. Refer to VULNERABLECODE to configure access to VulnerableCode in your ScanCode.io instance.
- class scanpipe.pipelines.find_vulnerabilities.FindVulnerabilities
Find vulnerabilities for packages and dependencies in the VulnerableCode database.
Vulnerability data is stored on each package and dependency instance.
- check_vulnerablecode_service_availability()
Check if the VulnerableCode service if configured and available.
- lookup_packages_vulnerabilities()
Check for vulnerabilities for each of the project’s discovered package.
- lookup_dependencies_vulnerabilities()
Check for vulnerabilities for each of the project’s discovered dependency.
Inspect ELF Binaries (addon)
Inspect Packages
- class scanpipe.pipelines.inspect_packages.InspectPackages
Inspect a codebase for packages and pre-resolved dependencies.
This pipeline inspects a codebase for application packages and their dependencies using package manifests and dependency lockfiles. It does not resolve dependencies, it does instead collect already pre-resolved dependencies from lockfiles, and direct dependencies (possibly not resolved) as found in package manifests’ dependency sections.
See documentation for the list of supported package manifests and dependency lockfiles: https://scancode-toolkit.readthedocs.io/en/stable/reference/available_package_parsers.html
- scan_for_application_packages()
Scan resources for package information to add DiscoveredPackage and DiscoveredDependency objects from detected package data.
- resolve_dependencies()
Create packages and dependency relationships from lockfiles or manifests containing pre-resolved dependencies.
Load Inventory
- class scanpipe.pipelines.load_inventory.LoadInventory
Load JSON/XLSX inventory files generated with ScanCode-toolkit or ScanCode.io.
Supported format are ScanCode-toolkit JSON scan results, ScanCode.io JSON output, and ScanCode.io XLSX output.
An inventory is composed of packages, dependencies, resources, and relations.
- get_inputs()
Locate all the supported input files from the project’s input/ directory.
- build_inventory_from_scans()
Process JSON scan results files to populate packages, dependencies, and resources.
Load SBOM
- class scanpipe.pipelines.load_sbom.LoadSBOM
Load package data from one or more SBOMs.
Supported SBOMs: - SPDX document - CycloneDX BOM Other formats: - AboutCode .ABOUT files for package curations.
- get_sbom_inputs()
Locate all the SBOMs among the codebase resources.
- get_packages_from_sboms()
Get packages data from SBOMs.
- create_packages_from_sboms()
Create the packages declared in the SBOMs.
- create_dependencies_from_sboms()
Create the dependency relationship declared in the SBOMs.
Resolve Dependencies
- class scanpipe.pipelines.resolve_dependencies.ResolveDependencies
Resolve dependencies from package manifests and lockfiles.
This pipeline collects lockfiles and manifest files that contain dependency requirements, and resolves these to a concrete set of package versions.
Supports resolving packages for: - Python: using python-inspector, using requirements.txt and setup.py manifests as inputs
- get_manifest_inputs()
Locate package manifest files with a supported package resolver.
- scan_for_application_packages()
Scan and assemble application packages from package manifests and lockfiles.
- create_packages_and_dependencies()
Create the statically resolved packages and their dependencies in the database.
- get_packages_from_manifest()
Resolve package data from lockfiles/requirement files with package requirements/dependenices.
- create_resolved_packages()
Create the dynamically resolved packages and their dependencies in the database.
Map Deploy To Develop
Warning
This pipeline requires input files to be tagged with the following:
“from”: For files related to the source code (also known as “develop”).
“to”: For files related to the build/binaries (also known as “deploy”).
Tagging your input files varies based on whether you are using the REST API, UI, or CLI. Refer to the How to tag input files? section for guidance.
- class scanpipe.pipelines.deploy_to_develop.DeployToDevelop
Establish relationships between two code trees: deployment and development.
This pipeline requires a minimum of two archive files, each properly tagged with:
from for archives containing the development source code.
to for archives containing the deployment compiled code.
When using download URLs as inputs, the “from” and “to” tags can be provided by adding a “#from” or “#to” fragment at the end of the download URLs.
When uploading local files:
User Interface: Use the “Edit flag” link in the “Inputs” panel of the Project details view.
REST API: Utilize the “upload_file_tag” field in addition to the “upload_file”.
Command Line Interface: Tag uploaded files using the “filename:tag” syntax, for example,
--input-file path/filename:tag
.
- get_inputs()
Locate the
from
andto
input files.
- extract_inputs_to_codebase_directory()
Extract input files to the project’s codebase/ directory.
- collect_and_create_codebase_resources()
Collect and create codebase resources.
- fingerprint_codebase_directories()
Compute directory fingerprints for matching
- flag_whitespace_files()
Flag whitespace files with size less than or equal to 100 byte as ignored.
- map_about_files()
Map
from/
.ABOUT files to their relatedto/
resources.
- map_checksum()
Map using SHA1 checksum.
- match_archives_to_purldb()
Match selected package archives by extension to PurlDB.
- find_java_packages()
Find the java package of the .java source files.
- map_java_to_class()
Map a .class compiled file to its .java source.
- map_jar_to_source()
Map .jar files to their related source directory.
- map_javascript()
Map a packed or minified JavaScript, TypeScript, CSS and SCSS to its source.
- map_elf()
Map ELF binaries to their sources.
- map_go()
Map Go binaries to their sources.
- match_directories_to_purldb()
Match selected directories in PurlDB.
- match_resources_to_purldb()
Match selected files by extension in PurlDB.
- map_javascript_post_purldb_match()
Map minified javascript file based on existing PurlDB match.
- map_javascript_path()
Map javascript file based on path.
- map_javascript_colocation()
Map JavaScript files based on neighborhood file mapping.
- map_thirdparty_npm_packages()
Map thirdparty package using package.json metadata.
- map_path()
Map using path similarities.
- flag_mapped_resources_archives_and_ignored_directories()
Flag all codebase resources that were mapped during the pipeline.
- perform_house_keeping_tasks()
- On deployed side
- PurlDB match files with
no-java-source
and empty status, if no match is found update status to
requires-review
.
- PurlDB match files with
Update status for uninteresting files.
Flag the dangling legal files for review.
- On devel side
Update status for not deployed files.
- match_purldb_resources_post_process()
Choose the best package for PurlDB matched resources.
- remove_packages_without_resources()
Remove packages without any resources.
- scan_unmapped_to_files()
Scan unmapped/matched
to/
files for copyrights, licenses, emails, and urls and update the status to requires-review.
- scan_mapped_from_for_files()
Scan mapped
from/
files for copyrights, licenses, emails, and urls.
- create_local_files_packages()
Create local-files packages for codebase resources not part of a package.
- flag_deployed_from_resources_with_missing_license()
Update the status for deployed from files with missing license.
Match to MatchCode (addon)
Warning
This pipeline requires access to a MatchCode.io service. Refer to MATCHCODE.IO to configure access to MatchCode.io in your ScanCode.io instance.
- class scanpipe.pipelines.match_to_matchcode.MatchToMatchCode
Match the codebase resources of a project against MatchCode.io to identify packages.
This process involves:
Generating a JSON scan of the project codebase
Transmitting it to MatchCode.io and awaiting match results
Creating discovered packages from the package data obtained
Associating the codebase resources with those discovered packages
Currently, MatchCode.io can only match for archives, directories, and files from Maven and npm Packages.
This pipeline requires a MatchCode.io instance to be configured and available. There is currently no public instance of MatchCode.io. Reach out to nexB, Inc. for other arrangements.
- check_matchcode_service_availability()
Check if the MatchCode.io service if configured and available.
- send_project_json_to_matchcode()
Create a JSON scan of the project Codebase and send it to MatchCode.io.
- poll_matching_results()
Wait until the match results are ready by polling the match run status.
- create_packages_from_match_results()
Create DiscoveredPackages from match results.
Populate PurlDB (addon)
Warning
This pipeline requires access to a PurlDB service. Refer to PURLDB to configure access to PurlDB in your ScanCode.io instance.
Scan Codebase
- class scanpipe.pipelines.scan_codebase.ScanCodebase
Scan a codebase for application packages, licenses, and copyrights.
This pipeline does not further scan the files contained in a package for license and copyrights and only considers the declared license of a package. It does not scan for system (Linux distro) packages.
- copy_inputs_to_codebase_directory()
Copy input files to the project’s codebase/ directory. The code can also be copied there prior to running the Pipeline.
- collect_and_create_codebase_resources()
Collect and create codebase resources.
- scan_for_application_packages()
Scan unknown resources for packages information.
- scan_for_files()
Scan unknown resources for copyrights, licenses, emails, and urls.
Scan For Virus
Scan Single Package
- class scanpipe.pipelines.scan_single_package.ScanSinglePackage
Scan a single package archive (or package manifest file).
This pipeline scans a single package for package metadata, declared dependencies, licenses, license clarity score and copyrights.
The output is a summary of the scan results in JSON format.
- get_package_input()
Locate the package input in the project’s input/ directory.
- collect_input_information()
Collect and store information about the project input.
- extract_input_to_codebase_directory()
Copy or extract input to project codebase/ directory.
- run_scan()
Scan extracted codebase/ content.
- load_inventory_from_toolkit_scan()
Process a JSON Scan results to populate codebase resources and packages.
- make_summary_from_scan_results()
Build a summary in JSON format from the generated scan results.