Built-in Pipelines
Pipelines in ScanCode.io are Python scripts that facilitate code analysis by executing a sequence of steps. The platform provides the following built-in pipelines:
Note
Some pipelines have optional steps which are enabled only when they are selected explicitly.
Tip
If you are unsure which pipeline suits your requirements best, check out the Which pipeline should I use? section for guidance.
Pipeline Base Class
- class scanpipe.pipelines.Pipeline
Alias for the ProjectPipeline class.
Analyse Docker Image
- class scanpipe.pipelines.analyze_docker.Docker
Analyze Docker images.
- extract_images()
Extract images from input tarballs.
- extract_layers()
Extract layers from input images.
- find_images_os_and_distro()
Find the operating system and distro of input images.
- collect_images_information()
Collect and store image information in a project.
- collect_and_create_codebase_resources()
Collect and labels all image files as CodebaseResources.
- collect_and_create_system_packages()
Collect installed system packages for each layer based on the distro.
- flag_uninteresting_codebase_resources()
Flag files that don’t belong to any system package.
Analyze Root Filesystem or VM Image
- class scanpipe.pipelines.analyze_root_filesystem.RootFS
Analyze a Linux root filesystem, also known as rootfs.
- extract_input_files_to_codebase_directory()
Extract root filesystem input archives with extractcode.
- find_root_filesystems()
Find root filesystems in the project’s codebase/.
- collect_rootfs_information()
Collect and stores rootfs information on the project.
- collect_and_create_codebase_resources()
Collect and label all image files as CodebaseResource.
- collect_and_create_system_packages()
Collect installed system packages for each rootfs based on the distro. The collection of system packages is only available for known distros.
- flag_uninteresting_codebase_resources()
Flag files—not worth tracking—that don’t belong to any system packages.
- scan_for_application_packages()
Scan unknown resources for packages information.
- match_not_analyzed_to_system_packages()
Match files with “not-yet-analyzed” status to files already belonging to system packages.
- match_not_analyzed_to_application_packages()
Match files with “not-yet-analyzed” status to files already belonging to application packages.
- scan_for_files()
Scan unknown resources for copyrights, licenses, emails, and urls.
- collect_and_create_license_detections()
Collect and create unique license detections from resources and package data.
- analyze_scanned_files()
Analyze single file scan results for completeness.
- flag_not_analyzed_codebase_resources()
Check for any leftover files for sanity; there should be none.
Analyse Docker Windows Image
- class scanpipe.pipelines.analyze_docker_windows.DockerWindows
Analyze Windows Docker images.
- flag_known_software_packages()
Flag files from known software packages by checking common install paths.
- flag_uninteresting_codebase_resources()
Flag files that are known/labelled as uninteresting.
- flag_program_files_dirs_as_packages()
Report the immediate subdirectories of
Program FilesandProgram Files (x86)as packages.
- flag_data_files_with_no_clues()
Flag data files that have no clues on their origin as uninteresting.
Benchmark PURLs (addon)
To check an SBOM against a list of expected Package URLs (PURLs):
Create a new project and provide two inputs:
The SBOM file you want to check.
A list of expected PURLs in a
*-purls.txtfile with one PURL per line.Tip
You may also flag any filename using the
purlsinput tag.
Run the pipelines:
Select and run the
load_sbompipeline to load the SBOM.Run the
benchmark_purlspipeline to validate against the expected PURLs.
Download the results from the “output” section of the project.
The output file contains only the differences between the discovered PURLs and the expected PURLs:
Lines starting with
-are missing from the project.Lines starting with
+are unexpected in the project.
Note
The load_sbom pipeline is provided as an example to benchmark external
tools using SBOMs as inputs. You can also run benchmark_purls directly
after any ScanCode.io pipeline to validate the discovered PURLs.
Tip
You can provide multiple expected PURLs files.
- class scanpipe.pipelines.benchmark_purls.BenchmarkPurls
Validate discovered project packages against a reference list of expected PURLs.
The expected PURLs must be provided as a .txt file with one PURL per line. Input files are recognized if:
They are tagged with “purls”, or
Their filename ends with “purls.txt” (e.g., “expected_purls.txt”).
- get_expected_purls()
Load the expected PURLs defined in the project inputs.
- compare_purls()
Run the PURLs diff and write the results to a project output file.
Collect string with Xgettext (addon)
Collect symbols, string and comments with Pygments (addon)
- class scanpipe.pipelines.collect_symbols_pygments.CollectSymbolsPygments
Collect source symbols, string literals and comments with Pygments.
- collect_and_store_pygments_symbols_and_strings()
Collect symbols, strings and comments from codebase files using pygments and store them in the extra data field.
Collect symbols and string with Tree-Sitter (addon)
Enrich With PurlDB (addon)
Warning
This pipeline requires access to a PurlDB service. Refer to PURLDB to configure access to PurlDB in your ScanCode.io instance.
Find Vulnerabilities (addon)
Warning
This pipeline requires access to a VulnerableCode database. Refer to VULNERABLECODE to configure access to VulnerableCode in your ScanCode.io instance.
- class scanpipe.pipelines.find_vulnerabilities.FindVulnerabilities
Find vulnerabilities for packages and dependencies in the VulnerableCode database.
Vulnerability data is stored on each package and dependency instance.
- check_vulnerablecode_service_availability()
Check if the VulnerableCode service if configured and available.
- lookup_packages_vulnerabilities()
Check for vulnerabilities for each of the project’s discovered package.
- lookup_dependencies_vulnerabilities()
Check for vulnerabilities for each of the project’s discovered dependency.
Inspect ELF Binaries (addon)
Inspect Packages
- class scanpipe.pipelines.inspect_packages.InspectPackages
Inspect a codebase for packages and pre-resolved dependencies.
This pipeline inspects a codebase for application packages and their dependencies using package manifests and dependency lockfiles. It does not resolve dependencies, it does instead collect already pre-resolved dependencies from lockfiles, and direct dependencies (possibly not resolved) as found in package manifests’ dependency sections.
See documentation for the list of supported package manifests and dependency lockfiles: https://scancode-toolkit.readthedocs.io/en/stable/reference/available_package_parsers.html
- scan_binaries_for_package()
Optional step: Compiled
Scan compiled binaries for package and dependency related data’ Currently supported compiled binaries: Go, Rust.
- scan_for_application_packages()
Scan resources for package information to add DiscoveredPackage and DiscoveredDependency objects from detected package data.
- resolve_dependencies()
Optional step: StaticResolver
Create packages and dependency relationships from lockfiles or manifests containing pre-resolved dependencies.
Load Inventory
- class scanpipe.pipelines.load_inventory.LoadInventory
Load JSON/XLSX inventory files generated with ScanCode-toolkit or ScanCode.io.
Supported format are ScanCode-toolkit JSON scan results, ScanCode.io JSON output, and ScanCode.io XLSX output.
An inventory is composed of packages, dependencies, resources, and relations.
- get_inputs()
Locate all the supported input files from the project’s input/ directory.
- build_inventory_from_scans()
Process JSON scan results files to populate packages, dependencies, and resources.
Load SBOM
- class scanpipe.pipelines.load_sbom.LoadSBOM
Load package data from one or more SBOMs.
Supported SBOMs: - SPDX document - CycloneDX BOM Other formats: - AboutCode .ABOUT files for package curations.
- get_sbom_inputs()
Locate all the SBOMs among the codebase resources.
- get_data_from_sboms()
Get data from SBOMs.
- create_packages_from_sboms()
Create the packages declared in the SBOMs.
- create_dependencies_from_sboms()
Create the dependency relationship declared in the SBOMs.
Resolve Dependencies
- class scanpipe.pipelines.resolve_dependencies.ResolveDependencies
Resolve dependencies from package manifests and lockfiles.
This pipeline collects lockfiles and manifest files that contain dependency requirements, and resolves these to a concrete set of package versions.
Supports resolving packages for: - Python: using python-inspector, using requirements.txt and setup.py manifests as inputs
- get_manifest_inputs()
Locate package manifest files with a supported package resolver.
- scan_for_application_packages()
Optional step: StaticResolver
Scan and assemble application packages from package manifests and lockfiles.
- create_packages_and_dependencies()
Optional step: StaticResolver
Create the statically resolved packages and their dependencies in the database.
- get_packages_from_manifest()
Optional step: DynamicResolver
Resolve package data from lockfiles/requirement files with package requirements/dependencies.
- create_resolved_packages()
Optional step: DynamicResolver
Create the dynamically resolved packages and their dependencies in the database.
Map Deploy To Develop
Warning
This pipeline requires input files to be tagged with the following:
“from”: For files related to the source code (also known as “develop”).
“to”: For files related to the build/binaries (also known as “deploy”).
Tagging your input files varies based on whether you are using the REST API, UI, or CLI. Refer to the How to tag input files? section for guidance.
- class scanpipe.pipelines.deploy_to_develop.DeployToDevelop
Establish relationships between two code trees: deployment and development.
This pipeline requires a minimum of two archive files, each properly tagged with:
from for archives containing the development source code.
to for archives containing the deployment compiled code.
When using download URLs as inputs, the “from” and “to” tags can be provided by adding a “#from” or “#to” fragment at the end of the download URLs.
When uploading local files:
User Interface: Use the “Edit flag” link in the “Inputs” panel of the Project details view.
REST API: Utilize the “upload_file_tag” field in addition to the “upload_file”.
Command Line Interface: Tag uploaded files using the “filename:tag” syntax, for example,
--input-file path/filename:tag.
- get_inputs()
Locate the
fromandtoinput files.
- extract_inputs_to_codebase_directory()
Extract input files to the project’s codebase/ directory.
- collect_and_create_codebase_resources()
Collect and create codebase resources.
- fingerprint_codebase_directories()
Compute directory fingerprints for matching
- flag_whitespace_files()
Flag whitespace files with size less than or equal to 100 byte as ignored.
- load_ecosystem_config()
Load ecosystem specific configurations for d2d steps for selected options.
- map_ruby()
Optional step: Ruby
Load Ruby specific configurations for d2d steps.
- map_about_files()
Map
from/.ABOUT files to their relatedto/resources.
- map_checksum()
Map using SHA1 checksum.
- match_archives_to_purldb()
Match selected package archives by extension to PurlDB.
- find_java_packages()
Optional step: Java
Find the java package of the .java source files.
- map_java_to_class()
Optional step: Java
Map a .class compiled file to its .java source.
- map_jar_to_java_source()
Optional step: Java
Map .jar files to their related source directory.
- find_scala_packages()
Optional step: Scala
Find the java package of the .scala source files.
- map_scala_to_class()
Optional step: Scala
Map a .class compiled file to its .scala source.
- map_jar_to_scala_source()
Optional step: Scala
Map .jar files to their related source directory.
- find_kotlin_packages()
Optional step: Kotlin
Find the java package of the kotlin source files.
- map_kotlin_to_class()
Optional step: Kotlin
Map a .class compiled file to its kotlin source.
- map_jar_to_kotlin_source()
Optional step: Kotlin
Map .jar files to their related source directory.
- find_grammar_packages()
Optional step: Antlr
Find the java package of the .g/.g4 source files.
- map_grammar_to_class()
Optional step: Antlr
Map a .class compiled file to its .g/.g4 source.
- map_jar_to_grammar_source()
Optional step: Antlr
Map .jar files to their related source directory.
- find_groovy_packages()
Optional step: Groovy
Find the package of the .groovy source files.
- map_groovy_to_class()
Optional step: Groovy
Map a .class compiled file to its .groovy source.
- map_jar_to_groovy_source()
Optional step: Groovy
Map .jar files to their related source directory.
- find_aspectj_packages()
Optional step: AspectJ
Find the package of the .aj source files.
- map_aspectj_to_class()
Optional step: AspectJ
Map a .class compiled file to its .aj source.
- map_jar_to_aspectj_source()
Optional step: AspectJ
Map .jar files to their related source directory.
- find_clojure_packages()
Optional step: Clojure
Find the package of the .clj source files.
- map_clojure_to_class()
Optional step: Clojure
Map a .class compiled file to its .clj source.
- map_jar_to_clojure_source()
Optional step: Clojure
Map .jar files to their related source directory.
- find_xtend_packages()
Optional step: Xtend
Find the java package of the xtend source files.
- map_xtend_to_class()
Optional step: Xtend
Map a .class compiled file to its xtend source.
- map_javascript()
Optional step: JavaScript
Map a packed or minified JavaScript, TypeScript, CSS and SCSS to its source.
- map_javascript_symbols()
Optional step: JavaScript
Map deployed JavaScript, TypeScript to its sources using symbols.
- map_javascript_strings()
Optional step: JavaScript
Map deployed JavaScript, TypeScript to its sources using string literals.
- get_symbols_from_binaries()
Extract symbols from Elf, Mach0 and windows binaries for mapping.
- map_elf()
Optional step: Elf
Map ELF binaries to their sources using dwarf paths and symbols.
- map_macho()
Optional step: MacOS
Map mach0 binaries to their sources using symbols.
- map_winpe()
Optional step: Windows
Map winpe binaries to their sources using symbols.
- map_go()
Optional step: Go
Map Go binaries to their sources using paths and symbols.
- map_rust()
Optional step: Rust
Map Rust binaries to their sources using symbols.
- map_python()
Optional step: Python
Map binaries from Python packages to their sources using dwarf paths and symbols.
- match_directories_to_purldb()
Match selected directories in PurlDB.
- match_resources_to_purldb()
Match selected files by extension in PurlDB.
- map_javascript_post_purldb_match()
Optional step: JavaScript
Map minified javascript file based on existing PurlDB match.
- map_javascript_path()
Optional step: JavaScript
Map javascript file based on path.
- map_javascript_colocation()
Optional step: JavaScript
Map JavaScript files based on neighborhood file mapping.
- map_thirdparty_npm_packages()
Optional step: JavaScript
Map thirdparty package using package.json metadata.
- map_path()
Map using path similarities.
- flag_mapped_resources_archives_and_ignored_directories()
Flag all codebase resources that were mapped during the pipeline.
- perform_house_keeping_tasks()
- On deployed side
Ignore specific files based on ecosystem based configurations.
- PurlDB match files with
no-java-sourceand empty status, if no match is found update status to
requires-review.
- PurlDB match files with
Update status for uninteresting files.
Flag the dangling legal files for review.
- On devel side
Update status for not deployed files.
- match_purldb_resources_post_process()
Choose the best package for PurlDB matched resources.
- remove_packages_without_resources()
Remove packages without any resources.
- scan_ignored_to_files()
Scan status=”ignored-from-config”
to/files for copyrights, licenses, emails, and urls. These files are ignored based on ecosystem specific configurations. These files are not used for the D2D purpose, but scanning them may provide useful information about the deployed codebase.
- scan_unmapped_to_files()
Scan unmapped/matched
to/files for copyrights, licenses, emails, and urls and update the status to requires-review.
- scan_mapped_from_for_files()
Scan mapped
from/files for copyrights, licenses, emails, and urls.
- collect_and_create_license_detections()
Collect and create unique license detections from resources and package data.
- create_local_files_packages()
Create local-files packages for codebase resources not part of a package.
- flag_deployed_from_resources_with_missing_license()
Update the status for deployed from files with missing license.
Match to MatchCode (addon)
Warning
This pipeline requires access to a MatchCode.io service. Refer to MATCHCODE.IO to configure access to MatchCode.io in your ScanCode.io instance.
- class scanpipe.pipelines.match_to_matchcode.MatchToMatchCode
Match the codebase resources of a project against MatchCode.io to identify packages.
This process involves:
Generating a JSON scan of the project codebase
Transmitting it to MatchCode.io and awaiting match results
Creating discovered packages from the package data obtained
Associating the codebase resources with those discovered packages
Currently, MatchCode.io can only match for archives, directories, and files from Maven and npm Packages.
This pipeline requires a MatchCode.io instance to be configured and available. There is currently no public instance of MatchCode.io. Reach out to nexB, Inc. for other arrangements.
- check_matchcode_service_availability()
Check if the MatchCode.io service if configured and available.
- send_project_json_to_matchcode()
Create a JSON scan of the project Codebase and send it to MatchCode.io.
- poll_matching_results()
Wait until the match results are ready by polling the match run status.
- create_packages_from_match_results()
Create DiscoveredPackages from match results.
Populate PurlDB (addon)
Warning
This pipeline requires access to a PurlDB service. Refer to PURLDB to configure access to PurlDB in your ScanCode.io instance.
Publish To FederatedCode (addon)
Warning
This pipeline requires access to a FederatedCode service. Refer to FEDERATEDCODE to configure access to FederatedCode in your ScanCode.io instance.
- class scanpipe.pipelines.publish_to_federatedcode.PublishToFederatedCode
Publish package scan to FederatedCode.
This pipeline commits the project scan result in FederatedCode Git repository. It uses
Project PURLto determine the Git repository and the exact directory path where the scan should be stored.- check_federatedcode_eligibility()
Check if the project fulfills the following criteria for pushing the project result to FederatedCode.
- get_package_repository()
Get the Git repository URL and scan path for a given package.
- clone_repository()
Clone repository to local_path.
- add_scan_result()
Add package scan result to the local Git repository.
- commit_and_push_changes()
Commit and push changes to remote repository.
- delete_working_dir()
Remove temporary working dir.
Scan Codebase
- class scanpipe.pipelines.scan_codebase.ScanCodebase
Scan a codebase for application packages, licenses, and copyrights.
This pipeline does not further scan the files contained in a package for license and copyrights and only considers the declared license of a package. It does not scan for system (Linux distro) packages.
- copy_inputs_to_codebase_directory()
Copy input files to the project’s codebase/ directory. The code can also be copied there prior to running the Pipeline.
- collect_and_create_codebase_resources()
Collect and create codebase resources.
- scan_for_application_packages()
Scan unknown resources for packages information.
- scan_for_files()
Scan unknown resources for copyrights, licenses, emails, and urls.
- collect_and_create_license_detections()
Collect and create unique license detections from resources and package data.
Scan For Virus
Scan Single Package
- class scanpipe.pipelines.scan_single_package.ScanSinglePackage
Scan a single package archive (or package manifest file).
This pipeline scans a single package for package metadata, declared dependencies, licenses, license clarity score and copyrights.
The output is a summary of the scan results in JSON format.
- get_package_input()
Locate the package input in the project’s input/ directory.
- collect_input_information()
Collect and store information about the project input.
- extract_input_to_codebase_directory()
Copy or extract input to project codebase/ directory.
- run_scan()
Scan extracted codebase/ content.
- load_inventory_from_toolkit_scan()
Process a JSON Scan results to populate codebase resources and packages.
- make_summary_from_scan_results()
Build a summary in JSON format from the generated scan results.
Fetch Scores (addon)
Warning
This pipeline is preconfigured to access the “OpenSSF Scorecard API” available at https://api.securityscorecards.dev/
- class scanpipe.pipelines.fetch_scores.FetchScores
Fetch ScoreCode information for packages.
This pipeline retrieves ScoreCode data for each package in the project and stores it in the corresponding package instances.
ScoreCode data refers to metadata retrieved from the OpenSSF Scorecard tool, which evaluates open source packages based on security and quality checks. This data includes an overall score, individual check results (such as use of branch protection, fuzzing, dependency updates, etc.), the version of the scoring tool used, and the date of evaluation
- check_scorecode_service_availability()
Check if the ScoreCode service is configured and available.
- fetch_packages_scorecode_info()
Fetch ScoreCode information for each of the project’s discovered packages.
- evaluate_compliance_alerts()
Evaluate scorecard compliance alerts for the project.