Built-in Pipelines
Pipelines in ScanCode.io are Python scripts that facilitate code analysis by executing a sequence of steps. The platform provides the following built-in pipelines:
Tip
If you are unsure which pipeline suits your requirements best, check out the Which pipeline should I use? section for guidance.
Pipeline Base Class
Deploy To Develop
- class scanpipe.pipelines.deploy_to_develop.DeployToDevelop
Establish relationships between two code trees: deployment and development.
This pipeline is expecting 2 archive files with “from-” and “to-” filename prefixes as inputs: - “from-[FILENAME]” archive containing the development source code - “to-[FILENAME]” archive containing the deployment compiled code
- get_inputs()
Locate the
from
andto
input files.
- extract_inputs_to_codebase_directory()
Extract input files to the project’s codebase/ directory.
- extract_archives_in_place()
Extract recursively from* and to* archives in place with extractcode.
- collect_and_create_codebase_resources()
Collect and create codebase resources.
- fingerprint_codebase_directories()
Compute directory fingerprints for matching
- map_about_files()
Map
from/
.ABOUT files to their relatedto/
resources.
- map_checksum()
Map using SHA1 checksum.
- match_archives_to_purldb()
Match selected package archives by extension to PurlDB.
- find_java_packages()
Find the java package of the .java source files.
- map_java_to_class()
Map a .class compiled file to its .java source.
- map_jar_to_source()
Map .jar files to their related source directory.
- map_javascript()
Map a packed or minified JavaScript, TypeScript, CSS and SCSS to its source.
- match_directories_to_purldb()
Match selected directories in PurlDB.
- match_resources_to_purldb()
Match selected files by extension in PurlDB.
- map_javascript_post_purldb_match()
Map minified javascript file based on existing PurlDB match.
- map_javascript_path()
Map javascript file based on path.
- map_javascript_colocation()
Map JavaScript files based on neighborhood file mapping.
- map_thirdparty_npm_packages()
Map thirdparty package using package.json metadata.
- map_path()
Map using path similarities.
- flag_mapped_resources_archives_and_ignored_directories()
Flag all codebase resources that were mapped during the pipeline.
- perform_house_keeping_tasks()
- On deployed side
- PurlDB match files with
no-java-source
and empty status, if no match is found update status to
requires-review
.
- PurlDB match files with
Update status for uninteresting files.
- On devel side
Update status for not deployed files.
- scan_unmapped_to_files()
Scan unmapped/matched
to/
files for copyrights, licenses, emails, and urls and update the status to requires-review.
- scan_mapped_from_for_files()
Scan mapped
from/
files for copyrights, licenses, emails, and urls.
- create_local_files_packages()
Create local-files packages for codebase resources not part of a package.
- flag_deployed_from_resources_with_missing_license()
Update the status for deployed from files with missing license.
Docker Image Analysis
- class scanpipe.pipelines.docker.Docker
Analyze Docker images.
- extract_images()
Extract images from input tarballs.
- extract_layers()
Extract layers from input images.
- find_images_os_and_distro()
Find the operating system and distro of input images.
- collect_images_information()
Collect and store image information in a project.
- collect_and_create_codebase_resources()
Collect and labels all image files as CodebaseResources.
- collect_and_create_system_packages()
Collect installed system packages for each layer based on the distro.
- flag_uninteresting_codebase_resources()
Flag files that don’t belong to any system package.
Docker Windows Image Analysis
- class scanpipe.pipelines.docker_windows.DockerWindows
Analyze Windows Docker images.
- flag_known_software_packages()
Flag files from known software packages by checking common install paths.
- flag_uninteresting_codebase_resources()
Flag files that are known/labelled as uninteresting.
- flag_program_files_dirs_as_packages()
Report the immediate subdirectories of
Program Files
andProgram Files (x86)
as packages.
- flag_data_files_with_no_clues()
Flag data files that have no clues on their origin as uninteresting.
Find Vulnerabilities
- class scanpipe.pipelines.find_vulnerabilities.FindVulnerabilities
Find vulnerabilities for packages and dependencies in the VulnerableCode database.
Vulnerability data is stored on each package and dependency instance.
- check_vulnerablecode_service_availability()
Check if the VulnerableCode service if configured and available.
- lookup_packages_vulnerabilities()
Check for vulnerabilities for each of the project’s discovered package.
- lookup_dependencies_vulnerabilities()
Check for vulnerabilities for each of the project’s discovered dependency.
Inspect Manifest
- class scanpipe.pipelines.inspect_manifest.InspectManifest
Inspect one or more manifest files and resolve their associated packages.
Supports: - BOM: SPDX document, CycloneDX BOM, AboutCode ABOUT file - Python: requirements.txt, setup.py, setup.cfg, Pipfile.lock - JavaScript: yarn.lock lockfile, npm package-lock.json lockfile - Java: Java JAR MANIFEST.MF, Gradle build script - Ruby: RubyGems gemspec manifest, RubyGems Bundler Gemfile.lock - Rust: Rust Cargo.lock dependencies lockfile, Rust Cargo.toml package manifest - PHP: PHP composer lockfile, PHP composer manifest - NuGet: nuspec package manifest - Dart: pubspec manifest, pubspec lockfile - OS: FreeBSD compact package manifest, Debian installed packages database
Full list available at https://scancode-toolkit.readthedocs.io/en/ doc-update-licenses/reference/available_package_parsers.html
- get_manifest_inputs()
Locate all the manifest files from the project’s input/ directory.
- get_packages_from_manifest()
Get packages data from manifest files.
- create_resolved_packages()
Create the resolved packages and their dependencies in the database.
Load Inventory From Scan
- class scanpipe.pipelines.load_inventory.LoadInventory
Load JSON/XLSX inventory files generated with ScanCode-toolkit or ScanCode.io.
Supported format are ScanCode-toolkit JSON scan results, ScanCode.io JSON output, and ScanCode.io XLSX output.
An inventory is composed of packages, dependencies, resources, and relations.
- get_inputs()
Locate all the supported input files from the project’s input/ directory.
- build_inventory_from_scans()
Process JSON scan results files to populate packages, dependencies, and resources.
Populate PurlDB
- class scanpipe.pipelines.populate_purldb.PopulatePurlDB
Populate PurlDB with discovered project packages and their dependencies.
- populate_purldb_with_discovered_packages()
Add DiscoveredPackage to PurlDB.
- populate_purldb_with_discovered_dependencies()
Add DiscoveredDependency to PurlDB.
- populate_purldb_with_detected_purls()
Add DiscoveredPackage to PurlDB.
Root Filesystem Analysis
- class scanpipe.pipelines.root_filesystems.RootFS
Analyze a Linux root filesystem, also known as rootfs.
- extract_input_files_to_codebase_directory()
Extract root filesystem input archives with extractcode.
- find_root_filesystems()
Find root filesystems in the project’s codebase/.
- collect_rootfs_information()
Collect and stores rootfs information on the project.
- collect_and_create_codebase_resources()
Collect and label all image files as CodebaseResource.
- collect_and_create_system_packages()
Collect installed system packages for each rootfs based on the distro. The collection of system packages is only available for known distros.
- flag_uninteresting_codebase_resources()
Flag files—not worth tracking—that don’t belong to any system packages.
- scan_for_application_packages()
Scan unknown resources for packages information.
- match_not_analyzed_to_system_packages()
Match files with “not-yet-analyzed” status to files already belonging to system packages.
- match_not_analyzed_to_application_packages()
Match files with “not-yet-analyzed” status to files already belonging to application packages.
- scan_for_files()
Scan unknown resources for copyrights, licenses, emails, and urls.
- analyze_scanned_files()
Analyze single file scan results for completeness.
- flag_not_analyzed_codebase_resources()
Check for any leftover files for sanity; there should be none.
Scan Codebase
- class scanpipe.pipelines.scan_codebase.ScanCodebase
Scan a codebase with ScanCode-toolkit.
If the codebase consists of several packages and dependencies, it will try to resolve and scan those too.
Input files are copied to the project’s codebase/ directory and are extracted in place before running the scan. Alternatively, the code can be manually copied to the project codebase/ directory.
- copy_inputs_to_codebase_directory()
Copy input files to the project’s codebase/ directory. The code can also be copied there prior to running the Pipeline.
- extract_archives()
Extract archives with extractcode.
- collect_and_create_codebase_resources()
Collect and create codebase resources.
- scan_for_application_packages()
Scan unknown resources for packages information.
- scan_for_files()
Scan unknown resources for copyrights, licenses, emails, and urls.
Scan Codebase Package
- class scanpipe.pipelines.scan_codebase_packages.ScanCodebasePackages
Scan a codebase for PURLs without assembling full packages/dependencies.
This Pipeline is intended for gathering PURL information from a codebase without the overhead of full package assembly.
- scan_for_application_packages()
Scan unknown resources for packages information.
Scan Package
- class scanpipe.pipelines.scan_package.ScanPackage
Scan a single package archive with ScanCode-toolkit.
The output is a summary of the scan results in JSON format.
- get_package_archive_input()
Locate the input package archive in the project’s input/ directory.
- collect_archive_information()
Collect and store information about the input archive in the project.
- extract_archive_to_codebase_directory()
Extract package archive with extractcode.
- run_scancode()
Scan extracted codebase/ content.
- load_inventory_from_toolkit_scan()
Process a JSON Scan results to populate codebase resources and packages.
- make_summary_from_scan_results()
Build a summary in JSON format from the generated scan results.