A project encapsulates the analysis of software code:
It has a Project workspace, which is a directory that contains the software code files under analysis.
It makes use of one or more code analysis Pipelines scripts to automate the code analysis process.
It tracks Codebase Resources, i.e. its code files and directories
It tracks Discovered Packages, i.e. system and application packages origin and license discovered in the codebase.
In the database, a project is identified by its unique name.
Multiple analysis pipelines can be run on a single project.
A project workspace is the root directory where a project’s files are stored.
The following directories exist under the workspace directory:
input/ contains all uploaded files used as the input of a project, such as a codebase archive.
codebase/ contains files and directories - i.e. resources - tracked as CodebaseResource records in the database.
output/ contains any output files created by the pipelines, including reports, scan results, etc.
tmp/ is a scratch pad for temporary files generated during pipelines runs.
A pipeline is a Python script that contains a series of steps, which are executed sequentially to perform a code analysis.
It usually starts with the uploaded input files, which might need to be
extracted first. Then, it generates
CodebaseResource records in the database
Those resources can then be analyzed, scanned, and matched as needed. Analysis results and reports are eventually posted at the end of a pipeline run.
All Built-in Pipelines are located in the
Each pipeline consists of a Python script and includes one subclass of the
Each step is a method of the
The execution order of the steps - or the sequence of steps execution - is
declared through the
steps class attribute.
Refer to Custom Pipelines for details about adding custom pipelines to ScanCode.io.
You can assign one or more pipelines to a project as a sequence.
As mentioned above, pipelines include a group of operations—Pipes—that are combined in a chain-like fashion and executed in orderly manner. Pipes are simply the building blocks of a given pipeline.
For example, the following operations—Steps—are included in the RootFS pipeline, and they are leveraging pipes to accomplish pre-defined tasks:
from scanpipe.pipelines import Pipeline from scanpipe.pipes import flag from scanpipe.pipes import rootfs from scanpipe.pipes import scancode class RootFS(Pipeline): [...] def flag_empty_files(self): """ Flags empty files. """ flag.flag_empty_files(self.project) def scan_for_application_packages(self): """ Scans unknown resources for packages information. """ scancode.scan_for_application_packages(self.project)
All built-in pipes are located in the
Pipes are grouped by type in modules, e.g.
Refer to our Pipes section for information about available pipes and their usage.
Codebase Resources are records of its code files and directories.
CodebaseResource is a database model and each record is identified by its path
under the project workspace.
The following are some of the
A status, which is used to track the analysis status for this resource.
A type, such as a file, a directory or a symlink
Various attributes to track detected copyrights, license expressions, copyright holders, and related packages.
Please note that ScanCode-toolkit use the same attributes and attribute names for files.
Discovered Packages are records of the system and application packages
discovered in the code under analysis.
DiscoveredPackage is a database model and each record is identified by its
Package URL is a fundamental effort to create informative identifiers for
software packages, such as Debian, RPM, npm, Maven, or PyPI packages.
See https://github.com/package-url for more details.
The following are some of the
A type, name, version (all Package URL attributes)
A homepage_url, download_url, and other URLs
Checksums, such as SHA1, MD5
Copyright, license_expression, and declared_license
Please note that ScanCode-toolkit use the same attributes and attribute names for packages.