Data Models

This section is a collection of concepts or notations for describing the structure of the ScanCode.io Data Model and providing details about all fields included in the output files.

Project

class scanpipe.models.Project

The Project encapsulates all analysis processing. Multiple analysis pipelines can be run on the same project.

Parameters:
  • uuid (UUIDField) – Primary key: UUID

  • extra_data (JSONField) – Extra data. Optional mapping of extra data key/values.

  • created_date (DateTimeField) – Created date. Creation date for this project.

  • name (CharField) – Name. Name for this project.

  • slug (SlugField) – Slug

  • work_directory (CharField) – Work directory. Project work directory location.

  • is_archived (BooleanField) – Is archived. Archived projects cannot be modified anymore and are not displayed by default in project lists. Multiple levels of data cleanup may have happened during the archive operation.

  • notes (TextField) – Notes

  • settings (JSONField) – Settings

Relationship fields:

Parameters:
  • labels (TaggableManager to Tag) – Tags. A comma-separated list of tags. (related name: project)

  • tagged_items (GenericRelation to UUIDTaggedItem) – Tagged items (related name: +)

Reverse relationships:

Parameters:
  • projectmessages (Reverse ForeignKey from ProjectMessage) – All projectmessages of this project (related name of project)

  • inputsources (Reverse ForeignKey from InputSource) – All inputsources of this project (related name of project)

  • runs (Reverse ForeignKey from Run) – All runs of this project (related name of project)

  • codebaseresources (Reverse ForeignKey from CodebaseResource) – All codebaseresources of this project (related name of project)

  • codebaserelations (Reverse ForeignKey from CodebaseRelation) – All codebaserelations of this project (related name of project)

  • discoveredpackages (Reverse ForeignKey from DiscoveredPackage) – All discoveredpackages of this project (related name of project)

  • discovereddependencies (Reverse ForeignKey from DiscoveredDependency) – All discovereddependencies of this project (related name of project)

  • webhooksubscriptions (Reverse ForeignKey from WebhookSubscription) – All webhooksubscriptions of this project (related name of project)

  • webhookdeliveries (Reverse ForeignKey from WebhookDelivery) – All webhookdeliveries of this project (related name of project)

add_downloads(downloads)

Move the given downloads to the current project’s input/ directory and adds the input_source for each entry.

add_error(description='', model='', details=None, exception=None, object_instance=None)

Create an ERROR ProjectMessage record using for this project.

add_info(description='', model='', details=None, exception=None, object_instance=None)

Create an INFO ProjectMessage record for this project.

add_input_source(download_url='', filename='', is_uploaded=False, tag='')

Create a InputFile entry for the current project, given a download_url or a filename.

add_message(severity, description='', model='', details=None, exception=None, object_instance=None)

Create a ProjectMessage record for this Project.

The model attribute can be provided as a string or as a Model class. A resource can be provided to keep track of the codebase resource that was analyzed when the error occurred.

add_pipeline(pipeline_name, execute_now=False, selected_groups=None)

Create a new Run instance with the provided pipeline on the current project.

If execute_now is True, the pipeline task is created. on_commit() is used to postpone the task creation after the transaction is successfully committed. If there isn’t any active transactions, the callback will be executed immediately.

add_upload(uploaded_file, tag='')

Write the given upload to the current project’s input/ directory and adds the input_source.

add_uploads(uploads)

Write the given uploads to the current project’s input/ directory and adds the input_source for each entry.

add_warning(description='', model='', details=None, exception=None, object_instance=None)

Create a WARNING ProjectMessage record for this project.

add_webhook_subscription(**kwargs)

Create a new WebhookSubscription instance with the provided target_url for the current project.

archive(remove_input=False, remove_codebase=False, remove_output=False)

Set the project is_archived field to True.

The remove_input, remove_codebase, and remove_output can be provided during the archive operation to delete the related work directories.

The project cannot be archived if one of its related run is queued or already running.

clear_tmp_directory()

Delete the whole content of the tmp/ directory. This is called at the end of each pipeline Run, and it doesn’t store any content that might be needed for further processing in following pipeline Run.

clone(clone_name, copy_inputs=False, copy_pipelines=False, copy_settings=False, copy_subscriptions=False, execute_now=False)

Clone this project using the provided clone_name as new project name.

copy_input_from(input_location)

Copy the file at input_location to the current project’s input/ directory.

delete(*args, **kwargs)

Delete the work_directory along project-related data in the database.

Delete all related object instances using the private _raw_delete model API. This bypass the objects collection, cascade deletions, and signals. It results in a much faster objects deletion, but it needs to be applied in the correct models order as the cascading event will not be triggered. Note that this approach is used in Django’s fast_deletes but the scanpipe models are cannot be fast-deleted as they have cascades and relations.

get_codebase_config_directory()

Return the .scancode config directory if available in the codebase directory.

get_enabled_settings()

Return the enabled settings with non-empty values.

get_env(field_name=None)

Return the project environment loaded from the scancode-config.yml config file, when available, and overridden by the settings model field.

field_name can be provided to get a single entry from the env.

get_env_from_config_file()

Return env dict loaded from the scancode-config.yml config file.

get_ignored_dependency_scopes_index()

Return a dictionary index of the ignored_dependency_scopes setting values defined in this Project env.

get_ignored_vulnerabilities_set()

Return a set of ignored_vulnerabilities setting values defined in this Project env.

get_input_config_file()

Return the scancode-config.yml file from the input/ directory or from the codebase/ immediate subdirectories.

Priority order: 1. If a config file exists directly in the input/ directory, return it. 2. If exactly one config file exists in a codebase/ immediate subdirectory, return it. 3. If multiple config files are found in subdirectories, report an error.

get_inputs_with_source()

Return an input list including the filename, download_url, and size data.

get_latest_output(filename)

Return the latest output file with the “filename” prefix, for example “scancode-<timestamp>.json”.

get_next_run()

Return the next non-executed Run instance assigned to current project.

get_output_file_path(name, extension)

Return a crafted file path in the project output/ directory using given name and extension. The current date and time strings are added to the filename.

This method ensures the proper setup of the work_directory in case of a manual wipe and re-creates the missing pieces of the directory structure.

get_output_files_info()

Return files form the output work directory including the name and size.

get_resource(path)

Return the codebase resource present for a given path, or None the resource with that path does not exist. This path is relative to the scan location. This is same as the Codebase.get_resource() function.

static get_root_content(directory)

Return a list of all files and directories of a given directory. Only the first level children will be listed.

get_settings_as_yml()

Return the settings file content as yml, suitable for a config file.

inputs(pattern='**/*', extensions=None)

Return all files and directories path of the input/ directory matching a given pattern. The default **/* pattern means “this directory and all subdirectories, recursively”. Use the * pattern to only list the root content. The returned paths can be limited to the provided list of extensions.

move_input_from(input_location)

Move the file at input_location to the current project’s input/ directory.

reset(keep_input=True)

Reset the project by deleting all related database objects and all work directories except the input directory—when the keep_input option is True.

save(*args, **kwargs)

Save this project instance. The workspace directories are set up during project creation.

setup_work_directory()

Create all the work_directory structure and skips if already existing.

start_pipelines()

Start the next “not started” pipeline execution.

walk_codebase_path()

Return files and directories path of the codebase/ directory recursively.

write_input_file(file_object)

Write the provided file_object to the project’s input/ directory.

WORK_DIRECTORIES = ['input', 'output', 'codebase', 'tmp']
can_change_inputs

Return True until one pipeline run has started its execution on the project. Always return False when the project is archived.

can_start_pipelines

Return True if at least one “not started” pipeline is assigned to this project and if no pipeline runs is currently “queued or running”. “not started”. Always return False when the project is archived.

property codebase_path

Return the codebase directory as a Path instance.

codebaserelations

Type: Reverse ForeignKey from CodebaseRelation

All codebaserelations of this project (related name of project)

codebaseresources

Type: Reverse ForeignKey from CodebaseResource

All codebaseresources of this project (related name of project)

created_date

Type: DateTimeField

Created date. Creation date for this project.

dependency_count

Return the number of dependencies related to this project.

discovereddependencies

Type: Reverse ForeignKey from DiscoveredDependency

All discovereddependencies of this project (related name of project)

discoveredpackages

Type: Reverse ForeignKey from DiscoveredPackage

All discoveredpackages of this project (related name of project)

extra_data

Type: JSONField

Extra data. Optional mapping of extra data key/values.

file_count

Return the number of file resources related to this project.

file_in_package_count

Return the number of file resources in a package related to this project.

file_not_in_package_count

Return the number of file resources not in a package related to this project.

has_single_resource

Return True if we only have a single CodebaseResource associated to this project, False otherwise.

ignored_dependency_scopes_index

Return the computed value of get_ignored_dependency_scopes_index. The value is only generated once and cached for further calls.

ignored_vulnerabilities_set

Return the computed value of get_ignored_vulnerabilities_set. The value is only generated once and cached for further calls.

property input_files

Return list of files’ relative paths in the input/ directory recursively.

property input_path

Return the input directory as a Path instance.

property input_root

Return a list of all files and directories of the input/ directory. Only the first level children will be listed.

property input_sources
inputsources

Type: Reverse ForeignKey from InputSource

All inputsources of this project (related name of project)

is_archived

Type: BooleanField

Is archived. Archived projects cannot be modified anymore and are not displayed by default in project lists. Multiple levels of data cleanup may have happened during the archive operation.

labels = <taggit.managers._TaggableManager object>
message_count

Return the number of messages related to this project.

name

Type: CharField

Name. Name for this project.

notes

Type: TextField

Notes

property output_path

Return the output directory as a Path instance.

property output_root

Return a list of all files and directories of the output/ directory. Only first level children will be listed.

package_count

Return the number of packages related to this project.

projectmessages

Type: Reverse ForeignKey from ProjectMessage

All projectmessages of this project (related name of project)

relation_count

Return the number of relations related to this project.

resource_count

Return the number of resources related to this project.

runs

Type: Reverse ForeignKey from Run

All runs of this project (related name of project)

settings

Type: JSONField

Settings

slug

Type: SlugField

Slug

tagged_items

Type: Reverse GenericRelation from Project

All + of this Label (related name of tagged_items)

property tmp_path

Return the tmp directory as a Path instance.

uuid

Type: UUIDField

Primary key: UUID

vulnerable_dependency_count

Return the number of vulnerable dependencies related to this project.

vulnerable_package_count

Return the number of vulnerable packages related to this project.

webhookdeliveries

Type: Reverse ForeignKey from WebhookDelivery

All webhookdeliveries of this project (related name of project)

webhooksubscriptions

Type: Reverse ForeignKey from WebhookSubscription

All webhooksubscriptions of this project (related name of project)

work_directory

Type: CharField

Work directory. Project work directory location.

property work_path

Return the work_directory as a Path instance.

CodebaseResource

class scanpipe.models.CodebaseResource

A project Codebase Resources are records of its code files and directories. Each record is identified by its path under the project workspace.

These model fields should be kept in line with commoncode.resource.Resource.

Parameters:
  • id (AutoField) – Primary key: ID

  • md5 (CharField) – MD5. MD5 checksum hex-encoded, as in md5sum.

  • sha1 (CharField) – SHA1. SHA1 checksum hex-encoded, as in sha1sum.

  • sha256 (CharField) – SHA256. SHA256 checksum hex-encoded, as in sha256sum.

  • sha512 (CharField) – SHA512. SHA512 checksum hex-encoded, as in sha512sum.

  • extra_data (JSONField) – Extra data. Optional mapping of extra data key/values.

  • detected_license_expression (TextField) – Detected license expression. The license expression summarizing the license info for this resource, combined from all the license detections

  • detected_license_expression_spdx (TextField) – Detected license expression spdx. The detected license expression for this file, with SPDX license keys

  • license_detections (JSONField) – License detections. List of license detection details.

  • license_clues (JSONField) – License clues. List of license matches that are not proper detections and potentially just clues to licenses or likely false positives. Those are not included in computing the detected license expression for the resource.

  • percentage_of_license_text (FloatField) – Percentage of license text. Percentage of file words detected as license text or notice.

  • copyrights (JSONField) – Copyrights. List of detected copyright statements (and related detection details).

  • holders (JSONField) – Holders. List of detected copyright holders (and related detection details).

  • authors (JSONField) – Authors. List of detected authors (and related detection details).

  • emails (JSONField) – Emails. List of detected emails (and related detection details).

  • urls (JSONField) – Urls. List of detected URLs (and related detection details).

  • compliance_alert (CharField) – Compliance alert. Indicates how the license expression complies with provided policies.

  • is_legal (BooleanField) – Is legal. True if this file is likely a legal, license-related file such as a COPYING or LICENSE file.

  • is_manifest (BooleanField) – Is manifest. True if this file is likely a package manifest file such as a Maven pom.xml or an npm package.json

  • is_readme (BooleanField) – Is readme. True if this file is likely a README file.

  • is_top_level (BooleanField) – Is top level. True if this file is top-level file located either at the root of a package or in a well-known common location.

  • is_key_file (BooleanField) – Is key file. True if this file is top-level file and either a legal, readme or manifest file.

  • path (CharField) – Path. The full path value of a resource (file or directory) in the archive it is from.

  • rootfs_path (CharField) – Rootfs path. Path relative to some root filesystem root directory. Useful when working on disk images, docker images, and VM images.Eg.: “/usr/bin/bash” for a path of “tarball-extract/rootfs/usr/bin/bash”

  • status (CharField) – Status. Analysis status for this resource.

  • size (BigIntegerField) – Size. Size in bytes.

  • tag (CharField) – Tag

  • type (CharField) – Type. Type of this resource as one of: file, directory, symlink

  • name (CharField) – Name. File or directory name of this resource with its extension.

  • extension (CharField) – Extension. File extension for this resource (directories do not have an extension).

  • programming_language (CharField) – Programming language. Programming language of this resource if this is a code file.

  • mime_type (CharField) – Mime type. MIME type (aka. media type) for this resource. See https://en.wikipedia.org/wiki/Media_type

  • file_type (CharField) – File type. Descriptive file type for this resource.

  • is_binary (BooleanField) – Is binary

  • is_text (BooleanField) – Is text

  • is_archive (BooleanField) – Is archive

  • is_media (BooleanField) – Is media

  • package_data (JSONField) – Package data. List of Package data detected from this CodebaseResource

Relationship fields:

Parameters:

project (ForeignKey to Project) – Project (related name: codebaseresources)

Reverse relationships:

Parameters:
class Type(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

List of CodebaseResource types.

DIRECTORY = 'directory'
FILE = 'file'
add_package(discovered_package)

Assign the discovered_package to this codebase_resource instance.

as_spdx()

Return this CodebaseResource as an SPDX Package entry.

children(codebase=None)

Return a QuerySet of direct children CodebaseResource objects using a database query on the current CodebaseResource path.

Paths are returned in lower-cased sorted path order to reflect the behavior of the commoncode.resource.Resource.children() https://github.com/aboutcode-org/commoncode/blob/main/src/commoncode/resource.py

codebase is not used in this context but required for compatibility with the commoncode.resource.VirtualCodebase class API.

create_and_add_package(package_data)

Create a DiscoveredPackage instance using the package_data and assigns it to the current CodebaseResource instance.

Errors that may happen during the DiscoveredPackage creation are capture at this level, rather that in the DiscoveredPackage.create_from_data level, so resource data can be injected in the ProjectMessage record.

classmethod create_from_data(project, resource_data)

Create and returns a DiscoveredPackage for a project from the package_data. If one of the values of the required fields is not available, a “ProjectMessage” is created instead of a new DiscoveredPackage instance.

descendants()

Return a QuerySet of descendant CodebaseResource objects using a database query on the current CodebaseResource path. The current CodebaseResource is not included.

get_compliance_alert_display(*, field=<django.db.models.CharField: compliance_alert>)

Shows the label of the compliance_alert. See get_FOO_display() for more information.

get_path_segments_with_subpath()

Return a list of path segment name along its subpath for this resource.

Such as:

[
    ('root', 'root'),
    ('subpath', 'root/subpath'),
    ('file.txt', 'root/subpath/file.txt'),
]
get_raw_url()

Return the URL to access the RAW content of the resource.

get_spdx_types()
get_type_display(*, field=<django.db.models.CharField: type>)

Shows the label of the type. See get_FOO_display() for more information.

has_parent()

Return True if this CodebaseResource has a parent CodebaseResource or False otherwise.

parent(codebase=None)

Return the parent CodebaseResource object for this CodebaseResource or None.

codebase is not used in this context but required for compatibility with the commoncode.resource.Codebase class API.

parent_path()

Return the parent path for this CodebaseResource or None.

siblings(codebase=None)

Return a sequence of sibling Resource objects for this Resource or an empty sequence.

codebase is not used in this context but required for compatibility with the commoncode.resource.Codebase class API.

walk(topdown=True)

Return all descendant Resources of the current Resource; does not include self.

Traverses the tree top-down, depth-first if topdown is True; otherwise traverses the tree bottom-up.

authors

Type: JSONField

Authors. List of detected authors (and related detection details).

compliance_alert

Type: CharField

Compliance alert. Indicates how the license expression complies with provided policies.

Choices:

  • ok

  • warning

  • error

  • missing

copyrights

Type: JSONField

Copyrights. List of detected copyright statements (and related detection details).

declared_dependencies

Type: Reverse ForeignKey from DiscoveredDependency

All declared dependencies of this codebase resource (related name of datafile_resource)

detected_license_expression

Type: TextField

Detected license expression. The license expression summarizing the license info for this resource, combined from all the license detections

detected_license_expression_spdx

Type: TextField

Detected license expression spdx. The detected license expression for this file, with SPDX license keys

discovered_packages

Type: Reverse ManyToManyField from DiscoveredPackage

All discovered packages of this codebase resource (related name of codebase_resources)

emails

Type: JSONField

Emails. List of detected emails (and related detection details).

extension

Type: CharField

Extension. File extension for this resource (directories do not have an extension).

extra_data

Type: JSONField

Extra data. Optional mapping of extra data key/values.

property file_content

Return the content of the current Resource file using TextCode utilities for optimal compatibility.

file_type

Type: CharField

File type. Descriptive file type for this resource.

property for_packages

Return the list of all discovered packages associated to this resource.

holders

Type: JSONField

Holders. List of detected copyright holders (and related detection details).

id

Type: AutoField

Primary key: ID

is_archive

Type: BooleanField

Is archive

is_binary

Type: BooleanField

Is binary

property is_dir

Return True, if the resource is a directory.

property is_file

Return True, if the resource is a file.

is_key_file

Type: BooleanField

Is key file. True if this file is top-level file and either a legal, readme or manifest file.

Type: BooleanField

Is legal. True if this file is likely a legal, license-related file such as a COPYING or LICENSE file.

is_manifest

Type: BooleanField

Is manifest. True if this file is likely a package manifest file such as a Maven pom.xml or an npm package.json

is_media

Type: BooleanField

Is media

is_readme

Type: BooleanField

Is readme. True if this file is likely a README file.

Return True, if the resource is a symlink.

is_text

Type: BooleanField

Is text

is_top_level

Type: BooleanField

Is top level. True if this file is top-level file located either at the root of a package or in a well-known common location.

license_clues

Type: JSONField

License clues. List of license matches that are not proper detections and potentially just clues to licenses or likely false positives. Those are not included in computing the detected license expression for the resource.

license_detections

Type: JSONField

License detections. List of license detection details.

license_expression_field = 'detected_license_expression'
property location

Return the location of the resource as a string.

property location_path

Return the location of the resource as a Path instance.

md5

Type: CharField

MD5. MD5 checksum hex-encoded, as in md5sum.

mime_type

Type: CharField

Mime type. MIME type (aka. media type) for this resource. See https://en.wikipedia.org/wiki/Media_type

name

Type: CharField

Name. File or directory name of this resource with its extension.

property name_without_extension

Return the name of the resource without it’s extension.

package_data

Type: JSONField

Package data. List of Package data detected from this CodebaseResource

path

Type: CharField

Path. The full path value of a resource (file or directory) in the archive it is from.

percentage_of_license_text

Type: FloatField

Percentage of license text. Percentage of file words detected as license text or notice.

programming_language

Type: CharField

Programming language. Programming language of this resource if this is a code file.

project

Type: ForeignKey to Project

Project (related name: codebaseresources)

project_id

Internal field, use project instead.

related_from

Type: Reverse ForeignKey from CodebaseRelation

All related from of this codebase resource (related name of to_resource)

related_to

Type: Reverse ForeignKey from CodebaseRelation

All related to of this codebase resource (related name of from_resource)

rootfs_path

Type: CharField

Rootfs path. Path relative to some root filesystem root directory. Useful when working on disk images, docker images, and VM images.Eg.: “/usr/bin/bash” for a path of “tarball-extract/rootfs/usr/bin/bash”

sha1

Type: CharField

SHA1. SHA1 checksum hex-encoded, as in sha1sum.

sha256

Type: CharField

SHA256. SHA256 checksum hex-encoded, as in sha256sum.

sha512

Type: CharField

SHA512. SHA512 checksum hex-encoded, as in sha512sum.

size

Type: BigIntegerField

Size. Size in bytes.

property spdx_id
status

Type: CharField

Status. Analysis status for this resource.

tag

Type: CharField

Tag

type

Type: CharField

Type. Type of this resource as one of: file, directory, symlink

Choices:

  • file

  • directory

  • symlink

urls

Type: JSONField

Urls. List of detected URLs (and related detection details).

DiscoveredPackage

class scanpipe.models.DiscoveredPackage

A project’s Discovered Packages are records of the system and application packages discovered in the code under analysis. Each record is identified by its Package URL. Package URL is a fundamental effort to create informative identifiers for software packages, such as Debian, RPM, npm, Maven, or PyPI packages. See https://github.com/package-url for more details.

Parameters:
  • id (AutoField) – Primary key: ID

  • type (CharField) – Type. A short code to identify the type of this package. For example: gem for a Rubygem, docker for a container, pypi for a Python Wheel or Egg, maven for a Maven Jar, deb for a Debian package, etc.

  • namespace (CharField) – Namespace. Package name prefix, such as Maven groupid, Docker image owner, GitHub user or organization, etc.

  • name (CharField) – Name. Name of the package.

  • version (CharField) – Version. Version of the package.

  • qualifiers (CharField) – Qualifiers. Extra qualifying data for a package such as the name of an OS, architecture, distro, etc.

  • subpath (CharField) – Subpath. Extra subpath within a package, relative to the package root.

  • md5 (CharField) – MD5. MD5 checksum hex-encoded, as in md5sum.

  • sha1 (CharField) – SHA1. SHA1 checksum hex-encoded, as in sha1sum.

  • sha256 (CharField) – SHA256. SHA256 checksum hex-encoded, as in sha256sum.

  • sha512 (CharField) – SHA512. SHA512 checksum hex-encoded, as in sha512sum.

  • extra_data (JSONField) – Extra data. Optional mapping of extra data key/values.

  • compliance_alert (CharField) – Compliance alert. Indicates how the license expression complies with provided policies.

  • affected_by_vulnerabilities (JSONField) – Affected by vulnerabilities

  • filename (CharField) – Filename. File name of a Resource sometimes part of the URI properand sometimes only available through an HTTP header.

  • primary_language (CharField) – Primary language. Primary programming language.

  • description (TextField) – Description. Description for this package. By convention the first line should be a summary when available.

  • release_date (DateField) – Release date. The date that the package file was created, or when it was posted to its original download source.

  • homepage_url (CharField) – Homepage URL. URL to the homepage for this package.

  • download_url (CharField) – Download URL. A direct download URL.

  • size (BigIntegerField) – Size. Size in bytes.

  • bug_tracking_url (CharField) – Bug tracking URL. URL to the issue or bug tracker for this package.

  • code_view_url (CharField) – Code view URL. a URL where the code can be browsed online.

  • vcs_url (CharField) – VCS URL. A URL to the VCS repository in the SPDX form of: “git”, “svn”, “hg”, “bzr”, “cvs”, https://github.com/nexb/scancode-toolkit.git@405aaa4b3 See SPDX specification “Package Download Location” at https://spdx.org/spdx-specification-21-web-version#h.49x2ik5

  • repository_homepage_url (CharField) – Repository homepage URL. URL to the page for this package in its package repository. This is typically different from the package homepage URL proper.

  • repository_download_url (CharField) – Repository download URL. Download URL to download the actual archive of code of this package in its package repository. This may be different from the actual download URL.

  • api_data_url (CharField) – API data URL. API URL to obtain structured data for this package such as the URL to a JSON or XML api its package repository.

  • copyright (TextField) – Copyright. Copyright statements for this package. Typically one per line.

  • holder (TextField) – Holder. Holders for this package. Typically one per line.

  • declared_license_expression (TextField) – Declared license expression. The license expression for this package typically derived from its extracted_license_statement or from some other type-specific routine or convention.

  • declared_license_expression_spdx (TextField) – Declared license expression spdx. The SPDX license expression for this package converted from its declared_license_expression.

  • license_detections (JSONField) – License detections. A list of LicenseDetection mappings typically derived from its extracted_license_statement or from some other type-specific routine or convention.

  • other_license_expression (TextField) – Other license expression. The license expression for this package which is different from the declared_license_expression, (i.e. not the primary license) routine or convention.

  • other_license_expression_spdx (TextField) – Other license expression spdx. The other SPDX license expression for this package converted from its other_license_expression.

  • other_license_detections (JSONField) – Other license detections. A list of LicenseDetection mappings which is different from the declared_license_expression, (i.e. not the primary license) These are detections for the detection for the license expressions in other_license_expression.

  • extracted_license_statement (TextField) – Extracted license statement. The license statement mention, tag or text as found in a package manifest and extracted. This can be a string, a list or dict of strings possibly nested, as found originally in the manifest.

  • notice_text (TextField) – Notice text. A notice text for this package.

  • is_private (BooleanField) – Is private. True if this is a private package, either not meant to be published on a repository, and/or a local package without a name and version used primarily to track dependencies and other information.

  • is_virtual (BooleanField) – Is virtual. True if this package is created only from a manifest or lockfile, and not from its actual packaged code. The files of this package are not present in the codebase.

  • datasource_ids (JSONField) – Datasource ids. The identifiers for the datafile handlers used to obtain this package.

  • datafile_paths (JSONField) – Datafile paths. A list of Resource paths for package datafiles which were used to assemble this pacakage.

  • file_references (JSONField) – File references. List of file paths and details for files referenced in a package manifest. These may not actually exist on the filesystem. The exact semantics and base of these paths is specific to a package type or datafile format.

  • parties (JSONField) – Parties. A list of parties such as a person, project or organization.

  • uuid (UUIDField) – UUID

  • missing_resources (JSONField) – Missing resources

  • modified_resources (JSONField) – Modified resources

  • package_uid (CharField) – Package uid. Unique identifier for this package.

  • keywords (JSONField) – Keywords

  • notes (TextField) – Notes

  • source_packages (JSONField) – Source packages

  • tag (CharField) – Tag

Relationship fields:

Parameters:

Reverse relationships:

Parameters:
add_resources(codebase_resources)

Assign the codebase_resources to this discovered_package instance.

as_cyclonedx()

Return this DiscoveredPackage as an CycloneDX Component entry.

as_spdx()

Return this DiscoveredPackage as an SPDX Package entry.

classmethod clean_data(data)

Return the data dict keeping only entries for fields available in the model.

classmethod create_from_data(project, package_data)

Create and return a DiscoveredPackage for a given project based on package_data.

If the required name field is missing in package_data, a ProjectMessage is created instead of a DiscoveredPackage instance.

If the type field is missing in package_data, it defaults to “unknown” before creating the DiscoveredPackage.

classmethod extract_purl_data(package_data)
get_compliance_alert_display(*, field=<django.db.models.CharField: compliance_alert>)

Shows the label of the compliance_alert. See get_FOO_display() for more information.

get_declared_license_expression()

Return this package license expression.

Use declared_license_expression when available or compute the expression from declared_license_expression_spdx.

get_declared_license_expression_spdx()

Return this package license expression using SPDX keys.

Use declared_license_expression_spdx when available or compute the expression from declared_license_expression.

affected_by_vulnerabilities

Type: JSONField

Affected by vulnerabilities

api_data_url

Type: CharField

API data URL. API URL to obtain structured data for this package such as the URL to a JSON or XML api its package repository.

bug_tracking_url

Type: CharField

Bug tracking URL. URL to the issue or bug tracker for this package.

children_packages

Type: ManyToManyField to DiscoveredPackage

Children packages (related name: parent_packages)

code_view_url

Type: CharField

Code view URL. a URL where the code can be browsed online.

codebase_resources

Type: ManyToManyField to CodebaseResource

Codebase resources (related name: discovered_packages)

compliance_alert

Type: CharField

Compliance alert. Indicates how the license expression complies with provided policies.

Choices:

  • ok

  • warning

  • error

  • missing

copyright

Type: TextField

Copyright. Copyright statements for this package. Typically one per line.

property cyclonedx_bom_ref

Use the package_uid when available to ensure having unique bom_ref in the SBOM when several instances of the same DiscoveredPackage (i.e. same purl) are present in the project.

datafile_paths

Type: JSONField

Datafile paths. A list of Resource paths for package datafiles which were used to assemble this pacakage.

datasource_ids

Type: JSONField

Datasource ids. The identifiers for the datafile handlers used to obtain this package.

declared_dependencies

Type: Reverse ForeignKey from DiscoveredDependency

All declared dependencies of this discovered package (related name of for_package)

declared_license_expression

Type: TextField

Declared license expression. The license expression for this package typically derived from its extracted_license_statement or from some other type-specific routine or convention.

declared_license_expression_spdx

Type: TextField

Declared license expression spdx. The SPDX license expression for this package converted from its declared_license_expression.

description

Type: TextField

Description. Description for this package. By convention the first line should be a summary when available.

download_url

Type: CharField

Download URL. A direct download URL.

extra_data

Type: JSONField

Extra data. Optional mapping of extra data key/values.

extracted_license_statement

Type: TextField

Extracted license statement. The license statement mention, tag or text as found in a package manifest and extracted. This can be a string, a list or dict of strings possibly nested, as found originally in the manifest.

file_references

Type: JSONField

File references. List of file paths and details for files referenced in a package manifest. These may not actually exist on the filesystem. The exact semantics and base of these paths is specific to a package type or datafile format.

filename

Type: CharField

Filename. File name of a Resource sometimes part of the URI properand sometimes only available through an HTTP header.

holder

Type: TextField

Holder. Holders for this package. Typically one per line.

homepage_url

Type: CharField

Homepage URL. URL to the homepage for this package.

id

Type: AutoField

Primary key: ID

is_private

Type: BooleanField

Is private. True if this is a private package, either not meant to be published on a repository, and/or a local package without a name and version used primarily to track dependencies and other information.

is_virtual

Type: BooleanField

Is virtual. True if this package is created only from a manifest or lockfile, and not from its actual packaged code. The files of this package are not present in the codebase.

keywords

Type: JSONField

Keywords

license_detections

Type: JSONField

License detections. A list of LicenseDetection mappings typically derived from its extracted_license_statement or from some other type-specific routine or convention.

license_expression_field = 'declared_license_expression'
md5

Type: CharField

MD5. MD5 checksum hex-encoded, as in md5sum.

missing_resources

Type: JSONField

Missing resources

modified_resources

Type: JSONField

Modified resources

name

Type: CharField

Name. Name of the package.

namespace

Type: CharField

Namespace. Package name prefix, such as Maven groupid, Docker image owner, GitHub user or organization, etc.

notes

Type: TextField

Notes

notice_text

Type: TextField

Notice text. A notice text for this package.

other_license_detections

Type: JSONField

Other license detections. A list of LicenseDetection mappings which is different from the declared_license_expression, (i.e. not the primary license) These are detections for the detection for the license expressions in other_license_expression.

other_license_expression

Type: TextField

Other license expression. The license expression for this package which is different from the declared_license_expression, (i.e. not the primary license) routine or convention.

other_license_expression_spdx

Type: TextField

Other license expression spdx. The other SPDX license expression for this package converted from its other_license_expression.

package_uid

Type: CharField

Package uid. Unique identifier for this package.

parent_packages

Type: Reverse ManyToManyField from DiscoveredPackage

All parent packages of this discovered package (related name of children_packages)

parties

Type: JSONField

Parties. A list of parties such as a person, project or organization.

primary_language

Type: CharField

Primary language. Primary programming language.

project

Type: ForeignKey to Project

Project (related name: discoveredpackages)

project_id

Internal field, use project instead.

property purl

Return the Package URL.

qualifiers

Type: CharField

Qualifiers. Extra qualifying data for a package such as the name of an OS, architecture, distro, etc.

release_date

Type: DateField

Release date. The date that the package file was created, or when it was posted to its original download source.

repository_download_url

Type: CharField

Repository download URL. Download URL to download the actual archive of code of this package in its package repository. This may be different from the actual download URL.

repository_homepage_url

Type: CharField

Repository homepage URL. URL to the page for this package in its package repository. This is typically different from the package homepage URL proper.

resolved_from_dependencies

Type: Reverse ForeignKey from DiscoveredDependency

All resolved from dependencies of this discovered package (related name of resolved_to_package)

resources

Return the assigned codebase_resources QuerySet as a list.

sha1

Type: CharField

SHA1. SHA1 checksum hex-encoded, as in sha1sum.

sha256

Type: CharField

SHA256. SHA256 checksum hex-encoded, as in sha256sum.

sha512

Type: CharField

SHA512. SHA512 checksum hex-encoded, as in sha512sum.

size

Type: BigIntegerField

Size. Size in bytes.

source_packages

Type: JSONField

Source packages

property spdx_id
subpath

Type: CharField

Subpath. Extra subpath within a package, relative to the package root.

tag

Type: CharField

Tag

type

Type: CharField

Type. A short code to identify the type of this package. For example: gem for a Rubygem, docker for a container, pypi for a Python Wheel or Egg, maven for a Maven Jar, deb for a Debian package, etc.

uuid

Type: UUIDField

UUID

vcs_url

Type: CharField

VCS URL. A URL to the VCS repository in the SPDX form of: “git”, “svn”, “hg”, “bzr”, “cvs”, https://github.com/nexb/scancode-toolkit.git@405aaa4b3 See SPDX specification “Package Download Location” at https://spdx.org/spdx-specification-21-web-version#h.49x2ik5

version

Type: CharField

Version. Version of the package.

DiscoveredDependency

class scanpipe.models.DiscoveredDependency

A project’s Discovered Dependencies are records of the dependencies used by system and application packages discovered in the code under analysis. Dependencies are usually collected from parsed package data such as a package manifest or lockfile.

Parameters:
  • id (AutoField) – Primary key: ID

  • type (CharField) – Type. A short code to identify the type of this package. For example: gem for a Rubygem, docker for a container, pypi for a Python Wheel or Egg, maven for a Maven Jar, deb for a Debian package, etc.

  • namespace (CharField) – Namespace. Package name prefix, such as Maven groupid, Docker image owner, GitHub user or organization, etc.

  • name (CharField) – Name. Name of the package.

  • version (CharField) – Version. Version of the package.

  • qualifiers (CharField) – Qualifiers. Extra qualifying data for a package such as the name of an OS, architecture, distro, etc.

  • subpath (CharField) – Subpath. Extra subpath within a package, relative to the package root.

  • affected_by_vulnerabilities (JSONField) – Affected by vulnerabilities

  • dependency_uid (CharField) – Dependency uid. The unique identifier of this dependency.

  • extracted_requirement (CharField) – Extracted requirement. The version requirements of this dependency.

  • scope (CharField) – Scope. The scope of this dependency, how it is used in a project.

  • datasource_id (CharField) – Datasource id. The identifier for the datafile handler used to obtain this dependency.

  • is_runtime (BooleanField) – Is runtime. True if this dependency is a runtime dependency.

  • is_optional (BooleanField) – Is optional. True if this dependency is an optional dependency

  • is_resolved (BooleanField) – Is resolved. True if this dependency version requirement has been pinned and this dependency points to an exact version.

  • is_direct (BooleanField) – Is direct. True if this is a direct, first-level dependency relationship for a package.

Relationship fields:

Parameters:
as_spdx()

Return this Dependency as an SPDX Package entry.

classmethod create_from_data(project, dependency_data, for_package=None, resolved_to_package=None, datafile_resource=None, datasource_id=None, strip_datafile_path_root=False)

Create and returns a DiscoveredDependency for a project from the dependency_data.

If strip_datafile_path_root is True, then create_from_data() will strip the root path segment from the datafile_path of dependency_data before looking up the corresponding CodebaseResource for datafile_path. This is used in the case where Dependency data is imported from a scancode-toolkit scan, where the root path segments are not stripped for datafile_path.

classmethod extract_purl_data(dependency_data, ignore_nulls=False)
classmethod populate_dependency_uuid(dependency_data)
affected_by_vulnerabilities

Type: JSONField

Affected by vulnerabilities

property base_purl
datafile_path
datafile_resource

Type: ForeignKey to CodebaseResource

Datafile resource. The codebase resource (e.g., manifest or lockfile) that declares this dependency. (related name: declared_dependencies)

datafile_resource_id

Internal field, use datafile_resource instead.

datasource_id

Type: CharField

Datasource id. The identifier for the datafile handler used to obtain this dependency.

dependency_uid

Type: CharField

Dependency uid. The unique identifier of this dependency.

extracted_requirement

Type: CharField

Extracted requirement. The version requirements of this dependency.

for_package

Type: ForeignKey to DiscoveredPackage

For package. The package that declares this dependency. (related name: declared_dependencies)

for_package_id

Internal field, use for_package instead.

for_package_uid
id

Type: AutoField

Primary key: ID

is_direct

Type: BooleanField

Is direct. True if this is a direct, first-level dependency relationship for a package.

is_optional

Type: BooleanField

Is optional. True if this dependency is an optional dependency

is_resolved

Type: BooleanField

Is resolved. True if this dependency version requirement has been pinned and this dependency points to an exact version.

is_runtime

Type: BooleanField

Is runtime. True if this dependency is a runtime dependency.

name

Type: CharField

Name. Name of the package.

namespace

Type: CharField

Namespace. Package name prefix, such as Maven groupid, Docker image owner, GitHub user or organization, etc.

property package_type
project

Type: ForeignKey to Project

Project (related name: discovereddependencies)

project_id

Internal field, use project instead.

property purl
qualifiers

Type: CharField

Qualifiers. Extra qualifying data for a package such as the name of an OS, architecture, distro, etc.

resolved_to_package

Type: ForeignKey to DiscoveredPackage

Resolved to package. The resolved package for this dependency. If empty, it indicates the dependency is unresolved. (related name: resolved_from_dependencies)

resolved_to_package_id

Internal field, use resolved_to_package instead.

resolved_to_package_uid
scope

Type: CharField

Scope. The scope of this dependency, how it is used in a project.

property spdx_id
subpath

Type: CharField

Subpath. Extra subpath within a package, relative to the package root.

type

Type: CharField

Type. A short code to identify the type of this package. For example: gem for a Rubygem, docker for a container, pypi for a Python Wheel or Egg, maven for a Maven Jar, deb for a Debian package, etc.

version

Type: CharField

Version. Version of the package.

CodebaseRelation

class scanpipe.models.CodebaseRelation

Relation between two CodebaseResource.

Parameters:
  • uuid (UUIDField) – Primary key: UUID

  • extra_data (JSONField) – Extra data. Optional mapping of extra data key/values.

  • map_type (CharField) – Map type

Relationship fields:

Parameters:
extra_data

Type: JSONField

Extra data. Optional mapping of extra data key/values.

from_resource

Type: ForeignKey to CodebaseResource

From resource (related name: related_to)

from_resource_id

Internal field, use from_resource instead.

map_type

Type: CharField

Map type

project

Type: ForeignKey to Project

Project (related name: codebaserelations)

project_id

Internal field, use project instead.

property score
property status
to_resource

Type: ForeignKey to CodebaseResource

To resource (related name: related_from)

to_resource_id

Internal field, use to_resource instead.

uuid

Type: UUIDField

Primary key: UUID

ProjectMessage

class scanpipe.models.ProjectMessage

Stores messages such as errors and exceptions raised during a pipeline run.

Parameters:
  • uuid (UUIDField) – Primary key: UUID

  • severity (CharField) – Severity. Severity level of the message.

  • description (TextField) – Description. Description.

  • model (CharField) – Model. Name of the model class.

  • details (JSONField) – Details. Data that caused the error.

  • traceback (TextField) – Traceback. Exception traceback.

  • created_date (DateTimeField) – Created date

Relationship fields:

Parameters:

project (ForeignKey to Project) – Project (related name: projectmessages)

class Severity(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
ERROR = 'error'
INFO = 'info'
WARNING = 'warning'
get_severity_display(*, field=<django.db.models.CharField: severity>)

Shows the label of the severity. See get_FOO_display() for more information.

created_date

Type: DateTimeField

Created date

description

Type: TextField

Description. Description.

details

Type: JSONField

Details. Data that caused the error.

model

Type: CharField

Model. Name of the model class.

project

Type: ForeignKey to Project

Project (related name: projectmessages)

project_id

Internal field, use project instead.

severity

Type: CharField

Severity. Severity level of the message.

Choices:

  • info

  • warning

  • error

traceback

Type: TextField

Traceback. Exception traceback.

uuid

Type: UUIDField

Primary key: UUID

Run

class scanpipe.models.Run

The Database representation of a pipeline execution.

Parameters:

Relationship fields:

Parameters:

project (ForeignKey to Project) – Project (related name: runs)

Reverse relationships:

Parameters:

webhook_deliveries (Reverse ForeignKey from WebhookDelivery) – All webhook deliveries of this run (related name of run)

deliver_project_subscriptions(has_next_run=False)

Triggers related project Webhook subscriptions.

execute_task_async()

Enqueues the pipeline execution task for an asynchronous execution.

get_diff_url()

Return a GitHub diff URL between this Run commit at the time of execution and the current commit of the ScanCode.io app instance. The URL is only returned if both commit are available and if they differ.

get_previous_runs()

Return all the previous Run instances regardless of their status.

make_pipeline_instance()

Return a pipelines instance using this Run pipeline_class.

profile(print_results=False)

Return computed execution times for each step in the current Run.

If print_results is provided, the results are printed to stdout.

set_current_step(message)

Set the message value on the current_step field. Truncate the value at 256 characters.

set_scancodeio_version()

Set the current ScanCode.io version on the scancodeio_version field.

start()

Start the pipeline execution when allowed or raised an exception.

sync_with_job()

Synchronise this Run instance with its related RQ Job. This is required when a Run gets out of sync with its Job, this can happen when the worker or one of its processes is killed, the Run status is not properly updated and may stay in a Queued or Running state forever. In case the Run is out of sync of its related Job, the Run status will be updated accordingly. When the run was in the queue, it will be enqueued again.

property can_start

Return True if this Run is allowed to start its execution.

Run are not allowed to start when any of their previous Run instances within the pipeline has not completed (not started, queued, or running). This is enforced to ensure the pipelines are run in a sequential order.

created_date

Type: DateTimeField

Created date

current_step

Type: CharField

Current step

description

Type: TextField

Description

log

Type: TextField

Log

property pipeline_class

Return this Run pipeline_class.

pipeline_name

Type: CharField

Pipeline name. Identify a registered Pipeline class.

project

Type: ForeignKey to Project

Project (related name: runs)

project_id

Internal field, use project instead.

property results_url

Return the rendered results_url if defined on the Pipeline class.

scancodeio_version

Type: CharField

Scancodeio version

selected_groups

Type: JSONField

Selected groups

selected_steps

Type: JSONField

Selected steps

task_end_date

Type: DateTimeField

Task end date

task_exitcode

Type: IntegerField

Task exitcode

task_id

Type: UUIDField

Task id

task_output

Type: TextField

Task output

task_start_date

Type: DateTimeField

Task start date

uuid

Type: UUIDField

Primary key: UUID

webhook_deliveries

Type: Reverse ForeignKey from WebhookDelivery

All webhook deliveries of this run (related name of run)