Data Models

This section is a collection of concepts or notations for describing the structure of the ScanCode.io Data Model and providing details about all fields included in the output files.

Project

class scanpipe.models.Project

The Project encapsulates all analysis processing. Multiple analysis pipelines can be run on the same project.

Parameters
  • uuid (UUIDField) – Primary key: UUID

  • extra_data (JSONField) – Extra data. Optional mapping of extra data key/values.

  • created_date (DateTimeField) – Created date. Creation date for this project.

  • name (CharField) – Name. Name for this project.

  • work_directory (CharField) – Work directory. Project work directory location.

  • input_sources (JSONField) – Input sources

  • is_archived (BooleanField) – Is archived. Archived projects cannot be modified anymore and are not displayed by default in project lists. Multiple levels of data cleanup may have happened during the archive operation.

Reverse relationships:

Parameters
add_downloads(downloads)

Move the given downloads to the current project’s input/ directory and adds the input_source for each entry.

add_error(error, model, details=None)

Create a “ProjectError” record from the provided error Exception for this project. The model attribute can be provided as a string or as a Model class.

add_input_source(filename, source, save=False)

Add given filename and source to the current project’s input_sources field.

add_pipeline(pipeline_name, execute_now=False)

Create a new Run instance with the provided pipeline on the current project.

If execute_now is True, the pipeline task is created. on_commit() is used to postpone the task creation after the transaction is successfully committed. If there isn’t any active transactions, the callback will be executed immediately.

add_uploads(uploads)

Write the given uploads to the current project’s input/ directory and adds the input_source for each entry.

add_webhook_subscription(target_url)

Create a new WebhookSubscription instance with the provided target_url for the current project.

archive(remove_input=False, remove_codebase=False, remove_output=False)

Set the project is_archived field to True.

The remove_input, remove_codebase, and remove_output can be provided during the archive operation to delete the related work directories.

The project cannot be archived if one of its related run is queued or already running.

clear_tmp_directory()

Delete the whole content of the tmp/ directory. This is called at the end of each pipeline Run, and it doesn’t store any content that might be needed for further processing in following pipeline Run.

copy_input_from(input_location)

Copy the file at input_location to the current project’s input/ directory.

delete(*args, **kwargs)

Delete the work_directory along project-related data in the database.

Delete all related object instances using the private _raw_delete model API. This bypass the objects collection, cascade deletions, and signals. It results in a much faster objects deletion, but it needs to be applied in the correct models order as the cascading event will not be triggered. Note that this approach is used in Django’s fast_deletes but the scanpipe models are cannot be fast-deleted as they have cascades and relations.

get_codebase_config_directory()

Return the .scancode directory if available in the codebase directory.

get_latest_failed_run()

Return the latest failed Run instance of the current project.

get_latest_output(filename)

Return the latest output file with the “filename” prefix, for example “scancode-<timestamp>.json”.

get_next_run()

Return the next non-executed Run instance assigned to current project.

get_output_file_path(name, extension)

Return a crafted file path in the project output/ directory using given name and extension. The current date and time strings are added to the filename.

This method ensures the proper setup of the work_directory in case of a manual wipe and re-creates the missing pieces of the directory structure.

static get_root_content(directory)

Return a list of all files and directories of a given directory. Only the first level children will be listed.

inputs(pattern='**/*', extensions=None)

Return all files and directories path of the input/ directory matching a given pattern. The default **/* pattern means “this directory and all subdirectories, recursively”. Use the * pattern to only list the root content. The returned paths can be limited to the provided list of extensions.

move_input_from(input_location)

Move the file at input_location to the current project’s input/ directory.

reset(keep_input=True)

Reset the project by deleting all related database objects and all work directories except the input directory—when the keep_input option is True.

save(*args, **kwargs)

Save this project instance. The workspace directories are set up during project creation.

setup_work_directory()

Create all the work_directory structure and skips if already existing.

walk_codebase_path()

Return files and directories path of the codebase/ directory recursively.

write_input_file(file_object)

Write the provided file_object to the project’s input/ directory.

WORK_DIRECTORIES = ['input', 'output', 'codebase', 'tmp']
can_add_input

Return True until one pipeline run has started to execute on the project.

property codebase_path

Return the codebase directory as a Path instance.

codebaserelations

Type: Reverse ForeignKey from CodebaseRelation

All codebaserelations of this project (related name of project)

codebaseresources

Type: Reverse ForeignKey from CodebaseResource

All codebaseresources of this project (related name of project)

created_date

Type: DateTimeField

Created date. Creation date for this project.

dependency_count

Return the number of dependencies related to this project.

discovereddependencies

Type: Reverse ForeignKey from DiscoveredDependency

All discovereddependencies of this project (related name of project)

discoveredpackages

Type: Reverse ForeignKey from DiscoveredPackage

All discoveredpackages of this project (related name of project)

error_count

Return the number of errors related to this project.

extra_data

Type: JSONField

Extra data. Optional mapping of extra data key/values.

file_count

Return the number of file resources related to this project.

file_in_package_count

Return the number of file resources in a package related to this project.

file_not_in_package_count

Return the number of file resources not in a package related to this project.

has_single_resource

Return True if we only have a single CodebaseResource associated to this project, False otherwise.

property input_files

Return list of files’ relative paths in the input/ directory recursively.

property input_path

Return the input directory as a Path instance.

property input_root

Return a list of all files and directories of the input/ directory. Only the first level children will be listed.

input_sources

Type: JSONField

Input sources

property input_sources_list
property inputs_with_source

Return a list of inputs including the source, type, sha256, and size data. Return the missing_inputs defined in the input_sources field but not available in the input/ directory. Only first level children will be listed.

is_archived

Type: BooleanField

Is archived. Archived projects cannot be modified anymore and are not displayed by default in project lists. Multiple levels of data cleanup may have happened during the archive operation.

name

Type: CharField

Name. Name for this project.

property output_path

Return the output directory as a Path instance.

property output_root

Return a list of all files and directories of the output/ directory. Only first level children will be listed.

package_count

Return the number of packages related to this project.

projecterrors

Type: Reverse ForeignKey from ProjectError

All projecterrors of this project (related name of project)

relation_count

Return the number of relations related to this project.

resource_count

Return the number of resources related to this project.

runs

Type: Reverse ForeignKey from Run

All runs of this project (related name of project)

property tmp_path

Return the tmp directory as a Path instance.

uuid

Type: UUIDField

Primary key: UUID

webhooksubscriptions

Type: Reverse ForeignKey from WebhookSubscription

All webhooksubscriptions of this project (related name of project)

work_directory

Type: CharField

Work directory. Project work directory location.

property work_path

Return the work_directory as a Path instance.

CodebaseResource

class scanpipe.models.CodebaseResource

A project Codebase Resources are records of its code files and directories. Each record is identified by its path under the project workspace.

These model fields should be kept in line with commoncode.resource.Resource.

Parameters
  • id (AutoField) – Primary key: ID

  • md5 (CharField) – MD5. MD5 checksum hex-encoded, as in md5sum.

  • sha1 (CharField) – SHA1. SHA1 checksum hex-encoded, as in sha1sum.

  • sha256 (CharField) – SHA256. SHA256 checksum hex-encoded, as in sha256sum.

  • sha512 (CharField) – SHA512. SHA512 checksum hex-encoded, as in sha512sum.

  • extra_data (JSONField) – Extra data. Optional mapping of extra data key/values.

  • detected_license_expression (TextField) – Detected license expression. The license expression summarizing the license info for this resource, combined from all the license detections

  • detected_license_expression_spdx (TextField) – Detected license expression spdx. The detected license expression for this file, with SPDX license keys

  • license_detections (JSONField) – License detections. List of license detection details.

  • license_clues (JSONField) – License clues. List of license matches that are not proper detections and potentially just clues to licenses or likely false positives. Those are not included in computing the detected license expression for the resource.

  • percentage_of_license_text (FloatField) – Percentage of license text. Percentage of file words detected as license text or notice.

  • copyrights (JSONField) – Copyrights. List of detected copyright statements (and related detection details).

  • holders (JSONField) – Holders. List of detected copyright holders (and related detection details).

  • authors (JSONField) – Authors. List of detected authors (and related detection details).

  • emails (JSONField) – Emails. List of detected emails (and related detection details).

  • urls (JSONField) – Urls. List of detected URLs (and related detection details).

  • path (CharField) – Path. The full path value of a resource (file or directory) in the archive it is from.

  • rootfs_path (CharField) – Rootfs path. Path relative to some root filesystem root directory. Useful when working on disk images, docker images, and VM images.Eg.: “/usr/bin/bash” for a path of “tarball-extract/rootfs/usr/bin/bash”

  • status (CharField) – Status. Analysis status for this resource.

  • size (BigIntegerField) – Size. Size in bytes.

  • tag (CharField) – Tag

  • type (CharField) – Type. Type of this resource as one of: file, directory, symlink

  • name (CharField) – Name. File or directory name of this resource with its extension.

  • extension (CharField) – Extension. File extension for this resource (directories do not have an extension).

  • programming_language (CharField) – Programming language. Programming language of this resource if this is a code file.

  • mime_type (CharField) – Mime type. MIME type (aka. media type) for this resource. See https://en.wikipedia.org/wiki/Media_type

  • file_type (CharField) – File type. Descriptive file type for this resource.

  • is_binary (BooleanField) – Is binary

  • is_text (BooleanField) – Is text

  • is_archive (BooleanField) – Is archive

  • is_key_file (BooleanField) – Is key file

  • is_media (BooleanField) – Is media

  • compliance_alert (CharField) – Compliance alert. Indicates how the detected licenses in a codebase resource complies with provided policies.

  • package_data (JSONField) – Package data. List of Package data detected from this CodebaseResource

Relationship fields:

Parameters

project (ForeignKey to Project) – Project (related name: codebaseresources)

Reverse relationships:

Parameters
class Compliance(value)

List of compliance alert values.

ERROR = 'error'
MISSING = 'missing'
OK = 'ok'
WARNING = 'warning'
class Type(value)

List of CodebaseResource types.

DIRECTORY = 'directory'
FILE = 'file'
add_package(discovered_package)

Assign the discovered_package to this codebase_resource instance.

as_spdx()

Return this CodebaseResource as an SPDX Package entry.

children(codebase=None)

Return a QuerySet of direct children CodebaseResource objects using a database query on the current CodebaseResource path.

Paths are returned in lower-cased sorted path order to reflect the behavior of the commoncode.resource.Resource.children() https://github.com/nexB/commoncode/blob/main/src/commoncode/resource.py

codebase is not used in this context but required for compatibility with the commoncode.resource.VirtualCodebase class API.

compute_compliance_alert()

Compute and return the compliance_alert value from the licenses policies.

create_and_add_package(package_data)

Create a DiscoveredPackage instance using the package_data and assigns it to the current CodebaseResource instance.

Errors that may happen during the DiscoveredPackage creation are capture at this level, rather that in the DiscoveredPackage.create_from_data level, so resource data can be injected in the ProjectError record.

classmethod create_from_data(project, resource_data)

Create and returns a Discover`edPackage for a project from the package_data. If one of the values of the required fields is not available, a “ProjectError” is created instead of a new DiscoveredPackage instance.

descendants()

Return a QuerySet of descendant CodebaseResource objects using a database query on the current CodebaseResource path. The current CodebaseResource is not included.

get_compliance_alert_display(*, field=<django.db.models.CharField: compliance_alert>)

Shows the label of the compliance_alert. See get_FOO_display() for more information.

get_path_segments_with_subpath()

Return a list of path segment name along its subpath for this resource.

Such as:: [

(‘root’, ‘root’), (‘subpath’, ‘root/subpath’), (‘file.txt’, ‘root/subpath/file.txt’),

]

get_raw_url()

Return the URL to access the RAW content of the resource.

get_spdx_types()
get_type_display(*, field=<django.db.models.CharField: type>)

Shows the label of the type. See get_FOO_display() for more information.

has_parent()

Return True if this CodebaseResource has a parent CodebaseResource or False otherwise.

parent(codebase=None)

Return the parent CodebaseResource object for this CodebaseResource or None.

codebase is not used in this context but required for compatibility with the commoncode.resource.Codebase class API.

parent_path()

Return the parent path for this CodebaseResource or None.

save(codebase=None, *args, **kwargs)

Save the current resource instance. Injects policies, if the feature is enabled, when the detected_license_expression field value is changed.

codebase is not used in this context but required for compatibility with the commoncode.resource.Codebase class API.

siblings(codebase=None)

Return a sequence of sibling Resource objects for this Resource or an empty sequence.

codebase is not used in this context but required for compatibility with the commoncode.resource.Codebase class API.

walk(topdown=True)

Return all descendant Resources of the current Resource; does not include self.

Traverses the tree top-down, depth-first if topdown is True; otherwise traverses the tree bottom-up.

authors

Type: JSONField

Authors. List of detected authors (and related detection details).

compliance_alert

Type: CharField

Compliance alert. Indicates how the detected licenses in a codebase resource complies with provided policies.

Choices:

  • ok

  • warning

  • error

  • missing

copyrights

Type: JSONField

Copyrights. List of detected copyright statements (and related detection details).

dependencies

Type: Reverse ForeignKey from DiscoveredDependency

All dependencies of this codebase resource (related name of datafile_resource)

detected_license_expression

Type: TextField

Detected license expression. The license expression summarizing the license info for this resource, combined from all the license detections

detected_license_expression_spdx

Type: TextField

Detected license expression spdx. The detected license expression for this file, with SPDX license keys

discovered_packages

Type: Reverse ManyToManyField from DiscoveredPackage

All discovered packages of this codebase resource (related name of codebase_resources)

emails

Type: JSONField

Emails. List of detected emails (and related detection details).

extension

Type: CharField

Extension. File extension for this resource (directories do not have an extension).

extra_data

Type: JSONField

Extra data. Optional mapping of extra data key/values.

property file_content

Return the content of the current Resource file using TextCode utilities for optimal compatibility.

file_type

Type: CharField

File type. Descriptive file type for this resource.

property for_packages

Return the list of all discovered packages associated to this resource.

holders

Type: JSONField

Holders. List of detected copyright holders (and related detection details).

id

Type: AutoField

Primary key: ID

is_archive

Type: BooleanField

Is archive

is_binary

Type: BooleanField

Is binary

property is_dir

Return True, if the resource is a directory.

property is_file

Return True, if the resource is a file.

is_key_file

Type: BooleanField

Is key file

is_media

Type: BooleanField

Is media

Return True, if the resource is a symlink.

is_text

Type: BooleanField

Is text

license_clues

Type: JSONField

License clues. List of license matches that are not proper detections and potentially just clues to licenses or likely false positives. Those are not included in computing the detected license expression for the resource.

license_detections

Type: JSONField

License detections. List of license detection details.

property location

Return the location of the resource as a string.

property location_path

Return the location of the resource as a Path instance.

md5

Type: CharField

MD5. MD5 checksum hex-encoded, as in md5sum.

mime_type

Type: CharField

Mime type. MIME type (aka. media type) for this resource. See https://en.wikipedia.org/wiki/Media_type

name

Type: CharField

Name. File or directory name of this resource with its extension.

package_data

Type: JSONField

Package data. List of Package data detected from this CodebaseResource

path

Type: CharField

Path. The full path value of a resource (file or directory) in the archive it is from.

percentage_of_license_text

Type: FloatField

Percentage of license text. Percentage of file words detected as license text or notice.

programming_language

Type: CharField

Programming language. Programming language of this resource if this is a code file.

project

Type: ForeignKey to Project

Project (related name: codebaseresources)

project_id

Internal field, use project instead.

related_from

Type: Reverse ForeignKey from CodebaseRelation

All related from of this codebase resource (related name of to_resource)

related_to

Type: Reverse ForeignKey from CodebaseRelation

All related to of this codebase resource (related name of from_resource)

rootfs_path

Type: CharField

Rootfs path. Path relative to some root filesystem root directory. Useful when working on disk images, docker images, and VM images.Eg.: “/usr/bin/bash” for a path of “tarball-extract/rootfs/usr/bin/bash”

sha1

Type: CharField

SHA1. SHA1 checksum hex-encoded, as in sha1sum.

sha256

Type: CharField

SHA256. SHA256 checksum hex-encoded, as in sha256sum.

sha512

Type: CharField

SHA512. SHA512 checksum hex-encoded, as in sha512sum.

size

Type: BigIntegerField

Size. Size in bytes.

property spdx_id
status

Type: CharField

Status. Analysis status for this resource.

tag

Type: CharField

Tag

type

Type: CharField

Type. Type of this resource as one of: file, directory, symlink

Choices:

  • file

  • directory

  • symlink

urls

Type: JSONField

Urls. List of detected URLs (and related detection details).

DiscoveredPackage

class scanpipe.models.DiscoveredPackage

A project’s Discovered Packages are records of the system and application packages discovered in the code under analysis. Each record is identified by its Package URL. Package URL is a fundamental effort to create informative identifiers for software packages, such as Debian, RPM, npm, Maven, or PyPI packages. See https://github.com/package-url for more details.

Parameters
  • id (AutoField) – Primary key: ID

  • type (CharField) – Type. A short code to identify the type of this package. For example: gem for a Rubygem, docker for a container, pypi for a Python Wheel or Egg, maven for a Maven Jar, deb for a Debian package, etc.

  • namespace (CharField) – Namespace. Package name prefix, such as Maven groupid, Docker image owner, GitHub user or organization, etc.

  • name (CharField) – Name. Name of the package.

  • version (CharField) – Version. Version of the package.

  • qualifiers (CharField) – Qualifiers. Extra qualifying data for a package such as the name of an OS, architecture, distro, etc.

  • subpath (CharField) – Subpath. Extra subpath within a package, relative to the package root.

  • md5 (CharField) – MD5. MD5 checksum hex-encoded, as in md5sum.

  • sha1 (CharField) – SHA1. SHA1 checksum hex-encoded, as in sha1sum.

  • sha256 (CharField) – SHA256. SHA256 checksum hex-encoded, as in sha256sum.

  • sha512 (CharField) – SHA512. SHA512 checksum hex-encoded, as in sha512sum.

  • extra_data (JSONField) – Extra data. Optional mapping of extra data key/values.

  • filename (CharField) – Filename. File name of a Resource sometimes part of the URI properand sometimes only available through an HTTP header.

  • primary_language (CharField) – Primary language. Primary programming language.

  • description (TextField) – Description. Description for this package. By convention the first line should be a summary when available.

  • release_date (DateField) – Release date. The date that the package file was created, or when it was posted to its original download source.

  • homepage_url (CharField) – Homepage URL. URL to the homepage for this package.

  • download_url (CharField) – Download URL. A direct download URL.

  • size (BigIntegerField) – Size. Size in bytes.

  • bug_tracking_url (CharField) – Bug tracking URL. URL to the issue or bug tracker for this package.

  • code_view_url (CharField) – Code view URL. a URL where the code can be browsed online.

  • vcs_url (CharField) – VCS URL. A URL to the VCS repository in the SPDX form of: “git”, “svn”, “hg”, “bzr”, “cvs”, https://github.com/nexb/scancode-toolkit.git@405aaa4b3 See SPDX specification “Package Download Location” at https://spdx.org/spdx-specification-21-web-version#h.49x2ik5

  • repository_homepage_url (CharField) – Repository homepage URL. URL to the page for this package in its package repository. This is typically different from the package homepage URL proper.

  • repository_download_url (CharField) – Repository download URL. Download URL to download the actual archive of code of this package in its package repository. This may be different from the actual download URL.

  • api_data_url (CharField) – API data URL. API URL to obtain structured data for this package such as the URL to a JSON or XML api its package repository.

  • copyright (TextField) – Copyright. Copyright statements for this package. Typically one per line.

  • holder (TextField) – Holder. Holders for this package. Typically one per line.

  • declared_license_expression (TextField) – Declared license expression. The license expression for this package typically derived from its extracted_license_statement or from some other type-specific routine or convention.

  • declared_license_expression_spdx (TextField) – Declared license expression spdx. The SPDX license expression for this package converted from its declared_license_expression.

  • license_detections (JSONField) – License detections. A list of LicenseDetection mappings typically derived from its extracted_license_statement or from some other type-specific routine or convention.

  • other_license_expression (TextField) – Other license expression. The license expression for this package which is different from the declared_license_expression, (i.e. not the primary license) routine or convention.

  • other_license_expression_spdx (TextField) – Other license expression spdx. The other SPDX license expression for this package converted from its other_license_expression.

  • other_license_detections (JSONField) – Other license detections. A list of LicenseDetection mappings which is different from the declared_license_expression, (i.e. not the primary license) These are detections for the detection for the license expressions in other_license_expression.

  • extracted_license_statement (TextField) – Extracted license statement. The license statement mention, tag or text as found in a package manifest and extracted. This can be a string, a list or dict of strings possibly nested, as found originally in the manifest.

  • notice_text (TextField) – Notice text. A notice text for this package.

  • datasource_id (CharField) – Datasource id. The identifier for the datafile handler used to obtain this package.

  • file_references (JSONField) – File references. List of file paths and details for files referenced in a package manifest. These may not actually exist on the filesystem. The exact semantics and base of these paths is specific to a package type or datafile format.

  • parties (JSONField) – Parties. A list of parties such as a person, project or organization.

  • uuid (UUIDField) – UUID

  • missing_resources (JSONField) – Missing resources

  • modified_resources (JSONField) – Modified resources

  • package_uid (CharField) – Package uid. Unique identifier for this package.

  • keywords (JSONField) – Keywords

  • source_packages (JSONField) – Source packages

Relationship fields:

Parameters

Reverse relationships:

Parameters

dependencies (Reverse ForeignKey from DiscoveredDependency) – All dependencies of this discovered package (related name of for_package)

add_resources(codebase_resources)

Assign the codebase_resources to this discovered_package instance.

as_cyclonedx()

Return this DiscoveredPackage as an CycloneDX Component entry.

as_spdx()

Return this DiscoveredPackage as an SPDX Package entry.

classmethod clean_data(data)

Return the data dict keeping only entries for fields available in the model.

classmethod create_from_data(project, package_data)

Create and returns a DiscoveredPackage for a project from the package_data. If one of the values of the required fields is not available, a “ProjectError” is created instead of a new DiscoveredPackage instance.

classmethod extract_purl_data(package_data)
get_declared_license_expression()

Return this package license expression.

Use declared_license_expression when available or compute the expression from declared_license_expression_spdx.

get_declared_license_expression_spdx()

Return this package license expression using SPDX keys.

Use declared_license_expression_spdx when available or compute the expression from declared_license_expression.

api_data_url

Type: CharField

API data URL. API URL to obtain structured data for this package such as the URL to a JSON or XML api its package repository.

bug_tracking_url

Type: CharField

Bug tracking URL. URL to the issue or bug tracker for this package.

code_view_url

Type: CharField

Code view URL. a URL where the code can be browsed online.

codebase_resources

Type: ManyToManyField to CodebaseResource

Codebase resources (related name: discovered_packages)

copyright

Type: TextField

Copyright. Copyright statements for this package. Typically one per line.

datasource_id

Type: CharField

Datasource id. The identifier for the datafile handler used to obtain this package.

declared_license_expression

Type: TextField

Declared license expression. The license expression for this package typically derived from its extracted_license_statement or from some other type-specific routine or convention.

declared_license_expression_spdx

Type: TextField

Declared license expression spdx. The SPDX license expression for this package converted from its declared_license_expression.

dependencies

Type: Reverse ForeignKey from DiscoveredDependency

All dependencies of this discovered package (related name of for_package)

description

Type: TextField

Description. Description for this package. By convention the first line should be a summary when available.

download_url

Type: CharField

Download URL. A direct download URL.

extra_data

Type: JSONField

Extra data. Optional mapping of extra data key/values.

extracted_license_statement

Type: TextField

Extracted license statement. The license statement mention, tag or text as found in a package manifest and extracted. This can be a string, a list or dict of strings possibly nested, as found originally in the manifest.

file_references

Type: JSONField

File references. List of file paths and details for files referenced in a package manifest. These may not actually exist on the filesystem. The exact semantics and base of these paths is specific to a package type or datafile format.

filename

Type: CharField

Filename. File name of a Resource sometimes part of the URI properand sometimes only available through an HTTP header.

holder

Type: TextField

Holder. Holders for this package. Typically one per line.

homepage_url

Type: CharField

Homepage URL. URL to the homepage for this package.

id

Type: AutoField

Primary key: ID

keywords

Type: JSONField

Keywords

license_detections

Type: JSONField

License detections. A list of LicenseDetection mappings typically derived from its extracted_license_statement or from some other type-specific routine or convention.

md5

Type: CharField

MD5. MD5 checksum hex-encoded, as in md5sum.

missing_resources

Type: JSONField

Missing resources

modified_resources

Type: JSONField

Modified resources

name

Type: CharField

Name. Name of the package.

namespace

Type: CharField

Namespace. Package name prefix, such as Maven groupid, Docker image owner, GitHub user or organization, etc.

notice_text

Type: TextField

Notice text. A notice text for this package.

other_license_detections

Type: JSONField

Other license detections. A list of LicenseDetection mappings which is different from the declared_license_expression, (i.e. not the primary license) These are detections for the detection for the license expressions in other_license_expression.

other_license_expression

Type: TextField

Other license expression. The license expression for this package which is different from the declared_license_expression, (i.e. not the primary license) routine or convention.

other_license_expression_spdx

Type: TextField

Other license expression spdx. The other SPDX license expression for this package converted from its other_license_expression.

package_uid

Type: CharField

Package uid. Unique identifier for this package.

parties

Type: JSONField

Parties. A list of parties such as a person, project or organization.

primary_language

Type: CharField

Primary language. Primary programming language.

project

Type: ForeignKey to Project

Project (related name: discoveredpackages)

project_id

Internal field, use project instead.

property purl

Return the Package URL.

qualifiers

Type: CharField

Qualifiers. Extra qualifying data for a package such as the name of an OS, architecture, distro, etc.

release_date

Type: DateField

Release date. The date that the package file was created, or when it was posted to its original download source.

repository_download_url

Type: CharField

Repository download URL. Download URL to download the actual archive of code of this package in its package repository. This may be different from the actual download URL.

repository_homepage_url

Type: CharField

Repository homepage URL. URL to the page for this package in its package repository. This is typically different from the package homepage URL proper.

resources

Return the assigned codebase_resources QuerySet as a list.

sha1

Type: CharField

SHA1. SHA1 checksum hex-encoded, as in sha1sum.

sha256

Type: CharField

SHA256. SHA256 checksum hex-encoded, as in sha256sum.

sha512

Type: CharField

SHA512. SHA512 checksum hex-encoded, as in sha512sum.

size

Type: BigIntegerField

Size. Size in bytes.

source_packages

Type: JSONField

Source packages

property spdx_id
subpath

Type: CharField

Subpath. Extra subpath within a package, relative to the package root.

type

Type: CharField

Type. A short code to identify the type of this package. For example: gem for a Rubygem, docker for a container, pypi for a Python Wheel or Egg, maven for a Maven Jar, deb for a Debian package, etc.

uuid

Type: UUIDField

UUID

vcs_url

Type: CharField

VCS URL. A URL to the VCS repository in the SPDX form of: “git”, “svn”, “hg”, “bzr”, “cvs”, https://github.com/nexb/scancode-toolkit.git@405aaa4b3 See SPDX specification “Package Download Location” at https://spdx.org/spdx-specification-21-web-version#h.49x2ik5

version

Type: CharField

Version. Version of the package.

DiscoveredDependency

class scanpipe.models.DiscoveredDependency

A project’s Discovered Dependencies are records of the dependencies used by system and application packages discovered in the code under analysis.

Parameters
  • id (AutoField) – Primary key: ID

  • type (CharField) – Type. A short code to identify the type of this package. For example: gem for a Rubygem, docker for a container, pypi for a Python Wheel or Egg, maven for a Maven Jar, deb for a Debian package, etc.

  • namespace (CharField) – Namespace. Package name prefix, such as Maven groupid, Docker image owner, GitHub user or organization, etc.

  • name (CharField) – Name. Name of the package.

  • version (CharField) – Version. Version of the package.

  • qualifiers (CharField) – Qualifiers. Extra qualifying data for a package such as the name of an OS, architecture, distro, etc.

  • subpath (CharField) – Subpath. Extra subpath within a package, relative to the package root.

  • dependency_uid (CharField) – Dependency uid. The unique identifier of this dependency.

  • extracted_requirement (CharField) – Extracted requirement. The version requirements of this dependency.

  • scope (CharField) – Scope. The scope of this dependency, how it is used in a project.

  • datasource_id (CharField) – Datasource id. The identifier for the datafile handler used to obtain this dependency.

  • is_runtime (BooleanField) – Is runtime

  • is_optional (BooleanField) – Is optional

  • is_resolved (BooleanField) – Is resolved

Relationship fields:

Parameters
as_spdx()

Return this Package as an SPDX Package entry.

classmethod create_from_data(project, dependency_data, for_package=None, datafile_resource=None, strip_datafile_path_root=False)

Create and returns a DiscoveredDependency for a project from the dependency_data.

If strip_datafile_path_root is True, then create_from_data() will strip the root path segment from the datafile_path of dependency_data before looking up the corresponding CodebaseResource for datafile_path. This is used in the case where Dependency data is imported from a scancode-toolkit scan, where the root path segments are not stripped for datafile_path.

datafile_path
datafile_resource

Type: ForeignKey to CodebaseResource

Datafile resource (related name: dependencies)

datafile_resource_id

Internal field, use datafile_resource instead.

datasource_id

Type: CharField

Datasource id. The identifier for the datafile handler used to obtain this dependency.

dependency_uid

Type: CharField

Dependency uid. The unique identifier of this dependency.

extracted_requirement

Type: CharField

Extracted requirement. The version requirements of this dependency.

for_package

Type: ForeignKey to DiscoveredPackage

For package (related name: dependencies)

for_package_id

Internal field, use for_package instead.

for_package_uid
id

Type: AutoField

Primary key: ID

is_optional

Type: BooleanField

Is optional

is_resolved

Type: BooleanField

Is resolved

is_runtime

Type: BooleanField

Is runtime

name

Type: CharField

Name. Name of the package.

namespace

Type: CharField

Namespace. Package name prefix, such as Maven groupid, Docker image owner, GitHub user or organization, etc.

property package_type
project

Type: ForeignKey to Project

Project (related name: discovereddependencies)

project_id

Internal field, use project instead.

property purl
qualifiers

Type: CharField

Qualifiers. Extra qualifying data for a package such as the name of an OS, architecture, distro, etc.

scope

Type: CharField

Scope. The scope of this dependency, how it is used in a project.

property spdx_id
subpath

Type: CharField

Subpath. Extra subpath within a package, relative to the package root.

type

Type: CharField

Type. A short code to identify the type of this package. For example: gem for a Rubygem, docker for a container, pypi for a Python Wheel or Egg, maven for a Maven Jar, deb for a Debian package, etc.

version

Type: CharField

Version. Version of the package.

CodebaseRelation

class scanpipe.models.CodebaseRelation

Relation between two CodebaseResource.

Parameters
  • uuid (UUIDField) – Primary key: UUID

  • extra_data (JSONField) – Extra data. Optional mapping of extra data key/values.

  • map_type (CharField) – Map type

Relationship fields:

Parameters
extra_data

Type: JSONField

Extra data. Optional mapping of extra data key/values.

from_resource

Type: ForeignKey to CodebaseResource

From resource (related name: related_to)

from_resource_id

Internal field, use from_resource instead.

map_type

Type: CharField

Map type

project

Type: ForeignKey to Project

Project (related name: codebaserelations)

project_id

Internal field, use project instead.

to_resource

Type: ForeignKey to CodebaseResource

To resource (related name: related_from)

to_resource_id

Internal field, use to_resource instead.

uuid

Type: UUIDField

Primary key: UUID

ProjectError

class scanpipe.models.ProjectError

Stores errors and exceptions raised during a pipeline run.

Parameters
  • uuid (UUIDField) – Primary key: UUID

  • created_date (DateTimeField) – Created date

  • model (CharField) – Model. Name of the model class.

  • details (JSONField) – Details. Data that caused the error.

  • message (TextField) – Message. Error message.

  • traceback (TextField) – Traceback. Exception traceback.

Relationship fields:

Parameters

project (ForeignKey to Project) – Project (related name: projecterrors)

created_date

Type: DateTimeField

Created date

details

Type: JSONField

Details. Data that caused the error.

message

Type: TextField

Message. Error message.

model

Type: CharField

Model. Name of the model class.

project

Type: ForeignKey to Project

Project (related name: projecterrors)

project_id

Internal field, use project instead.

traceback

Type: TextField

Traceback. Exception traceback.

uuid

Type: UUIDField

Primary key: UUID

Run

class scanpipe.models.Run

The Database representation of a pipeline execution.

Parameters

Relationship fields:

Parameters

project (ForeignKey to Project) – Project (related name: runs)

deliver_project_subscriptions()

Triggers related project webhook subscriptions.

execute_task_async()

Enqueues the pipeline execution task for an asynchronous execution.

make_pipeline_instance()

Return a pipelines instance using this Run pipeline_class.

profile(print_results=False)

Return computed execution times for each step in the current Run.

If print_results is provided, the results are printed to stdout.

set_current_step(message)

Set the message value on the current_step field. Truncate the value at 256 characters.

set_scancodeio_version()

Set the current ScanCode.io version on the scancodeio_version field.

sync_with_job()

Synchronise this Run instance with its related RQ Job. This is required when a Run gets out of sync with its Job, this can happen when the worker or one of its processes is killed, the Run status is not properly updated and may stay in a Queued or Running state forever. In case the Run is out of sync of its related Job, the Run status will be updated accordingly. When the run was in the queue, it will be enqueued again.

created_date

Type: DateTimeField

Created date

current_step

Type: CharField

Current step

description

Type: TextField

Description

log

Type: TextField

Log

property pipeline_class

Return this Run pipeline_class.

pipeline_name

Type: CharField

Pipeline name. Identify a registered Pipeline class.

project

Type: ForeignKey to Project

Project (related name: runs)

project_id

Internal field, use project instead.

scancodeio_version

Type: CharField

Scancodeio version

task_end_date

Type: DateTimeField

Task end date

task_exitcode

Type: IntegerField

Task exitcode

task_id

Type: UUIDField

Task id

task_output

Type: TextField

Task output

task_start_date

Type: DateTimeField

Task start date

uuid

Type: UUIDField

Primary key: UUID