Basic Options
All “Basic” Scan Options
Option lists are two-column lists of command-line options and descriptions, documenting a program’s options. For example:
- -c, --copyright
Scan
<input>for copyrights.Sub-Options:
--consolidate
- -l, --license
Scan
<input>for licenses.Sub-Options:
--license-references--license-text--license-text-diagnostics--license-diagnostics--license-url-template TEXT--license-score INT--license-clarity-score--consolidate--unknown-licenses
- -p, --package
Scan
<input>for packages.Sub-Options:
--consolidate
- --system-package
Scan
<input>for installed system package databases.- -e, --email
Scan
<input>for emails.Sub-Options:
--max-email INT
- -u, --url
Scan
<input>for urls.Sub-Options:
--max-url INT
- -i, --info
Scan for and include information such as:
Size,
Type,
Date,
Programming language,
sha1 and md5 hashes,
binary/text/archive/media/source/script flags
Additional options through more CLI options
Sub-Options:
--mark-source
Note
Unlike previous 2.x versions, -c, -l, and -p are not default. If any combination of these
options are used, ScanCode performs only that specific task, and not the others.
scancode -l scans only for licenses, and doesn’t scan for copyright/packages/general
information/emails/urls. The only notable exception: a --package scan also has
license information for package manifests and top-level packages, which are derived
regardless of --license option being used.
Note
These options, i.e. -c, -l, -p, -e, -u, and -i can be used together. As in, instead of
scancode -c -i -p, you can write scancode -cip and it will be the same.
- --generated
Classify automatically generated code files with a flag.
- --max-email INT
Report only up to INT emails found in a file. Use 0 for no limit. [Default: 50]
Sub-Option of:
--email- --max-url INT
Report only up to INT urls found in a file. Use 0 for no limit. [Default: 50]
Sub-Option of:
--url- --license-score INTEGER
Do not return license matches with scores lower than this score. A number between 0 and 100. [Default: 0] Here, a bigger number means a better match, i.e. Setting a higher license score translates to a higher threshold (with equal or smaller number of matches).
Sub-Option of:
--license- --license-text
Include the matched text for the detected licenses in the output report.
Sub-Option of:
--licenseSub-Options:
--license-text-diagnostics
- --license-url-template TEXT
Set the template URL used for the license reference URLs.
In a template URL, curly braces ({}) are replaced by the license key. [Default: default: https://scancode-licensedb.aboutcode.org/{}]
Sub-Option of:
--license- --license-text-diagnostics
In the matched license text, include diagnostic highlights surrounding with square brackets [] words that are not matched.
Sub-Option of:
--licenseand--license-text- --license-diagnostics
In license detections, include diagnostic details to figure out the license detection post processing steps applied.
Sub-Option of:
--license- --unknown-licenses
[EXPERIMENTAL] Detect unknown licenses.
Sub-Option of:
--license
--copyright Option
The
--copyrightoption detects copyright statements in files.It adds the following resource-level attributes:
copyrights: This is a data mapping with the following attributes:copyrightcontaining the whole copyright value, withstart_lineandend_linecontaining the line numbers in the file where this copyright value was detected.
holders: This is a data mapping with the following attributes:holdercontaining the whole copyright holder value, withstart_lineandend_linecontaining the line numbers in the file where this copyright value was detected.
authors: This is a data mapping with the following attributes:authorcontaining the whole copyright author value, withstart_lineandend_linecontaining the line numbers in the file where this copyright value was detected.Example:
# # Copyright (c) 2010 Patrick McHardy All rights reserved. # Authors: Patrick McHardy <kaber@trash.net>The above lines when scanned for copyrights generates the following results for the discussed attributes:
{ "copyrights": [ { "copyright": "Copyright (c) 2010 Patrick McHardy", "start_line": 2, "end_line": 2 } ], "holders": [ { "holder": "Patrick McHardy", "start_line": 2, "end_line": 2 } ], "authors": [ { "author": "Patrick McHardy <kaber@trash.net>", "start_line": 3, "end_line": 3 } ], }
--license Option
The
--licenseoption detects various kinds of license texts, notices, tags, references and other specialized license declarations like the SPDX license identifier in files.It adds the following attributes to the file data:
license_detections: This has a mapping of license detection data with the license expression, detection log and license matches. And the license matches contain the license expression for the match, score, more details for the license detected and the rule detected, along with the match text optionally.
license_clues: This is a list of license matches, same asmatchesinlicense_detections. These are mere license clues and not perfect detections.
detected_license_expression: This is a scancode license expression string.
detected_license_expression_spdx: This is the SPDX version ofdetected_license_expression.
percentage_of_license_text: This has a percentage number which denotes what percentage of the resource scanned has legalese words.Example:
License: Apache-2.0If we run license detection (with
--license-text) on the above text we get the following result for the resource attributes added by the license detection:{ "path": "apache-2.0.txt", "type": "file", "detected_license_expression": "apache-2.0", "detected_license_expression_spdx": "Apache-2.0", "license_detections": [ { "license_expression": "apache-2.0", "matches": [ { "score": 100.0, "start_line": 1, "end_line": 1, "matched_length": 4, "match_coverage": 100.0, "matcher": "1-hash", "license_expression": "apache-2.0", "rule_identifier": "apache-2.0_65.RULE", "rule_relevance": 100, "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/apache-2.0_65.RULE", "matched_text": "License: Apache-2.0" } ], "identifier": "apache_2_0-ec759ae0-ea5a-f138-793e-388520e080c0" } ], "license_clues": [], "percentage_of_license_text": 100.0, "scan_errors": [] }We also have top level unique license detections with the same identifier referencing all occurrences of this license detection and counts:
{ "license_detections": [ { "identifier": "apache_2_0-ec759ae0-ea5a-f138-793e-388520e080c0", "license_expression": "apache-2.0", "detection_count": 1 } ] }
--package Option
The
--packageoption detects various package manifests, lockfiles and package-like data and then assembles codebase level packages and dependencies from these package data detected at files. Also tags files if they are part of the packages.It adds the following attributes to the file data:
package_data: This is a mapping of package data parsed and retrieved from the file, with the fields for the package URL, license detections, copyrights, dependencies, and the various URLs.
for_packages: This is a list of strings pointing to the packages that the files is a part of. The string is basically a packageURL with an UUID as a qualifier.It adds the following attributes to the top-level in results:
packages: This is a mapping of package data with all the atrributes present in file levelpackage_datawith the following extra attributes:package_uid,datafile_pathsanddatasource_ids.
dependencies: This is a mapping of dependency data from all the lockfiles or package manifests in the scan.Example:
The following scan result was generated from scanning a package manifest:
{ "dependencies": [ { "purl": "pkg:bower/get-size", "extracted_requirement": "~1.2.2", "scope": "dependencies", "is_runtime": true, "is_optional": false, "is_resolved": false, "resolved_package": {}, "extra_data": {}, "dependency_uid": "pkg:bower/get-size?uuid=fixed-uid-done-for-testing-5642512d1758", "for_package_uid": "pkg:bower/blue-leaf?uuid=fixed-uid-done-for-testing-5642512d1758", "datafile_path": "bower.json", "datasource_id": "bower_json" } ], "packages": [ { "type": "bower", "namespace": null, "name": "blue-leaf", "version": null, "qualifiers": {}, "subpath": null, "primary_language": null, "description": "Physics-like animations for pretty particles", "release_date": null, "parties": [ { "type": null, "role": "author", "name": "Betty Beta <bbeta@example.com>", "email": null, "url": null } ], "keywords": [ "motion", "physics", "particles" ], "homepage_url": null, "download_url": null, "size": null, "sha1": null, "md5": null, "sha256": null, "sha512": null, "bug_tracking_url": null, "code_view_url": null, "vcs_url": null, "copyright": null, "declared_license_expression": "mit", "declared_license_expression_spdx": "MIT", "license_detections": [ { "license_expression": "mit", "matches": [ { "score": 100.0, "start_line": 1, "end_line": 1, "matched_length": 1, "match_coverage": 100.0, "matcher": "1-spdx-id", "license_expression": "mit", "rule_identifier": "spdx-license-identifier: mit", "rule_url": null, "rule_relevance": 100, "matched_text": "MIT" } ], "identifier": "apache_2_0-ec759abc-ea5a-2a38-793e-312340e080c0" } ], "other_license_expression": null, "other_license_expression_spdx": null, "other_license_detections": [], "extracted_license_statement": "MIT", "notice_text": null, "source_packages": [], "extra_data": {}, "repository_homepage_url": null, "repository_download_url": null, "api_data_url": null, "package_uid": "pkg:bower/blue-leaf?uuid=fixed-uid-done-for-testing-5642512d1758", "datafile_paths": [ "bower.json" ], "datasource_ids": [ "bower_json" ], "purl": "pkg:bower/blue-leaf" } ], "files": [ { "path": "bower.json", "type": "file", "package_data": [ { "type": "bower", "namespace": null, "name": "blue-leaf", "version": null, "qualifiers": {}, "subpath": null, "primary_language": null, "description": "Physics-like animations for pretty particles", "release_date": null, "parties": [ { "type": null, "role": "author", "name": "Betty Beta <bbeta@example.com>", "email": null, "url": null } ], "keywords": [ "motion", "physics", "particles" ], "homepage_url": null, "download_url": null, "size": null, "sha1": null, "md5": null, "sha256": null, "sha512": null, "bug_tracking_url": null, "code_view_url": null, "vcs_url": null, "copyright": null, "declared_license_expression": "mit", "declared_license_expression_spdx": "MIT", "license_detections": [ { "license_expression": "mit", "matches": [ { "score": 100.0, "start_line": 1, "end_line": 1, "matched_length": 1, "match_coverage": 100.0, "matcher": "1-spdx-id", "license_expression": "mit", "rule_identifier": "spdx-license-identifier: mit", "rule_url": null, "rule_relevance": 100, "matched_text": "MIT" } ], "identifier": "apache_2_0-ec759abc-ea5a-2a38-793e-312340e080c0" } ], "other_license_expression": null, "other_license_expression_spdx": null, "other_license_detections": [], "extracted_license_statement": "MIT", "notice_text": null, "source_packages": [], "file_references": [], "extra_data": {}, "dependencies": [ { "purl": "pkg:bower/get-size", "extracted_requirement": "~1.2.2", "scope": "dependencies", "is_runtime": true, "is_optional": false, "is_resolved": false, "resolved_package": {}, "extra_data": {} } ], "repository_homepage_url": null, "repository_download_url": null, "api_data_url": null, "datasource_id": "bower_json", "purl": "pkg:bower/blue-leaf" } ], "for_packages": [ "pkg:bower/blue-leaf?uuid=fixed-uid-done-for-testing-5642512d1758" ], "scan_errors": [] } ] }
--info Option
The
--infooption obtains miscellaneous information about the file being scanned such as mime/filetype, checksums, programming language, and various boolean flags.It adds the following attributes to the file data:
date: last modified data of the file.
sha1,md5andsha256: file checksums of various algorithms.
mime_typeandfile_type: basic file type and mime type/subtype information obtained from libmagic.
programming_language: programming language based on extensions.
is_binary,is_text,is_archive,is_media,is_source, andis_script: various boolean flags with misc. information about the file.
--email Option
The
It adds the
emailsattribute to the file data with the following attributes:start_lineandend_lineto be able to locate where the email was detected in the file.
--url Option
The
--urloption detects and reports URLs present in scanned files.It adds the
urlsattribute to the file data with the following attributes:urlwith the actual URL that was present in the file,start_lineandend_lineto be able to locate where the URL was detected in the file.
--generated Option
The
--generatedoption classifies automatically generated code files with a flag.An example of using
--generatedin a scan:scancode -clpieu --json-pp output.json samples --generatedIn the results, for each file the following attribute is added with it’s corresponding
true/falsevalue"is_generated": trueClassification of a file being generated or not is done based on the first few lines having usually encountered generated keywords.
--max-email Option
Dependency
The option
--max-emailis a sub-option of and requires the optionIf in the files that are scanned, in individual files, there are a lot of emails (i.e lists) which are unnecessary and clutter the scan results,
--max-emailoption can be used to report emails only up to a limit in individual files.Some important INTEGER values of the
--max-email INTEGERoption:
0 - No limit, include all emails.
50 - Default.
An example usage:
scancode -clpieu --json-pp output.json samples --max-email 5This only reports 5 email addresses per file and ignores the rest.
--max-url Option
Dependency
The option
--max-urlis a sub-option of and requires the option--url.If in the files that are scanned, in individual files, there are a lot of links to other websites (i.e url lists) which are unnecessary and clutter the scan results,
--max-urloption can be used to report urls only up to a limit in individual files.Some important INTEGER values of the
--max-url INTEGERoption:
0 - No limit, include all urls.
50 - Default.
An example usage:
scancode -clpieu --json-pp output.json samples --max-url 10This only reports 10 urls per file and ignores the rest.
--license-score Option
Dependency
The option
--license-scoreis a sub-option of and requires the option--license.License matching strictness, i.e. How closely matched licenses are detected in a scan, can be modified by using this
--license-scoreoption.Some important INTEGER values of the
--license-score INTEGERoption:
0 - Default and Lowest Value, All matches are reported.
100 - Highest Value, Only licenses with a much better match are reported
Here, a bigger number means a better match, i.e. Setting a higher license score translates to a higher threshold for matching licenses (with equal or less number of license matches).
An example usage:
scancode -clpieu --json-pp output.json samples --license-score 70Here’s the license results on setting the integer value to 100, Vs. the default value 0. This is visualized using ScanCode workbench in the License Info Dashboard.
License scan results of Samples Directory. ![]()
License Score 0 (Default).
![]()
License Score 100.
--license-text Option
Dependency
The option
--license-textis a sub-option of and requires the option--license.Sub-Option
The option
--license-text-diagnosticsis a sub-option of--license-text.With the
--license-textoption, the scan results attribute “matched text” includes the matched text for the detected license.An example Scan:
scancode -cplieu --json-pp output.json samples --license-textAn example matched text included in the results is as follows:
"matched_text": " This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions: 1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. 2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. 3. This notice may not be removed or altered from any source distribution. Jean-loup Gailly Mark Adler jloup@gzip.org madler@alumni.caltech.edu"
The file in which this license was detected:
samples/arch/zlib.tar.gz-extract/zlib-1.2.8/zlib.hLicense name: “ZLIB License”
--license-url-template Option
Dependency
The option
--license-url-templateis a sub-option of and requires the option--license.The
--license-url-templateoption sets the template URL used for the license reference URLs.The default template URL is : [https://enterprise.dejacode.com/urn/urn:dje:license:{}] In a template URL, curly braces ({}) are replaced by the license key.
So, by default the license reference URL points to the dejacode page for that license.
A scan example using the
--license-url-template TEXToptionscancode -clpieu --json-pp output.json samples --license-url-template https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/{}.ymlIn a normal scan, reference url for “ZLIB License” is as follows:
"reference_url": "https://enterprise.dejacode.com/urn/urn:dje:license:zlib",After using the option in the following manner:
``--license-url-template https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/{}``the reference URL changes to this zlib.yml file:
"reference_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/zlib.yml",The reference URL changes for all detected licenses in the scan, across the scan result file.
--license-text-diagnostics Option
Dependency
The option
--license-text-diagnosticsis a sub-option of and requires the options--licenseand--license-text.In the matched license text, include diagnostic highlights surrounding with square brackets [] words that are not matched.
In a normal scan, whole lines of text are included in the matched license text, including parts that are possibly unmatched.
An example Scan:
scancode -cplieu --json-pp output.json samples --license-text --license-text-diagnosticsRunning a scan on the samples directory with
--license-text --license-text-diagnosticsoptions, causes the following difference in the scan result of the filesamples/JGroups/licenses/bouncycastle.txt.Without Diagnostics:
"matched_text": "License Copyright (c) 2000 - 2006 The Legion Of The Bouncy Castle (http://www.bouncycastle.org) Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restrictionWith Diagnostics on:
"matched_text": "License [Copyright] ([c]) [2000] - [2006] [The] [Legion] [Of] [The] [Bouncy] [Castle] ([http]://[www].[bouncycastle].[org]) Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction,
--license-diagnostics Option
Dependency
The option
--license-diagnosticsis a sub-option of and requires the option--licenseOn using the
--license-diagnosticsoption on a license scan there is thedetection_logattribute added to license detections with diagnostics information about the license detection post-processing steps which are used to create license detections from license matches.Consider the following text:
## License All code, unless stated otherwise, is dual-licensed under [`WTFPL`](http://www.wtfpl.net/txt/copying/) and [`MIT`](https://opensource.org/licenses/MIT).If we run a license scan with the
--license-diagnosticsoption enabled, we have the following license detection results:{ "path": "README.md", "type": "file", "detected_license_expression": "wtfpl-2.0 AND mit", "detected_license_expression_spdx": "WTFPL AND MIT", "license_detections": [ { "license_expression": "wtfpl-2.0 AND mit", "matches": [ { "score": 100.0, "start_line": 43, "end_line": 43, "matched_length": 3, "match_coverage": 100.0, "matcher": "2-aho", "license_expression": "unknown-license-reference", "rule_identifier": "lead-in_unknown_30.RULE", "rule_relevance": 100, "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/lead-in_unknown_30.RULE", "matched_text": "dual-licensed under [` }, { "score": 50.0, "start_line": 43, "end_line": 43, "matched_length": 1, "match_coverage": 100.0, "matcher": "2-aho", "license_expression": "wtfpl-2.0", "rule_identifier": "spdx_license_id_wtfpl_for_wtfpl-2.0.RULE", "rule_relevance": 50, "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/spdx_license_id_wtfpl_for_wtfpl-2.0.RULE", "matched_text": "WTFPL" }, { "score": 100.0, "start_line": 43, "end_line": 43, "matched_length": 3, "match_coverage": 100.0, "matcher": "2-aho", "license_expression": "wtfpl-2.0", "rule_identifier": "wtfpl-2.0_27.RULE", "rule_relevance": 100, "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/wtfpl-2.0_27.RULE", "matched_text": "www.wtfpl.net/" }, { "score": 100.0, "start_line": 43, "end_line": 43, "matched_length": 6, "match_coverage": 100.0, "matcher": "2-aho", "license_expression": "mit", "rule_identifier": "mit_64.RULE", "rule_relevance": 100, "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/mit_64.RULE", "matched_text": "MIT`](https://opensource.org/licenses/MIT)." } ], "detection_log": [ "unknown-intro-followed-by-match" ], "identifier": "wtfpl_2_0_and_mit-e5642b07-705c-9730-80ab-f5ed0565be28" } ], "license_clues": [], "percentage_of_license_text": 8.18, "scan_errors": [] }Here from the
"detection_log": ["unknown-intro-followed-by-match"]added diagnostics information we learn that there was an unknown intro license match, followed by proper detections, so we conclude the unknown intro to be an introduction to the following license and hence conclude the license from the license matches after the unknown detection.