Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Compare sentences from input document with all sentences from reference documents - find very similar ones.
This is a command-line tool for checking the similarity between a given text and a set of reference documents. The tool uses the Jaccard similarity algorithm to compare the input text with the reference documents.
Install in an isolated environment using pipx (or normal pip):
pipx install sentence-plagiarism
To run the plagiarism checker, use the following command:
sentence-plagiarism <path-to-input-file> <path-to-reference-file-1> <path-to-reference-file-2> ... [--threshold <threshold-value>] [--output_file <path-to-output-file>] [--quiet]
<path-to-input-file>
: Path to the input file to be checked for plagiarism.<path-to-reference-file-1> ...
: Paths to the reference files to compare against.--threshold
: (optional) The minimum similarity score required to consider a sentence as plagiarized. The value should be between 0 and 1.--output-file
(optional): Path to the output file to save the results in JSON format.--quiet
(optional): Flag to suppress the display of similar sentences in the console.The following command:
sentence-plagiarism input.txt --reference-files ref1.txt ref2.txt --similarity-threshold 0.8 --output-file results.json
can produce the following output on stdout:
Input Sentence: The retriever and seq2seq modules commence their operations as pretrained models, and through a joint fine-tuning process, they adapt collaboratively, thus enhancing both retrieval and generation for specific downstream tasks.
Reference Sentence: foobar The retriever and seq2seq modules commence their operations as pretrained models, and through a joint fine-tuning process, they adapt collaboratively, thus enhancing both retrieval and generation for specific downstream tasks.
Reference Document: ref1.txt
Similarity Score: 0.9667
Input Sentence: Closing thoughts For a comprehensive understanding of the RAG technique, we offer an in-depth exploration, commencing with a simplified overview and progressively delving into more intricate technical facets.
Reference Sentence: barfoo For a comprehensive understanding of the RAG technique, we offer an in-depth exploration, commencing with a simplified overview and progressively delving into more intricate technical facets.
Reference Document: ref2.txt
Similarity Score: 0.8966
Results saved to results.json
and save results to results.json
.
from sentence_plagiarism import check
check(
examined_file="txt/txt1.txt",
reference_files=["txt/txt2.txt", "txt/txt3.txt"],
similarity_threshold=0.8,
output_file=None,
quiet=False,
)
Distributed under the MIT License. See LICENSE
for more information.
Krystian Safjan - ksafjan@gmail.com
Project Link: https://github.com/izikeros/sentence-plagiarism
FAQs
Compare sentences from input document with all sentences from reference documents - find very similar ones.
We found that sentence-plagiarism demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.