Security News
Introducing the Socket Python SDK
The initial version of the Socket Python SDK is now on PyPI, enabling developers to more easily interact with the Socket REST API in Python projects.
job-offer-classifier
Advanced tools
Classify job candidate emails
Sentiment classifier of emails from job candidates based on whether an email response expresses an interesting candidate for the job position.
The sentiment classifier can be found on PyPI so you can just run:
pip install job-offer-classifier
For an editable install, clone the GitHub repository and cd
to the cloned repo directory, then run:
pip install -e job_offer_classifier
First load and run the data science pipeline by importing the module:
from job_offer_classifier.pipeline_classifier import Pipeline
Instantiate the class Pipeline
and call the pipeline
method. This method loads the dataset, and trains and evaluates the model. The source file is the dataset of payloads annotated with 'positive' and 'negative' labels
pl = Pipeline(src_file = '../data/interim/payloads.csv',random_state=931696214)
pl.pipeline()
The parameter random_state
is the pandas seed used in the dataframe split. This parameter is necessary to present deterministic results and has been chosen from the results of the k fold validation.
To make a prediction, use the sentiment
method
pl.sentiment(''' Thank you for offering me the position of Merchandiser with Thomas Ltd.
I am thankful to accept this job offer and look ahead to starting my career with your company
on June 27, 2000.''')
'positive'
One can take an example from the test set, contained in the dfs
attribute. This attribute is a dictionary of pandas dataframes.
example = pl.dfs['test'].sample(random_state=1213702178).payload.iloc[0]
print(example.strip())
thank you for offering me the position of financial analyst at Lozano-Carlson.
i was delighted to meet
you and learn more about the company.
although i verbally agreed to accept the position, i have given it a lot of thought and decided to turn
down the post.
i believe it is in my, and your company’s, best interests.
ultimately, i elected to take on a
position at a firm where i believe my skills and experience are a better fit. i truly apologise for any
inconvenience i have caused.
i was impressed with Lozano-Carlson during the interview, and continue to be at this time.
wishing you
all the best in the future and hope to still see you in attendance at the snow terrace financial conference
in june.
pl.sentiment(example)
'negative'
We use two tools to assesss the performance of the model:
To plot the confusion matrix, the Pipeline
has the method plot_confusion_matrix
.
pl.plot_confusion_matrix('train')
pl.plot_confusion_matrix('test')
To assess the performance of the model via the k fold validation method, import the class KFoldPipe
from job_offer_classifier.validations import KFoldPipe
Run the k_fold_validation
method
kfp = KFoldPipe(src_file='../data/interim/payloads.csv',n_splits=4)
kfp.k_fold_validation()
The averaged scores are stored in averages
kfp.averages['train']
{'accuracy': 0.9954212456941605,
'accuracy_baseline': 0.7985348105430603,
'auc': 0.9987489432096481,
'auc_precision_recall': 0.9996496587991714,
'average_loss': 0.02481173211708665,
'label/mean': 0.7985348105430603,
'loss': 0.03453406784683466,
'precision': 0.9954595416784286,
'prediction/mean': 0.7989358454942703,
'recall': 0.9988532066345215,
'global_step': 12500.0,
'f1_score': 0.9971447710408015}
kfp.averages['test']
{'accuracy': 0.980555534362793,
'accuracy_baseline': 0.800000011920929,
'auc': 0.995563268661499,
'auc_precision_recall': 0.9989252239465714,
'average_loss': 0.060208675917238,
'label/mean': 0.800000011920929,
'loss': 0.060208675917238,
'precision': 0.986666664481163,
'prediction/mean': 0.8020820915699005,
'recall': 0.9895833283662796,
'global_step': 12500.0,
'f1_score': 0.9880000766313914}
The seed of the best F1 score is stored in best_seed
kfp.best_seed
427851256
The library supports multiple classes in labels. The following instruction uploads the multiclass classifier
from job_offer_classifier.multiclass import Multiclass
The sibatel_web_intekglobal_payloads.csv file contains three type of sentiments: 'positive', 'negative' and 'neutral'. Instantiate the Multiclass
by specifying the number of classes
mc = Multiclass(
src_file='../data/raw/sibatel_web_intekglobal_payloads.csv',
random_state=931696214,
n_classes=3
)
mc.pipeline()
mc.plot_confusion_matrix('train')
mc.plot_confusion_matrix('test')
To further inquire on the training parameters and how to store and load the trained models, please refer to the pipeline docs and multiclass docs. The validation method can be found in the validations docs
https://www.tensorflow.org/hub/tutorials/text_classification_with_tf_hub
FAQs
Classification of Job Offer Responses
We found that job-offer-classifier demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The initial version of the Socket Python SDK is now on PyPI, enabling developers to more easily interact with the Socket REST API in Python projects.
Security News
Floating dependency ranges in npm can introduce instability and security risks into your project by allowing unverified or incompatible versions to be installed automatically, leading to unpredictable behavior and potential conflicts.
Security News
A new Rust RFC proposes "Trusted Publishing" for Crates.io, introducing short-lived access tokens via OIDC to improve security and reduce risks associated with long-lived API tokens.