df | (:class:pandas.Dataframe , optional , defaults to None) | Dataframe containing text and one-hot encoded features. |
text_column | (:obj:string , optional , defaults to "text") | Column in df containing text. |
features | (:obj:string , optional , defaults to None) | Comma-separated string of features to possibly augment data for. |
device | (:class:torch.device , optional , 'cuda' or 'cpu') | Torch device to run on cuda if available otherwise cpu. |
model | (:class:~transformers.T5ForConditionalGeneration , optional , defaults to T5ForConditionalGeneration.from_pretrained('t5-small')) | Model used for abstractive summarization. |
tokenizer | (:class:~transformers.T5Tokenizer , optional , defaults to T5Tokenizer.from_pretrained('t5-small')) | Tokenizer used for abstractive summarization. |
return_tensors | (:obj:str, optional , defaults to "pt") | Can be set to ‘tf’, ‘pt’ or ‘np’ to return respectively TensorFlow tf.constant, PyTorch torch.Tensor or Numpy :oj: np.ndarray instead of a list of python integers. |
num_beams | (:obj:int , optional , defaults to 4) | Number of beams for beam search. Must be between 1 and infinity. 1 means no beam search. Default to 1. |
no_repeat_ngram_size | (:obj:int , optional , defaults to 4 | If set to int > 0, all ngrams of size no_repeat_ngram_size can only occur once. |
min_length | (:obj:int , optional , defaults to 10) | The min length of the sequence to be generated. Between 0 and infinity. Default to 10. |
max_length | (:obj:int , optional , defaults to 50) | The max length of the sequence to be generated. Between min_length and infinity. Default to 50. |
early_stopping | (:obj:bool , optional , defaults to True) | bool if set to True beam search is stopped when at least num_beams sentences finished per batch. Defaults to False as defined in configuration_utils.PretrainedConfig. |
skip_special_tokens | (:obj:bool , optional , defaults to True) | Don't decode special tokens (self.all_special_tokens). Default: False. |
num_samples | (:obj:int , optional , defaults to 100) | Number of samples to pull from dataframe with specific feature to use in generating new sample with Abstractive Summarization. |
threshold | (:obj:int , optional , defaults to 3500) | Maximum ceiling for each feature, normally the under-sample max. |
multiproc | (:obj:bool , optional , defaults to True) | If set, stores calls to abstractive summarization in array which is then passed to run_cpu_tasks_in_parallel to allow for increasing performance through multiprocessing. |
debug | (:obj:bool , optional , defaults to True) | If set, prints generated summarizations. |