![Oracle Drags Its Feet in the JavaScript Trademark Dispute](https://cdn.sanity.io/images/cgdhsj6q/production/919c3b22c24f93884c548d60cbb338e819ff2435-1024x1024.webp?w=400&fit=max&auto=format)
Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Wide-Analysis is a suite of tools for analyzing deletion discussions from MediaWiki Platforms. It is designed to help researchers and practitioners to understand the dynamics of deletion discussions, and to develop tools for supporting the decision-making process in Wikipedia. The suite includes a set of tools for collecting, processing, and analyzing deletion discussions. The package contains the following functionalities
Following gives an overview of the suite:
You can install the package from PyPI using the following command:
pip install wide-analysis
After the installation, you can import the package and start using the functionalities.
The dataset creation funtionalities will return a dataframe. The data collection command contains the following parameters:
wide_2023
or wiki_stance (_stance or _policy)
. If selected wide_2023
as mode parameter, then the data will be collected from the existing Wide-analysis dataset available in huggingface. The function will return a HuggingFace dataset. If wiki_stance
is selected, it will return the English dataset of the 'Wiki-stance' dataset (see Kaffee et al., 2023).We show all the following examples in the context of Wikipedia deletion discussions.
Creation of dataset can be done in four ways:
from wide_analysis import collect
data = collect(mode = 'existing',
start_date=None,
end_date=None,
url=None,
title=None,
output_path=None,
platform='wikipedia',
lang='en',
dataset_name='wide_2023')
will return the existing dataset available in huggingface.
Datset loaded successfully as huggingfaece dataset
The dataset has the following columns: {'train': ['text', 'label'], 'validation': ['text', 'label'], 'test': ['text', 'label']}
from wide_analysis import collect
data = collect(mode = 'title',
start_date='YYYY-MM-DD',
end_date=None,
url='URL for the title',
title='article title',
output_path='save_path' or None)
Example: To collect the deletion discussions for the article 'Raisul Islam Ador' for the date '2024-07-18', the following command can be used:
from wide_analysis import collect
data = collect(mode = 'title',
start_date='2024-07-18',
end_date=None,
url='https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Log/2024_July_15#Raisul_Islam_Ador',
title='Raisul Islam Ador',
output_path= None)
This will return a dataframe with the data for the title 'Raisul Islam Ador' for the date '2024-07-18'. If the output_path is provided, the dataframe will be saved as a csv file in the provided path. The output looks like the following:
Date | Title | URL | Discussion | Label | Confirmation |
---|---|---|---|---|---|
2024-07-18 | Raisul Islam Ador | URL to article text | Deletion discussion here | speedy delete | Please do not modify it. |
from wide_analysis importcollect
data = collect(mode = 'date_range',
start_date='YYYY-MM-DD',
end_date='YYYY-MM-DD',
url=None,
title=None,
output_path='save_path' or None)
Example: To collect the deletion discussions for the articles within the date range '2024-07-18' and '2024-07-20', the following command can be used:
from wide_analysis import collect
data = collect(mode = 'date_range',
start_date='2024-07-18',
end_date='2024-07-20',
url=None,
title=None,
output_path= None)
This will return a dataframe with the data for the articles within the date range '2024-07-18' and '2024-07-20'. The output looks like the same format as the article level data collection, just with more rows for each date within the date range.
from wide_analysis import collect
data = collect(mode = 'date',
start_date='YYYY-MM-DD',
end_date=None,
url=None,
title=None,
output_path= None)
Example: To collect the deletion discussions for the articles within the date '2024-07-18', the following command can be used:
from wide_analysis import collect
data = collect(mode = 'date',
start_date='2024-07-18',
end_date=None,
url=None,
title=None,
output_path= None)
This will return a dataframe with the data for the articles within the date '2024-07-18'. The output looks like the same format as the article level data collection, just with more rows for each article within the date.
We train a set of models and leverage some pretrained task based models from huggingface for the following tasks: Outcome Prediction, Stance Detection, Policy Prediction, Sentiment Prediction, and Offensive Language Detection. The functionalities will return a dictionary, with the predictions for each task and their individual probablity score. The model based functionalities contain the following parameters:
It is worth noting that the model based functionalities are only available for the article level data collection. We also provide an explanation feature for outcome prediction task, which will return the explanation of the prediction made by the model using Openai GPT4 model of user's chouce with default GPT 4o-mini model. You will need your own API key for this feature to work.
Apart from the input parameters, the outcome prediction function also contains the following parameters:
from wide_analysis import analyze
predictions = analyze(inp='URL/text of the article',
mode='url or text',
task='outcome',
openai_access_token=None,
explanation=False,
platform = 'wikipedia',
lang='en',
explainer_model='gpt4o-mini',
model ='')
Example: To predict the outcome of the deletion discussion for the article 'Raisul Islam Ador' using discussion url, the following command can be used:
from wide_analysis import analyze
predictions = analyze(inp='https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Log/2024_July_15#Raisul_Islam_Ador',
mode= 'url',
task='outcome',
openai_access_token=None,
explanation=False,
platform = 'wikipedia',
lang='en',
explainer_model='gpt4o-mini',
model ='')
OR if using text:
from wide_analysis import analyze
text_input = 'Raisul Islam Ador: None establish his Wikipedia:Notability. The first reference is almost identical in wording to his official web site.CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11.' #sample input text
predictions = analyze(inp=text_input,
mode= 'text',
task='outcome',
openai_access_token=None,
explanation=False,
platform = 'wikipedia',
lang='en',
explainer_model='gpt4o-mini',
model ='')
Both of which will return the following output:
{'prediction': 'speedy delete', 'probability': 0.99}
To predict the outcome of the deletion discussion for the article 'Raisul Islam Ador' with explanation, the following command can be used:
from wide_analysis import analyze
predictions = analyze(inp='https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Log/2024_July_15#Raisul_Islam_Ador',
mode='url',
task='outcome',
openai_access_token='<OPENAI KEY>',
explanation=True,
platform = 'wikipedia',
lang='en',
explainer_model='gpt4o-mini',
model ='')
Returns:
{'prediction': 'speedy delete',
'probability': 0.99,
'explanation': 'The article does not establish the notability of the subject. The references are not reliable and the article is not well written. '}
from wide_analysis import analyze
predictions = analyze(inp='URL/text of the article',
mode='url or text',
task='stance',
platform = 'platform name',
lang='en/es/gr',
model ='model name')
Example: To predict the stance of the participants in the deletion discussion for the article 'Raisul Islam Ador', the following command can be used:
from wide_analysis import analyze
predictions = analyze(inp='https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Log/2024_July_15#Raisul_Islam_Ador'
mode = 'url',
task='stance',
platform = 'wikipedia',
lang='en',
model ='')
OR if using text:
from wide_analysis import analyze
text_input = 'Raisul Islam Ador: None establish his Wikipedia:Notability. The first reference is almost identical in wording to his official web site.CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11.' #sample input text
predictions = analyze(inp=text_input, mode= 'text', task='stance')
Both of which will return the following output:
[{'sentence': 'None establish his Wikipedia:Notability . ', 'stance': 'delete', 'score': 0.9950249791145325},
{'sentence': 'The first reference is almost identical in wording to his official web site. ', 'stance': 'delete', 'score': 0.7702090740203857},
{'sentence': 'CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11. ', 'stance': 'delete', 'score': 0.9993199110031128}]
from wide_analysis import analyze
predictions = analyze(inp='URL/text of the article',
mode='url or text',
task='policy',
platform = 'platform name',
lang='en/es/gr',
model ='model name')
Example: To predict the policy that is most relevant to the comments of the participants in the deletion discussion for the article 'Raisul Islam Ador', the following command can be used:
from wide_analysis import analyze
predictions = analyze(inp='https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Log/2024_July_15#Raisul_Islam_Ador'
mode = 'url',
task='policy',
platform = 'wikipedia',
lang='en',
model ='')
OR if using text:
from wide_analysis import analyze
text_input = 'Raisul Islam Ador: None establish his Wikipedia:Notability. The first reference is almost identical in wording to his official web site.CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11.' #sample input text
predictions = analyze(inp=text_input, mode= 'text', task='policy')
Both of which will return the following output:
[{'sentence': 'None establish his Wikipedia:Notability . ', 'policy': 'Wikipedia:Notability', 'score': 0.8100407719612122},
{'sentence': 'The first reference is almost identical in wording to his official web site. ', 'policy': 'Wikipedia:Notability', 'score': 0.6429345607757568},
{'sentence': 'CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11. ', 'policy': 'Wikipedia:Criteria for speedy deletion', 'score': 0.9400111436843872}]
from wide_analysis import analyze
predictions = analyze(inp='URL/text of the article',
mode='url or text',
task='sentiment')
Example: To predict the sentiment of the participants in the deletion discussion for the article 'Raisul Islam Ador' with url, the following command can be used:
from wide_analysis import analyze
predictions = analyze(inp='https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Log/2024_July_15#Raisul_Islam_Ador',mode='url' task='sentiment')
OR if using text:
from wide_analysis import analyze
text_input = 'Raisul Islam Ador: None establish his Wikipedia:Notability. The first reference is almost identical in wording to his official web site.CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11.' #sample input text
predictions = analyze(inp=text_input, mode= 'text', task='sentiment')
Both of which will return the following output:
[{'sentence': 'None establish his Wikipedia:Notability . ', 'sentiment': 'negative', 'score': 0.515991747379303},
{'sentence': 'The first reference is almost identical in wording to his official web site. ', 'sentiment': 'neutral', 'score': 0.9082792401313782},
{'sentence': 'CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11. ', 'sentiment': 'neutral', 'score': 0.8958092927932739}, ]
from wide_analysis import analyze
predictions = analyze(inp='URL/text of the article',mode='url or text', task='offensive')
Example: To detect offensive language in the comments of the participants in the deletion discussion for the article 'Raisul Islam Ador', the following command can be used:
from wide_analysis import analyze
predictions = analyze(inp='https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Log/2024_July_15#Raisul_Islam_Ador',mode='url', task='offensive')
OR if using text:
from wide_analysis import analyze
text_input = 'Raisul Islam Ador: None establish his Wikipedia:Notability. The first reference is almost identical in wording to his official web site.CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11.' #sample input text
predictions = analyze(inp=text_input, mode= 'text', task='offensive')
Both of which will return the following output:
[{'sentence': 'None establish his Wikipedia:Notability . ', 'offensive_label': 'non-offensive', 'score': 0.8752073645591736},
{'sentence': 'The first reference is almost identical in wording to his official web site. ', 'offensive_label': 'non-offensive', 'score': 0.9004920721054077},
{'sentence': 'CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11. ', 'offensive_label': 'non-offensive', 'score': 0.9054554104804993}]
FAQs
A package for analyzing deletion discussions of Wiki platforms.
We found that wide-analysis demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.