![Oracle Drags Its Feet in the JavaScript Trademark Dispute](https://cdn.sanity.io/images/cgdhsj6q/production/919c3b22c24f93884c548d60cbb338e819ff2435-1024x1024.webp?w=400&fit=max&auto=format)
Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
A simple but fast python script that reads the XML dump of a wiki and output the processed data in a CSV file.
A simple but fast python script that reads the XML dump of a wiki and output the processed data in a CSV file.
All revisions history of a mediawiki wiki can be backed up as an XML file, known as a XML dump. <https://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki#Backup_the_content_of_the_wiki_(XML_dump)>
__
This file is a record of all the edits made in a wiki with all the
corresponding data regarding date, page, author and the full content
within the edit.
Very often we just want the metadata for the edit regarding date, author and page; and therefore, we do not need the content of the edit, which by far the longest piece of data.
This script converts this very long XML dump in csv files much smaller and easiest to read and work with. It takes care of
Install the package using pip:
pip install wiki_dump_parser
Then, use it directly from command line:
python -m wiki_dump_parser <dump.xml>
Or from python code:
.. code:: python
import wiki_dump_parser as parser
parser.xml_to_csv('dump.xml')
The output csv files should be loaded using '|' as an escape character for quoting string. An example to load the output file "dump.csv" generated by this script using pandas would be:
.. code:: python
df = pd.read_csv('dump.csv', quotechar='|', index_col = False)
df['timestamp'] = pd.to_datetime(df['timestamp'],format='%Y-%m-%dT%H:%M:%SZ')
Yes, nothing more.
There are several ways to get the wiki dump:
instructions in the mediawiki docs <https://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki#Backup_the_content_of_the_wiki_(XML_dump)>
__.many other domains <https://github.com/Grasia/wiki-scripts/tree/master/wiki_dump_downloader#domains-tested>
__,
you can use our in-house developed script made to accomplish this
task. It is straightforward to use and very fast on it.Select your target wiki from the list <https://dumps.wikimedia.org/backup-index-bydb.html>
__ and
download the complete edit history dump and uncompress it.in their wiki <https://github.com/WikiTeam/wikiteam/wiki/Tutorial#I_have_no_shell_access_to_server>
__.
Its usage is very straightforward and the script is well maintained.
Remember to use the --xml option to download the full history dump.FAQs
A simple but fast python script that reads the XML dump of a wiki and output the processed data in a CSV file.
We found that wiki-dump-parser demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.