Socket
Socket
Sign inDemoInstall

de.unistuttgart.ims:de.unistuttgart.ims.drama

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

de.unistuttgart.ims:de.unistuttgart.ims.drama

Several component for processing dramatic texts (theatre plays) with Apache UIMA.


Version published
Maintainers
1
Source

release Build Status DOI license

DramaNLP

This repository contains a number of UIMA components to process dramatic texts, as well as an executable pipeline. We follow general design ideas implemented in DKPro Core. The full pipeline reads in files in several TEI/XML dialects (see below), and applies the most important NLP tools on them, while keeping the structural annotation of the plays intact (and, if necessary, processing different text layers separately).

Compiling from source

  1. Clone the repository: git clone https://github.com/quadrama/DramaNLP.git
  2. Enter the directory: cd DramaNLP
    • If necessary, switch to a branch git checkout develop/1.0
  3. Download dependencies, compile everything and install it locally: mvn compile install This produces a lot of output, but at the end, you should see something like BUILD SUCCESS
  4. To compile a runnable binary, enter the directory: cd de.unistuttgart.ims.drama.main and run mvn package. This creates a file called drama.Main.jar in the directory target/assembly/. This file contains the code and all its dependencies.

Running entire pipeline

As an example, we'll work on the data from the GerDraCor collection (which is based on TextGrid). Download the files from GitHub and store the XML files in a directory. We will call the directory $TEIDIR in the following examples. The directory $OUTDIR is used to store the output of the pipeline. You'll need the file drama.Main.jar.

Enter the following command in the command line interface: java -cp target/assembly/drama.Main.jar de.unistuttgart.ims.drama.main.TEI2XMI --input $TEIDIR --output $OUTDIR/xmi --csvOutput $OUTDIR/csv --conllOutput $OUTDIR/conll --skipSpeakerIdentifier --corpus GERDRACOR --collectionId "gdc" --doCleanup

After running, the directory $OUTDIR contains three sub directories, xmi, csv and conll, which are different file formats for the plays.

TEI/XML dialects

This package supports the following drama corpora

  • TextGrid (German)
  • GerDraCor (German)
  • theatre classique (French)

FAQs

Package last updated on 26 Mar 2018

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc