tcf2nif

tcf2nif
is a NLP data converter from the TCF format (used by WebLicht) to the RDF-based NIF format. At the moment, it has a limited functionality.
License
tcf2nif is released under the GNU Lesser General Public License (version 3). See LICENSE.txt
for details.
Version History
0.2.1 - tiny correction for making CI succeed again (30 Oct 2015)
- support for a new test output directory was added, now CI succeeds again
0.2.0 - First working exe scripts (30 Oct 2015)
0.1.0 - Initial prototype (22 Sep 2015)
- This is the proof of concept version with minimal functionality
Installation
Add this line to your application's Gemfile:
gem 'tcf2nif'
And then execute:
$ bundle
Or install it yourself as:
$ gem install tcf2nif
Usage
See the Jupyter notebook for a hands-on experience and code examples.
This gem provides a proof of concept for reading TCF data (the file format used by the WebLicht services).
Reading TCF documents
In a nutshell, an instance of a TcfDocument can be obtained from any IO
object that contains a representation of a TCF XML stream:
@tcf_document = Tcf2Nif::TcfDocument.new(@io)
where @io
is a Ruby IO object (a file, a stream, etc.). This TCF document can then be queried in different ways:
puts @tcf_document.text
@tcf_document.tokens.each do |token|
puts token.form
puts token.pos?
puts token.lemma?
puts token.pos
puts token.lemma
end
@tcf_document.geo_annotations
@tcf_document.dependency_map
Conversion to NIF
The Tcf2Nif::Transformer
class provides instances that can convert from a TCF document to an RDF graph containing NIF.
@transformer = Tcf2Nif::Transformer.new(@tcf_doc, {})
graph = @trans.transform()
puts graph.size
RDF::Writer.open('path/to/outputfile'), :format => :ntriples) do |writer|
writer << RDF::Repository.new do |repo|
repo << graph
end
end
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run bin/console
for an interactive prompt that will allow you to experiment. Run bundle exec tcf2nif
to use the code located in this directory, ignoring other installed copies of this gem.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
to create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Contributing
- Fork it ( https://github.com/[my-github-username]/tcf2nif/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request