Natural language processing support for Pandas DataFrames.
![Documentation Status](https://readthedocs.org/projects/text-extensions-for-pandas/badge/?version=latest)
Text Extensions for Pandas adds extension types to Pandas DataFrames for representing natural
language data, plus a library of functions for working with these extension
types.
Features
SpanArray: A Pandas extension type for spans of text
- Connect features with regions of a document
- Visualize the internal data of your NLP application
- Analyze the accuracy of your models
- Combine the results of multiple models
TensorArray: A Pandas extension type for tensors
- Represent BERT embeddings in a Pandas series
- Store logits and other feature vectors in a Pandas series
- Store an entire time series in each cell of a Pandas series
Pandas front-ends for popular NLP toolkits
Documentation
For examples of how to use the library, take a look at the notebooks in
this directory.
API documentation can be found at https://text-extensions-for-pandas.readthedocs.io/en/latest/
Source Code
The source code for Text Extensions for Pandas is available at https://github.com/CODAIT/text-extensions-for-pandas.
We welcome code and documentation contributions! See the README file
for more information on contributing.