Socket
Socket
Sign inDemoInstall

delab-trees

Package Overview
Dependencies
7
Maintainers
1
Alerts
File Explorer

Install Socket

Detect and block malicious and high-risk dependencies

Install

    delab-trees

a library to analyse reply trees in forums and social media


Maintainers
1

Readme

Delab Trees

A library to analyze conversation trees.

Installation

pip install delab_trees

Get started

Example data for Reddit and Twitter are available here https://github.com/juliandehne/delab-trees/raw/main/delab_trees/data/dataset_[reddit|twitter]_no_text.pkl. The data is structure only. Ids, text, links, or other information that would break confidentiality of the academic access have been omitted.

The trees are loaded from tables like this:

tree_idpost_idparent_idauthor_idtextcreated_at
011nanjamesI am James2017-01-01 01:00:00
1121markI am Mark2017-01-01 02:00:00
2132stevenI am Steven2017-01-01 03:00:00
3141johnI am John2017-01-01 04:00:00
421nanjamesI am James2017-01-01 01:00:00
5221markI am Mark2017-01-01 02:00:00
6232stevenI am Steven2017-01-01 03:00:00
7243johnI am John2017-01-01 04:00:00

This dataset contains two conversational trees with four posts each.

Currently, you need to import conversational tables as a pandas dataframe like this:

import pandas as pd
from delab_trees import TreeManager

d = {'tree_id': [1] * 4,
     'post_id': [1, 2, 3, 4],
     'parent_id': [None, 1, 2, 1],
     'author_id': ["james", "mark", "steven", "john"],
     'text': ["I am James", "I am Mark", " I am Steven", "I am John"],
     "created_at": [pd.Timestamp('2017-01-01T01'),
                    pd.Timestamp('2017-01-01T02'),
                    pd.Timestamp('2017-01-01T03'),
                    pd.Timestamp('2017-01-01T04')]}
df = pd.DataFrame(data=d)
manager = TreeManager(df) 
# creates one tree
test_tree = manager.random()

Note that the tree structure is based on the parent_id matching another rows post_id.

You can now analyze the reply trees basic metrics:

from delab_trees.main import get_test_tree
from delab_trees.delab_tree import DelabTree

test_tree : DelabTree = get_test_tree()
assert test_tree.total_number_of_posts() == 4
assert test_tree.average_branching_factor() > 0

A summary of basic metrics can be attained by calling

from delab_trees.main import get_test_tree
from delab_trees.delab_tree import DelabTree

test_tree : DelabTree = get_test_tree()
print(test_tree.get_author_metrics())

# >>> removed [] and changed {} (merging subsequent posts of the same author)
# >>>{'james': <delab_trees.delab_author_metric.AuthorMetric object at 0x7fa9c5496110>, 'steven': <delab_trees.delab_author_metric.AuthorMetric object at 0x7fa9c5497dc0>, 'john': <delab_trees.delab_author_metric.AuthorMetric object at 0x7fa9c5497a00>, 'mark': <delab_trees.delab_author_metric.AuthorMetric object at 0x7fa9c5497bb0>}

More complex metrics that use the full dataset for training can be gotten by the manager:

import pandas as pd
from delab_trees import TreeManager

d = {'tree_id': [1] * 4,
     'post_id': [1, 2, 3, 4],
     'parent_id': [None, 1, 2, 1],
     'author_id': ["james", "mark", "steven", "john"],
     'text': ["I am James", "I am Mark", " I am Steven", "I am John"],
     "created_at": [pd.Timestamp('2017-01-01T01'),
                    pd.Timestamp('2017-01-01T02'),
                    pd.Timestamp('2017-01-01T03'),
                    pd.Timestamp('2017-01-01T04')]}
df = pd.DataFrame(data=d)
manager = TreeManager(df) # creates one tree
rb_vision_dictionary : dict["tree_id", dict["author_id", "vision_metric"]] = manager.get_rb_vision()

The following two complex metrics are implemented:

from delab_trees.main import get_test_manager

manager = get_test_manager()
rb_vision_dictionary = manager.get_rb_vision() # predict an author having seen a post
pb_vision_dictionary = manager.get_pb_vision() # predict an author to write the next post

How to cite

    @article{dehne_dtrees_23,
    author    = {Dehne, Julian},
    title     = {Delab-Trees: measuring deliberation in online conversations},        
    url = {https://github.com/juliandehne/delab-trees}     
    year      = {2023},
}

FAQs


Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc