Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

prosodic

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

prosodic

Prosodic 2: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.

  • 2.1.2
  • PyPI
  • Socket score

Maintainers
1

Prosodic

codecov

Prosodic is a metrical-phonological parser written in Python. Currently, it can parse English and Finnish text, but adding additional languages is easy with a pronunciation dictionary or a custom python function. Prosodic was built by Ryan Heuser, Josh Falk, and Arto Anttila. Josh also maintains another repository, in which he has rewritten the part of this project that does phonetic transcription for English and Finnish. Sam Bowman has contributed to the codebase as well, adding several new metrical constraints.

This version, Prosodic 2.x, is a near-total rewrite of the original Prosodic.

Supports Python>=3.9.

Demo

You can view and use a web app demo of the current Prosodic app at prosodic.dev.

Install

1. Install python package

Install from pypi:

pip install prosodic

2. Install espeak

Install espeak, free text-to-speak (TTS) software, to ‘sound out’ unknown words.

  • Mac: brew install espeak. (First install homebrew if not already installed.)

  • Linux: apt-get install espeak libespeak1 libespeak-dev

  • Windows: Download and install from github

Usage

Web app

Prosodic has a new GUI (graphical user interface) in a web app. After installing, run:

prosodic web

Then navigate to http://127.0.0.1:8181/. It should look like this:

prosodic-web-preview-3

Python

Read texts
# import prosodic
import prosodic

# load a text
sonnet = prosodic.Text("""
Those hours, that with gentle work did frame
The lovely gaze where every eye doth dwell,
Will play the tyrants to the very same
And that unfair which fairly doth excel;
For never-resting time leads summer on
To hideous winter, and confounds him there;
Sap checked with frost, and lusty leaves quite gone,
Beauty o’er-snowed and bareness every where:
Then were not summer’s distillation left,
A liquid prisoner pent in walls of glass,
Beauty’s effect with beauty were bereft,
Nor it, nor no remembrance what it was:
But flowers distill’d, though they with winter meet,
Leese but their show; their substance still lives sweet.
""")

# can also load by filename
shaksonnets = prosodic.Text(fn='corpora/corppoetry_en/en.shakespeare.txt')
Stanzas, lines, words, syllables, phonemes

Texts in prosodic are organized into a tree structure. The .children of a Text object is a list of Stanza's, whose .parent objects point back to the Text. In turn, in each stanza's .children is a list of Line's, whose .parent's point back to the stanza; so on down the tree.

# Take a peek at this tree structure 
# and the features particular entities have
sonnet.show(maxlines=30, incl_phons=True)
Text()
|   Stanza(num=1)
|       Line(num=1, txt='Those hours, that with gentle work did frame')
|           WordToken(num=1, txt='Those', sent_num=1, sentpart_num=1)
|               WordType(num=1, txt='Those', lang='en', num_forms=1)
|                   WordForm(num=1, txt='Those')
|                       Syllable(ipa='ðoʊz', num=1, txt='Those', is_stressed=False, is_heavy=True)
|                           Phoneme(num=1, txt='ð', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=3, txt='o', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=-1, back=1, round=1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|           WordToken(num=2, txt=' hours', sent_num=1, sentpart_num=1)
|               WordType(num=1, txt='hours', lang='en', num_forms=2)
|                   WordForm(num=1, txt='hours')
|                       Syllable(ipa="'aʊ", num=1, txt='ho', is_stressed=True, is_heavy=True, is_strong=True, is_weak=False)
|                           Phoneme(num=2, txt='a', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=1, back=-1, round=-1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)
|                       Syllable(ipa='ɛːz', num=2, txt='urs', is_stressed=False, is_heavy=True, is_strong=False, is_weak=True)
|                           Phoneme(num=2, txt='ɛː', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=-1, long=1, hitone=0, hireg=0)
|                           Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|                   WordForm(num=2, txt='hours')
|                       Syllable(ipa="'aʊrz", num=1, txt='hours', is_stressed=True, is_heavy=True)
|                           Phoneme(num=2, txt='a', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=1, back=-1, round=-1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=4, txt='r', syl=-1, son=1, cons=1, cont=1, delrel=0, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=0, lo=0, back=0, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|           WordToken(num=3, txt=',', sent_num=1, sentpart_num=1)
|               WordType(num=1, txt=',', lang='en', num_forms=0, is_punc=True)
|           WordToken(num=4, txt=' that', sent_num=1, sentpart_num=1)
|               WordType(num=1, txt='that', lang='en', num_forms=3)
# take a peek at it in dataframe form
sonnet.df   # by-syllable dataframe representation
sonnet      # ...which will also be shown when text object displayed (in a notebook)
word_num_formssyll_is_stressedsyll_is_heavysyll_is_strongsyll_is_weakword_is_punc
stanza_numline_numline_txtsent_numsentpart_numwordtoken_numwordtoken_txtword_langwordform_numsyll_numsyll_txtsyll_ipa
11Those hours, that with gentle work did frame111Thoseen11Thoseðoʊz101
2hoursen11ho'aʊ21110
2ursɛːz20101
21hours'aʊrz211
3,en0001
...................................................
14Leese but their show; their substance still lives sweet.117substanceen12tancestəns10101
8stillen11still'stɪl111
9livesen11lives'lɪvz111
10sweeten11sweet'swiːt111
11.en0001

195 rows × 6 columns

# you can loop over this directly if you want
for stanza in shaksonnets.stanzas:
    for line in sonnet:
        for wordtoken in line:
            for wordtype in wordtoken:
                for wordform in wordtype:
                    for syllable in wordform:
                        for phoneme in syllable:
                            # ...
                            pass
# or directly access components
print(f'''
Shakespeare's sonnets have:
  * {len(shaksonnets.stanzas):,} "stanzas"        (in this text, each one a sonnet)
  * {len(shaksonnets.lines):,} lines
  * {len(shaksonnets.wordtokens):,} wordtokens    (including punctuation)
  * {len(shaksonnets.wordtypes):,} wordtypes     (each token has one wordtype object)
  * {len(shaksonnets.wordforms):,} wordforms     (a word + IPA pronunciation; no punctuation)
  * {len(shaksonnets.syllables):,} syllables
  * {len(shaksonnets.phonemes):,} phonemes
''')
Shakespeare's sonnets have:
  * 154 "stanzas"        (in this text, each one a sonnet)
  * 2,155 lines
  * 20,317 wordtokens    (including punctuation)
  * 20,317 wordtypes     (each token has one wordtype object)
  * 17,601 wordforms     (a word + IPA pronunciation; no punctuation)
  * 21,915 syllables
  * 63,614 phonemes
# access lines

# text.line{num} will return text.lines[num-1]
assert sonnet.line1 is sonnet.lines[0]
assert sonnet.line10 is sonnet.lines[9]

# show the line
sonnet.line1
word_num_formssyll_is_stressedsyll_is_heavysyll_is_strongsyll_is_weakword_is_punc
line_numline_txtsent_numsentpart_numwordtoken_numwordtoken_txtword_langwordform_numsyll_numsyll_txtsyll_ipa
1Those hours, that with gentle work did frame111Thoseen11Thoseðoʊz101
2hoursen11ho'aʊ21110
2ursɛːz20101
21hours'aʊrz211
3,en0001
.......................................
6gentleen12tletəl10101
7worken11work'wɛːk111
8diden11diddɪd201
21did'dɪd211
9frameen11frame'freɪm111

15 rows × 6 columns

# build lines directly
line_from_richardIII = prosodic.Line('A horse, a horse, my kingdom for a horse!')
line_from_richardIII
tokenizing @ 2023-12-15 14:14:17,991
⎿ 0 seconds @ 2023-12-15 14:14:17,992
word_num_formssyll_is_stressedsyll_is_heavyword_is_puncsyll_is_strongsyll_is_weak
line_txtsent_numsentpart_numwordtoken_numwordtoken_txtword_langwordform_numsyll_numsyll_txtsyll_ipa
A horse, a horse, my kingdom for a horse!111Aen11A101
2horseen11horse'hɔːrs111
3,en0001
4aen11a101
5horseen11horse'hɔːrs111
.......................................
8kingdomen12domdəm10101
9foren11forfɔːr101
10aen11a101
11horseen11horse'hɔːrs111
12!en0001

13 rows × 6 columns

Metrical parsing
Parsing lines
# parse with default options by just reaching for best parse
plausible_parses = line_from_richardIII.parse()
plausible_parses
parse_scoreparse_is_boundedmeterpos_num_slots*w_peak*w_stress*s_unstress*unres_across*unres_within
line_txtparse_rankparse_txtparse_meterparse_stress
A horse, a horse, my kingdom for a horse!1a HORSE a HORSE my KING dom FOR a HORSE-+-+-+-+-+-+-+-+---+1.00.01000100
# see best parse
line_from_richardIII.best_parse
A horse a horse my kingdom for a horse
⎿ Parse(rank=1, meter='-+-+-+-+-+', stress='-+-+-+---+', score=1, is_bounded=0)
# parse with different options
diff_parses = line_from_richardIII.parse(constraints=('w_peak','s_unstress'))
diff_parses
parse_scoreparse_is_boundedmeterpos_num_slots*w_peak*s_unstress
line_txtparse_rankparse_txtparse_meterparse_stress
A horse, a horse, my kingdom for a horse!1a HORSE a HORSE my KING dom FOR a HORSE-+-+-+-+-+-+-+-+---+1.00.01001
2a HORSE a HORSE my KING dom FOR a.horse-+-+-+-+---+-+-+---+1.00.01201
3a HORSE a HORSE my KING dom.for A horse-+-+-+--+--+-+-+---+1.00.01201
4a HORSE a HORSE my KING dom.for A.HORSE-+-+-+--++-+-+-+---+1.00.01401
5a HORSE a HORSE my KING.DOM for.a HORSE-+-+-++--+-+-+-+---+1.00.01401
6a HORSE a HORSE my KING dom FOR.A horse-+-+-+-++--+-+-+---+2.00.01202
Parsing texts
# small texts
sonnet.parse()
parsing 14 lines [5x] @ 2023-12-15 14:17:43,563
│ stanza 01, line 14: LEESE but.their SHOW their SUBS tance STILL lives SWEET: 100%|██████████| 14/14 [00:00<00:00, 45.78it/s]
⎿ 0.3 seconds @ 2023-12-15 14:17:43,873
parse_scoreparse_is_boundedmeterpos_num_slots*w_peak*w_stress*s_unstress*unres_across*unres_within
stanza_numline_numline_txtparse_rankparse_txtparse_meterparse_stress
11Those hours, that with gentle work did frame1those HO urs THAT with GEN tle WORK did FRAME-+-+-+-+-+-+-+-+-+-+0.00.01000000
2those HOURS that.with GEN tle WORK did FRAME-+--+-+-+-+--+-+-+0.00.01100000
3those HOURS that.with GEN tle WORK did FRAME-+--+-+-+-+--+-+-+0.00.01100000
2The lovely gaze where every eye doth dwell,1the LO vely GAZE where E very EYE doth DWELL-+-+-+-+-+-+-+-+-+-+0.00.01000000
2the LO vely GAZE where E ve.ry EYE doth DWELL-+-+-+--+-+-+-+-+--+-+1.00.01300001
..........................................
13But flowers distill'd, though they with winter meet,1but FLO wers DIS.TILL'D though THEY with WIN ter MEET-+-++-+-+-+-+--+-+-+-+2.00.01300101
2but FLO wers.dis TILL'D though THEY with WIN ter MEET-+--+-+-+-+-+--+-+-+-+2.00.01300020
3but FLO.WERS dis TILL'D though THEY with WIN ter MEET-++-+-+-+-+-+--+-+-+-+2.00.01300101
4but FLO wers DIS till'd THOUGH they.with WIN ter MEET-+-+-+--+-+-+--+---+-+4.00.01311200
14Leese but their show; their substance still lives sweet.1LEESE but.their SHOW their SUBS tance STILL lives SWEET+--+-+-+-++--+-+-+++1.00.01201000

37 rows × 8 columns

# and big texts
shaksonnets.parse()
parsing 2155 lines [5x] @ 2023-12-15 14:17:52,124
│ stanza 154, line 14: love's FI re HEATS.WA ter WA ter COOLS not LOVE       : 100%|██████████| 2155/2155 [00:56<00:00, 38.03it/s]
⎿ 57.4 seconds @ 2023-12-15 14:18:49,496
parse_scoreparse_is_boundedmeterpos_num_slots*w_peak*w_stress*s_unstress*unres_across*unres_within
stanza_numline_numline_txtparse_rankparse_txtparse_meterparse_stress
11FROM fairest creatures we desire increase,1from FAI rest CREA tures WE de SIRE in CREASE-+-+-+-+-+-+-+-+-+-+0.00.01000000
2from FAI rest CREA tures WE de SI re IN crease-+-+-+-+-+--+-+-+-+-++1.00.01101000
3from FAI rest CREA tures WE de SI re IN.CREASE-+-+-+-+-++-+-+-+-+-++1.00.01300001
4from FAI rest CREA tures WE de SI re.in CREASE-+-+-+-+--+-+-+-+-+--+2.00.01300020
2That thereby beauty's rose might never die,1that THE reby BEA uty's ROSE might NE ver DIE-+-+-+-+-+-+++-+-+-+1.00.01001000
.............................................
15414Love's fire heats water, water cools not love.2love's FI re HEATS wa.ter WA ter COOLS not LOVE-+-+--+-+-+++-++-+-+-+4.00.01312001
3love's FI.RE heats WA ter WA ter COOLS not LOVE-++-+-+-+-+++-++-+-+-+4.00.01302101
4LOVE'S fire HEATS wa.ter WA ter COOLS not LOVE+-+--+-+-+++++-+-+-+4.00.01212001
5LOVE'S.FI re HEATS.WA ter WA ter COOLS not LOVE++-++-+-+-+++-++-+-+-+4.00.01500040
6love's FI re HEATS wa TER wa TER cools NOT love-+-+-+-+-+-++-++-+-+++9.00.01125200

7277 rows × 8 columns

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc