Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

russianCVparser

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

russianCVparser

Parser for CV in russian language. Supported formats: pdf, txt, docx

  • 1.1
  • PyPI
  • Socket score

Maintainers
1

General information

Parser extracts as many as possible information from the CV text. It uses natasha library for entities recognition and yargy parser for rule-based parsing.

Information that can be extracted:

  • socdem:
    • name,
    • gender,
    • date_of_birth,
    • age,
    • location
  • career:
    • period,
    • org_name,
    • occupation
  • education:
    • year,
    • name,
    • specialization
  • hobby:
    • name

Installation

pip install russianCVparser

Usage

Parser supports documents in docx, pdf and txt formats.

from russianCVparser import CVparser, Document, show_json

parser = CVparser()
document = Document('path/to/doc.pdf')
data = parser.parse_text(document.text) # returns an OrderedDict instance
show_json(data)

Example

from russianCVparser import CVparser, Document, show_json

parser = CVparser()
document = Document('CV.pdf')
data = parser.parse_text(document.text)
show_json(data)

Output:

{
  "socdem": {
    "name": "Иванов Иван Иванович",
    "gender": "male",
    "date_of_birth": {
      "year": 1981,
      "month": 5,
      "day": 2
    },
    "age": "39 лет",
    "location": {
      "name": "Казань"
    }
  },
  "career": [
    {
      "period": {
        "from_date": {
          "month": 12,
          "year": 2017
        }
      },
      "org_name": "ООО "Инвест-консалт"",
      "occupation": "Ведущий специалист"
    },
    {
      "period": {
        "from_date": {
          "month": 2,
          "year": 2011
        },
        "to_date": {
          "month": 6,
          "year": 2017
        }
      },
      "org_name": "Казгорсеть",
      "occupation": "Ведущий специалист"
    },
    {
      "period": {
        "from_date": {
          "month": 2,
          "year": 2010
        },
        "to_date": {
          "month": 2,
          "year": 2011
        }
      },
      "org_name": "ООО Адванс",
      "occupation": "Аналитик"
    }
  ],
  "education": [
    {
      "year": 2015,
      "name": "Российский государственный аграрный университет"
    },
    {
      "year": 2016,
      "name": "Московский Государственный Технический Университет"
    }
  ],
  "hobby": [
    {
      "name": [
        "футбол",
        "рыбалка",
        "шахматы"
      ]
    }
  ]
}

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc