New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

korean-news-crawler

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

korean-news-crawler

Python Library for Crawling Top 10 Korean News and Providing Synonym Dictionary

  • 1.0.5
  • PyPI
  • Socket score

Maintainers
1

Korean_News_Crawler

한국 10대 일간지 크롤링 및 유사어 사전 제공 Python 라이브러리입니다. 아직 PyPI에 정식등록되진 않은 beta 버전입니다.
Open Source Project로 기여자, 참여자 상시 모집하고 있습니다. 연락주시면 감사하겠습니다.

This is Python library for crawling articles from Korean Top 10 Newspaper sites and providing synonym dictionary.
The copyright of articles are belong to original media company. We don't take any legal responsibility using of them. We assume that you have agreed to this.
We're greeting to join you as contibutors, collaborator. Thanks to give me contact.

Supported News Sites

Contibutors

Indigo_Coder
Indigo_Coder

Installation

pip install korean_news_crawler

BeautifulSoup, Selenium, Requests are required.

Quick Usage

from korean_news_crawler import chosun

chosun = Chosun()
print(chosun.dynamic_crawl("https://www.chosun.com/..."))

chosun_url_list = list() #Chosun Ilbo url list
print(chosun.dynamic_crawl(chosun_url_list))

API

  1. Chosun()
  2. Donga()
  3. Hankook()
  4. Hankyoreh()
  5. Joongang()
  6. Kukmin()
  7. Kyunghyang()
  8. Munhwa()
  9. Naeil()
  10. Segye()
  11. Seoul()

korean_news_crawler.Chosun(delay_time=None, saving_html=False)

It provides crawling Chosun Ilbo.

Parameters
ParametersTypeDescription
delay_timefloat or tuple- Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_htmlbool- Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.
Attributes
AttributesTypeDescription
delay_timefloat or tuple
saving_htmlbool
Methods
MethodsDescription
dynamic_crawl(url)Return article text using Selenium.
static_crawl(url)Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.

korean_news_crawler.Donga(delay_time=None, saving_html=False)

It provides crawling Dong-a Ilbo.

Parameters
ParametersTypeDescription
delay_timefloat or tuple- Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_htmlbool- Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.
Attributes
AttributesTypeDescription
delay_timefloat or tuple
saving_htmlbool
Methods
MethodsDescription
dynamic_crawl(url)Return article text using Selenium.
static_crawl(url)Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.

korean_news_crawler.Hankook(delay_time=None, saving_html=False)

It provides crawling Hankook Ilbo.

Parameters
ParametersTypeDescription
delay_timefloat or tuple- Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_htmlbool- Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.
Attributes
AttributesTypeDescription
delay_timefloat or tuple
saving_htmlbool
Methods
MethodsDescription
dynamic_crawl(url)Return article text using Selenium.
static_crawl(url)Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.

korean_news_crawler.Hankyoreh(delay_time=None, saving_html=False)

It provides crawling Hankyoreh.

Parameters
ParametersTypeDescription
delay_timefloat or tuple- Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_htmlbool- Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.
Attributes
AttributesTypeDescription
delay_timefloat or tuple
saving_htmlbool
Methods
MethodsDescription
dynamic_crawl(url)Return article text using Selenium.
static_crawl(url)Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.

korean_news_crawler.Joongang(delay_time=None, saving_html=False)

It provides crawling Joongang Ilbo.

Parameters
ParametersTypeDescription
delay_timefloat or tuple- Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_htmlbool- Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.
Attributes
AttributesTypeDescription
delay_timefloat or tuple
saving_htmlbool
Methods
MethodsDescription
dynamic_crawl(url)Return article text using Selenium.
static_crawl(url)Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.

korean_news_crawler.Kukmin(delay_time=None, saving_html=False)

It provides crawling Kukmin Ilbo.

Parameters
ParametersTypeDescription
delay_timefloat or tuple- Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_htmlbool- Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.
Attributes
AttributesTypeDescription
delay_timefloat or tuple
saving_htmlbool
Methods
MethodsDescription
dynamic_crawl(url)Return article text using Selenium.
static_crawl(url)Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.

korean_news_crawler.Kyunghyang(delay_time=None, saving_html=False)

It provides crawling Kyunghyang Shinmun.

Parameters
ParametersTypeDescription
delay_timefloat or tuple- Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_htmlbool- Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.
Attributes
AttributesTypeDescription
delay_timefloat or tuple
saving_htmlbool
Methods
MethodsDescription
dynamic_crawl(url)Return article text using Selenium.
static_crawl(url)Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.

korean_news_crawler.Munhwa(delay_time=None, saving_html=False)

It provides crawling Munhwa Ilbo.

Parameters
ParametersTypeDescription
delay_timefloat or tuple- Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_htmlbool- Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.
Attributes
AttributesTypeDescription
delay_timefloat or tuple
saving_htmlbool
Methods
MethodsDescription
dynamic_crawl(url)Return article text using Selenium.
static_crawl(url)Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.

korean_news_crawler.Naeil(delay_time=None, saving_html=False)

It provides crawling Naeil News.

Parameters
ParametersTypeDescription
delay_timefloat or tuple- Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_htmlbool- Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.
Attributes
AttributesTypeDescription
delay_timefloat or tuple
saving_htmlbool
Methods
MethodsDescription
dynamic_crawl(url)Return article text using Selenium.
static_crawl(url)Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.

korean_news_crawler.Segye(delay_time=None, saving_html=False)

It provides crawling Segye Ilbo.

Parameters
ParametersTypeDescription
delay_timefloat or tuple- Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_htmlbool- Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.
Attributes
AttributesTypeDescription
delay_timefloat or tuple
saving_htmlbool
Methods
MethodsDescription
dynamic_crawl(url)Return article text using Selenium.
static_crawl(url)Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.

korean_news_crawler.Seoul(delay_time=None, saving_html=False)

It provides crawling Seoul Shinmun.

Parameters
ParametersTypeDescription
delay_timefloat or tuple- Optional, Defaults to None.
- When 'delay_time=float', it will crawl sites with delay.
- When 'delay_time=tuple', it will crawl sites with random delay.
saving_htmlbool- Optional, Defaults to False.
- When 'saving_html=False', it always requests url every function calling.
- When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.
Attributes
AttributesTypeDescription
delay_timefloat or tuple
saving_htmlbool
Methods
MethodsDescription
dynamic_crawl(url)Return article text using Selenium.
static_crawl(url)Return article text using BeautifulSoup.
dynamic_crawl(url)
  • Return article text using Selenium.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.
static_crawl(url)
  • Return article text using BeautifulSoup.
ParametersTypeDescription
urlstr or list- When 'url=str', it will only crawl given url.
- When 'url=list', it will crawl with iterating url list.
Returns TypeDescription
listReturn article list.

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc