![Oracle Drags Its Feet in the JavaScript Trademark Dispute](https://cdn.sanity.io/images/cgdhsj6q/production/919c3b22c24f93884c548d60cbb338e819ff2435-1024x1024.webp?w=400&fit=max&auto=format)
Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Python Library for Crawling Top 10 Korean News and Providing Synonym Dictionary
한국 10대 일간지 크롤링 및 유사어 사전 제공 Python 라이브러리입니다. 아직 PyPI에 정식등록되진 않은 beta 버전입니다.
Open Source Project로 기여자, 참여자 상시 모집하고 있습니다. 연락주시면 감사하겠습니다.
This is Python library for crawling articles from Korean Top 10 Newspaper sites and providing synonym dictionary.
The copyright of articles are belong to original media company. We don't take any legal responsibility using of them. We assume that you have agreed to this.
We're greeting to join you as contibutors, collaborator. Thanks to give me contact.
Indigo_Coder |
pip install korean_news_crawler
BeautifulSoup, Selenium, Requests are required.
from korean_news_crawler import chosun
chosun = Chosun()
print(chosun.dynamic_crawl("https://www.chosun.com/..."))
chosun_url_list = list() #Chosun Ilbo url list
print(chosun.dynamic_crawl(chosun_url_list))
korean_news_crawler.Chosun(delay_time=None, saving_html=False)
It provides crawling Chosun Ilbo.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
dynamic_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
static_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
korean_news_crawler.Donga(delay_time=None, saving_html=False)
It provides crawling Dong-a Ilbo.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
dynamic_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
static_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
korean_news_crawler.Hankook(delay_time=None, saving_html=False)
It provides crawling Hankook Ilbo.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
dynamic_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
static_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
korean_news_crawler.Hankyoreh(delay_time=None, saving_html=False)
It provides crawling Hankyoreh.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
dynamic_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
static_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
korean_news_crawler.Joongang(delay_time=None, saving_html=False)
It provides crawling Joongang Ilbo.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
dynamic_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
static_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
korean_news_crawler.Kukmin(delay_time=None, saving_html=False)
It provides crawling Kukmin Ilbo.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
dynamic_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
static_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
korean_news_crawler.Kyunghyang(delay_time=None, saving_html=False)
It provides crawling Kyunghyang Shinmun.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
dynamic_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
static_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
korean_news_crawler.Munhwa(delay_time=None, saving_html=False)
It provides crawling Munhwa Ilbo.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
dynamic_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
static_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
korean_news_crawler.Naeil(delay_time=None, saving_html=False)
It provides crawling Naeil News.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
dynamic_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
static_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
korean_news_crawler.Segye(delay_time=None, saving_html=False)
It provides crawling Segye Ilbo.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
dynamic_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
static_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
korean_news_crawler.Seoul(delay_time=None, saving_html=False)
It provides crawling Seoul Shinmun.
Parameters | Type | Description |
---|---|---|
delay_time | float or tuple | - Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay. |
saving_html | bool | - Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load. |
Attributes | Type | Description |
---|---|---|
delay_time | float or tuple | |
saving_html | bool |
Methods | Description |
---|---|
dynamic_crawl(url) | Return article text using Selenium. |
static_crawl(url) | Return article text using BeautifulSoup. |
dynamic_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
static_crawl(url)
Parameters | Type | Description |
---|---|---|
url | str or list | - When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list. |
Returns Type | Description |
---|---|
list | Return article list. |
FAQs
Python Library for Crawling Top 10 Korean News and Providing Synonym Dictionary
We found that korean-news-crawler demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.