Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Package acting as a wrapper around the headless mode of existing web browsers to generate images from URLs and from HTML+CSS strings or files.
PyPI Package | GitHub Repository |
---|
A lightweight Python package acting as wrapper around the headless mode of existing web browsers, allowing image generation from HTML/CSS strings, files and URLs.
This package has been tested on Windows, Ubuntu (desktop and server) and MacOS. If you encounter any problems or difficulties while using it, feel free to open an issue on the GitHub page of this project. Feedback is also welcome!
⚠️ Disclaimer: Use this package with trusted content only. Processing untrusted or unsanitized input can lead to malicious code execution. Always ensure content safety.
Most web browsers have a Headless Mode, which is a way to run them without displaying any graphical interface. Headless mode is mainly used for automated testing but also comes in handy if you want to take screenshots of web pages that are exact replicas of what you would see on your screen if you were using the browser yourself.
However, for the sake of taking screenshots, headless mode is not very convenient to use. HTML2Image aims to hide the inconveniences of the browsers' headless modes while adding useful features, such as allowing the creation of images from simple strings.
For more information about headless modes :
HTML2Image is published on PyPI and can be installed through pip:
pip install --upgrade html2image
In addition to this package, at least one of the following browsers must be installed on your machine :
from html2image import Html2Image
hti = Html2Image()
Multiple arguments can be passed to the constructor:
browser
: Browser that will be used, can be set to 'chrome'
(default) or 'edge'
.browser_executable
: The path or the command that can be used to find the executable of a specific browser.output_path
: Path to the folder to which taken screenshots will be outputted. Default is the current working directory of your python program.size
: 2-Tuple representing the size of the screenshots that will be taken. Default value is (1920, 1080)
.temp_path
: Path that will be used to put together different resources when screenshotting strings of files. Default value is %TEMP%/html2image
on Windows, and /tmp/html2image
on Linux and MacOS.keep_temp_files
: Pass True to this argument to not automatically remove temporary files created in temp_path
. Default is False.Example:
hti = Html2Image(size=(500, 200))
You can also change these values later:
hti.size = (500, 200)
The screenshot
method is the basis of this package. Most of the time, you won't need to use anything else. It can take screenshots of various things:
url
parameter;html_file
and css_file
parameters;html_str
and css_str
parameters;other_file
parameter (try it with .svg files!).And you can also (optional):
size
parameter;save_as
parameter.N.B.: The screenshot
method returns a list containing the path(s) of the screenshot(s) it took.
hti.screenshot(url='https://www.python.org', save_as='python_org.png')
html = """<h1> An interesting title </h1> This page will be red"""
css = "body {background: red;}"
hti.screenshot(html_str=html, css_str=css, save_as='red_page.png')
hti.screenshot(
html_file='blue_page.html', css_file='blue_background.css',
save_as='blue_page.png'
)
hti.screenshot(other_file='star.svg')
hti.screenshot(other_file='star.svg', size=(500, 500))
hti = Html2Image(output_path='my_screenshot_folder')
OR
hti.output_path = 'my_screenshot_folder'
N.B. : the output path will be changed for all future screenshots.
screenshot
method# create three files from one filename
hti.screenshot(html_str=['A', 'B', 'C'], save_as='ABC.png')
# outputs ABC_0.png, ABC_1.png, ABC_2.png
# create three files from from different filenames
hti.screenshot(html_str=['A', 'B', 'C'], save_as=['A.png', 'B.png', 'C.png'])
# outputs A.png, B.png, C.png
# take four screenshots with a resolution of 100*50
hti.screenshot(
html_str=['A', 'B', 'C', 'D'],
size=(100, 50)
)
# take four screenshots with different resolutions from three given sizes
hti.screenshot(
html_str=['A', 'B', 'C', 'D'],
size=[(100, 50), (100, 100), (50, 50)]
)
# respectively 100*50, 100*100, 50*50, 50*50
# if not enough sizes are given, the last size in the list will be repeated
# screenshot two html strings and apply css strings on both
hti.screenshot(
html_str=['A', 'B'],
css_str='body {background: red;}'
)
# screenshot two html strings and apply multiple css strings on both
hti.screenshot(
html_str=['A', 'B'],
css_str=['body {background: red;}', 'body {font-size: 50px;}']
)
# screenshot one html string and apply multiple css strings on it
hti.screenshot(
html_str='A',
css_str=['body {background: red;}', 'body {font-size: 50px;}']
)
screenshot
method returns a list containing the path(s) of the screenshot(s):paths = hti.screenshot(
html_str=['A', 'B', 'C'],
save_as="letters.png",
)
print(paths)
# >>> ['D:\\myFiles\\letters_0.png', 'D:\\myFiles\\letters_1.png', 'D:\\myFiles\\letters_2.png']
In some cases, you may need to change the flags that are used to run the headless mode of a browser.
Flags can be used to:
--no-sandbox
flag;You can find the full list of Chrome / Chromium flags here.
There are two ways to specify custom flags:
# At the object instanciation
hti = Html2image(custom_flags=['--my_flag', '--my_other_flag=value'])
# Afterwards
hti.browser.flags = ['--my_flag', '--my_other_flag=value']
With Chrome / Chromium, screenshots are fired directly after there is no more "pending network fetches", but you may sometimes want to add a delay before taking a screenshot, to wait for animations to end for example.
There is a flag for this purpose, --virtual-time-budget=VALUE_IN_MILLISECONDS
. You can use it like so:
hti = Html2Image(
custom_flags=['--virtual-time-budget=10000', '--hide-scrollbars']
)
hti.screenshot(url='http://example.org')
For ease of use, some flags are set by default. However default flags are not used if you decide to specify custom_flags
or change the value of browser.flags
:
# Taking a look at the default flags
>>> hti = Html2Image()
>>> hti.browser.flags
['--default-background-color=000000', '--hide-scrollbars']
# Changing the value of browser.flags gets rid of the default flags.
>>> hti.browser.flags = ['--1', '--2']
>>> hti.browser.flags
['--1', '--2']
# Using the custom_flags parameter gets rid of the default flags.
>>> hti = Html2Image(custom_flags=['--a', '--b'])
>>> hti.browser.flags
['--a', '--b']
HTML2image comes with a Command Line Interface which you can use to generate screenshots from files and URLs on the go.
The CLI is a work in progress and may undergo changes.
You can call it by typing hti
or html2image
into a terminal.
argument | description | example |
---|---|---|
-h, --help | Shows the help message | hti -h |
-U, --urls | Screenshots a list of URLs | hti -U https://www.python.org |
-H, --html | Screenshots a list of HTML files | hti -H file.html |
-C, --css | Attaches a CSS files to the HTML ones | hti -H file.html -C style.css |
-O, --other | Screenshots a list of files of type "other" | hti -O star.svg |
-S, --save-as | A list of the screenshot filename(s) | hti -O star.svg -S star.png |
-s, --size | A list of the screenshot size(s) | hti -O star.svg -s 50,50 |
-o, --output_path | Change the output path of the screenshots (default is current working directory) | hti star.svg -o screenshot_dir |
-q, --quiet | Disable all CLI's outputs | hti --quiet |
-v, --verbose | More details, can help debugging | hti --verbose |
--chrome_path | Specify a different chrome path | |
--temp_path | Specify a different temp path (where the files are loaded) |
You can also test the package and the CLI without having to install everything on your local machine, via a Docker container.
git clone
this repocd
inside itdocker build -t html2image .
docker run -it html2image /bin/bash
Inside that container, the html2image
package as well as chromium
are installed.
You can load and execute a python script to use the package, or simply use the CLI.
On top of that, you can also use volumes to bind a container directory to your local machine directory, allowing you to retrieve the generated images, or even load some resources (HTML, CSS or Python files).
Only basic testing is available at the moment. To run tests, install the requirements (Pillow) and run PyTest at the root of the project:
pip install -r requirements-test.txt
python -m pytest
Can I automatically take a full page screenshot?
Sadly no, it is not easily possible. Html2Image relies on the headless mode of Chrome/Chromium browsers to take screenshots and there is no way to "ask" for a full page screenshot at the moment. If you know a way (by estimating the page size for example), please open an issue or a discussion!
Can I add delay before taking a screenshot?
Yes you can, please take a look at the Change browser flags
section of the readme.
Can I speed up the screenshot-taking process?
Yes, when you are taking a lot of screenshots, you can achieve better performance using Parallel Processing or Multiprocessing methods. You can find an example of it here.
Can I make a cookie modal disappear?
Yes and no. No, because there is no options to do it magically and extensions are not supported in headless Chrome (The I don't care about cookies
extension would have been useful in this case). Yes, because you can make any element of a page disappear by retrieving its source code, modifying it as you wish, and finally screenshotting the modified source code.
If you see any typos or notice things that are oddly said, feel free to create an issue or a pull request.
FAQs
Package acting as a wrapper around the headless mode of existing web browsers to generate images from URLs and from HTML+CSS strings or files.
We found that html2image demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.