Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

streamlit-pdf-viewer

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

streamlit-pdf-viewer

Streamlit component for PDF visualisation and manipulation

  • 0.0.19
  • PyPI
  • Socket score

Maintainers
1

License PyPI version Downloads Build Coverage Status

streamlit-pdf-viewer

Streamlit component that allows the visualisation and enrichment of PDF documents. You can see an application in action here.

Features

  • Show PDF files in a Streamlit application with a simple command
  • Based on the pdf.js library
  • Visualize annotations on top of the PDF documents
  • Render text on top of the PDF document, allowing copy-paste
  • Allow to render specific pages of the PDF document
  • Scroll to a specific page
  • Scroll to a specific annotation
  • Allow custom callbacks when an annotation is clicked
  • Additional support showing PDF documents using the native pdf.js browser's viewer: "legacy" (with limitations, no annotations, no scrolling, etc..)

Limitations

  • Tested and developed to support Firefox and Chrome.
  • The legacy visualization works only on Firefox and does not support annotations
  • Our Javascript skills are limited, so all troubleshooting may take time
  • The component is still in development, so expect some bugs and limitations
  • The streamlit reload at each action may render slowly for complex PDF documents

Caveats

Here some caveats to be aware of:

  • It ss mandatory to specify a width to show PDF document on tabs and expanders, otherwise, the viewer will not be displayed on tabs not immediately visible.
  • From version 0.0.16, the behavior for managing width and height has changed:
    • If only the height is specified, the PDF document will be shown in proportion with the with proportional based on the PDF dimensions.
    • The possibility to show a large view of half the PDF is not available anymore (let's face it, it was not very useful).
    • If you need to use all the available space and limit the height, you can encapsulate the pdf_viewer() into a st.component(width:...) setting the width.
  • The legacy rendering is not supported on Chrome, due to security reasons.

Getting started

pip install streamlit-pdf-viewer

In your streamlit application, you can use it as:

import streamlit as st
from streamlit_pdf_viewer import pdf_viewer

pdf_viewer("str, path or bytes")

Params

In the following table the list of parameters that can be provided to the pdf_viewer function:

namedescription
inputThe source of the PDF file. Accepts a file path, URL, or binary data.
widthWidth of the PDF viewer in pixels. It defaults to 700 pixels.
heightHeight of the PDF viewer in pixels. If not provided, the viewer shows the whole content.
annotationsA list of annotations to be overlaid on the PDF. Format is described here.
pages_vertical_spacingThe vertical space (in pixels) between each page of the PDF. Defaults to 2 pixels.
annotation_outline_sizeSize of the outline around each annotation in pixels. Defaults to 1 pixel.
renderingType of rendering: unwrap (default), legacy_iframe, or legacy_embed. The default value, unwrap shows the PDF document using pdf.js, and supports the visualisation of annotations. Other values are legacy_iframe and legacy_embed which use the legacy approach of injecting the document into an <embed> or <iframe>. They allow viewing the PDF using the viewer of the browser that contains additional features we are still working to implement in this component. IMPORTANT: :warning: The "legacy" methods work only with Firefox, and do not support annotations. :warning:
pages_to_renderFilter the rendering to a specific set of pages. By default, all pages are rendered.
render_textEnable a layer of text on top of the PDF document. The text may be selected and copied. NOTE to avoid breaking existing deployments, we made this optional at first, also considering that having many annotations might interfere with the copy-paste.
scroll_to_pageScroll to a specific page when the component is rendered. Default is None. Require ints and ignores the parameters below zero.
scroll_to_annotationScroll to a specific annotation when the component is rendered. Default is None. Mutually exclusive with scroll_to_page. Raise an exception if used with scroll_to_page
on_annotation_clickCallback function that is called when an annotation is clicked. The function receives the annotation as a parameter.

Annotation format

The annotations format has been derived from the Grobid's coordinate formats, which are described as a list of "bounding boxes". The annotations are expressed as a dictionary of six elements, the page, x and y indicate the top left point. The color can be expressed following the html CSS convention.

Here an example:

[
   {
      "page": 1,
      "x": 220,
      "y": 155,
      "height": 22,
      "width": 65,
      "color": "red"
   },
[...]

The example shown in our screenshot can be found here.

Custom callback for clicking on annotations

from streamlit_pdf_viewer import pdf_viewer

annotations = [
    {
        "page": 1,
        "x": 220,
        "y": 155,
        "height": 22,
        "width": 65,
        "color": "red"
    },
    {
        "page": 1,
        "x": 220,
        "y": 155,
        "height": 22,
        "width": 65,
        "color": "red"
    }
]

def my_custom_annotation_handler(annotation):
    print(f"Annotation {annotation} clicked.")

pdf_viewer(
   "path/to/pdf",
   on_annotation_click=my_custom_annotation_handler,
   annotations=annotations
)

Developers notes

Environment

  • Python >= 3.8
  • Node.js >= 16
  • Streamlit >= 1.28.2

Configure environment for development

First, make sure that _RELEASE = False in streamlit_pdf_viewer/__init__.py. To run the component in development mode, use the following commands:

streamlit run streamlit_pdf_viewer/__init__.py

cd frontend
npm run serve

These commands will start the Streamlit application and serve the Node.js component. Please make sure you're in the correct directory before running these commands.

Integrate into a streamlit application

  1. Build the frontend part:

    cd frontend
    export NODE_OPTIONS=--openssl-legacy-provider
    npm run build 
    
  2. Make sure that _RELEASE = True in streamlit_pdf_viewer/__init__.py.

  3. move to the streamlit_application and run

    pip install -e {path of component}
    

Release

bump-my-version bump patch | minor | major
git push
git push --tags 

Acknowledgement

The project was initiated at the National Institute for Materials Science (NIMS) in Japan. Currently, the development is possible thanks to ScienciLAB. Main collaborator: Tomoya Mato very helpful to attenuate the pain of Javascript.

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc