Use solar-embedding-1-large model for embeddings. Do not add suffixes such as -query or -passage to the model name. UpstageEmbeddings will automatically add the suffixes based on the method called.

Document Parse Loader

See a usage example

The use_ocr option determines whether OCR will be used for text extraction from documents. If this option is not specified, the default policy of the Upstage Document Parse API service will be applied. When use_ocr is set to True, OCR is utilized to extract text. In the case of PDF documents, this involves converting the PDF into images before performing OCR. Conversely, if use_ocr is set to False for PDF documents, the text information embedded within the PDF is used directly. However, if the input document is not a PDF, such as an image, setting use_ocr to False will result in an error.

from langchain_upstage import UpstageDocumentParseLoader

file_path = "/PATH/TO/YOUR/FILE.image"
layzer = UpstageDocumentParseLoader(file_path, split="page")

# For improved memory efficiency, consider using the lazy_load method to load documents page by page.
docs = layzer.load()  # or layzer.lazy_load()

for doc in docs[:3]:
    print(doc)

If you are a Windows user, please ensure that the Visual C++ Redistributable is installed before using the loader.

FAQs

What is langchain-upstage?

Is langchain-upstage well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install