LlamaIndex Readers Integration: File
pip install llama-index-readers-file
This is the default integration for different loaders that are used within SimpleDirectoryReader
.
Provides support for the following loaders:
- DocxReader
- HWPReader
- PDFReader
- EpubReader
- FlatReader
- HTMLTagReader
- ImageCaptionReader
- ImageReader
- ImageVisionLLMReader
- IPYNBReader
- MarkdownReader
- MboxReader
- PptxReader
- PandasCSVReader
- VideoAudioReader
- UnstructuredReader
- PyMuPDFReader
- ImageTabularChartReader
- XMLReader
- PagedCSVReader
- CSVReader
- RTFReader
Installation
pip install llama-index-readers-file
Usage
Once installed, You can import any of the loader. Here's an example usage of one of the loader.
from llama_index.core import SimpleDirectoryReader
from llama_index.readers.file import (
DocxReader,
HWPReader,
PDFReader,
EpubReader,
FlatReader,
HTMLTagReader,
ImageCaptionReader,
ImageReader,
ImageVisionLLMReader,
IPYNBReader,
MarkdownReader,
MboxReader,
PptxReader,
PandasCSVReader,
VideoAudioReader,
UnstructuredReader,
PyMuPDFReader,
ImageTabularChartReader,
XMLReader,
PagedCSVReader,
CSVReader,
RTFReader,
)
parser = PDFReader()
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
parser = DocxReader()
file_extractor = {".docx": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
parser = HWPReader()
file_extractor = {".hwp": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
parser = EpubReader()
file_extractor = {".epub": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
parser = FlatReader()
file_extractor = {".txt": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
parser = HTMLTagReader()
file_extractor = {".html": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
parser = ImageReader()
file_extractor = {
".jpg": parser,
".jpeg": parser,
".png": parser,
}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
parser = IPYNBReader()
file_extractor = {".ipynb": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
parser = MarkdownReader()
file_extractor = {".md": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
parser = MboxReader()
file_extractor = {".mbox": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
parser = PptxReader()
file_extractor = {".pptx": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
parser = PandasCSVReader()
file_extractor = {".csv": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
parser = PyMuPDFReader()
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
parser = XMLReader()
file_extractor = {".xml": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
parser = PagedCSVReader()
file_extractor = {".csv": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
parser = CSVReader()
file_extractor = {".csv": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
This loader is designed to be used as a way to load data into LlamaIndex.