Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

docx-to-vfile

Package Overview
Dependencies
Maintainers
0
Versions
16
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

docx-to-vfile

Reads a `.docx` file and stores its components in vfile format to be processed by other tools, like `reoff-parse`.

  • 0.11.0
  • latest
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
10
increased by42.86%
Maintainers
0
Weekly downloads
 
Created
Source

Note This repository is automatically generated from the main parser monorepo. Please submit any issues or pull requests there.

docx-to-vfile

npm version npm downloads

Reads a .docx file and stores its components in vfile format to be processed by other tools, like reoff-parse.

Currently extremely dumb and just stores it all in memory, no streams for you. File reading does happen in streams.

Based on docxtract

Contents

What is this?

This package reads a .docx file and stores its components in vfile format to be processed by other tools, like reoff-parse. This is the first step in a pipeline to convert a .docx file to many other formats using the unified ecosystem.

A .docx document is just a zip file with a bunch of XML and other files (such as images) in it. This package unzips the .docx file, reads the XML files and images and stores them in a VFile object, which is a virtual file format that can be used by other tools in the unified ecosystem.

When should I use this?

Probably only exclusively to read a docx file to feed into reoff-parse or something similar, or if you want to access the raw data of a docx file for some reason.

Install

This package is ESM only. In Node.js (version 12.20+, 14.14+, 16.0+, 18.0+), install as

pnpm add docx-to-vfile
# or with yarn
# yarn add docx-to-vfile
# or with npm
# npm install docx-to-vfile

Use

In Node

import { docxToVFile } from 'docx-to-vfile'

Pass a path to a .docx file

const file = await docxToVFile('path/to/file.docx')

Pass a Blob

const blob = await fetch('https://path/to/file.docx').then((res) => res.blob())
const file = await docxToVFile(blob)

Pass a Buffer

import { readFile } from 'fs/promises'
const buffer = await readFile('path/to/file.docx')
const file = await docxToVFile(buffer)

Pass a ReadStream

import { createReadStream } from 'fs'

const file = await docxToVFile(createReadStream('path/to/file.docx'))

In the browser

import { docxToVFile } from 'docx-to-vfile/browser'

Pass a File

<input type="file" />
document.querySelector('input[type="file"]')?.addEventListener('change', async (e) => {
  const file = await docxToVFile(e.target.files[0])
})

Output

Using the default settings, the main value of the VFile will be the content of the main document, and the data will contain the content of the other files in the .docx archive. Media files will be stored in the media property.

const output = {
  data: {
    'word/footnotes.xml': '<?xml version ...',
    '_rels/rels': '<?xml version ...',
    // ...
    relations: {
      rId9: 'footnotes.xml',
      rId8: 'endnotes.xml',
      // ...
    },
    media: {
      media/image1.png: //<Blob>,
    },
  },
  value: //'[the content of word/document.xml, the main document]',
  // other vfile stuff
  messages: [],
  history: [],
  cwd: './',
}

String(output) === output.value // true

API


docxToVFile()

Takes a docx file as a Blob or File and returns a VFile with the contents of the document.xml file as the root, and the contents of the other xml files as data.

Signature
docxToVFile(file: string | Blob | ArrayBuffer | File | Buffer, userOptions?: Options): Promise<VFile>;
Parameters
NameTypeDescription
filestringBlob
userOptions?Options-
Returns

Promise<VFile>

A VFile with the contents of the document.xml file as the root, and the contents of the other xml files as data.

Defined in: lib/docx-to-vfile-unzipit.ts:91


DocxVFileData

The data attribute of the VFile will contain the following:

Indexable

[key: XMLOrRelsString]: string | undefined

Properties
media

object

The media files in the .docx file Possibly undefined only to be compatible with the VFile interface

Since

0.5.0 - Added media, removed images

Index signature
Type declaration

Defined in: lib/docx-to-vfile-unzipit.ts:53

parsed?

object

The parsed .xml files in the .docx file

Usually added by reoff-parse

Index signature

[key: XMLOrRelsString]: Root | undefined

Type declaration

Defined in: lib/docx-to-vfile-unzipit.ts:72

relations?

object

{
    document: {
    };
    endnotes?: {
    };
    footnotes?: {
    };
}

The relations between the .xml files in the .docx file Possibly undefined only to be compatible with the VFile interface

Since

0.7.0 - Added relations.footnotes and relations.endnotes. relations.document is now an alias for relations. This now gets added by reoff-parse.

Type declaration
MemberType
document{ }
endnotes?{ }
footnotes?{ }

Defined in: lib/docx-to-vfile-unzipit.ts:61


Options

Hierarchy
Properties
include?

string[] | RegExp[] | (key: string) => boolean | "all" | "allWithDocumentXML"

Include only the specified files on the data attribute of the VFile. This may be useful if you want to only do something with a subset of the files in the docx file, and don't intend to use 'reoff-stringify' to turn the VFile back into a docx file.

  • If an array of strings or regexps is passed, only files that match one of the values will be included.
  • If a function is passed, it will be called for each file and should return true to include the file.
  • If the value is 'all', almost all files will be included, except for 'word/document.xml', as that already is the root of the VFile.
  • If the value is 'allWithDocumentXML', all files will be included, including word/document.xml, even though that is already the root of the VFile. Useful if you really want to mimic the original docx file.

You should keep it at the default value if you intend to use 'reoff-stringify' to turn the VFile back into a docx file.

Default

'all'

Defined in: lib/docx-to-vfile-unzipit.ts:29

withoutMedia?

boolean

Whether or not to include media in the VFile.

By default, images are included on the data.media attribute of the VFile as an object of Blobs, which are accessible both client and serverside.

Default

false

Defined in: lib/docx-to-vfile-unzipit.ts:15


OptionsWithFetchConfig

Hierarchy
Properties
fetchConfig?

RequestInit

The config to pass to fetch, for e.g. authorization headers.

Defined in: lib/docx-to-vfile-unzipit.ts:36

include?

string[] | RegExp[] | (key: string) => boolean | "all" | "allWithDocumentXML"

Include only the specified files on the data attribute of the VFile. This may be useful if you want to only do something with a subset of the files in the docx file, and don't intend to use 'reoff-stringify' to turn the VFile back into a docx file.

  • If an array of strings or regexps is passed, only files that match one of the values will be included.
  • If a function is passed, it will be called for each file and should return true to include the file.
  • If the value is 'all', almost all files will be included, except for 'word/document.xml', as that already is the root of the VFile.
  • If the value is 'allWithDocumentXML', all files will be included, including word/document.xml, even though that is already the root of the VFile. Useful if you really want to mimic the original docx file.

You should keep it at the default value if you intend to use 'reoff-stringify' to turn the VFile back into a docx file.

Default

'all'

Inherited from: Options.include

Defined in: lib/docx-to-vfile-unzipit.ts:29

withoutMedia?

boolean

Whether or not to include media in the VFile.

By default, images are included on the data.media attribute of the VFile as an object of Blobs, which are accessible both client and serverside.

Default

false

Inherited from: Options.withoutMedia

Defined in: lib/docx-to-vfile-unzipit.ts:15


XMLOrRelsString

${string}.xml | ${string}.rels

Defined in: lib/docx-to-vfile-unzipit.ts:82

Compatibility

Security

docx-to-vfile currently does not read macros, so it is not vulnerable to potential security issues with macros.

It does not however do any other security checks, so it is possible that maliciously crafted docx files could cause problems when e.g. parsed with rehype.

  • reoff-parse — Parse the output of docx-to-vfile into a VFile with an ooxast tree.

Contribute

License

GPL-3.0-or-later © Thomas F. K. Jorna

Keywords

FAQs

Package last updated on 28 Jun 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc