pat-tree - npm Package Compare versions

Comparing version 1.0.0 to 1.0.1

package.json

		{
		"name": "pat-tree",
		"version": "1.0.0",
		"version": "1.0.1",
		"description": "PAT tree construction for Chinese documents, keyword extraction and text segmentation",
		@@ -16,5 +16,10 @@ "main": "index.js",
		"pat-tree",
		"trie",
		"patricia tree",
		"pat tree",
		"PAT",
		"tree",
		"information retrieval",
		"Chinese",
		"ckip",
		"keyword extraction",
		@@ -21,0 +26,0 @@ "text segmentation"

README.md

		pat-tree
		========

		PAT tree construction for Chinese document.
		In Information Retrieval, text segmentation on Chinese like
		documents has been a difficult task, since Chinese words are
		continuous and has no white space between them. But finding basic
		elements of a document is critical for all applications in information retrieval.

		PAT tree is a Patricia tree, or called trie, that used particularly for
		text segmentation and word retrieval. This module can be used for
		PAT tree construction for Chinese documents.
		Provide functionality to add documents and construct PAT tree in memory,
		@@ -9,2 +16,6 @@ convert to JSON for storing to database,

		You can collect a corpus, adding all of them to construct a PAT tree,
		then extract significant lexical patterns, and do text segmentation
		on other documents.

		example of result:
		@@ -39,2 +50,4 @@

		`doc` is the document you want to add to the tree. data type: string

		### Extract Significant Lexical Patterns
		@@ -142,7 +155,7 @@
		```javascript
		tree.printTreeContent(printExternalNodes, printDocuments);
		tree.printTreeContent(printExternalNode, printDocuments);
		```

		Print the content of the tree on console.
		If `printExternalNodes` is set to true, print out all external nodes for each internal node.
		If `printExternalNode` is set to true, print out one external node for each internal node.
		If `printDocuments` is set to true, print out the whole collection of the tree.
		@@ -181,3 +194,3 @@
		id: 3, // the id of this node, data type: integer, auto generated.
		parent: 1, // the parent id of this node, data type: integer
		parent: parentNode, // the parent of this node, data type: Node
		left: leftChildNode, // data type: Node
		@@ -251,4 +264,19 @@ right: rightChildNode, // data type: Node

		# Performance

		All operations are fast, but require more memory and disk space to operate successfully.
		Running on Macbook Pro Retina, connected to local MongoDB, given 8GB memory size
		by specifying V8 option `--max_old_space_size=8000`, has following performance.

		* Add 32,769 Facebook-like posts by `tree.addDocument()` takes about 5 minutes.
		* After above operation, extract SLP by `tree.extractSLP()` takes about 5 minutes.
		* After above operation, converting to JSON by `tree.toJSON()` and store three collections to database takes about 1 minutes
		and 5 GB disk space, and about 1,000,000 records of tree nodes.
		* After above operation, find all collections in database and reborn the tree by `tree.reborn()` takes about 1 minutes.
		* After above operation, do text segmentation on 32,769 posts by `tree.segmentDoc()`, given SLPs extracted above,
		takes about 5 minutes.

		# Release History

		* 1.0.1 Modify README file
		* 1.0.0 Stable release
		@@ -255,0 +283,0 @@ * 0.2.8 Improve algorithm of `segmentDoc()`

pat-tree - npm Package Compare versions

Improved metrics