Comparing version 0.2.0 to 0.2.1
{ | ||
"name": "pat-tree", | ||
"version": "0.2.0", | ||
"version": "0.2.1", | ||
"description": "PAT tree construction for Chinese documents", | ||
@@ -5,0 +5,0 @@ "main": "index.js", |
140
README.md
@@ -18,3 +18,3 @@ pat-tree | ||
This project is now in development and used for academic purpose, | ||
**DO NOT** use this module until the **WARNING** statement is removed. | ||
**DO NOT** use this module in production until the **WARNING** statement is removed. | ||
//TODO: improve document splitting algorithm | ||
@@ -25,76 +25,95 @@ | ||
npm install pat-tree --save | ||
```bash | ||
npm install pat-tree --save | ||
``` | ||
# Usage | ||
## Init | ||
### Instanitiate | ||
var PATtree = require("pat-tree"); | ||
var tree = new PATtree(); | ||
```javascript | ||
var PATtree = require("pat-tree"); | ||
var tree = new PATtree(); | ||
``` | ||
## Add document | ||
### Add document | ||
tree.addDocument(input); | ||
```javascript | ||
tree.addDocument(input); | ||
``` | ||
## Extract Significant Lexical Patterns | ||
### Extract Significant Lexical Patterns | ||
var SLPs = tree.extractSLP(TFThreshold, SEThreshold); // SLPs: array of signifiant lexical patterns. | ||
```javascript | ||
var SLPs = tree.extractSLP(TFThreshold, SEThreshold); | ||
// SLPs: array of strings, which are signifiant lexical patterns. | ||
``` | ||
If the frequency of a pattern exceeds **THThreshold**, | ||
and the SE value exceeds **SEThreshold**, it would appear in the result array. | ||
If the frequency of a pattern exceeds `TFThreshold`, | ||
and the SE value exceeds `SEThreshold`, it would appear in the result array. | ||
**THTreshold** shold be integer, **SEThreshold** shold be between 0 and 1. | ||
`TFTreshold` should be integer, `SEThreshold` should be float between 0 and 1. | ||
## Split document | ||
### Split document | ||
var result = tree.splitDoc(doc, SLPs); | ||
```javascript | ||
var result = tree.splitDoc(doc, SLPs); | ||
``` | ||
**doc** is the document to be splitted, data type: string. | ||
`doc` is the document to split, data type: string. | ||
**SLPs** is array of SLP that extracted by **tree.extractSLP()**, or array of keywords retrieved any other way. | ||
data type: array of strings. | ||
`SLPs` is array of SLP that extracted by `tree.extractSLP()`, or array of keywords retrieved any other way, data type: array of strings. | ||
**result** is the result of splitted document, data type: string. | ||
`result` is the result of splitted document, data type: string. | ||
# Additional functions | ||
## Print tree content | ||
### Print tree content | ||
tree.printTreeContent(printExternalNodes, printDocuments); | ||
```javascript | ||
tree.printTreeContent(printExternalNodes, printDocuments); | ||
``` | ||
Print the content of the tree on console. | ||
If **printExternalNodes** is set to true, print out all external nodes for each internal node. | ||
If **printDocuments** is set to true, print out the whole collection of the tree. | ||
If `printExternalNodes` is set to true, print out all external nodes for each internal node. | ||
If `printDocuments` is set to true, print out the whole collection of the tree. | ||
## Traversal | ||
### Traversal | ||
tree.traverse(preCallback, inCallback, postCallback); | ||
```javascript | ||
tree.traverse(preCallback, inCallback, postCallback); | ||
``` | ||
For convenient, there are functions for each order of traversal | ||
For convenience, there are functions for each order of traversal | ||
tree.preOrderTraverse(callback); | ||
tree.inOrderTraverse(callback); | ||
tree.postOrderTraverse(callback); | ||
```javascript | ||
tree.preOrderTraverse(callback); | ||
tree.inOrderTraverse(callback); | ||
tree.postOrderTraverse(callback); | ||
``` | ||
For example | ||
tree.preOrderTraverse(function(node) { | ||
console.log("node id: " + node.id); | ||
}) | ||
```javascript | ||
tree.preOrderTraverse(function(node) { | ||
console.log("node id: " + node.id); | ||
}) | ||
``` | ||
# Data type | ||
## Node | ||
### Node | ||
Every nodes has some common informaitons, an node has the following structure: | ||
```javascript | ||
node = { | ||
id: 3, // the id of this node, data type: JSON, auto generated. | ||
parent: 1, // the parent id of this node, data type: integer | ||
left: leftChildNode, // data type: Node | ||
right: rightChildNode, // data type: Node | ||
data: {} // payload for this node, data type : JSON | ||
id: 3, // the id of this node, data type: JSON, auto generated. | ||
parent: 1, // the parent id of this node, data type: integer | ||
left: leftChildNode, // data type: Node | ||
right: rightChildNode, // data type: Node | ||
data: {} // payload for this node, data type : JSON | ||
} | ||
``` | ||
@@ -104,22 +123,39 @@ Data is different for internal nodes and external nodes, | ||
## Internal nodes | ||
### Internal nodes | ||
```javascript | ||
internalNode.data = { | ||
type: "internal", // indicates this is an internal node | ||
position: 13, // the branch position of external nodes, data type: integer | ||
prefix: "00101", // the sharing prefix of external nodes, data type: string of 0s and 1s | ||
externalNodeNum: 87, // number of external nodes contained in subtree of this node, data type: integer | ||
totalFrequency: 89, // number of the total frequency of the external nodes in the collection, data type: integer | ||
sistringRepres: node // one of the external node in the subree of this internal node, data type: Node | ||
type: "internal", | ||
// indicates this is an internal node | ||
position: 13, | ||
// the branch position of external nodes, data type: integer | ||
prefix: "00101", | ||
// the sharing prefix of external nodes, data type: string of 0s and 1s | ||
externalNodeNum: 87, | ||
// number of external nodes contained in subtree of this node, | ||
// data type: integer | ||
totalFrequency: 89, | ||
// number of the total frequency of the external nodes in the collection, | ||
// data type: integer | ||
sistringRepres: node | ||
// one of the external node in the subree of this internal node, | ||
// data type: Node | ||
} | ||
``` | ||
## External nodes | ||
### External nodes | ||
External nodes has following structure: | ||
```javascript | ||
externalNode.data = { | ||
type: "external", // indicates this is an external node, | ||
sistring: "00101100110101", // binary representation of the character, data type: string | ||
indexes: ["0.1,3", "1.2.5"] // the positions where the sistring appears in the collection, data type: array | ||
type: "external", | ||
// indicates this is an external node, | ||
sistring: "00101100110101", | ||
// binary representation of the character, data type: string | ||
indexes: ["0.1,3", "1.2.5"] | ||
// the positions where the sistring appears in the collection, | ||
// data type: array | ||
} | ||
``` | ||
@@ -131,2 +167,3 @@ # Collection | ||
```javascript | ||
[ [ '嗨你好', | ||
@@ -136,2 +173,3 @@ '這是測試文件' ], | ||
'這是另外一個測試文件' ] ] | ||
``` | ||
@@ -142,6 +180,7 @@ An index is in following structure: | ||
For example, **"0.1.2"** is the index of the character "測". | ||
For example, `"0.1.2"` is the index of the character `"測"`. | ||
# Release History | ||
* 0.2.1 Mofify README file | ||
* 0.2.0 Add document splitting functionality | ||
@@ -156,2 +195,1 @@ * 0.1.8 Alter algorithm, improve simplicity | ||
* 0.1.1 First release | ||
29053
189