New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

pat-tree

Package Overview
Dependencies
Maintainers
1
Versions
25
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

pat-tree - npm Package Compare versions

Comparing version 0.2.0 to 0.2.1

2

package.json
{
"name": "pat-tree",
"version": "0.2.0",
"version": "0.2.1",
"description": "PAT tree construction for Chinese documents",

@@ -5,0 +5,0 @@ "main": "index.js",

@@ -18,3 +18,3 @@ pat-tree

This project is now in development and used for academic purpose,
**DO NOT** use this module until the **WARNING** statement is removed.
**DO NOT** use this module in production until the **WARNING** statement is removed.
//TODO: improve document splitting algorithm

@@ -25,76 +25,95 @@

npm install pat-tree --save
```bash
npm install pat-tree --save
```
# Usage
## Init
### Instanitiate
var PATtree = require("pat-tree");
var tree = new PATtree();
```javascript
var PATtree = require("pat-tree");
var tree = new PATtree();
```
## Add document
### Add document
tree.addDocument(input);
```javascript
tree.addDocument(input);
```
## Extract Significant Lexical Patterns
### Extract Significant Lexical Patterns
var SLPs = tree.extractSLP(TFThreshold, SEThreshold); // SLPs: array of signifiant lexical patterns.
```javascript
var SLPs = tree.extractSLP(TFThreshold, SEThreshold);
// SLPs: array of strings, which are signifiant lexical patterns.
```
If the frequency of a pattern exceeds **THThreshold**,
and the SE value exceeds **SEThreshold**, it would appear in the result array.
If the frequency of a pattern exceeds `TFThreshold`,
and the SE value exceeds `SEThreshold`, it would appear in the result array.
**THTreshold** shold be integer, **SEThreshold** shold be between 0 and 1.
`TFTreshold` should be integer, `SEThreshold` should be float between 0 and 1.
## Split document
### Split document
var result = tree.splitDoc(doc, SLPs);
```javascript
var result = tree.splitDoc(doc, SLPs);
```
**doc** is the document to be splitted, data type: string.
`doc` is the document to split, data type: string.
**SLPs** is array of SLP that extracted by **tree.extractSLP()**, or array of keywords retrieved any other way.
data type: array of strings.
`SLPs` is array of SLP that extracted by `tree.extractSLP()`, or array of keywords retrieved any other way, data type: array of strings.
**result** is the result of splitted document, data type: string.
`result` is the result of splitted document, data type: string.
# Additional functions
## Print tree content
### Print tree content
tree.printTreeContent(printExternalNodes, printDocuments);
```javascript
tree.printTreeContent(printExternalNodes, printDocuments);
```
Print the content of the tree on console.
If **printExternalNodes** is set to true, print out all external nodes for each internal node.
If **printDocuments** is set to true, print out the whole collection of the tree.
If `printExternalNodes` is set to true, print out all external nodes for each internal node.
If `printDocuments` is set to true, print out the whole collection of the tree.
## Traversal
### Traversal
tree.traverse(preCallback, inCallback, postCallback);
```javascript
tree.traverse(preCallback, inCallback, postCallback);
```
For convenient, there are functions for each order of traversal
For convenience, there are functions for each order of traversal
tree.preOrderTraverse(callback);
tree.inOrderTraverse(callback);
tree.postOrderTraverse(callback);
```javascript
tree.preOrderTraverse(callback);
tree.inOrderTraverse(callback);
tree.postOrderTraverse(callback);
```
For example
tree.preOrderTraverse(function(node) {
console.log("node id: " + node.id);
})
```javascript
tree.preOrderTraverse(function(node) {
console.log("node id: " + node.id);
})
```
# Data type
## Node
### Node
Every nodes has some common informaitons, an node has the following structure:
```javascript
node = {
id: 3, // the id of this node, data type: JSON, auto generated.
parent: 1, // the parent id of this node, data type: integer
left: leftChildNode, // data type: Node
right: rightChildNode, // data type: Node
data: {} // payload for this node, data type : JSON
id: 3, // the id of this node, data type: JSON, auto generated.
parent: 1, // the parent id of this node, data type: integer
left: leftChildNode, // data type: Node
right: rightChildNode, // data type: Node
data: {} // payload for this node, data type : JSON
}
```

@@ -104,22 +123,39 @@ Data is different for internal nodes and external nodes,

## Internal nodes
### Internal nodes
```javascript
internalNode.data = {
type: "internal", // indicates this is an internal node
position: 13, // the branch position of external nodes, data type: integer
prefix: "00101", // the sharing prefix of external nodes, data type: string of 0s and 1s
externalNodeNum: 87, // number of external nodes contained in subtree of this node, data type: integer
totalFrequency: 89, // number of the total frequency of the external nodes in the collection, data type: integer
sistringRepres: node // one of the external node in the subree of this internal node, data type: Node
type: "internal",
// indicates this is an internal node
position: 13,
// the branch position of external nodes, data type: integer
prefix: "00101",
// the sharing prefix of external nodes, data type: string of 0s and 1s
externalNodeNum: 87,
// number of external nodes contained in subtree of this node,
// data type: integer
totalFrequency: 89,
// number of the total frequency of the external nodes in the collection,
// data type: integer
sistringRepres: node
// one of the external node in the subree of this internal node,
// data type: Node
}
```
## External nodes
### External nodes
External nodes has following structure:
```javascript
externalNode.data = {
type: "external", // indicates this is an external node,
sistring: "00101100110101", // binary representation of the character, data type: string
indexes: ["0.1,3", "1.2.5"] // the positions where the sistring appears in the collection, data type: array
type: "external",
// indicates this is an external node,
sistring: "00101100110101",
// binary representation of the character, data type: string
indexes: ["0.1,3", "1.2.5"]
// the positions where the sistring appears in the collection,
// data type: array
}
```

@@ -131,2 +167,3 @@ # Collection

```javascript
[ [ '嗨你好',

@@ -136,2 +173,3 @@ '這是測試文件' ],

'這是另外一個測試文件' ] ]
```

@@ -142,6 +180,7 @@ An index is in following structure:

For example, **"0.1.2"** is the index of the character "測".
For example, `"0.1.2"` is the index of the character `"測"`.
# Release History
* 0.2.1 Mofify README file
* 0.2.0 Add document splitting functionality

@@ -156,2 +195,1 @@ * 0.1.8 Alter algorithm, improve simplicity

* 0.1.1 First release
SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc