@aicore/analytics-parser
Advanced tools
Comparing version 1.0.0 to 1.0.1
{ | ||
"name": "@aicore/analytics-parser", | ||
"version": "1.0.0", | ||
"version": "1.0.1", | ||
"description": "parse analytics dumps to json/csv/other formats", | ||
@@ -5,0 +5,0 @@ "main": "src/index.js", |
137
README.md
@@ -1,7 +0,4 @@ | ||
# template-nodejs | ||
A template project for nodejs. Has integrated linting, testing, | ||
coverage, reporting, GitHub actions for publishing to npm repository, dependency updates and other goodies. | ||
# analyticsParser | ||
A module that can transform raw analytics dump files. | ||
Easily use this template to quick start a production ready nodejs project template. | ||
## Code Guardian | ||
@@ -22,68 +19,50 @@ [](https://github.com/aicore/analytics-parser/actions/workflows/build_verify.yml) | ||
## APIs | ||
```js | ||
// after npm install @aicore/analytics-parser | ||
import {parseJSON, parseGZIP} from '@aicore/analytics-parser'; | ||
## Getting Started | ||
### Install this library | ||
```bash | ||
npm install @aicore/analytics-parser | ||
``` | ||
## parseJSON | ||
Converts the given JSON analytics dump file int a processed JSON representation and returns it. | ||
Will optionally write the json file to disk if targetFilePath is specified below. | ||
#### The processed JSON format is an array of sample item below: | ||
### Import the library | ||
```js | ||
[{ | ||
"type": "usage", | ||
"category": "languageServerProtocol", | ||
"subCategory": "codeHintsphp", | ||
"count": 1, | ||
"value": 1, // value is optional, if present, the count specified the number of times the value happened. | ||
"geoLocation": { | ||
"city": "Gurugram (Sector 44)", | ||
"continent": "Asia", | ||
"country": "India", | ||
"isInEuropeanUnion": false | ||
}, | ||
"sessionID": "cmn92zuk0i", | ||
"clientTimeUTC": 1669799589768, // this is the time as communicated by the client, but client clock may be wrong | ||
// server time is approximated time based on servers time. client time should be preferred, and | ||
// serverTimeUTC used to validate that the client is not wrong/lying about its time. | ||
"serverTimeUTC": 1669799580000, | ||
"uuid": "208c5676-746f-4493-80ed-d919775a2f1d" | ||
}, ...] | ||
import {parseGZIP} from '@aicore/analytics-parser'; | ||
``` | ||
### Get the analytics logs | ||
The analytics logs will be structured in the storage bucket as follows: | ||
1. Each analytics app will have a root folder under which the analytics data is collected. (Eg. `brackets-prod`). | ||
2. Within each app folder, the raw analytics dump files can be located easily with the date. | ||
Eg. `brackets-prod/2022/10/11/*` will have all analytics data for that day. | ||
3. Download the analytics gzip files for the dates that you desire. https://cyberduck.io/ is a good utility | ||
for this in windows and mac. | ||
### Parse the extracted zip file | ||
Type: [function][1] | ||
To parse the GZipped analytics dump file using the `parseGZIP` API: | ||
### Parameters | ||
* `JSONFilePath` **[string][2]**  | ||
* `targetFilePath` **[string][2]?** Optional path, if specified will write to file as well. | ||
### Examples | ||
To parse the extracted json analytics dump file: | ||
```javascript | ||
// To extract the expanded analytics dump to a json file | ||
let expandedJSON = await parseJSON('path/to/someText.json', "target/path/to/expanded.json"); | ||
// if you do not want to expand to a json file and only want the parsed array, omit the second parameter. | ||
let expandedJSON = await parseJSON('path/to/someText.json'); | ||
// Give the gzip input file path. Note that the file name should be | ||
// exactly of the form `brackets-prod.2022-11-30-9-13-17-656.v1.json.tar.gz` containing a single file | ||
// `brackets-prod.2022-11-30-9-13-17-656.v1.json`. | ||
let expandedJSON = await parseGZIP('path/to/brackets-prod.2022-11-30-9-13-17-656.v1.json.tar.gz'); | ||
``` | ||
Returns **[Promise][3]<[Object][4]>** Promised that resolves to an object representing analytics data as described above. | ||
#### Understanding return data type of `parseGZIP` API | ||
The returned `expandedJSON` object is an array of event point objects as below. | ||
Each event point object has the following fields: | ||
## parseGZIP | ||
1. (`type`, `category`, `subCategory`): These three strings identifies the **event** and are guaranteed to be present. | ||
Eg. `(type: platform, category: startup, subCategory: time)`, | ||
`(type: platform, category: language, subCategory: en-us)`, `(type: UI, category: click, subCategory: closeBtn)` | ||
2. `uuid`: Unique user ID that persists across sessions. | ||
3. `sessionID`: A session ID that gets reset on every session. For Eg. Browser Tab close resets `sessionID`. | ||
4. `clientTimeUTC`: A unix timestamp signalling the exact time(accurate to 3 seconds) at which the said event occurred | ||
according to the clients clock. This is the preferred time to use for the event. Note that the client clock may be wrong | ||
or misleading as this is client specified data. So cross-reference it to be within 30 minutes of `serverTimeUTC`. | ||
5. `serverTimeUTC`: A unix timestamp signalling the exact time(accurate to within 10 minutes) at which the said event | ||
occurred according to the servers clock. Use this only to cross-reference with `clientTimeUTC`. | ||
6. `count`: The number of times the **event** occurred in the time. Guaranteed to be present. | ||
7. `value`: Value is an optional string usually representing a number. if present, the `count` specified the number | ||
of times the `value` happened. This is only present in certain events that tracks values. | ||
Eg. If we are tracking `JS file open` latencies, `(value: 250, count 2)` means that we got 2 `JS file open` events | ||
each with latency of 250 units. | ||
8. `geoLocation`: Of the user raising the event. | ||
Converts the given Gzip analytics dump file int a processed JSON representation and returns it. | ||
Will optionally write the json file to disk if targetFilePath is specified below. Note that the file name should be | ||
exactly of the form `brackets-prod.2022-11-30-9-13-17-656.v1.json.tar.gz` containing a single file | ||
`brackets-prod.2022-11-30-9-13-17-656.v1.json`. If you want to parse arbitrary JSON, use the `parseJSON` | ||
method instead. | ||
#### The processed JSON format is an array of sample item below: | ||
```js | ||
@@ -95,3 +74,3 @@ [{ | ||
"count": 1, | ||
"value": 1, // value is optional, if present, the count specified the number of times the value happened. | ||
"value": "250", // value is optional, if present, the count specified the number of times the value happened. | ||
"geoLocation": { | ||
@@ -104,5 +83,3 @@ "city": "Gurugram (Sector 44)", | ||
"sessionID": "cmn92zuk0i", | ||
"clientTimeUTC": 1669799589768, // this is the time as communicated by the client, but client clock may be wrong | ||
// server time is approximated time based on servers time. client time should be preferred, and | ||
// serverTimeUTC used to validate that the client is not wrong/lying about its time. | ||
"clientTimeUTC": 1669799589768, | ||
"serverTimeUTC": 1669799580000, | ||
@@ -113,26 +90,16 @@ "uuid": "208c5676-746f-4493-80ed-d919775a2f1d" | ||
Type: [function][1] | ||
### Parameters | ||
## The Analytics Zip file | ||
The analytics zip file name is of the format `brackets-prod.YYYY-MM-DD-H-M-S-ms.v1.json.tar.gz`. It has a single JSON | ||
file when extracted with name of form `brackets-prod.YYYY-MM-DD-H-M-S-ms.v1.json`(referred here on as extracted JSON). | ||
The first part of the name contains the app name(Eg. `brackets-prod`) for which the dump corresponds to and the | ||
second part is the timestamp(accurate to milliseconds) at which the dump was collected at the server. | ||
* `gzipFilePath` **[string][2]**  | ||
* `targetFilePath` **[string][2]?** Optional path, if specified will write to file as well. | ||
To learn more about the raw extracted JSON format, see this [wiki](https://github.com/aicore/Core-Analytics-Server/blob/main/docs/architecture.md#client-schema). | ||
But knowing the raw format is not necessary for this library. The purpose of this library is to convert this raw JSON to | ||
a much more human-readable JSON format via the `parseGZIP` API outlined below. | ||
### Examples | ||
## Detailed API docs | ||
[See this link for detailed API docs.](https://github.com/aicore/analytics-parser/blob/main/docs/generatedApiDocs/index-API.md) | ||
To parse the GZipped analytics dump file: | ||
```javascript | ||
// To extract to a json file, give the gzip file path. Note that the file name should be | ||
// exactly of the form `brackets-prod.2022-11-30-9-13-17-656.v1.json.tar.gz` containing a single file | ||
// `brackets-prod.2022-11-30-9-13-17-656.v1.json`. If you want to parse arbitrary JSON, use the `parseJSON` | ||
// method instead. | ||
let expandedJSON = await parseGZIP('path/to/brackets-prod.2022-11-30-9-13-17-656.v1.json.tar.gz', | ||
"target/path/to/expanded.json"); | ||
// if you do not want to expand to a json file and only want the parsed array, omit the second parameter. | ||
let expandedJSON = await parseGZIP('path/to/brackets-prod.2022-11-30-9-13-17-656.v1.json.tar.gz'); | ||
``` | ||
Returns **[Promise][3]<[Object][4]>** Promised that resolves to an object representing analytics data as described above. | ||
# Commands available | ||
@@ -139,0 +106,0 @@ |
@@ -30,3 +30,3 @@ /* | ||
const exec = util.promisify(child_process.exec); | ||
import * as fs from "fs"; | ||
import { readFile, writeFile,unlink } from 'node:fs/promises'; | ||
import * as path from "path"; | ||
@@ -36,8 +36,8 @@ | ||
return new Promise((resolve)=>{ | ||
fs.unlink(filePath, function(err){ | ||
if(err) { | ||
unlink(filePath) | ||
.then(resolve) | ||
.catch((err)=>{ | ||
console.log(err); | ||
} | ||
resolve(); | ||
}); | ||
resolve(); | ||
}); | ||
}); | ||
@@ -110,3 +110,3 @@ } | ||
/** | ||
* Converts the given JSON analytics dump file int a processed JSON representation and returns it. | ||
* Converts the given extracted JSON analytics dump file int a processed JSON representation and returns it. | ||
* Will optionally write the json file to disk if targetFilePath is specified below. | ||
@@ -121,3 +121,3 @@ * | ||
* "count": 1, | ||
* "value": 1, // value is optional, if present, the count specified the number of times the value happened. | ||
* "value": "45", // value is optional, if present, the count specified the number of times the value happened. | ||
* "geoLocation": { | ||
@@ -139,5 +139,5 @@ * "city": "Gurugram (Sector 44)", | ||
* // To extract the expanded analytics dump to a json file | ||
* let expandedJSON = await parseJSON('path/to/someText.json', "target/path/to/expanded.json"); | ||
* let expandedJSON = await parseExtractedFile('path/to/someText.json', "target/path/to/expanded.json"); | ||
* // if you do not want to expand to a json file and only want the parsed array, omit the second parameter. | ||
* let expandedJSON = await parseJSON('path/to/someText.json'); | ||
* let expandedJSON = await parseExtractedFile('path/to/someText.json'); | ||
* | ||
@@ -150,4 +150,4 @@ * @param {string} JSONFilePath | ||
export async function parseJSON(JSONFilePath, targetFilePath) { | ||
let json = JSON.parse(fs.readFileSync(JSONFilePath, {encoding: 'utf8', flag: 'r'})); | ||
export async function parseExtractedFile(JSONFilePath, targetFilePath) { | ||
let json = JSON.parse(await readFile(JSONFilePath, {encoding: 'utf8', flag: 'r'})); | ||
let expandedEvents = []; | ||
@@ -168,3 +168,3 @@ _validateSchemaVersion1(json); | ||
if(targetFilePath){ | ||
fs.writeFileSync(targetFilePath, JSON.stringify(expandedEvents)); | ||
await writeFile(targetFilePath, JSON.stringify(expandedEvents)); | ||
} | ||
@@ -178,3 +178,3 @@ return expandedEvents; | ||
* exactly of the form `brackets-prod.2022-11-30-9-13-17-656.v1.json.tar.gz` containing a single file | ||
* `brackets-prod.2022-11-30-9-13-17-656.v1.json`. If you want to parse arbitrary JSON, use the `parseJSON` | ||
* `brackets-prod.2022-11-30-9-13-17-656.v1.json`. If you want to parse the extracted JSON, use the `parseExtractedFile` | ||
* method instead. | ||
@@ -189,3 +189,3 @@ * | ||
* "count": 1, | ||
* "value": 1, // value is optional, if present, the count specified the number of times the value happened. | ||
* "value": "23", // value is optional, if present, the count specified the number of times the value happened. | ||
* "geoLocation": { | ||
@@ -208,3 +208,3 @@ * "city": "Gurugram (Sector 44)", | ||
* // exactly of the form `brackets-prod.2022-11-30-9-13-17-656.v1.json.tar.gz` containing a single file | ||
* // `brackets-prod.2022-11-30-9-13-17-656.v1.json`. If you want to parse arbitrary JSON, use the `parseJSON` | ||
* // `brackets-prod.2022-11-30-9-13-17-656.v1.json`. If you want to parse Extracted JSON, use the `parseExtractedFile` | ||
* // method instead. | ||
@@ -225,3 +225,3 @@ * let expandedJSON = await parseGZIP('path/to/brackets-prod.2022-11-30-9-13-17-656.v1.json.tar.gz', | ||
await exec(`tar -xvf ${gzipFilePath} -C ${path.dirname(gzipFilePath)}`); | ||
let json = await parseJSON(jsonPath, targetFilePath); | ||
let json = await parseExtractedFile(jsonPath, targetFilePath); | ||
await silentlyDelete(jsonPath); | ||
@@ -228,0 +228,0 @@ return json; |
Filesystem access
Supply chain riskAccesses the file system, and could potentially read sensitive data.
Found 1 instance in 1 package
59183
0
252