Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

network-pipeline

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

network-pipeline

Distributed Network Packet Analysis Pipeline for Layer 2, 3 and 4 Frames

  • 1.2.11
  • PyPI
  • Socket score

Maintainers
1

AntiNex - Network Data Analysis Pipeline

This is a distributed python 3 framework for automating network traffic capture and converting it into a csv file. Once you have a csv file you can build, train and tune machine learning models to defend your own infrastructure by actively monitoring the network layer.

.. image:: https://raw.githubusercontent.com/jay-johnson/network-pipeline/master/docker/images/network-pipeline-workflow.png :align: center

.. image:: https://readthedocs.org/projects/antinex-network-pipeline/badge/?version=latest :target: http://antinex-network-pipeline.readthedocs.io/en/latest/?badge=latest

It supports auto-publishing captured network traffic to the AntiNex REST API_ for using pre-trained Deep Neural Networks to make predictions on if this is an attack record or not using the AntiNex Core. Please refer to the Making Live Predictions using Pre-trained Neural Networks section for more details. Publishing to the REST API can run inside docker as well.

.. _AntiNex REST API: https://github.com/jay-johnson/train-ai-with-django-swagger-jwt .. _Making Live Predictions using Pre-trained Neural Networks: https://github.com/jay-johnson/network-pipeline#making-live-predictions-using-pre-trained-neural-networks

There are many choices to build a machine learning or AI model but for now I am using Jupyter Hub_ to build a pre-trained model for defending against OWASP Dynamic Analysis tools for finding vulnerabilities_ running in my owasp-jenkins_ repository.

  • Django REST Framework + JWT + Swagger_ - run prepare-dataset and train-keras-deep-neural-network using a multi-tenant Django 2.0+ REST API server supporting JWT and Swagger
  • Simulations directory_ - capturing simulated attacks using ZAP with Django, Flask, React, Vue, and Spring
  • Prepare Dataset section_ - preparing training csvs from captured recordings
  • Train Models section_ - training machine learning and AI models from prepared csvs and please check out the AntiNex Core which has accuracies over 99.8%_ and a Jupyter notebook_
  • Datasets repository_ - captured recordings if you want to see what some of the data will look like

.. _Jupyter Hub: https://github.com/jay-johnson/celery-connectors#running-jupyterhub-with-postgres-and-ssl .. _OWASP Dynamic Analysis tools for finding vulnerabilities: https://www.owasp.org/index.php/Category:Vulnerability_Scanning_Tools .. _owasp-jenkins: https://github.com/jay-johnson/owasp-jenkins .. _Simulations directory: https://github.com/jay-johnson/network-pipeline/tree/master/simulations .. _Django REST Framework + JWT + Swagger: https://github.com/jay-johnson/train-ai-with-django-swagger-jwt#django-rest-framework--jwt--swagger--keras--tensorflow .. _Prepare Dataset section: https://github.com/jay-johnson/network-pipeline/#prepare-dataset .. _Train Models section: https://github.com/jay-johnson/network-pipeline/#train-models .. _Datasets repository: https://github.com/jay-johnson/network-pipeline-datasets .. _AntiNex Core which has accuracies over 99.8%: https://github.com/jay-johnson/antinex-core#antinex-core .. _Jupyter notebook: https://github.com/jay-johnson/antinex-core/blob/master/docker/notebooks/AntiNex-Protecting-Django.ipynb

Why?

After digging into how Internet Chemotherapy_ worked with a simple Nerfball approach_, I wanted to see if I could train machine learning and AI models to defend this type of attack. Since the network is the first line to defend on the edge, on-premise or in the cloud, I wanted to start building the first line of defense and open source it. Also I do not know of any other toolchains to build defensive models using the network layer for free.

This repository automates dataset creation for training models by capturing network traffic on layers 2, 3 and 4 of the OSI model. Once a dataset has been Prepared it can be used to Train a Deep Neural Network. Pre-trained Deep Neural Networks can make live predictions on good or bad network traffic with the AntiNex Core.

.. _Internet Chemotherapy: https://0x00sec.org/t/internet-chemotherapy/4664 .. _Nerfball approach: https://github.com/jay-johnson/nerfball .. _OSI model: https://en.wikipedia.org/wiki/OSI_model .. _Prepared: https://github.com/jay-johnson/antinex-client#prepare-a-dataset .. _Train a Deep Neural Network: https://github.com/jay-johnson/antinex-client#using-pre-trained-neural-networks-to-make-predictions .. _AntiNex Core: https://github.com/jay-johnson/antinex-core#django---train-and-predict

How does it work?

This framework uses free open source tools to create the following publish-subscriber workflow:

#. Network traffic matches a capture tool filter #. Capture tool converts packet layers into JSON #. Capture tool publishes converted JSON dictionary to a message broker (Redis or RabbitMQ) #. Packet processor consumes dictionary from message broker #. Packet processor flattens dictionary #. Packet processor periodically writes csv dataset from collected, flattened dictionaries (configurable for snapshotting csv on n-th number of packets consumed) #. Flatten packets are published using JWT to a pre-trained Deep Neural Network for making predictions on if the network traffic is good or bad

Envisioned Deployment

  • For on-premise and cloud environments, this framework would deploy capture tools to load balancers and application servers. These capture tool agents would publish to a redis cluster outside of the load balancers and application servers for analysis. By doing this, models could also be tuned to defend on the load balancer tier or application server tier independently.

  • Remote edge machines would be running deployed, pre-trained, package-maintained models that are integrated with a prediction API. Periodic uploads of new, unexpected records would be sent encrypted back to the cloud for retraining models for helping defend an IoT fleet.

Detailed Version

The pipeline is a capture forwarding system focused on redundancy and scalability. Components-wise there are pre-configured capture tools that hook into the network devices on the operating system. If the capture tools find any traffic that matches their respective filter, then they json-ify the captured packet and forward it as a nested dictionary to a redis server (rabbitmq works as well, but requires setting the environment variables for authentication). Once the traffic packet dictionaries are in redis/rabbitmq, the packet processor consumes the nested dictionary and flattens them using pandas. The packet processors are set up to write csv datasets from the consumed, flattened dictionaries every 100 packets (you can configure the SAVE_AFTER_NUM environment variable to a larger number too).

Here are the included, standalone capture tools (all of which require root privileges to work):

#. capture_arp.py_ #. capture_icmp.py_ #. capture_ssh.py_ #. capture_tcp.py_ #. capture_telnet.py_ #. capture_udp.py_

.. _capture_arp.py: https://github.com/jay-johnson/network-pipeline/blob/master/network_pipeline/scripts/capture_arp.py .. _capture_icmp.py: https://github.com/jay-johnson/network-pipeline/blob/master/network_pipeline/scripts/capture_icmp.py .. _capture_ssh.py: https://github.com/jay-johnson/network-pipeline/blob/master/network_pipeline/scripts/capture_ssh.py .. _capture_tcp.py: https://github.com/jay-johnson/network-pipeline/blob/master/network_pipeline/scripts/capture_tcp.py .. _capture_telnet.py: https://github.com/jay-johnson/network-pipeline/blob/master/network_pipeline/scripts/capture_telnet.py .. _capture_udp.py: https://github.com/jay-johnson/network-pipeline/blob/master/network_pipeline/scripts/capture_udp.py

AntiNex Stack Status

AntiNex Network Pipeline is part of the AntiNex stack:

.. list-table:: :header-rows: 1

What packets and layers are supported?

Layer 2

  • Ethernet_
  • ARP_

Layer 3

  • IPv4_
  • IPv6_
  • ICMP_

Layer 4

  • TCP_
  • UDP_
  • Raw - hex data from TCP or UDP packet body

Layer 5

  • DNS_

.. _Ethernet: https://en.wikipedia.org/wiki/Ethernet .. _ARP: https://en.wikipedia.org/wiki/Address_Resolution_Protocol .. _IPv4: https://en.wikipedia.org/wiki/IPv4 .. _IPv6: https://en.wikipedia.org/wiki/IPv6 .. _ICMP: https://en.wikipedia.org/wiki/Internet_Control_Message_Protocol .. _TCP: https://en.wikipedia.org/wiki/Transmission_Control_Protocol .. _UDP: https://en.wikipedia.org/wiki/User_Datagram_Protocol .. _DNS: https://en.wikipedia.org/wiki/Domain_Name_System

How do I get started?

#. Install from pypi or build the development environment

::

    pip install network-pipeline

**Or you can set up the repository locally**

::

    mkdir -p -m 777 /opt/antinex
    git clone https://github.com/jay-johnson/network-pipeline.git /opt/antinex/pipeline
    cd /opt/antinex/pipeline
    virtualenv -p python3 /tmp/netpipevenv && source /tmp/netpipevenv/bin/activate && pip install -e .

#. Start Redis

This guide assumes redis is running in docker, but as long as there's an accessible redis server on port 6379 you can use that too. RabbitMQ works as well, but requires setting the environment variables for connectivity.

::

    # if you do not have docker-compose installed, you can try to install it with:
    # pip install docker-compose
    ./start.sh

#. Verify Redis is Working

::

    redis-cli

or

::

    telnet localhost 6379

#. Start Packet Processor for Consuming Messages_

Activate the virtual environment

::

    source /tmp/netpipevenv/bin/activate

Start it up

::

    ./network_pipeline/scripts/packets_redis.py

.. _Packet Processor for Consuming Messages: https://github.com/jay-johnson/network-pipeline/blob/master/network_pipeline/scripts/packets_redis.py

Making Live Predictions using Pre-trained Neural Networks

There are a few ways to make live predictions depending on how the pipeline and AntiNex assets are deployed:

#. Running the Full Django REST API stack using compose.yml (Co-located mode)

This will start the Packet Processor using the default compose.yml file:

https://github.com/jay-johnson/train-ai-with-django-swagger-jwt/blob/0d280216e3697f0d2cf7456095e37df64be73040/compose.yml#L105

Clone the repo:

::

    git clone https://github.com/jay-johnson/train-ai-with-django-swagger-jwt.git /opt/antinex/api
    cd /opt/antinex/api

Start the co-located container stack with the ``compose.yml`` file:

::

    docker-compose -f compose.yml up -d

#. Running Only the Network Pipeline compose.yml (Distributed mode)

This will just start the Network Pipeline container and assumes the REST API is running on another host.

https://github.com/jay-johnson/network-pipeline/blob/master/compose.yml

Use the command:

::

    docker-compose -f compose.yml up

#. Running the Packet Processor Manually Using Environment Variables (Development mode)

Make sure to source the correct environment file before running ``packets_redis.py`` (Packet Processor).

As an example the repository has a version that works with the `compose.yml`_ docker deployment:

::

    source envs/antinex-dev.env

.. _compose.yml: https://github.com/jay-johnson/network-pipeline/blob/master/compose.yml#L5

When building your own credentials and datasets, you may have special characters in the env file. Please use ``set -o allexport; source envs/antinex-dev.env; set +o allexport;`` to handle this case.

Right now the defaults do not have special characters, so the ``source`` command works just fine:

::

    export ANTINEX_PUBLISH_ENABLED=1
    export ANTINEX_URL=http://localhost:8010
    export ANTINEX_USER=root
    export ANTINEX_EMAIL=123321
    export ANTINEX_PASSWORD=123321
    export ANTINEX_PUBLISH_TO_CORE=1
    export ANTINEX_USE_MODEL_NAME=Full-Django-AntiNex-Simple-Scaler-DNN
    export ANTINEX_PUBLISH_REQUEST_FILE=/opt/antinex/client/examples/predict-rows-scaler-full-django.json
    export ANTINEX_FEATURES_TO_PROCESS=idx,arp_hwlen,arp_hwtype,arp_id,arp_op,arp_plen,arp_ptype,dns_default_aa,dns_default_ad,dns_default_an,dns_default_ancount,dns_default_ar,dns_default_arcount,dns_default_cd,dns_default_id,dns_default_length,dns_default_ns,dns_default_nscount,dns_default_opcode,dns_default_qd,dns_default_qdcount,dns_default_qr,dns_default_ra,dns_default_rcode,dns_default_rd,dns_default_tc,dns_default_z,dns_id,eth_id,eth_type,icmp_addr_mask,icmp_code,icmp_gw,icmp_id,icmp_ptr,icmp_seq,icmp_ts_ori,icmp_ts_rx,icmp_ts_tx,icmp_type,icmp_unused,ip_id,ip_ihl,ip_len,ip_tos,ip_version,ipv6_fl,ipv6_hlim,ipv6_nh,ipv6_plen,ipv6_tc,ipv6_version,ipvsix_id,pad_id,tcp_dport,tcp_fields_options.MSS,tcp_fields_options.NOP,tcp_fields_options.SAckOK,tcp_fields_options.Timestamp,tcp_fields_options.WScale,tcp_id,tcp_seq,tcp_sport,udp_dport,udp_id,udp_len,udp_sport
    export ANTINEX_IGNORE_FEATURES=
    export ANTINEX_SORT_VALUES=
    export ANTINEX_ML_TYPE=classification
    export ANTINEX_PREDICT_FEATURE=label_value
    export ANTINEX_SEED=42
    export ANTINEX_TEST_SIZE=0.2
    export ANTINEX_BATCH_SIZE=32
    export ANTINEX_EPOCHS=15
    export ANTINEX_NUM_SPLITS=2
    export ANTINEX_LOSS=binary_crossentropy
    export ANTINEX_OPTIMIZER=adam
    export ANTINEX_METRICS=accuracy
    export ANTINEX_HISTORIES=val_loss,val_acc,loss,acc
    export ANTINEX_VERSION=1
    export ANTINEX_CONVERT_DATA=1
    export ANTINEX_CONVERT_DATA_TYPE=float
    export ANTINEX_MISSING_VALUE=-1.0
    export ANTINEX_INCLUDE_FAILED_CONVERSIONS=false
    export ANTINEX_CLIENT_VERBOSE=1
    export ANTINEX_CLIENT_DEBUG=0

Load the Deep Neural Network into the AntiNex Core

Note: If you are running without the docker containers, please make sure to clone the client and datasets to disk:

::

mkdir -p -m 777 /opt/antinex
git clone https://github.com/jay-johnson/antinex-client.git /opt/antinex/client
git clone https://github.com/jay-johnson/antinex-datasets.git /opt/antinex/antinex-datasets

Load the Django Model into the Core

Please note this can take a couple minutes...

::

ai_train_dnn.py -u root -p 123321 -f deep-neural-networks/full-django.json

...

30196    -1.0 -1.000000  -1.000000  
30197    -1.0 -1.000000  -1.000000  
30198    -1.0 -1.000000  -1.000000  
30199    -1.0 -1.000000  -1.000000  

[30200 rows x 72 columns]

Capture Network Traffic

These tools are installed with the pip and require running with root to be able to hook into the local network devices for capturing traffic correctly.

Scapy_ currently provides the traffic capture tooling, but the code already has a semi-functional scalable, multi-processing engine to replace it. This will be ideal for dropping on a heavily utilized load balancer tier and run as an agent managed as a systemd service.

.. _Scapy: https://github.com/phaethon/kamene

#. Login as root

::

    sudo su

#. Activate the Virtual Environment

::

    source /tmp/netpipevenv/bin/activate

#. Capture TCP Data

By default TCP capture is only capturing traffic on ports: 80, 443, 8010, and 8443. This can be modified with the ``NETWORK_FILTER`` environment variable. Please avoid capturing on the redis port (default 6379) and rabbitmq port (default 5672) to prevent duplicate sniffing on the already-captured data that is being forwarded to the message queues which are ideally running in another virtual machine.

This guide assumes you are running all these tools from the base directory of the repository.

::

    ./network_pipeline/scripts/capture_tcp.py

Capture SSH Traffic

::

    ./network_pipeline/scripts/capture_ssh.py

Capture Telnet Traffic

::

    ./network_pipeline/scripts/capture_telnet.py

#. Capture UDP Data

With another terminal, you can capture UDP traffic at the same time

::

    sudo su

Start UDP capture tool

::

    source /tmp/netpipevenv/bin/activate && ./network_pipeline/scripts/capture_udp.py

#. Capture ARP Data

With another terminal, you can capture ARP traffic at the same time

::

    sudo su

Start ARP capture tool

::

    source /tmp/netpipevenv/bin/activate && ./network_pipeline/scripts/capture_arp.py

#. Capture ICMP Data

With another terminal, you can capture ICMP traffic at the same time

::

    sudo su

Start ICMP capture tool

::

    source /tmp/netpipevenv/bin/activate && ./network_pipeline/scripts/capture_icmp.py

Simulating Network Traffic

ZAP Testing with Web Applications

.. image:: https://www.owasp.org/images/1/11/Zap128x128.png :align: center

The repository includes ZAPv2 simulations targeting the follow application servers:

  • Django 2.0.1_
  • Flask RESTplus with Swagger_
  • React + Redux_
  • Vue_
  • Spring Pet Clinic_

.. _Django 2.0.1: https://github.com/jay-johnson/network-pipeline/tree/master/simulations/django .. _Flask RESTplus with Swagger: https://github.com/jay-johnson/network-pipeline/tree/master/simulations/flask .. _React + Redux: https://github.com/jay-johnson/network-pipeline/tree/master/simulations/react-redux .. _Spring Pet Clinic: https://github.com/jay-johnson/network-pipeline/tree/master/simulations/spring .. _Vue: https://github.com/jay-johnson/network-pipeline/tree/master/simulations/vue

I will be updating this guide with more ZAP simulation tests in the future.

Please refer to the Simulations README_ for more details on running these to capture network traffic during an attack.

.. _Simulations README: https://github.com/jay-johnson/network-pipeline/tree/master/simulations#network-traffic-simulations

Quick Simulations

If you want to just get started, here are some commands and tools to start simulating network traffic for seeding your csv datasets.

#. Send a TCP message

::

    ./network_pipeline/scripts/tcp_send_msg.py

#. Send a UDP message

(Optional) Start a UDP server for echo-ing a response on port 17000

::

    sudo ./network_pipeline/scripts/listen_udp_port.py
    2018-01-27T17:39:47.725377 - Starting UDP Server address=127.0.0.1:17000 backlog=5 size=1024 sleep=0.5 shutdown=/tmp/udp-shutdown-listen-server-127.0.0.1-17000

Send the UDP message

::

    ./network_pipeline/scripts/udp_send_msg.py
    sending UDP: address=('0.0.0.0', 17000) msg=testing UDP msg time=2018-01-27 17:40:04 - cc9cdc1a-a900-48c5-acc9-b8ff5919087b

(Optional) Verify the UDP server received the message

::

    2018-01-27T17:40:04.915469 received UDP data=testing UDP msg time=2018-01-27 17:40:04 - cc9cdc1a-a900-48c5-acc9-b8ff5919087b 

#. Simulate traffic with common shell tools

::

    nslookup 127.0.0.1; nslookup 0.0.0.0; nslookup localhost

::

    dig www.google.com; dig www.cnn.com; dig amazon.com

::

    wget https://www.google.com; wget http://www.cnn.com; wget https://amazon.com

::

    ping google.com; ping amazon.com

#. Run all of them at once

::

    nslookup 127.0.0.1; nslookup 0.0.0.0; nslookup localhost; dig www.google.com; dig www.cnn.com; dig amazon.com; wget https://www.google.com; wget http://www.cnn.com; wget https://amazon.com; ping google.com; ping amazon.com

Capturing an API Simulation

Simulations that can automate + fuzz authenticated REST API service layers like ZAP_ are available in the AntiNex datasets repository_ for training Deep Neural Networks. The included Flask ZAP Simulation_ does login using OAuth 2.0 with ZAP for REST API validation, but there is a known issue with the swagger openapi integration within ZAP that limits the functionality (for now):

https://github.com/zaproxy/zaproxy/issues/4072

.. _ZAP: https://github.com/zaproxy/zaproxy .. _AntiNex datasets repository: https://github.com/jay-johnson/antinex-datasets .. _Flask ZAP Simulation: https://github.com/jay-johnson/network-pipeline/blob/master/simulations/zap/tests/flask-zap.py#L26

#. Start a local server listening on TCP port 80

::

    sudo ./network_pipeline/scripts/listen_tcp_port.py 
    2018-01-27T23:59:22.344687 - Starting Server address=127.0.0.1:80 backlog=5 size=1024 sleep=0.5 shutdown=/tmp/shutdown-listen-server-127.0.0.1-80

#. Run a POST curl

::

    curl -i -vvvv -POST http://localhost:80/TESTURLENDPOINT -d '{"user_id", "1234", "api_key": "abcd", "api_secret": "xyz"}'
    *   Trying 127.0.0.1...
    * TCP_NODELAY set
    * Connected to localhost (127.0.0.1) port 80 (#0)
    > POST /TESTURLENDPOINT HTTP/1.1
    > Host: localhost
    > User-Agent: curl/7.55.1
    > Accept: */*
    > Content-Length: 59
    > Content-Type: application/x-www-form-urlencoded
    > 
    * upload completely sent off: 59 out of 59 bytes
    POST /TESTURLENDPOINT HTTP/1.1
    Host: localhost
    User-Agent: curl/7.55.1
    Accept: */*
    Content-Length: 59
    Content-Type: application/x-www-form-urlencoded

    * Connection #0 to host localhost left intact
    {"user_id", "1234", "api_key": "abcd", "api_secret": "xyz"}    

#. Verify local TCP server received the POST

::

    2018-01-28T00:00:54.445294 received msg=7 data=POST /TESTURLENDPOINT HTTP/1.1
    Host: localhost
    User-Agent: curl/7.55.1
    Accept: */*
    Content-Length: 59
    Content-Type: application/x-www-form-urlencoded

    {"user_id", "1234", "api_key": "abcd", "api_secret": "xyz"} replying

Larger Traffic Testing

#. Host a local server listening on TCP port 80 using nc

::

    sudo nc -l 80

#. Send a large TCP msg to the nc server

::

    ./network_pipeline/scripts/tcp_send_large_msg.py

Inspecting the CSV Datasets

By default, the dataset csv files are saved to: /tmp/netdata-*.csv and you can set a custom path by exporting the environment variables DS_NAME, DS_DIR or OUTPUT_CSV as needed.

::

ls /tmp/netdata-*.csv 
/tmp/netdata-2018-01-27-13-13-58.csv  /tmp/netdata-2018-01-27-13-18-25.csv  /tmp/netdata-2018-01-27-16-44-08.csv
/tmp/netdata-2018-01-27-13-16-38.csv  /tmp/netdata-2018-01-27-13-19-46.csv
/tmp/netdata-2018-01-27-13-18-03.csv  /tmp/netdata-2018-01-27-13-26-34.csv

Prepare Dataset

This is a guide for building training datasets from the recorded csvs in the network pipeline datasets_ repository. Once a dataset is prepared locally, you can use the modelers_ to build and tune machine learning and AI models.

.. _network pipeline datasets: https://github.com/jay-johnson/network-pipeline-datasets .. _modelers: https://github.com/jay-johnson/network-pipeline/network_pipeline/scripts/modelers

Install

This will make sure your virtual environment is using the latest pandas pip and install the latest ML/AI pips. Please run it from the repository's base directory.

::

source /tmp/netpipevenv/bin/activate
pip install --upgrade -r ./network_pipeline/scripts/builders/requirements.txt

Overview

I have not uploaded a local recording from my development stacks, so for now this will prepare a training dataset by randomly applying non-attack - 0 and attack - 1 labels for flagging records as attack and non-attack records.

Setup

Please export the path to the datasets repository on your host:

::

export DS_DIR=<path_to_datasets_base_directory>

Or clone the repository to the default value for the environment variable (DS_DIR=/opt/antinex/datasets) with:

::

mkdir -p -m 777 /opt/antinex
git clone https://github.com/jay-johnson/network-pipeline-datasets.git /opt/antinex/datasets

Build Dataset

This will take a few moments to prepare the csv files.

::

prepare_dataset.py
2018-01-31 23:38:04,298 - builder - INFO - start - builder
2018-01-31 23:38:04,298 - builder - INFO - finding pipeline csvs in dir=/opt/antinex/datasets/*/*.csv
2018-01-31 23:38:04,299 - builder - INFO - adding file=/opt/antinex/datasets/react-redux/netdata-2018-01-29-13-36-35.csv
2018-01-31 23:38:04,299 - builder - INFO - adding file=/opt/antinex/datasets/spring/netdata-2018-01-29-15-00-12.csv
2018-01-31 23:38:04,299 - builder - INFO - adding file=/opt/antinex/datasets/vue/netdata-2018-01-29-14-12-44.csv
2018-01-31 23:38:04,299 - builder - INFO - adding file=/opt/antinex/datasets/django/netdata-2018-01-28-23-12-13.csv
2018-01-31 23:38:04,299 - builder - INFO - adding file=/opt/antinex/datasets/django/netdata-2018-01-28-23-06-05.csv
2018-01-31 23:38:04,299 - builder - INFO - adding file=/opt/antinex/datasets/flask-restplus/netdata-2018-01-29-11-30-02.csv

Verify Dataset and Tracking Files

By default the environment variable OUTPUT_DIR writes the dataset csv files to /tmp:

::

ls -lrth /tmp/*.csv
-rw-rw-r-- 1 jay jay  26M Jan 31 23:38 /tmp/fulldata_attack_scans.csv
-rw-rw-r-- 1 jay jay 143K Jan 31 23:38 /tmp/cleaned_attack_scans.csv

Additionally, there are data governance, metadata and tracking files created as well:

::

ls -lrth /tmp/*.json
-rw-rw-r-- 1 jay jay 2.7K Jan 31 23:38 /tmp/fulldata_metadata.json
-rw-rw-r-- 1 jay jay 1.8K Jan 31 23:38 /tmp/cleaned_metadata.json

Train Models

I am using Keras_ to train a Deep Neural Network to predict attack (1) and non-attack (0) records using a prepared dataset. Please checkout the keras_dnn.py_ module if you are interested in learning more. Please let me know if there are better ways to set up the neural network layers or hyperparameters as well.

.. _Keras: https://github.com/keras-team/keras .. _keras_dnn.py: https://github.com/jay-johnson/network-pipeline/blob/master/network_pipeline/scripts/modelers/keras_dnn.py

#. Source the virtual environment

::

    source /tmp/netpipevenv/bin/activate

#. (Optional) Train with a different dataset

By default the environment variable ``CSV_FILE=/tmp/cleaned_attack_scans.csv`` can be changed to train models with another prepared dataset.

To do so run:

::

    export CSV_FILE=<path_to_csv_dataset_file>

Train a Keras Deep Neural Network

Included in the pip is a keras_dnn.py script. Below is a sample log from a training run that scored an 83.33% accuracy predicting attack vs non-attack records.

Please note, this can take a few minutes if you are not using a GPU. Also the accuracy results will be different depending on how you set up the model.

::

keras_dnn.py 
Using TensorFlow backend.
2018-02-01 00:01:30,653 - keras-dnn - INFO - start - keras-dnn
2018-02-01 00:01:30,653 - keras-dnn - INFO - Loading csv=/tmp/cleaned_attack_scans.csv
2018-02-01 00:01:30,662 - keras-dnn - INFO - Predicting=label_value with features=['eth_type', 'idx', 'ip_ihl', 'ip_len', 'ip_tos', 'ip_version', 'label_value', 'tcp_dport', 'tcp_fields_options.MSS', 'tcp_fields_options.Timestamp', 'tcp_fields_options.WScale', 'tcp_seq', 'tcp_sport'] ignore_features=['label_name', 'ip_src', 'ip_dst', 'eth_src', 'eth_dst', 'src_file', 'raw_id', 'raw_load', 'raw_hex_load', 'raw_hex_field_load', 'pad_load', 'eth_dst', 'eth_src', 'ip_dst', 'ip_src'] records=2217
2018-02-01 00:01:30,664 - keras-dnn - INFO - splitting rows=2217 into X_train=1773 X_test=444 Y_train=1773 Y_test=444
2018-02-01 00:01:30,664 - keras-dnn - INFO - creating sequential model
2018-02-01 00:01:30,705 - keras-dnn - INFO - compiling model
2018-02-01 00:01:30,740 - keras-dnn - INFO - fitting model - please wait
Train on 1773 samples, validate on 444 samples
Epoch 1/50
2018-02-01 00:01:30.947551: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2
1773/1773 [==============================] - 1s 704us/step - loss: 2.5727 - acc: 0.8404 - val_loss: 2.6863 - val_acc: 0.8333
Epoch 2/50
1773/1773 [==============================] - 1s 626us/step - loss: 2.5727 - acc: 0.8404 - val_loss: 2.6863 - val_acc: 0.8333

...

Epoch 50/50
1773/1773 [==============================] - 1s 629us/step - loss: 2.5727 - acc: 0.8404 - val_loss: 2.6863 - val_acc: 0.8333
444/444 [==============================] - 0s 17us/step
2018-02-01 00:02:29,118 - keras-dnn - INFO - Accuracy: 83.33333333333334

Optional Tweaks

#. Colorized Logging for Debugging

Export the path to the colorized logger config. This examples assumes you are in the base directory of the repository.

::

    export LOG_CFG=$(pwd)/network_pipeline/log/colors-logging.json

Linting

flake8 .

pycodestyle --exclude=./simulations,.tox,.eggs

License

Apache 2.0 - Please refer to the LICENSE_ for more details

.. _License: https://github.com/jay-johnson/network-pipeline/blob/master/LICENSE

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc