
Security News
Open Source Maintainers Feeling the Weight of the EU’s Cyber Resilience Act
The EU Cyber Resilience Act is prompting compliance requests that open source maintainers may not be obligated or equipped to handle.
RNN and general weights, gradients, & activations visualization in Keras & TensorFlow
RNN weights, gradients, & activations visualization in Keras & TensorFlow (LSTM, GRU, SimpleRNN, CuDNN, & all others)
Introspection is a powerful tool for debugging, regularizing, and understanding neural networks; this repo's methods enable:
It enables answering questions such as:
For further info on potential uses, see this SO.
pip install see-rnn
or clone repository
Will possibly implement:
return_sequences=False
_id
and layer
? Need duplicates resolution scheme# for all examples
grads = get_gradients(model, 1, x, y) # return_sequences=True, layer index 1
grads = get_gradients(model, 2, x, y) # return_sequences=False, layer index 2
outs = get_outputs(model, 1, x) # return_sequences=True, layer index 1
# all examples use timesteps=100
# NOTE: `title_mode` kwarg below was omitted for simplicity; for Gradient visuals, would set to 'grads'
EX 1: bi-LSTM, 32 units - activations, activation='relu'
features_1D(outs[:1], share_xy=False)
features_1D(outs[:1], share_xy=True, y_zero=True)
return_sequences=True
)share_xy=False
better pronounces features' shape, whereas =True
allows for an even comparison - but may greatly 'shrink' waveforms to appear flatlined (not shown here)EX 2: one sample, uni-LSTM, 6 units - gradients, return_sequences=True
, trained for 20 iterations
features_1D(grads[:1], n_rows=2)
EX 3: all (16) samples, uni-LSTM, 6 units -- return_sequences=True
, trained for 20 iterations
features_1D(grads, n_rows=2)
features_2D(grads, n_rows=4, norm=(-.01, .01))
EX 4: all (16) samples, uni-LSTM, 6 units -- return_sequences=True
, trained for 200 iterations
features_1D(grads, n_rows=2)
features_2D(grads, n_rows=4, norm=(-.01, .01))
EX 5: 2D vs. 1D, uni-LSTM: 256 units, return_sequences=True
, trained for 200 iterations
features_1D(grads[0, :, :])
features_2D(grads[:, :, 0], norm=(-.0001, .0001))
EX 6: bi-GRU, 256 units (512 total) -- return_sequences=True
, trained for 400 iterations
features_2D(grads[0], norm=(-.0001, .0001), reflect_half=True)
norm
for more units is expected, as approx. the same loss-derived gradient is being distributed across more parameters (hence the squared numeric average is less)EX 7: 0D, all (16) samples, uni-LSTM, 6 units -- return_sequences=False
, trained for 200 iterations
features_0D(grads)
return_sequences=False
utilizes only the last timestep's gradient (which is still derived from all timesteps, unless using truncated BPTT), requiring a new approachEX 8: LSTM vs. GRU vs. SimpleRNN, unidir, 256 units -- return_sequences=True
, trained for 250 iterations
features_2D(grads, n_rows=8, norm=(-.0001, .0001), xy_ticks=[0,0], title_mode=False)
EX 9: uni-LSTM, 256 units, weights -- batch_shape = (16, 100, 20)
(input)
rnn_histogram(model, 'lstm', equate_axes=False, bias=False)
rnn_histogram(model, 'lstm', equate_axes=True, bias=False)
rnn_heatmap(model, 'lstm')
equate_axes=True
for an even comparison across kernels and gates, improving quality of comparison, but potentially degrading visual appealEX 10: bi-CuDNNLSTM, 256 units, weights -- batch_shape = (16, 100, 16)
(input)
rnn_histogram(model, 'bidir', equate_axes=2)
rnn_heatmap(model, 'bidir', norm=(-.8, .8))
CuDNNLSTM
(and CuDNNGRU
) biases are defined and initialized differently - something that can't be inferred from histogramsEX 11: uni-CuDNNGRU, 64 units, weights gradients -- batch_shape = (16, 100, 16)
(input)
rnn_heatmap(model, 'gru', mode='grads', input_data=x, labels=y, cmap=None, absolute_value=True)
absolute_value=True
and a greyscale colormapNew
is the most active kernel gate (input-to-hidden), suggesting more error correction on permitting information flowReset
is the least active recurrent gate (hidden-to-hidden), suggesting least error correction on memory-keepingEX 12: NaN detection: LSTM, 512 units, weights -- batch_shape = (16, 100, 16)
(input)
EX 13: Sparse Conv1D autoencoder weights -- w = layer.get_weights()[0]; w.shape == (16, 64, 128)
features_2D(w, n_rows=16, norm=(-.1, .1), tight=True, borderwidth=1, title_mode=title)
# title = "((Layer Channels vs. Kernels) vs. Weights) vs. Input Channels -- norm = (-0.1, 0.1)"
Conv1D
sparse autoencoder layers; network trained with Dropout(0.5, noise_shape=(batch_size, 1, channels))
(Spatial Dropout), encouraging sparse features which may benefit classificationQUICKSTART: run sandbox.py, which includes all major examples and allows easy exploration of various plot configs.
Note: if using tensorflow.keras
imports, set import os; os.environ["TF_KERAS"]='1'
. Minimal example below.
visuals_gen.py functions can also be used to visualize Conv1D
activations, gradients, or any other meaningfully-compatible data formats. Likewise, inspect_gen.py also works for non-RNN layers.
import numpy as np
from keras.layers import Input, LSTM
from keras.models import Model
from keras.optimizers import Adam
from see_rnn import get_gradients, features_0D, features_1D, features_2D
def make_model(rnn_layer, batch_shape, units):
ipt = Input(batch_shape=batch_shape)
x = rnn_layer(units, activation='tanh', return_sequences=True)(ipt)
out = rnn_layer(units, activation='tanh', return_sequences=False)(x)
model = Model(ipt, out)
model.compile(Adam(4e-3), 'mse')
return model
def make_data(batch_shape):
return np.random.randn(*batch_shape), \
np.random.uniform(-1, 1, (batch_shape[0], units))
def train_model(model, iterations, batch_shape):
x, y = make_data(batch_shape)
for i in range(iterations):
model.train_on_batch(x, y)
print(end='.') # progbar
if i % 40 == 0:
x, y = make_data(batch_shape)
units = 6
batch_shape = (16, 100, 2*units)
model = make_model(LSTM, batch_shape, units)
train_model(model, 300, batch_shape)
x, y = make_data(batch_shape)
grads_all = get_gradients(model, 1, x, y) # return_sequences=True, layer index 1
grads_last = get_gradients(model, 2, x, y) # return_sequences=False, layer index 2
features_1D(grads_all, n_rows=2, xy_ticks=[1,1])
features_2D(grads_all, n_rows=8, xy_ticks=[1,1], norm=(-.01, .01))
features_0D(grads_last)
FAQs
RNN and general weights, gradients, & activations visualization in Keras & TensorFlow
We found that see-rnn demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The EU Cyber Resilience Act is prompting compliance requests that open source maintainers may not be obligated or equipped to handle.
Security News
Crates.io adds Trusted Publishing support, enabling secure GitHub Actions-based crate releases without long-lived API tokens.
Research
/Security News
Undocumented protestware found in 28 npm packages disrupts UI for Russian-language users visiting Russian and Belarusian domains.