New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

github.com/unixpickle/clockwork

Package Overview
Dependencies
Alerts
File Explorer
Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

github.com/unixpickle/clockwork

  • v0.0.0-20161220182624-0125b4afe9ea
  • Source
  • Go
  • Socket score

Version published
Created
Source

clockwork RNNs

Clockwork RNNs use multiple RNN blocks at different timescales to deal with the vanishing gradient problem. They were shown to work well for speech recognition and sequence generation, but I hypothesize that they are ill-suited for many other sequence modeling tasks. In this repository, I will not only implement traditional clockwork RNNs, but also experiment with related architectures.

Traditional CWRNNs

In a traditional CWRNN, each sub-block in the RNN has a different period p. At timesteps t where t%p = 0, we update the sub-block's state s using the traditional RNN update: s = tanh(W*[inS; in] + b). Here, inS is a concatenation of the old state s with the states of all the "slower" blocks. The in vector is the input at time t.

The instant I saw the CWRNN update formula, I perceived a flaw in its design: a block with a high period does not receive information about every input. For instance, if I trained a CWRNN on a text corpus and gave it the string hello, a sub-block with period 2 would only see hlo. Since slower blocks only receive information from even slower blocks, using exponential periods (2, 4, 8, ...) makes it impossible for a block with period p to receive information about inputs for times t where t%p != 0. For tasks like text modeling, I hypothesize that this limitation would prevent slower blocks from capturing any long-term contextual information.

The tasks in the original CWRNN paper are almost "special" in that they do not suffer from the CWRNN's input-visibility limitation. The first task is sequence prediction with no inputs. Obviously, since the RNN has no inputs, the input-visibility limitation would not be manifest. The second task, speech recognition, lends itself naturally to CWRNNs despite their limitations. The reason is that speech (or any sound) is a sum of periodic functions. Thus, if you only sample every p timesteps from a speech signal, you still get some information about the signal. Essentially, the slower blocks in such an environment are seeing downsampled audio data.

Fully-Connected CWRNNs

To mitigate the input-visibility limitation of CWRNNs, I propose a minor change. In my new architecture, fully-connected CWRNN, sub-blocks receive information from all sub-blocks, whether faster or slower. This way, a sub-block can learn about skipped timesteps by looking at the state of a faster sub-block which did not miss those inputs. For text modeling, it could be seen how this architecture might make sense: a rapid sub-block could process words, a slower sub-block could process sentences based on those words, etc.

FAQs

Package last updated on 20 Dec 2016

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc