Security News
38% of CISOs Fear They’re Not Moving Fast Enough on AI
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
github.com/unixpickle/clockwork
Clockwork RNNs use multiple RNN blocks at different timescales to deal with the vanishing gradient problem. They were shown to work well for speech recognition and sequence generation, but I hypothesize that they are ill-suited for many other sequence modeling tasks. In this repository, I will not only implement traditional clockwork RNNs, but also experiment with related architectures.
In a traditional CWRNN, each sub-block in the RNN has a different period p
. At timesteps t
where t%p = 0
, we update the sub-block's state s
using the traditional RNN update: s = tanh(W*[inS; in] + b)
. Here, inS
is a concatenation of the old state s
with the states of all the "slower" blocks. The in
vector is the input at time t
.
The instant I saw the CWRNN update formula, I perceived a flaw in its design: a block with a high period does not receive information about every input. For instance, if I trained a CWRNN on a text corpus and gave it the string hello
, a sub-block with period 2 would only see hlo
. Since slower blocks only receive information from even slower blocks, using exponential periods (2, 4, 8, ...) makes it impossible for a block with period p
to receive information about inputs for times t
where t%p != 0
. For tasks like text modeling, I hypothesize that this limitation would prevent slower blocks from capturing any long-term contextual information.
The tasks in the original CWRNN paper are almost "special" in that they do not suffer from the CWRNN's input-visibility limitation. The first task is sequence prediction with no inputs. Obviously, since the RNN has no inputs, the input-visibility limitation would not be manifest. The second task, speech recognition, lends itself naturally to CWRNNs despite their limitations. The reason is that speech (or any sound) is a sum of periodic functions. Thus, if you only sample every p
timesteps from a speech signal, you still get some information about the signal. Essentially, the slower blocks in such an environment are seeing downsampled audio data.
To mitigate the input-visibility limitation of CWRNNs, I propose a minor change. In my new architecture, fully-connected CWRNN, sub-blocks receive information from all sub-blocks, whether faster or slower. This way, a sub-block can learn about skipped timesteps by looking at the state of a faster sub-block which did not miss those inputs. For text modeling, it could be seen how this architecture might make sense: a rapid sub-block could process words, a slower sub-block could process sentences based on those words, etc.
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Research
Security News
Socket researchers uncovered a backdoored typosquat of BoltDB in the Go ecosystem, exploiting Go Module Proxy caching to persist undetected for years.
Security News
Company News
Socket is joining TC54 to help develop standards for software supply chain security, contributing to the evolution of SBOMs, CycloneDX, and Package URL specifications.