[NLPL Task Force (A)] Fwd: [MLS research seminar] Hopfield Networks is All You Need - November 27 at 14.00
Stephan Oepen
oe at ifi.uio.no
Thu Nov 26 15:12:55 UTC 2020
colleagues,
we have an internal EOSC-Nordic meeting scheduled for tomorrow, but
there is a conflicting seminar presentation that i would like to
attend (see below; feel free to zoom in if you are interested :-).
any chance we could postpone to e.g. 15:00 CET next friday (december
4), or sometime before 14:00 CET the following week (december 10)?
with apologies for the short notice! oe
---------- Forwarded message ---------
From: Milena Pavlovic <milenpa at student.matnat.uio.no>
Date: Fri, Nov 13, 2020 at 10:44 AM
Subject: [MLS research seminar] Hopfield Networks is All You Need -
November 27 at 14.00
To: mls-research-seminar at ifi.uio.no <mls-research-seminar at ifi.uio.no>
Cc: ramsauer at ml.jku.at <ramsauer at ml.jku.at>
Dear all,
The next MLS research seminar will be on Friday, November 27 at 14.00
on Zoom (meeting details below). Hubert Ramsauer from the Institute
for Machine Learning at the Johannes Kepler University Linz will give
a talk titled “Hopfield Networks is All You Need”.
Abstract: The transformer and BERT models pushed the performance on
NLP tasks to new levels via their attention mechanism. We show that
this attention mechanism is the update rule of a modern Hopfield
network with continuous states. This new Hopfield network can store
exponentially (with the dimension) many patterns, converges with one
update, and has exponentially small retrieval errors. The number of
stored patterns must be traded off against convergence speed and
retrieval error. The new Hopfield network has three types of energy
minima (fixed points of the update): (1) global fixed point averaging
over all patterns, (2) metastable states averaging over a subset of
patterns, and (3) fixed points which store a single pattern.
Transformers learn an attention mechanism by constructing an embedding
of patterns and queries into an associative space. Transformer and
BERT models operate in their first layers preferably in the global
averaging regime, while they operate in higher layers in metastable
states. The gradient in transformers is maximal in the regime of
metastable states, is uniformly distributed when averaging globally,
and vanishes when a fixed point is near a stored pattern. Based on the
Hopfield network interpretation, we analyzed learning of transformer
and BERT architectures. Learning starts with attention heads that
average and then most of them switch to metastable states. However,
the majority of heads in the first layers still averages and can be
replaced by averaging operations like the Gaussian weighting that we
propose. In contrast, heads in the last layers steadily learn and seem
to use metastable states to collect information created in lower
layers. These heads seem to be a promising target for improving
transformers. Neural networks that integrate Hopfield networks, that
are equivalent to attention heads, outperform other methods on immune
repertoire classification, where the Hopfield net stores several
hundreds of thousands of patterns. We provide a new PyTorch layer
called “Hopfield” which allows to equip deep learning architectures
with modern Hopfield networks as new powerful concept comprising
pooling, memory, and attention. The implementation is available at:
https://github.com/ml-jku/hopfield-layers.
The full paper is available at this link.
Looking forward to seeing you all at the seminar!
Kind regards,
Milena
Zoom details:
https://uio.zoom.us/j/67683473454?pwd=YUNyZWhRTDZMdjRvWVkxTWRWdHdmQT09
Meeting ID: 676 8347 3454
Passcode: 096069
Documentation on how to use Zoom can be found here:
https://www.uio.no/english/services/it/phone-chat-videoconf/zoom/
More information about the infrastructure
mailing list