[NLPL Task Force (A)] concerns about LUMI
Stephan Oepen
oe at ifi.uio.no
Sat Oct 31 15:04:25 UTC 2020
hi joerg, thanks for this initiative!
i received word from Sigma2 that AMD has promised CUDA support and
that it was important to break free of the NVIDA lock-in. but, still,
even if they manage to pull off a CUDA compatibility layer that might
allow compilation of something like PyTorch; will compatiblity also
cover separate NVIDIA packages, e.g. cuDNN or NCCL? so, a coordinated
effort from our community, and possibly other deep learning folks, to
remind national providers in the various LUMI participant countries
will surely be worthwhile!
i have been meaning to refresh the NLPL steering group, in part to
have some semi-official body to invoke in case we need to speak on
behalf of our community. i think this is the opportunity to do so
(while we are compiling our prioritized list of tools that we would
want to run on LUMI next fall), so will send separate email about that
to all the original NLPL partner sites ...
regarding essential tools, i imagine you consider PyTorch and
TensorFlow as given? andrey has been compiling a survey of
large-scale pre-training frameworks, and i suggest we nominate at
least some of these:
http://wiki.nlpl.eu/index.php/Eosc/pretraining
oe
On Fri, Oct 30, 2020 at 10:13 AM Tiedemann, Jörg
<jorg.tiedemann at helsinki.fi> wrote:
>
> Dear infra people at NLPL,
>
>
> I just had a discussion with other ML researchers about concerns with the new LUMI infrastructure that will come next year. As you may have seen, the consortium decided to go for AMD hardware and many people are concerned about compatibility of their code. CSC promises dedicated support to help with porting everything to the new system but there is, of course, a lot of concerns that people are a bit too optimistic about that procedure.
>
> In order to push the LUMI team to be a bit more careful about the setup and the effort needed for setting everything up we would like to compile a list of essentials that we need in various fields to show what we are concerned about. We would like to push CSC and others in their support team to start earlier and with more resources to port, test and benchmark software we need to make our research possible.
>
> In connection with this, it would be great to have a stack of software and libraries that we consider to be essential for us, if possible with priorities. That list could then be (partially) included in a message going to CSC to make them aware of the situation.
>
> Could you help me to compile such a list, possibly with short comments about the items on the list. That can include general-purpose libraries and specialised software etc. Also, ask your colleagues. Thank you!
>
>
> All the best,
> Jörg
>
> *****************************************************************
> Jörg Tiedemann
> Language Technology https://blogs.helsinki.fi/language-technology/
> University of Helsinki
>
More information about the infrastructure
mailing list