[NLPL Infrastructure] Fwd: [CSC #479570] lustre-related error in custom installation of binutils
Andrey Kutuzov
andreku at ifi.uio.no
Wed May 5 19:13:53 UTC 2021
OK, it seems that in fact all we have to do is add the following line to
the GCCcore easyconfig:
use_gold_linker = False
Then it will use the good old ld instead of ld.gold, and this eliminates
the fallocate problem (may be at the cost of being marginally slower).
I have tested it on my own Easybuild environment on Puhti. However, I
can't change our NLPL easyconfig at
/projappl/nlpl/software/20/packages/EasyBuild/4.3.4/easybuild/easyconfigs/g/GCCcore/GCCcore-8.3.0.eb
...because it is write-protected for the group.
I guess we would want to apply a patch to this easyconfig at the setup
stage once we detect that we are on Puhti.
On 05.05.2021 17:33, Stephan Oepen wrote:
> if that in fact is a valid option to the configure script for binutils;
> could you check?
>
> oe
>
>
> On Wed, 5 May 2021 at 17:08 Andrey Kutuzov <andreku at ifi.uio.no
> <mailto:andreku at ifi.uio.no>> wrote:
>
> No, I have not received this.
>
> So, we probably should try to add "--no-posix-fallocate" to the
> binutils
> easyconfig, right?
>
> On 05.05.2021 16:59, Stephan Oepen wrote:
> > a very helpful reply from CSC; see below. did you get this from
> their
> > RT queue already, andrey?
> >
> > oe
> >
> >
> > ---------- Forwarded message ---------
> > From: Henrik Nortamo via RT <research-support at csc.fi
> <mailto:research-support at csc.fi>>
> > Date: Wed, May 5, 2021 at 10:02 AM
> > Subject: [CSC #479570] lustre-related error in custom
> installation of binutils
> > To: <oe at ifi.uio.no <mailto:oe at ifi.uio.no>>
> >
> >
> > Hi,
> >
> > Binutils is not provided as a module on Puhti, but the versions which
> > gcc 8.3.0 uses can be found under
> > /appl/spack/install-tree/gcc-4.8.5/binutils-2.31.1-xbslr3/lib/.
> >
> > That particular error comes from our version of lustre ( 2.12.6
> ) not
> > supporting the "fallocate" systemcall (it has been implemented in
> > lustre 2.14.0, https://jira.whamcloud.com/browse/LU-3606
> <https://jira.whamcloud.com/browse/LU-3606>). The
> > returned error code might also not be correct (see
> > https://jira.whamcloud.com/browse/LU-14301
> <https://jira.whamcloud.com/browse/LU-14301>), which means that some
> > program fallbacks from fallocate is not working properly.
> >
> > So I'm guessing easybuild builds binutils on a non lustre filesystem
> > e.g /tmp, /local_scratch, $TMPIDR or similar
> > checks that fallocate is supported and then when you try to
> compile on
> > lustre it fails. Not fully certain, but you might also
> > be able to disable fallocate support manually when building binutils
> > (could require patching the configure scripts).
> >
> > A temporary workaround is to add "--no-posix-fallocate " to LD_FLAGS
> > when compiling.
> >
> >
> > Br.
> > Henrik Nortamo
> > CSC
> >
> >
> > On Tue May 04 23:19:05 2021, oe at ifi.uio.no <mailto:oe at ifi.uio.no>
> wrote:
> >> dear colleagues:
> >>
> >> in the context of the NLPL use case in EOSC-Nordic, we are preparing
> >> an updated version of what we call the NLPL virtual laboratory,
> >> essentially a community-maintained collection of core software and
> >> data resources for natural language processing. on Puhti, our
> virtual
> >> laboratory resides in '/projapp/nlpl/' and is collectively
> maintained
> >> by EOSC-Nordic project members at helsinki and oslo universities.
> >>
> >> for perfect parallelism of the installed software, we have now fully
> >> automated the process of compiling and installing a collection of
> >> dozens of packages, using EasyBuild. in doing so, we use
> system-wide
> >> modules where they are available (in the exact same versions and
> >> configurations) and let EasyBuild fall back on building dependencies
> >> as needed. on Puhti, that means we end up compiling, among other
> >> things, our own versions of gcc and GNU binutils.
> >>
> >> it now appears that the version of binutils created by the stock
> >> EasyBuild recipe on Puhti ends up incompatible with the lustre
> >> filesystem below '/projapp/'. we have isolated the problem outside
> >> the EasyBuild environment, and it appears to boil down to ld.gold
> >> failing in fallocate(2) with
> >>
> >> [oe at puhti-login1 ~]$ module purge
> >> [oe at puhti-login1 ~]$ module use -a /projappl/nlpl/software/20/etc
> >> [oe at puhti-login1 ~]$ module load GCCcore/8.3.0 binutils/2.32
> >> [oe at puhti-login1 ~]$ module list
> >>
> >> Currently Loaded Modules:
> >> 1) GCCcore/8.3.0 2) binutils/2.32
> >>
> >>
> >>
> >> [oe at puhti-login1 ~]$ cat conftest.c
> >> int main (void) {
> >> ;
> >> return 0;
> >> }
> >> [oe at puhti-login1 ~]$ gcc conftest.c
> >> /projappl/nlpl/software/20/packages/binutils/2.32/bin/ld.gold: fatal
> >> error: a.out: Unknown error 524
> >> collect2: error: ld returned 1 exit status
> >> [oe at puhti-login1 ~]$ strace -f gcc conftest.c 2>&1 | grep fallocate
> >> [pid 110197] fallocate(21, 0, 0, 7840) = -1 ENOTSUPP (Unknown
> error 524)
> >>
> >> our current hypothesis is that the standard way of EasyBuild
> >> bootstrapping gcc and binutils on Puhti ends up with a configuration
> >> that is incompatible with creating binaries on the lustre
> filesystem.
> >> when moving the above file to '/tmp/', say, i can compile it without
> >> errors.
> >>
> >> i realize that we are off the beaten track here and well outside of
> >> what i would expect as regular support from your end. but our
> hope is
> >> that you might see the abstract appeal in fully parallel software
> >> installations across different systems, maybe even more so where
> this
> >> is managed within our researcher community, i.e. has the
> potential to
> >> shift some of the maintenance and support burden for
> >> discipline-specific software towards us (semi-expert) users :-).
> >>
> >> does the above error from ld.gold ring a bell for someone, by
> any chance?
> >>
> >> i see that the system-wide gcc 8.3.0 module on Puhti does not
> include
> >> binutils and was built using Spack; in fact, it appears there is no
> >> separate binutils module beyond the stock RHEL 7 binaries (binutils
> >> 2.27), is that correct? we could of course try dropping our custom
> >> binutils from the EasyBuild dependency tree, but that would (a)
> reduce
> >> the degree of 'full' parallelism across systems and (b) require
> us to
> >> modify core EasyBuild recipes. hence, if possible we would much
> >> rather understand the underlying nature of the above problem and
> >> resolve it. i imagine this may lead to a refinement of the
> EasyBuild
> >> recipe for binutils, which we could then submit upstream.
> >>
> >> any comments on the above or suggestions for how to debug
> further will
> >> be warmly appreciated.
> >>
> >> with thanks in advance, oe
> >>
> >>
>
>
> --
> Andrey
> Language Technology Group (LTG)
> University of Oslo
>
--
Andrey
Language Technology Group (LTG)
University of Oslo
More information about the infrastructure
mailing list