[NLPL Infrastructure] lustre-related error in custom installation of binutils
Stephan Oepen
oe at ifi.uio.no
Tue May 4 20:18:38 UTC 2021
dear colleagues:
in the context of the NLPL use case in EOSC-Nordic, we are preparing
an updated version of what we call the NLPL virtual laboratory,
essentially a community-maintained collection of core software and
data resources for natural language processing. on Puhti, our virtual
laboratory resides in '/projapp/nlpl/' and is collectively maintained
by EOSC-Nordic project members at helsinki and oslo universities.
for perfect parallelism of the installed software, we have now fully
automated the process of compiling and installing a collection of
dozens of packages, using EasyBuild. in doing so, we use system-wide
modules where they are available (in the exact same versions and
configurations) and let EasyBuild fall back on building dependencies
as needed. on Puhti, that means we end up compiling, among other
things, our own versions of gcc and GNU binutils.
it now appears that the version of binutils created by the stock
EasyBuild recipe on Puhti ends up incompatible with the lustre
filesystem below '/projapp/'. we have isolated the problem outside
the EasyBuild environment, and it appears to boil down to ld.gold
failing in fallocate(2) with
[oe at puhti-login1 ~]$ module purge
[oe at puhti-login1 ~]$ module use -a /projappl/nlpl/software/20/etc
[oe at puhti-login1 ~]$ module load GCCcore/8.3.0 binutils/2.32
[oe at puhti-login1 ~]$ module list
Currently Loaded Modules:
1) GCCcore/8.3.0 2) binutils/2.32
[oe at puhti-login1 ~]$ cat conftest.c
int main (void) {
;
return 0;
}
[oe at puhti-login1 ~]$ gcc conftest.c
/projappl/nlpl/software/20/packages/binutils/2.32/bin/ld.gold: fatal
error: a.out: Unknown error 524
collect2: error: ld returned 1 exit status
[oe at puhti-login1 ~]$ strace -f gcc conftest.c 2>&1 | grep fallocate
[pid 110197] fallocate(21, 0, 0, 7840) = -1 ENOTSUPP (Unknown error 524)
our current hypothesis is that the standard way of EasyBuild
bootstrapping gcc and binutils on Puhti ends up with a configuration
that is incompatible with creating binaries on the lustre filesystem.
when moving the above file to '/tmp/', say, i can compile it without
errors.
i realize that we are off the beaten track here and well outside of
what i would expect as regular support from your end. but our hope is
that you might see the abstract appeal in fully parallel software
installations across different systems, maybe even more so where this
is managed within our researcher community, i.e. has the potential to
shift some of the maintenance and support burden for
discipline-specific software towards us (semi-expert) users :-).
does the above error from ld.gold ring a bell for someone, by any chance?
i see that the system-wide gcc 8.3.0 module on Puhti does not include
binutils and was built using Spack; in fact, it appears there is no
separate binutils module beyond the stock RHEL 7 binaries (binutils
2.27), is that correct? we could of course try dropping our custom
binutils from the EasyBuild dependency tree, but that would (a) reduce
the degree of 'full' parallelism across systems and (b) require us to
modify core EasyBuild recipes. hence, if possible we would much
rather understand the underlying nature of the above problem and
resolve it. i imagine this may lead to a refinement of the EasyBuild
recipe for binutils, which we could then submit upstream.
any comments on the above or suggestions for how to debug further will
be warmly appreciated.
with thanks in advance, oe
More information about the infrastructure
mailing list