Opened 2 years ago

Last modified 8 days ago

#7695 patch bug

Hang when locale-archive and gconv-modules are not there

Reported by: hpd Owned by:
Priority: highest Milestone: 7.10.2
Component: None Version: 7.8.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime crash Test Case:
Blocked By: Blocking:
Related Tickets: #8977, #10298 Differential Revisions: Phab:D898

Description

Running a (statically) compiled program in an environment where /usr/lib/locale or /usr/lib/gconv (or ../lib64/..) are not there, causes it to hog the CPU and rapidly allocate memory (this happens for example in an empty chroot).

Steps to reproduce:

echo "main = return ()" > test.hs
ghc -static -optl-static -optl-pthread -o test test.hs
chroot `pwd` /test

Strace output:

$ /strace -e trace=file -e signal= /test
execve("/test", ["/test"], [/* 17 vars */]) = 0
open("/usr/lib64/locale/locale-archive", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib64/locale/de_DE.UTF-8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib64/locale/de_DE.utf8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib64/locale/de_DE/LC_CTYPE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib64/locale/de.UTF-8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib64/locale/de.utf8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib64/locale/de/LC_CTYPE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib64/gconv/gconv-modules", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

Strace output (with LC_ALL=C):

$ LC_ALL=C strace -e trace=file -e signal= /test
execve("/test", ["/test"], [/* 18 vars */]) = 0
open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib64/gconv/gconv-modules", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

In both cases, there are no further system calls (apart from timer ticks and mmap).

I was also able to reproduce this behavior without a chroot, by replacing the paths from strace's output with something nonexistent in the binary (manually, with a hex-editor). I didn't try dynamically linked binaries, yet.

One problem may be, that iconv_open constantly returns -1 if it doesn't find the files listed above.

Change History (19)

comment:1 Changed 2 years ago by igloo

  • difficulty set to Unknown

It looks like you're right; iconv_open returns -1, so we throw an exception, but printing the exception means we try opening iconv again, and loop forever.

comment:2 Changed 2 years ago by igloo

  • Milestone set to 7.8.1

comment:3 Changed 14 months ago by trommler

  • Milestone changed from 7.8.1 to 7.8.2
  • Operating System changed from Linux to Unknown/Multiple
  • Version changed from 7.6.2 to 7.8.1

This was also reported on Solaris. See #8977.

comment:4 Changed 14 months ago by thoughtpolice

  • Milestone changed from 7.8.2 to 7.8.3

comment:5 Changed 14 months ago by trommler

  • difficulty changed from Unknown to Moderate (less than a day)
  • Owner set to trommler
  • Type of failure changed from None/Unknown to Runtime crash

OK, I'll take a stab at that one.

comment:6 Changed 12 months ago by thoughtpolice

  • Milestone changed from 7.8.3 to 7.8.4

Moving to 7.8.4.

comment:7 Changed 8 months ago by thoughtpolice

  • Milestone changed from 7.8.4 to 7.10.1

Moving (in bulk) to 7.10.4

comment:8 follow-up: Changed 6 months ago by thomie

This bug seems to be fixed in HEAD (ghc-7.9.20141121), although I can not find the commit that might have fixed it.

We should probably add a regression test before closing.

comment:9 in reply to: ↑ 8 ; follow-up: Changed 6 months ago by trommler

  • difficulty changed from Moderate (less than a day) to Unknown
  • Owner trommler deleted

Replying to thomie:

This bug seems to be fixed in HEAD (ghc-7.9.20141121), although I can not find the commit that might have fixed it.

No, the bug ist still present in HEAD @96d29b5. I just verified on openSUSE 13.2 i586.

Unfortunately, I don't have time to work on a fix right now.

comment:10 in reply to: ↑ 9 ; follow-up: Changed 6 months ago by thomie

No, the bug ist still present in HEAD @96d29b5. I just verified on openSUSE 13.2 i586.

Unfortunately, I don't have time to work on a fix right now.

Strange. When I run the commands from the description with ghc-7.8.3, cpu goes indeed to 100% and the process never finishes. When I do the same with HEAD, everything works fine. I'm on Ubuntu 14.04 x86_64.

trommler: do you maybe have a better test for this bug?

comment:11 in reply to: ↑ 10 ; follow-up: Changed 6 months ago by trommler

Replying to thomie:

Strange. When I run the commands from the description with ghc-7.8.3, cpu goes indeed to 100% and the process never finishes. When I do the same with HEAD, everything works fine. I'm on Ubuntu 14.04 x86_64.

Strange. Perhaps I will try again with a freshly checked out tree. I am never sure if the submodules worked out right. But blame that on my lack of git skills.

trommler: do you maybe have a better test for this bug?

I use ghc --version.

comment:12 in reply to: ↑ 11 Changed 6 months ago by trommler

Replying to trommler:

Replying to thomie:

Strange. When I run the commands from the description with ghc-7.8.3, cpu goes indeed to 100% and the process never finishes. When I do the same with HEAD, everything works fine. I'm on Ubuntu 14.04 x86_64.

Strange. Perhaps I will try again with a freshly checked out tree. I am never sure if the submodules worked out right. But blame that on my lack of git skills.

On a fresh clone of HEAD at 41c3545 I still see the issue on openSUSE 13.2 i586.

comment:13 Changed 5 months ago by thoughtpolice

  • Milestone changed from 7.10.1 to 7.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:14 Changed 5 weeks ago by simonmar

  • Milestone changed from 7.12.1 to 7.10.2
  • Priority changed from normal to highest

Let's get this one fixed, Michael Snoyman recently ran into an issue that sounds like the same thing when working with Docker.

comment:15 Changed 5 weeks ago by rwbarton

Oh, do you mean #10298? Indeed it looks the same.

comment:16 Changed 5 weeks ago by simonpj

See other dups (beyond #10298): #8977, #8928

comment:17 Changed 2 weeks ago by thoughtpolice

comment:18 Changed 10 days ago by hsyl20

Fixed with my patch for #10298 (see my comment there).

Can you check that you obtain the same result on your system with this code:

$> cat test.c
#include <langinfo.h>
#include <locale.h>
#include <stdio.h>

int main() {
   setlocale(LC_CTYPE, "");
   printf("LC_CTYPE: %s\n", nl_langinfo(CODESET));
   return 0;
}

$> gcc -static -L/path/to/static/glibc test.c -o test
$> sudo chroot `pwd` /test
LC_CTYPE: ANSI_X3.4-1968

I don't know if the default LC_CTYPE is the same everywhere...

(You should get the same result if you have compiled the glibc with --disable-shared when you call ./test directly without the chroot)

comment:19 Changed 8 days ago by thoughtpolice

  • Differential Revisions set to Phab:D898
  • Status changed from new to patch
Note: See TracTickets for help on using tickets.