Opened 11 years ago

Closed 9 years ago

Last modified 44 years ago

#94 closed bug (Rejected)

Bad space behaviour with huge input file

Reported by: ajk Owned by: simonpj
Priority: normal Milestone:
Component: Compiler Version: 5.04
Keywords: Cc:
Operating System: Architecture:
Type of failure: Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

The attached files (actually, just UnicodeData.hs but
the other file is imported by it) trigger very bad
space and time behaviour in ghc during compilation. One
attempt went up to 500 MB of virtual memory (256
physical available) on my i386 machine. The compilation
ran for more than an hour until killed (stuck in the
rename phase).

I had another version (available on request) of this
that has all the data in a string, compiled into an
object file using gcc (in no time!), accessed using FFI
and then using read made into a real data structure.
The program, looking up one entry in the resulting
FiniteMap, has a memory hit of approximately 130 MB and
runs in one minute (which, while still too much, is
bearable). So it seems there is lots to improve in the
compiler in this case (we are essentially talking about
the same process taking way too much time and memory
when done by the compiler, compared to the program
itself doing it - and even then the memory requirement
is outrageous).

Even some sort of special-casing pragma that allows me
to ask for lighter treatment of pure data would be good
(and a way to statically initialize a FiniteMap...)

I'm sorry but I do not have any simpler input files to
offer.

Attachments (1)

problem.tar.2.gz (231.3 KB) - added by ajk 11 years ago.

Download all attachments as: .zip

Change History (5)

Changed 11 years ago by ajk

comment:1 Changed 9 years ago by simonmar

  • Status changed from assigned to closed
Logged In: YES 
user_id=48280

This source file is simply huge (6M) and contains a single
large nested non-constant expression.  Also, it isn't
syntactically correct, and even when the syntax errors are
fixed it has type errors.

I'm going to close this bug.  Feel free to submit more
examples of code that GHC takes too long to compile, but
there's not much we can do with this one.

comment:2 Changed 9 years ago by ajk

Logged In: YES 
user_id=14329

Of course it is a huge file.  That's the whole point of this
bug report :)

And as to it being syntactically invalid, and having type
errors - I believe this is all the more reason to fix this.
 How am I supposed to fix these issues if the compiler does
not give me a diagnostic?

I can appreciate that typechecking can have inherently bad
space/time behaviour for pathetic cases, but if we are
talking about a syntactically invalid file, the compiler
should not even get to typechecking, now should it. 
Parsing, however, is a well-understood part of computer
science and has no such nasty surprises.

comment:3 Changed 9 years ago by simonmar

Logged In: YES 
user_id=48280

Ok, the syntax errors aren't really syntax errors.  You
missed the "0x" off the front of many hex constants, with
the results that things like 000A is two lexemes.  It parsed
ok, but the renamer would have caught the errors.

The point is I don't know whether we should expect GHC to be
able to compile this module in reasonable time/space.  The
requirements are likely to increase non-linearly with the
size of the program, because it is one huge non-constant
nested exrpression.

It is true that GHC doesn't have a good way to declare large
amounts of constant data.  This is a shortcoming, but not a
bug (please by all means submit a feature request).

comment:4 Changed 9 years ago by ajk

Logged In: YES 
user_id=14329

Ok.  I can relate to that :)

Some background: When I originally submitted this bug, I had
already tried many ways of achieving what I wanted; some of
them involved constant expressions, some didn't.  All of
them had bad space/time behaviour; what I submitted was what
I had when I gave up.  The best solution I found involved a
C string foreign-imported and parsed at Haskell side at
run-time, like I indicated in the original message.

I'm now satisfied with your response (in case it isn't
apparent:).
Note: See TracTickets for help on using tickets.