Opened 13 months ago

Closed 9 months ago

#9345 closed bug (fixed)

Data.List.inits is extremely slow

Reported by: dfeuer Owned by: ekmett
Priority: high Milestone: 7.8.4
Component: Core Libraries Version: 7.8.3
Keywords: Cc: hvr, ekmett, core-libraries-committee@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions: Phab:D329

Description

As discussed on libraries@…, Data.List.inits is extremely slow (try running print $ length $ inits [1..100000] if you don't believe me). As discussed, there are at least two reasonable fixes. One of them (named initsR in the attached) is a one-liner and gives very good performance in general, but poor performance in certain cases that may or may not appear in real code. The other (named initsQ in the attached) is slightly more complex and slightly slower in general, but its performance appears to be robust. I would personally lean toward initsQ for Data.List.

Attachments (7)

inits.hs (4.6 KB) - added by dfeuer 13 months ago.
qrscomp.html.gz (171.6 KB) - added by dfeuer 13 months ago.
Benchmark comparing reversing, stripped-down-queue, and sequence-based implementations.
report.html.gz (179.3 KB) - added by nomeata 13 months ago.
report with benchmarks for initsQ2
report.html.2.gz (179.3 KB) - added by dfeuer 13 months ago.
nomeata's report with benchmarks for initsQ2, compressed only once for less confusion.
inits.diff (1.2 KB) - added by dfeuer 11 months ago.
Patch implementing Bertram Felgenhauer's initsT'
inits.2.diff (1.6 KB) - added by dfeuer 11 months ago.
A plain version of the initsT' patch, without build or inlining, and using Word
Inits.hs (935 bytes) - added by dfeuer 11 months ago.
Test demonstrating problem with sum . concat . inits

Download all attachments as: .zip

Change History (43)

Changed 13 months ago by dfeuer

comment:1 follow-ups: Changed 13 months ago by nomeata

Interesting. Can you elaborate (just for the record here) when initsQ is faster?

I would find it strange to add such complexity “just for” inits. Now, if we had such a Queue type (which looks useful in general) in base anyways, using it would be a different story...

How is the performance when the existing Seq is used instead of Queue?

comment:2 in reply to: ↑ 1 Changed 13 months ago by dfeuer

Replying to nomeata:

Interesting. Can you elaborate (just for the record here) when initsQ is faster?

I would find it strange to add such complexity “just for” inits. Now, if we had such a Queue type (which looks useful in general) in base anyways, using it would be a different story...

How is the performance when the existing Seq is used instead of Queue?

I'm attaching the Criterion comparison of initsR, initsQ, and a new initsS that uses Data.Sequence. On most tests, initsS is much slower than initsQ, reflecting the fact that sequences are much heavier data structures than these snoc-builder almost-queues.

initsQ is faster when the heads (or more generally the first few elements) of several of the elements of the result list are inspected by traversing the result list. If you just calculate head $ initsR [1..n] !! n, then the time required to reverse the final list can be amortized over the steps in !!n. But if you were to calculate, for some inexplicable reason, sum $ map head $ initsR [1..n], you only have n steps over which to amortize n^2 work. The vague notion I've had in the back of my mind for a vaguely realistic example is some sort of breadth-first traversal of inits [1..n].

Changed 13 months ago by dfeuer

Benchmark comparing reversing, stripped-down-queue, and sequence-based implementations.

comment:3 Changed 13 months ago by nomeata

JFTR: Seeing that the rear list of Queue is only consed and reversed, I tried to change it to a difference list ([a] -> [a]). The difference in performance is small and mixed, sometimes slightly better, sometimes slightly worse.

comment:4 in reply to: ↑ 1 ; follow-up: Changed 13 months ago by dfeuer

Replying to nomeata:

I would find it strange to add such complexity “just for” inits. Now, if we had such a Queue type (which looks useful in general) in base anyways, using it would be a different story...

If it were added, it wouldn't be called Queue, because (unlike the real banker's queue) it's actually incapable of implementing uncons efficiently. It's really just a very limited, but very fast and light, list builder. If it's useful for things other than implementing inits, I would not be opposed to adding it in a separate module with some sufficiently clear name. I don't know enough about what sorts of things should and should not be provided by base to really comment much beyond that.

comment:5 Changed 13 months ago by dfeuer

  • Milestone set to 7.8.4

I'm taking the liberty of setting an early milestone for this one because the current situation is silly and the change is entirely local.

comment:6 in reply to: ↑ 4 Changed 13 months ago by nomeata

Replying to dfeuer:

Replying to nomeata:

I would find it strange to add such complexity “just for” inits. Now, if we had such a Queue type (which looks useful in general) in base anyways, using it would be a different story...

If it were added, it wouldn't be called Queue, because (unlike the real banker's queue) it's actually incapable of implementing uncons efficiently. It's really just a very limited, but very fast and light, list builder. If it's useful for things other than implementing inits, I would not be opposed to adding it in a separate module with some sufficiently clear name. I don't know enough about what sorts of things should and should not be provided by base to really comment much beyond that.

After reading more of the code I agree that this not too big “just for” inits, and I’m in favor of adding it.

I also thought about ways to make it even faster. In particular, I tried to make GHC get rid of Queue and simply pass the three values around as arguments to the inner loop, and also make this fuse. With this definition it works:

myScanl' :: (b -> a -> b) -> b -> [a] -> [b]
myScanl' f a bs = build $ \c n ->
    a `seq` a `c`
    foldr (\b g x -> (let b' = f x b in b' `seq` (b' `c` g b')))
          (\b -> b `seq` n)
          bs
          a

initsQ2 :: [a] -> [[a]]
initsQ2 = map toListQ . myScanl' snocQ emptyQ

and we get this nice inner loop:

Rec {
InitsQ.initsQ1
  :: forall a. [a] -> GHC.Prim.Word# -> [a] -> [a] -> [[a]]
InitsQ.initsQ1 =
  \ (@ a)
    (w :: [a])
    (ww :: GHC.Prim.Word#)
    (ww1 :: [a])
    (ww2 :: [a]) ->
    case w of _ {
      [] -> GHC.Types.[];
      : y ys ->
        let {
          a1 :: GHC.Prim.Word#
          a1 = GHC.Prim.plusWord# ww (__word 1) } in
        case GHC.Prim.word2Int# (GHC.Prim.popCnt# a1) of _ {
          __DEFAULT ->
            let {
              ipv :: [a]
              ipv = GHC.Types.: y ww2 } in
            GHC.Types.:
              (GHC.Base.++ ww1 (GHC.List.reverse1 ipv (GHC.Types.[])))
              (InitsQ.initsQ1 ys a1 ww1 ipv);
          1 ->
            let {
              ipv :: [a]
              ipv =
                GHC.Base.++
                  ww1 (GHC.List.reverse1 (GHC.Types.: y ww2) (GHC.Types.[])) } in
            GHC.Types.:
              (GHC.Base.++ ipv (InitsQ.initsQ4))
              (InitsQ.initsQ1 ys a1 ipv (GHC.Types.[]))
        }
    }
end Rec }

It (unsurprisingly) yields consistently equal or better results than initsQ. (See attached report, initsQ2. Ignore initsQDL, thats the same, but with the rear list implemented as a difference list, which makes little difference.)

This raises the question whether we want such a fusing strict scanl' in Data.List as well – it turned out to be useful here.

Last edited 13 months ago by nomeata (previous) (diff)

comment:7 Changed 13 months ago by dfeuer

That's a neat idea. If I'm not mistaken, seq in the argument to build leads to a proof obligation to justify the safety of the fusion. Do you have a proof?

comment:8 follow-up: Changed 13 months ago by nomeata

That's a neat idea. If I'm not mistaken, seq in the argument to build leads to a proof obligation to justify the safety of the fusion. Do you have a proof?

No. What exactly is the proof obligation in this case?

comment:9 in reply to: ↑ 8 Changed 13 months ago by dfeuer

Replying to nomeata:

That's a neat idea. If I'm not mistaken, seq in the argument to build leads to a proof obligation to justify the safety of the fusion. Do you have a proof?

No. What exactly is the proof obligation in this case?

According to the short cut fusion page on the Haskell Wiki, the foldr/build equivalence in general is only guaranteed when the argument to build does not use seq; otherwise there are ways to break it so that the fused program is less lazy than it should be. That page gives one way around that limitation, which involves restricting the other arguments to foldr, but that's not an option here—a "bare" build is exposed to the world. So if I understand things right (which is never guaranteed at all), you're left having to write your own proof, from scratch or based on other theorems not mentioned on that page, that the foldr/build rule is safe here.

comment:10 Changed 13 months ago by dfeuer

I've worked on it a bit, and if I'm not mistaken, myScanl' is safe if, whenever a /= _|_, and however I choose evil, wrong, and f, the "correct" expression

e1 = a `evil`
     foldr evil wrong $
       foldr (\b g x -> (let b' = f x b in b' `seq` (b' : g b')))
          (\b -> b `seq` [])
          bs
          a

gives the same result as the fused expression

e2 = a `evil`
     foldr (\b g x -> (let b' = f x b in b' `seq` (b' `evil` g b')))
           (\b -> b `seq` wrong)
           bs
           a

But I'm not sure how to prove this or find a counterexample. Any ideas? Note: I think it likely is valid; I just don't see how to prove it.

Last edited 13 months ago by dfeuer (previous) (diff)

comment:11 follow-up: Changed 13 months ago by nomeata

I forgot to attach the report mentioned in comment:6, doing that now.

Changed 13 months ago by nomeata

report with benchmarks for initsQ2

Changed 13 months ago by dfeuer

nomeata's report with benchmarks for initsQ2, compressed only once for less confusion.

comment:12 in reply to: ↑ 11 Changed 13 months ago by dfeuer

Replying to nomeata:

I forgot to attach the report mentioned in comment:6, doing that now.

I was finally able to figure out how to read that file, and wow. Those are some impressive results. If we can verify that the fusion on the left is safe, initsQ2 is clearly the way to go. If we can't, it looks like we probably want to use the same thing, but without the build.

comment:13 follow-up: Changed 13 months ago by nomeata

I was finally able to figure out how to read that file, and wow. Those are some impressive results.

Of course the allocation and destruction of the Queue is not for free. Just to make sure I didn’t do anything stupid: Can you reproduce my results?

If we are worried about seq we can simply inline my myScanl' in initsQ2 and replace the seq by case statements. Or leave the seq but argue that we are always applying them to values of type Queue, nothing breaks. And by that argument, as long as myScanl is only used for initsQ2, we should be safe.

I guess we only should worry about actually adding scanl' to the libraries, but that’s a different issue and should be discussed in #9356 I guess.

We could also write initsQ in the completely inlined form, i.e. the core above. Plus point: No need to define the Queue datatype (which would be dead code anyways). Minus point: Doesn’t look elegant, no fusion possible.

Last edited 13 months ago by nomeata (previous) (diff)

comment:14 in reply to: ↑ 13 Changed 13 months ago by dfeuer

Replying to nomeata:

If we are worried about seq we can simply inline my myScanl' in initsQ2 and replace the seq by case statements. Or leave the seq but argue that we are always applying them to values of type Queue, nothing breaks. And by that argument, as long as myScanl is only used for initsQ2, we should be safe.

I believe I have now pretty much completed the proof. It's very ugly, because I don't have a sufficiently firm understanding of the higher-order fold involved, but here's the outline (minus horrors of immature mathematics):

foldr evil wrong (scanl' f a bs)
= foldr evil wrong $
    a `seq`
    a : foldr (\b g x -> let b' = f x b in b' `seq` (b` : g b'))
              (\b -> b `seq` [])
              bs
              a -- Call this e1 a bs

=?= (\c n ->
    a `seq`
    a `c` foldr (\b g x -> let b' = f x b in b' `seq` (b' `c` g b'))
                (\b -> b `seq` n)
                bs
                a) evil wrong  -- Call this e2 a bs

There are two trivial base cases:

e1 [] = e2 [] = evil a wrong
e1 _|_ = e2 _|_ = evil a _|_

Then the part I found difficult was proving that

e1 a (q:bs) = evil a $ let b1' = f a q in b1' `seq` e1 b1' bs

to match the much simpler

e2 a (q:bs) = evil a $ let b1' = f a q in b1' `seq` e2 b1' bs

For the infinite case, I don't know the formalism involved, but the general idea is that evil can only inspect a finite amount of its second argument before producing something, so we can always (temporarily) cut off the list somewhere past that.

I guess we only should worry about actually adding scanl' to the libraries, but that’s a different issue and should be discussed in #9356 I guess.

Yes, but I now think we should.

We could also write initsQ in the completely inlined form, i.e. the core above. Plus point: No need to define the Queue datatype (which would be dead code anyways). Minus point: Doesn’t look elegant, no fusion possible.

It doesn't look bad at all, but I think it's pretty much write-only code—it's hard to see what it's doing and why, and hard to experiment with as we've been doing (swapping modular pieces around to try different things). Giving up possible fusion to attain less clarity does not look like a good plan to me.

I'm going to do some benchmarking and profiling myself; I'd like to toss a fusable non-strict scanl into the ring for comparison, and ramp up the scale on some of the benchmarks to improve resolution.

comment:15 Changed 12 months ago by dfeuer

  • Priority changed from normal to high

Changed 11 months ago by dfeuer

Patch implementing Bertram Felgenhauer's initsT'

comment:16 Changed 11 months ago by dfeuer

  • Status changed from new to patch

My testing indicates that for large lists (e.g., 10,000 elements), Bertram Felgenhauer's simple and mostly very clean initsT' implementation is much better than any variation on initsQ when concatenating the results. I wrapped it up in build to get a little fusion potential (not essential) and uploaded a patch. I would still like very much to understand why using take' gives slightly better performance than using take, but it very clearly does.

comment:17 Changed 11 months ago by dfeuer

If someone is able to do benchmark comparisons for lists of various sizes, that would really be great; I'm not currently able to use Criterion on GHC 7.9. It's possible that initsQ is better when the list is fairly small relative to the cache size and initsT' is better when it's large, in which case someone will have to make a judgement about which case is more important.

comment:18 follow-up: Changed 11 months ago by nomeata

I’m really worryied seeing INLINE on such a rather large function. Doesn’t that mean that we will never use the compiled inits from the library but rather re-compile it at every use site? That does not seem to be good. (Disclaimer: Have not actually tried it.)

I have run criterion on HEAD earlier, I might be able to set it up again. Do you have a ready-to-run benchmark that you would like me to run?

comment:19 in reply to: ↑ 18 Changed 11 months ago by dfeuer

Replying to nomeata:

I’m really worryied seeing INLINE on such a rather large function. Doesn’t that mean that we will never use the compiled inits from the library but rather re-compile it at every use site? That does not seem to be good. (Disclaimer: Have not actually tried it.)

I have run criterion on HEAD earlier, I might be able to set it up again. Do you have a ready-to-run benchmark that you would like me to run?

I think we can just skip the build and get rid of INLINE without any significant harm. The fusion potential with this definition is quite limited anyway. The most interesting-looking fusion opportunities (not enabled by that definition anyway) seem to be inits/enumFromTo and concat/inits. The former is great if things are, and remain, unboxed, and horrible otherwise, and we can't tell the difference, unfortunately. The latter might be worth thinking about, if anyone actually writes that. This looks very good in Core, if we can find a way to accomplish it:

concat (inits xs) = build (\cons nil ->
  let
    go _ _ []     = nil
    go n k (y:ys)
      | n == k    = go (n+1) 0 xs
      | otherwise = y `cons` go n (k+1) ys
  in go 1 0 xs)

Changed 11 months ago by dfeuer

A plain version of the initsT' patch, without build or inlining, and using Word

comment:20 Changed 11 months ago by dfeuer

I've attached a new patch without the build or inlining, and upgrading from Int to Word to double the number of lists we can pull from inits . cycle before the world explodes.

comment:21 Changed 11 months ago by dfeuer

  • Status changed from patch to new

I worked with Bertram Felgenhauer a bit tonight, and it looks like he's right; initsQ2 really is better. Unfortunately, it appears that the implementation of scanl' given here has the potential to blow things up for initsQ2 when combined with concat. In particular, if I do

main = print $ sum $ concat $ initsQ2 [1..10000::Int]

then it uses up all my RAM and takes forever. Interrupting fusion between initsQ2 and [1..1000] does not help at all, but interrupting it between concat and initsQ2 seems to work. Removing the sum also prevents it from using all my RAM. If I replace sum with foldl' (+) 0 then it runs in a small amount of RAM, but allocation is still very high and it's very slow. I'm still struggling to tease apart the interactions here. One thing that seems clear is that at its worst, it fails to recognize that it can treat the lazy foldl of sum as a strict foldl', which is disastrous. But that's not the only thing going wrong.

comment:22 follow-up: Changed 11 months ago by nomeata

Didn’t look at the core yet, but this might be a case of non-linear recursion where an arity analysis in insufficient to get good code for a left fold, see 5.1.1 in http://pp.ipd.kit.edu/uploads/publikationen/breitner14callarity.pdf.

I can have a closer look later. Can you, with these examples, please always include one complete self-contained file? Otherwise there is a risk that I might be testing with the wrong version of inits/scanl'/whatever. I assume you are on GHC HEAD?

Changed 11 months ago by dfeuer

Test demonstrating problem with sum . concat . inits

comment:23 in reply to: ↑ 22 Changed 11 months ago by dfeuer

Replying to nomeata:

Didn’t look at the core yet, but this might be a case of non-linear recursion where an arity analysis in insufficient to get good code for a left fold, see 5.1.1 in http://pp.ipd.kit.edu/uploads/publikationen/breitner14callarity.pdf.

I can have a closer look later. Can you, with these examples, please always include one complete self-contained file? Otherwise there is a risk that I might be testing with the wrong version of inits/scanl'/whatever. I assume you are on GHC HEAD?

I've attached a complete, self-contained test case. As I mentioned in an email, you should keep a close eye on RAM usage or run this program with heap limits set if you don't want to crash your computer. If this is an arity analysis problem that's too hard to fix, we can work around it by writing inits xs = don'tFuse . map toListQ . scanl' snocQ emptyQ to ensure that inits doesn't fuse on the problematic (and, in fact, rather unimportant) side.

comment:24 Changed 11 months ago by nomeata

Thanks. I changed it to foo = sum $ concat $ initsQ' $ [1..10000::Int] (so that I don’t get any print-related stuff), and -ddump-call-arity gives me this code:

[LclId,
 Arity=2,
 CallArity=2,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
         ConLike=True, WorkFree=True, Expandable=True,
         Guidance=IF_ARGS [0 0] 150 0}]
c =
  \ (x :: [Int]) (y [OS=OneShot] :: Int -> Int) ->
    letrec {
      go [Occ=LoopBreaker] :: [Int] -> Int -> Int
      [LclId,
       Arity=1,
       CallArity=1,
       Str=DmdType,
       Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=1, Value=True,
               ConLike=True, WorkFree=True, Expandable=True,
               Guidance=IF_ARGS [30] 110 60}]
      go =
        \ (ds :: [Int]) ->
          case ds of _ [Occ=Dead] {
            [] -> y;
            : y ys ->
              let {
                ds1 [OS=OneShot] :: Int -> Int
                [LclId,
                 Str=DmdType,
                 Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=0, Value=False,
                         ConLike=False, WorkFree=False, Expandable=False,
                         Guidance=IF_ARGS [] 20 0}]
                ds1 = go ys } in
              \ (ds2 :: Int) -> ds1 ($fNumInt_$c+ ds2 y)
          }; } in
    go x

foo :: Int
[LclIdX,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, WorkFree=False, Expandable=False,
         Guidance=IF_ARGS [] 634 0}]
foo =
  c (++ ([]) (reverse1 ([]) ([])))
    (letrec {
       go [Occ=LoopBreaker] :: Int# -> Queue Int -> Int -> Int
       [LclId,
        Arity=2,
        CallArity=2,
        Str=DmdType,
        Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=2, Value=True,
                ConLike=True, WorkFree=True, Expandable=True,
                Guidance=IF_ARGS [80 20] 464 0}]
       go =
         \ (x :: Int#) (eta :: Queue Int) ->
           let {
             b :: Int
             [LclId,
              Str=DmdType,
              Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=0, Value=True,
                      ConLike=True, WorkFree=True, Expandable=True,
                      Guidance=IF_ARGS [] 10 20}]
             b = I# x } in
           case eta of _ [Occ=Dead] { Queue dt f r ->
           let {
             a :: Word#
             [LclId,
              Str=DmdType,
              Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=0, Value=False,
                      ConLike=False, WorkFree=False, Expandable=True,
                      Guidance=IF_ARGS [] 1 0}]
             a = plusWord# dt (__word 1) } in
           let {
             r :: [Int]
             [LclId,
              Str=DmdType,
              Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=0, Value=True,
                      ConLike=True, WorkFree=True, Expandable=True,
                      Guidance=IF_ARGS [] 10 30}]
             r = : b r } in
           case word2Int# (popCnt# a) of _ [Occ=Dead] {
             __DEFAULT ->
               c (++ f (reverse1 r ([])))
                 (case x of wild {
                    __DEFAULT -> go (+# wild 1) (Queue a f r);
                    10000 -> \ (eta :: Int) -> eta
                  });
             1 ->
               let {
                 ipv :: [Int]
                 [LclId,
                  Str=DmdType,
                  Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=0, Value=False,
                          ConLike=False, WorkFree=False, Expandable=False,
                          Guidance=IF_ARGS [] 60 0}]
                 ipv = ++ f (reverse1 r ([])) } in
               c (++ ipv (reverse1 ([]) ([])))
                 (case x of wild {
                    __DEFAULT -> go (+# wild 1) (Queue a ipv ([]));
                    10000 -> \ (eta :: Int) -> eta
                  })
           }
           }; } in
     go 1 a)
    (I# 0)

The go inside foo is the interesting function. Its type has three arguments, but its outermost lambda only takes two. This is common when fusing a left-fold. Only that Call Arity is not sufficient to see that we’d want this to be expanded to thre arguments (CallArity=3).

Would CallArity=3 be correct? Yes, because the recursive calls to go call it with two arguments, pass that to c, which calls it at most once (so no sharing could be lost) with at least one argument.

Unfortunately, that information is currently not available to the call arity analysis (§5.1.1 in the paper linked above). Call Arity is a forward analysis, so it is hard to see how it could make use of that. And even if it did, it would still have to pass a function (in that case a PAP) to c... :-(

comment:25 Changed 11 months ago by dfeuer

Just to keep us on the same page, I rewrote scanl' like this so it can be written back to a simple form.

{-# NOINLINE [1] scanl' #-}
scanl'           :: (b -> a -> b) -> b -> [a] -> [b]
scanl' f q ls    = q `seq` q : (case ls of
                                []   -> []
                                x:xs -> scanl' f (f q x) xs)

{-# RULES
"scanl'"  [~1] forall f a bs . scanl' f a bs =
  build (\c n -> a `c` foldr (scanlFB' f c) (flipSeqScanl' n) bs a)
"scanl'List" [1] forall f a bs . foldr (scanlFB' f (:)) (flipSeqScanl' []) bs a =
  tailScanl' f a bs
 #-}

{-# INLINE [0] scanlFB' #-}
scanlFB' f c = \b g x -> let b' = f x b in b' `seq` b' `c` g b'

{-# INLINE [0] flipSeqScanl' #-}
flipSeqScanl' = flip seq

{-# NOINLINE [1] tailScanl' #-}
tailScanl' f a bs = a `seq` foldr (scanlFB' f (:)) (flip seq []) bs a

This doesn't seem to affect the problem, however. I am curious if there is a way to manually force whatever's getting the wrong arity to get the right one. Bertram pointed out that fusing inits on the right with an expensive producer is really bad, and I realized that fusing it on the right with something that leads to lots of reboxing is also really bad. So for now, probably the best thing is just to NOINLINE inits altogether.

Unrelatedly: I was successful in my quest to write rules to make concat and inits fuse into something that's actually good (in case anyone actually uses such a thing). Interestingly the arity/strictness analysis only works on it if it's inlined (maybe you can figure out a fix for that?):

-- We don't particularly like the idea of inlining this function, since it's
-- a bit large, but it appears to be necessary to get the analyses to work
-- right and avoid eating all available memory. Making this INLINE [0] to
-- be able to rewrite it to a simpler form unfortunately seems to make it miss some
-- analysis that speeds it up by a relatively small but still significant amount.
-- There may be some way around this problem, but I haven't found it yet.
{-# INLINE concatInitsFB #-}
concatInitsFB xs cons nil =
  let
    go _ _ []     = nil
    go n k (y:ys)
      | n == k    = go (n+1) 0 xs
      | otherwise = y `cons` go n (k+1) ys
  in go (1::Int) 0 xs

{-# RULES
-- We might catch concat before it's rewritten.
"concatInits1" forall xs . concat (inits xs) = build (concatInitsFB xs)
-- If not, we might be able to recognize its rewritten form.
"concatInits2" forall c n xs .
  foldr (\x y -> foldr c y x) n (inits xs) = concatInitsFB xs c n
 #-}

comment:26 follow-up: Changed 11 months ago by nomeata

This doesn't seem to affect the problem, however.

Yes, I don’t think the problem in this particular case lies with init per se. I think it’s more like concat, but only when part of the recursion (the c above) is not inlined. That might also be what you observe.

I am curious if there is a way to manually force whatever's getting the wrong arity to get the right one

Not with list fusion as we have it right now, I think. It should work with https://github.com/takano-akio/ww-fusion (which I should follow up on again).

I don’t think a rule for concat (inits) is worth it, but it may help us understand the problem better.

I wonder if maybe the rule for concat can be rewritten so that the resulting recursion is nicer, but I don’t see it right now (but I’m too jetlagged to think anyways).

comment:27 in reply to: ↑ 26 Changed 11 months ago by dfeuer

Replying to nomeata:

This doesn't seem to affect the problem, however.

Yes, I don’t think the problem in this particular case lies with init per se. I think it’s more like concat, but only when part of the recursion (the c above) is not inlined. That might also be what you observe.

I hope you'll be able to tease this apart; inits isn't very important in the grand scheme of things, but concat is.

I am curious if there is a way to manually force whatever's getting the wrong arity to get the right one

Not with list fusion as we have it right now, I think. It should work with https://github.com/takano-akio/ww-fusion (which I should follow up on again).

That looks very interesting, by which I mean I couldn't understand it at all the first time around. I'll have to try again. If it can really solve this class of problem in a fairly thorough fashion, it will certainly be worth the added complexity.

I wonder if maybe the rule for concat can be rewritten so that the resulting recursion is nicer, but I don’t see it right now (but I’m too jetlagged to think anyways).

Rest well!

comment:28 Changed 10 months ago by thoughtpolice

  • Component changed from libraries/base to Core Libraries
  • Owner set to ekmett

Moving over to new owning component 'Core Libraries'.

comment:29 Changed 10 months ago by dfeuer

  • Cc core-libraries-committee@… added
  • Differential Revisions set to Phab:D329
  • Status changed from new to patch

comment:30 Changed 10 months ago by thoughtpolice

  • Milestone changed from 7.8.4 to 7.10.1

Moving (in bulk) to 7.10.4

comment:31 Changed 10 months ago by dfeuer

  • Milestone changed from 7.10.1 to 7.8.4

The initsR version, at least, can slip into 7.8.4 without any ripples. Changing milestone back per thoughtpolice.

comment:32 Changed 10 months ago by Joachim Breitner <mail@…>

In cde3a77f9703966145cae481ee35f52dcca2cf7d/ghc:

Make Data.List.Inits fast

Fixes #9345. Use a modified banker's queue to achieve amortized optimal
performance for inits. The previous implementation was extremely slow.

Reviewed By: nomeata, ekmett, austin

Differential Revision: https://phabricator.haskell.org/D329

comment:33 Changed 10 months ago by nomeata

  • Resolution set to fixed
  • Status changed from patch to closed

comment:34 Changed 10 months ago by thoughtpolice

  • Milestone changed from 7.8.4 to 7.10.1

comment:35 Changed 9 months ago by dfeuer

  • Milestone changed from 7.10.1 to 7.8.4
  • Status changed from closed to merge

comment:36 Changed 9 months ago by thoughtpolice

  • Status changed from merge to closed

Merged.

Note: See TracTickets for help on using tickets.