Opened 2 years ago

Last modified 7 months ago

#10735 new task

Smooth out the differences between `compiler/utils/Pretty.hs` and `libraries/pretty`

Reported by: thomie Owned by:
Priority: normal Milestone: 8.0.1
Component: Compiler Version: 7.0.1
Keywords: Cc: michalt
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: #1062, #1176, #7666, #8809 Differential Rev(s): Phab:D1130, Phab:D1132
Wiki Page:

Description

GHC has an internal copy of pretty in compiler/utils/Pretty.hs.

Todo:

  • refactor GHC's copy to make it as similar as possible to pretty, without making any real code changes.
  • apply the bug fixes that pretty received to GHC's copy, making sure not to pick up any possible new bugs in the process.

According to (1):

"There is one situation where the laws for pretty are ambiguous and leave room for choice. GHC decided one way and pretty the other."

  • Find which law that is, and document the differences.

This hopefully allows us to close the following tickets for free: #1062, #1176 and #7666, maybe more.

Ideally we could remove GHC's copy altogether, but we're not there yet. GHC's copy uses FastString, which is supposedly needed for performance, whereas pretty uses String.

(1) https://github.com/haskell/pretty/issues/1

Change History (27)

comment:1 Changed 2 years ago by thomie

Owner: set to thomie

I have patches.

comment:2 in reply to:  1 Changed 2 years ago by bgamari

Replying to thomie:

I have patches.

Lovely. Will we be seeing them on Phab soon?

On a related note, I've recently been thinking about error reporting with an eye for making the messages produced by GHC more useful to external tools. I've described one possible approach in ticket:8809#comment:3 which would involve making Doc a sort of free functor, allowing the insertion of rich AST objects into the document. Monadic semantics would certain patterns of projecting out these annotations into vanilla documents quite convenient.

Unfortunately, the Hughes/Peyton-Jones pretty-printer isn't terribly well-suited for this as you need to know the width of a node's parent during construction. I've been meaning to explore adopting the Wadler/Leijen pretty-printer as Edward Kmett has already demonstrated that an implementation with annotations along the lines of my proposal is possible.

comment:3 Changed 2 years ago by thomie

Ok, maybe you want to comment on https://github.com/haskell/pretty/issues/1#issuecomment-78297997, where Trevor Elliott said:

"I was hoping to extend the GHC pretty printer with the annotations, to get functionality like in the idris repl. Maybe extending their forked version with the annotations is the right path forward for that work :)"

comment:4 Changed 2 years ago by thomie

Differential Rev(s): Phab:D1130

comment:5 Changed 2 years ago by Thomas Miedema <thomasmiedema@…>

In d7b053a/ghc:

Pretty: reformat using style from libraries/pretty (#10735)

This commit copies the code structure (what goes where), whitespace layout
and comments from libraries/pretty/src/Text/PrettyPrint/HughesPJ.hs,
with the intention to be able to later more easily compare the two
files, and port bug fixes.

I'm sorry this messes up git blame history, but there's no other way.

comment:6 Changed 2 years ago by Thomas Miedema <thomasmiedema@…>

In 9d24b060/ghc:

Pretty: rename variables to the ones used by libraries/pretty (#10735)

comment:7 Changed 2 years ago by Thomas Miedema <thomasmiedema@…>

In 25bc406/ghc:

Pretty: improve error messages (#10735)

Again, following libraries/pretty.

comment:8 Changed 2 years ago by Thomas Miedema <thomasmiedema@…>

In 53484d3d/ghc:

Pretty: remove superfluous parenthesis (#10735)

comment:9 Changed 2 years ago by Thomas Miedema <thomasmiedema@…>

In 2d1eae26/ghc:

Pretty: kill code that has been dead since 1997 (#10735)

comment:10 Changed 2 years ago by Thomas Miedema <thomasmiedema@…>

In 6f6d082/ghc:

Pretty: Args of NilAbove/TextBeside/Nest/Union are always RDocs (#10735)

Just following libraries/pretty.

comment:11 Changed 2 years ago by Thomas Miedema <thomasmiedema@…>

In 926e428/ghc:

Pretty: use BangPatterns instead of manual unboxing Ints (#10735)

Follow same style as libraries/pretty, although some of it is pretty
archaic, and could be improved with BangPatterns:
   * `get w _ | w == 0 && False = undefined`
   * `mkNest k _ | k `seq` False = undefined`

comment:12 Changed 2 years ago by Thomas Miedema <thomasmiedema@…>

In f951ffc/ghc:

Pretty: mimic pretty API more closely (#10735)

Refactoring only. Nothing much to see here.

comment:13 Changed 2 years ago by Thomas Miedema <thomasmiedema@…>

In 85179b58/ghc:

Pretty: use replicate for spaces and multi_ch (#10735)

Similar changes were made to pretty in commit
7575ab16430c876eaa1451b02465b6b103b3a519.

comment:14 Changed 2 years ago by thomie

Differential Rev(s): Phab:D1130Phab:D1130, Phab:D1132
Status: newpatch

comment:15 Changed 2 years ago by Thomas Miedema <thomasmiedema@…>

In 67576ddc/ghc:

Pretty: bugfix fillNB (#10735)

This is a backport of a bug fix by Benedikt Huber (#2393), from commit
1e50748beaa4bd2281d323b18ea51c786bba04a1 in the pretty library.

From https://mail.haskell.org/pipermail/libraries/2008-June/009991.html:

    Law <l1> states that

    > sep (ps++[empty]++qs)   = sep (ps ++ qs)
    >         ...ditto hsep, hcat, vcat, fill...

    In the current implementation, this fails for the paragraph fill
    variants.

    > render' $ fsep [ text "c", text "c",empty, text "c", text "b"]
    >   where render' = renderStyle (Style PageMode 7 1.4)
    >> c c c
    >>     b

comment:16 Changed 2 years ago by Thomas Miedema <thomasmiedema@…>

In bcfae08c/ghc:

Pretty: fix potential bad formatting of error message (#10735)

This is a backport of a bug fix by Benedikt Huber for the same problem
in the pretty library (#1337), from commit
8d8866a8379c2fe8108ef034893c59e06d5e752f. The original explanation for
the fix is attached below.

Ticket #1776 originally reported an infinite loop when printing error
message. This promptly got fixed in:

  commit 2d52ee06786e5caf0c2d65a4b4bb7c45c6493190
  Author: simonpj@microsoft.com <unknown>
  Date:   Thu Mar 1 11:45:13 2007 +0000

      Do not go into an infinite loop when pretty-printer finds a
      negative indent (Trac #1176)

SPJ reports in the ticket: "So infinite loop is fixed, but the bad
formatting remains. I've added a test, tcfail177."

tcfail177 however hasn't triggered the formatting problem for years (as
Ian reported in c9e0e6067a47c574d9ff3721afe58e30ca1be3e4).

This patch updates the test to a version that at least still failed with
ghc-7.0 (from #1776#comment:7).

-------------------

From https://mail.haskell.org/pipermail/libraries/2008-June/010013.html,
by Benedikt Huber:

    Concerning ticket #1337, we have to change the formal specification of
    fill (it doesn't match the implementation):

    -- Current Specification:
    --   fill []  = empty
    --   fill [p] = p
    --   fill (p1:p2:ps) = oneLiner p1 <#> nest (length p1)
    --                                          (fill (oneLiner p2 : ps))
    --                     `union`
    --                      p1 $$ fill ps

    Problem 1: We want to `unnest' the second argument of (p1 $$ fill ps),
    but not the first one

    In the definition above we have e.g.

    > getSecondLayout $
    >   fillDef False [text "a", text "b", text "a"]

    >> text "ab"; nilabove; nest -1; text "a"; empty
    >> |ab|
    >> |.a|

    Problem 2: The overlapping $$ should only be used for those layouts of
    p1 which aren't one liners (otherwise violating the invariant "Left
    union arg has shorter first line").

    I suggest the following specification (i believe it almost matches the
    current implementation, modulo [fillNB: fix bug #1337] (see below):

    -- Revised Specification:
    --   fill g docs = fill' 0 docs
    --   gap g       = if g then 1 else 0
    --   fill' n []  = []
    --   fill' n [p] = [p]
    --   fill' n (p1:p2:ps) =
    --      oneLiner p1 <g> (fill' (n+length p1+gap g) (oneLiner p2 : ps))
    --        `union`
    --     (p1 $*$ nest (-n) (fill' g ps))
    --
    -- $*$ is defined for layouts (One-Layout Documents) as
    --
    -- layout1 $*$ layout2 | isOneLiner layout1 = layout1 $+$ layout2
    --                     | otherwise          = layout1 $$ layout2

    I've also implemented the specification in HughesPJQuickCheck.hs,
    and checked them against the patched pretty printer.

    Concerning Bug #1337:
    ~~~~~~~~~~~~~~~~~~~~~

    If the above formal specification is fine, it is easy to fix:
    elide the nests of (oneLiner p2) [see attached patch, record bug #1337].

    > PrettyPrint(0) $ ./Bug1337
    > ....ab
    > ...c

    The (long) explanation follows below.

    <snip/>

    ===========================================================
    Explanation of Bug #1337:

    Consider

    > fcat [ nest 1 $ text "a", nest 2 $ text "b", text "c"]

    --> expected: (nest 1; text "a"; text "b"; nest -3; "c")
    --> actual  : (nest 1; text "a"; text "b"; nest -5; "c")

    Reduction:
    === (nest 1; text a) <> (fill (-2) (p2:ps))
    ==>                     (nest 2 (text "b") $+$ text "c")
    ==>                     (nest 2 (text "b")) `nilabove`
                            (nest (-3) (text "c"))
    ==> (nest 1; text a; text b; nest -5 c)

    The problem is that if we decide to layout (p1:p2:ps) as

    | p1 p2
    | ps

    (call it layout A), then we want to have

    > (p1 <> p2) $+$ ps.

    But following law <n6> this means that

    > fcat_A [p1:nest k p2:ps]

    is equivalent to

    > fcat_A [p1,p2,ps]

    so the nest of p2 has to be removed.

    This is somewhat similar to bug #667, but easier to fix
    from a semantic point of view:
    p1,p2 and ps are distinct layouts - we only have to preserve the
    individual layouts, and no combinations of them.

comment:17 Changed 2 years ago by Thomas Miedema <thomasmiedema@…>

In 5d57087/ghc:

Pretty: fix a broken invariant (#10735)

This is a backport of a bug fix from
6cfbd0444981c074bae10a3cf72733bcb8597bef in libraries/pretty:

    Fix a broken invariant
    Patch from #694,  for the problem "empty is an identity for <> and $$" is
    currently broken by eg. isEmpty (empty<>empty)"

comment:18 Changed 2 years ago by Thomas Miedema <thomasmiedema@…>

In 85bf76a8/ghc:

Pretty: show rational as is (#10735)

Following libraries/pretty. I'm not sure why it converted to Double
before.

This function isn't used by GHC itself. It is exported from these two
places:
  * compiler/utils/Outputable
  * libraries/template-haskell/Language/Haskell/TH/PprLib.hs

comment:19 Changed 2 years ago by thomie

There is 1 commit left for parity with pretty-1.1.2.0. I have not landed it yet, because I haven't been able to measure the supposed space/time improvement. The other 2 commits described below seem to suggest this commit doesn't actually work:

  • "Pretty: improving the space/time performance of vcat, hsep, hcat"

There are 2 commits left for parity with pretty-1.1.2.1. I have not landed them yet, because they seem to cause more allocation in the compiler (see discussion in https://github.com/haskell/pretty/pull/9):

  • "Resolve foldr-strictness stack overflow bug"
  • "Special-case reduce for horiz/vert"

The commits are here: https://github.com/thomie/ghc/commits/pretty

Last edited 2 years ago by thomie (previous) (diff)

comment:20 Changed 2 years ago by Thomas Miedema <thomasmiedema@…>

In f903949/ghc:

Pretty: improving the space/time performance of vcat, hsep, hcat (#10735)

After 5d57087e314bd484dbe14958f9b422be3ac6641a ("Pretty: fix a broken
invariant"), T3294 showed 50% more max_bytes_used (#3294). After this
commit, max_bytes_used is back to what it was before, and the test
passes again.

This is a backport of a bug fix by Benedikt Huber (#2393), from commit
1e50748beaa4bd2281d323b18ea51c786bba04a1 in the pretty library.

From https://mail.haskell.org/pipermail/libraries/2008-June/009991.html:

    vcat (hsep,cat) is implemented in an unneccessarily strict way.
    We only get some output after all of vcat's arguments are evaluated
    and checked against being Empty.
    This can be improved by only checking the right argument of foldr
    against being Empty, and
    then applying an Empty-filter on the resulting Doc. Space improvement
    is obvious.
    The microbenchmark (code.haskell.org/~bhuber/Text/PrettyPrint/
    HughesPJPerfCheck.hs) suggests that
    the improvements in time are remarkable too.

comment:21 Changed 2 years ago by thoughtpolice

Milestone: 7.12.18.0.1

Milestone renamed

comment:22 Changed 2 years ago by bgamari

Milestone: 8.0.18.2.1

It seems unlikely that this is going to happen for 8.0. Punting to 8.2.

comment:23 Changed 23 months ago by thomie

Milestone: 8.2.18.0.1
Resolution: fixed
Status: patchclosed

This is pretty much done.

For parity with pretty-1.1.2.1 the following two commits would also be needed. I did not land them, because they seem to cause more allocation in the compiler (see discussion in ​https://github.com/haskell/pretty/pull/9):

  • "Resolve foldr-strictness stack overflow bug"
  • "Special-case reduce for horiz/vert"

The commits are here: ​https://github.com/thomie/ghc/commits/pretty

comment:24 Changed 13 months ago by simonpj

Owner: thomie deleted
Resolution: fixed
Status: closednew

Re-opening, because we'd really like to start using the real pretty library for GHC. It just needs someone to finish the work Thomas started!

comment:25 Changed 13 months ago by niteria

I've made some notes about this for my convenience: https://github.com/niteria/notes/blob/master/PrettyPrintingGHC.md#problems, it summarizes some issues around this.

comment:26 Changed 10 months ago by michalt

Cc: michalt added

comment:27 Changed 7 months ago by bgamari

Note that adinapoli has been doing some great work on this: https://github.com/haskell/pretty/pull/43.

Note: See TracTickets for help on using tickets.