IRC_Meetings: ghc-meeting-2008-08-20.log

File ghc-meeting-2008-08-20.log, 16.8 KB (added by nominolo, 7 years ago)

6th meeting

117:01 < JaffaCake> hi folks, welcome to the #ghc meeting
217:02 < JaffaCake> one thing we could talk about is this:
317:02 < lambdabot> Title: Design/BuildSystem - GHC - Trac
417:02 < JaffaCake> our plan for making the build system a bit easier to manage
517:02 <     tibbe> hi all
617:02 <     tibbe> what's the topic?
717:02 < JaffaCake> hi tibbe
817:03 < JaffaCake> the build system, if anyone is interested (link above)
917:03 < JaffaCake> the other suggestion was to talk about contributing to GHC
1017:04 <     tibbe> ok
1117:04              tibbe would like to fix GHC so it uses epoll but doesn't know where to start
1217:04 <     tibbe> I'll read the build doc
1317:04 <  malcolmw>  the plan for rejigging the build system looks good to me
1417:04 < ndmitchel> JaffaCake: the build system ideas all sound very good, much more solid
1517:05 < ndmitchel> i'd also suggest, as far as possible, freezing at a cabal version before a release starts
1617:05 <  malcolmw> using the cabal file as the declarative meta-data is a good idea - as you know, it is one of the reasons nhc98 switched from pure Make to Make + cabal meta-data
1717:06 < JaffaCake> ok, the reactions seem to be mostly positive so far
1817:06 <  malcolmw> it is easier to keep multiple build-systems in sync if there is only one copy of the essential meta-data
1917:07 < JaffaCake> right, that's the problem we had before the Cabal switchover, GHC had a separate copy of the metadata
2017:07 < JaffaCake> I think we'll probably do this after the 6.10 fork
2117:08 < JaffaCake> to try to do it for 6.10 would probably delay 6.10
2217:08 <  malcolmw> seems sensible
2317:09 <  gbeshers> Q: is there a consistent way to add libraries?  Ideally so that hugs and jhc can use the same mechanism?
2417:09 < JaffaCake> tibbe: the place to start for epoll() would be in GHC.Conc
2517:09 < JaffaCake> tibbe: that's where we call select() (in the -threaded RTS)
2617:09 <     tibbe> JaffaCake: what does GHC currently do for filedescriptors it can't select on?
2717:09 < ndmitchel> gbeshers: not sure what you mean? add libraries to what?
2817:09 <     tibbe> JaffaCake: so select is not used in the non-threaded RTS?
2917:10 <  gbeshers> ndmitchell: third party libraries (hackage).
3017:10 <     tibbe> JaffaCake: btw, the build doc seems reasonable to me
3117:10 < JaffaCake> tibbe: re your first question.. are there any?
3217:10 <     tibbe> JaffaCake: I thought so, let me check
3317:10 <     Igloo> gbeshers: Add third party libraries to what?
3417:10 < JaffaCake> tibbe: for the second question, yes the non-threaded RTS uses select(), but the machinery is inside the RTS instead of in Haskell
3517:11 <  gbeshers> ndmitchell: and have dependencies make sense as say debian packages would...
3617:11 <     tibbe> JaffaCake: OK
3717:11 <  gbeshers> Igloo: Suppose I want to add something (e.g., space usage monitor) which depends on hopengl.
3817:12 <     Igloo> gbeshers: OK, so adding new core libraries?
3917:12              >>> andyjgill!n=[email protected]
4017:12 <     Igloo> "boot libraries", I should say
4117:12 <  malcolmw> gbeshers: the referenced page about the ghc build system is _only_ talking about a small number of core libs necessary for building ghc itself.  other 3rd party libs remain on hackage, with cabal as a build system, just as before.  (and even the core libs have copies there too)
4217:13 <  gbeshers> Yes: maybe this comes under "Improvements for later".
4317:13 <     tibbe> JaffaCake: "On Windows, the underlying select() function is provided by the WinSock library, and does not handle file descriptors that don't originate from WinSock." says the Python documentation
4417:13 < ndmitchel> gbeshers: i'm not sure these things will ever be done inside GHC, they are more hackage/cabal issues
4517:13 < JaffaCake> tibbe: ah, on Windows we don't use select()
4617:13 <     tibbe> JaffaCake: :) what do you use then?
4717:13 < JaffaCake> separate OS threads
4817:13 <     Igloo> gbeshers: For GHC you just add them to SUBDIRS in libraries/Makefile; I don't think there's really anything to be shared there
4917:13 <     tibbe> JaffaCake: that's what I thought, that's how it's usually emulated
5017:13 <  gbeshers> malcolmw, ndmitchell, Igloo: OK
5117:14 <     tibbe> JaffaCake: a small thread pool?
5217:14 < JaffaCake> tibbe: OS threads are created on demand, so there will be as many as are needed
5317:14              >>> waern!i=53915df2@gateway/web/ajax/
5417:15 < JaffaCake> we don't currently free them later, we probably should
5517:15 < JaffaCake> but anyway on Windows we ought to be using completion ports or whatever they're called
5617:15 <     tibbe> yes
5717:15 <      BSP_> JaffaCake: i'm only vaguely familiar with what select() does, but does WaitForMultipleObjectsEx do what you need on windows?
5817:15 <     tibbe> completion ports
5917:15 <     tibbe> JaffaCake: do you use edge or leveled triggered notifications?
6017:16              >>> tengvall!i=53915df2@gateway/web/ajax/
6117:16 < JaffaCake> select() only gives you edge, doesn't it?
6217:17 <     tibbe> let me check
6317:17 < JaffaCake> BSP_: not really, you can't use all kinds of Handles with WFMO
6417:17 <      BSP_> ah, ok
6517:17 < JaffaCake> BSP_: it's a mess on Windows, but it's not much better on Linux
6617:18 <     tibbe> JaffaCake: select is level-triggered
6717:19 < JaffaCake> oh, perhaps I don't understand the difference then
6817:19 < JaffaCake> select tells you when there's non-zero data available to read, or non-zero buffer space to write
6917:19 <     tibbe> JaffaCake: if you call select on a file descriptor that has data it will return.
7017:19 <     tibbe> JaffaCake: edge triggered only causes epoll() to return when the data becomes available
7117:19 < JaffaCake> ah, I see - so edge only tells you once, level tells you every time
7217:20 <     tibbe> yes
7317:20 <     tibbe> apparently there's an efficiency difference
7417:20 <     tibbe> we can use epoll as a level-triggered API with somewhat lower performance
7517:20 <     tibbe> but it should still be better than select and poll
7617:20 <     tibbe> on bsd we qould use kqueue
7717:21 < JaffaCake> yep
7817:21 <     tibbe> JaffaCake: then there is aio which is true async I/O (i.e. it uses callbacks)
7917:22 <     Igloo> So if there's nothing available on fd n, then something becomes available, and then you call epoll in edge-only mode, does it return?
8017:22 < quicksilv> I was wondering that.
8117:22 < quicksilv> tibbe's description sounded like it had a race condition
8217:22 <     tibbe> Igloo: yes presumably
8317:23 < JaffaCake> edge does sound harder to use
8417:23 <     tibbe> hmm
8517:23 <     tibbe> JaffaCake: yes it is
8617:23 <      BSP_> are you meant to time out the epoll then and recheck periodically?
8717:23 <      pejo> tibbe, there probably needs to be a general fallback to select() since not all systems have kqeue/epoll/etc. Also, how robust is AIO implementations currently?
8817:23 <     tibbe> BSP_: I guess your supposed to read all the data that become available
8917:24 <     tibbe> pejo: yes, a fallback is needed, I'm not sure about AIO
9017:24 <     tibbe> pejo: what we could do is to write a System.Event wrapper
9117:24 <     tibbe> I started wrapping kqueue but its interface is much more annoying than epoll
9217:26 < quicksilv> ideally epoll would check for 'edges since I last called epoll'
9317:26 < quicksilv> that wouldn't have a race condition.
9417:26 <     tibbe> then System.Poll could emulate using threads
9517:26 <     tibbe> quicksilver: isn't that just level triggered?
9617:26 < quicksilv> I don't know.
9717:27 < JaffaCake> no, because you don't get told twice about a single transition
9817:27              quicksilver reads epoll manual pages.
9917:27 < JaffaCake> tibbe: so if you're interested in this, do you want to get a GHC build going and start poking around?
10017:27 <     tibbe> quicksilver: I think so
10117:28 < quicksilv> meta-question: why is this all so hard?
10217:28 < quicksilv> everybody wants asynchronous IO and always has.
10317:28 < JaffaCake> I wish I knew
10417:28 <     tibbe> JaffaCake: yes, I have a few projects on my plate at the same time but I would like to get around to it
10517:28 < quicksilv> why are ther 64 different incompatible APIs, and whichever one you choose someone will tell you it is poor performing.
10617:28 < JaffaCake> there ought to be a general "wait for events" API, but neither Windows nor Linux seems to have one
10717:29 <     tibbe> quicksilver: some of the top linux kernel guys expressed it as: you either get hard (AIO) and good performance or easy (threads) and not so good performance
10817:29 <     tibbe> quicksilver: dunno if I agree though
10917:29 < JaffaCake> in Haskell you should get both :)
11017:29 <     tibbe> JaffaCake: libevent tries to unify a bunch, there are also similar things in c++
11117:29 < JaffaCake> threads using AIO under the hood
11217:29 <     tibbe> JaffaCake: yes that would be nice :)
11317:30 <     tibbe> JaffaCake: yes, that's green threads
11417:30 < JaffaCake> which we have, the only problem is they use select()
11517:31              <   malcolmw!n=[email protected] ["got to go..."]
11617:31 < JaffaCake> Igloo: btw, I think the problem is that the RTS is throwing BlockedOnDeadMVar without the appropriate wrapper
11717:31 < JaffaCake> also BlockedIndefinitely
11817:31 <     Igloo> Hmm, OK
11917:32 <     Igloo> I'll do some grepping for all the constructors and fix any problematic ones
12017:33 < JaffaCake> those are the only two I see, looking in rts/Prelude.h
12117:33              >>> byorgey_!n=[email protected]
12217:33              ~   byorgey_ is now byorgey
12317:35 <     tibbe> JaffaCake: there's also the problem that in the cases you want to emulate async I/O you might want to tweak the thread pool parameters, I assume you can't do that now
12417:35              >>> simonpj!n=simonpj@nat/microsoft/x-49d75a13ef026dc5
12517:35 <     tibbe> JaffaCake: how big a problem this is in reality I don't know
12617:35 < JaffaCake> why would you need to emulate async I/O?
12717:38 <      SamB> hmm, sync-all doesn't seem able to play nice with tools like StGIT
12817:38 < JaffaCake> StGIT?
12917:39 <     tibbe> JaffaCake: like you do on windows for file system I/O
13017:39 <      SamB> it's a tool for local patch management
13117:39 <   LarstiQ> Stacked git, similar to quilt.
13217:39 <     tibbe> JaffaCake: all APIs do not support all kinds of file descriptors
13317:40 <      BSP_> SamB: if you know how to fix it please do. i've never heard of StGIT
13417:40 < JaffaCake> tibbe: right, but we want to hide those details in the implementation of IO
13517:40 < JaffaCake> SamB: isn't *git* a tool for local patch management?
13617:40 <     tibbe> JaffaCake: sure, what I'm saying is that applications with special performance needs might need to tweak those parameters
13717:40              <   dolio!n=[email protected] [Read error: 104 (Connection reset by peer)]
13817:40 <      SamB> JaffaCake: no, git is the stupid content tracker
13917:41 < JaffaCake> oh, ok :)
14017:41 <      SamB> (lest anyone think I be insulting git, they should look at the top of the manpage and see if I didn't quote it!)
14117:42 <     tibbe> it's Git design filosophy
14217:42 <     tibbe> Git's*
14317:42 < JaffaCake> tibbe: I'd take a wait and see approach to that - find an example that has bad performance first
14417:42 <     tibbe> JaffaCake: well, it might also be the case that people look at GHC and says, this doesn't allow me to control X so I won't try
14517:43 <     tibbe> JaffaCake: from experience with working with high performance systems I'd say that having some knobs to tweak is important
14617:43 <     tibbe> but maybe we could make it an RTS flag
14717:43 < JaffaCake> SamB: yes, but right under that it says " Git is a fast, scalable, distributed revision control system with an unusually rich command set..."
14817:44 <      SamB> JaffaCake: yes, but git does NOT store patches ...
14917:44              >>> dolio!n=[email protected]
15017:44 < JaffaCake> tibbe: ok, but since the IO system is all in Haskell it'd be an API call
15117:44 <      SamB> it stores directory trees
15217:44 < JaffaCake> well, it stores patches implicitly as the difference between directory trees
15317:45 <     tibbe> JaffaCake: sure, I haven't yet digested all the implications of having the RTS run the event system rather than the programmer
15417:45 < JaffaCake> we don't *want* the RTS to run it, I'm sure
15517:45 <     tibbe> JaffaCake: have you measured whether having an I/O manager thread hurts performance?
15617:45 < JaffaCake> it improved performance, when I tried it
15717:45 <     tibbe> JaffaCake: sorry, I meant built in in the threading library
15817:46              >>> claus!i=d57a2b7a@gateway/web/ajax/
15917:47 < JaffaCake> I'm sure you could do better by hand-coding everything and tweaking for a particular application, yes
16017:47 <     tibbe> JaffaCake: Maybe a compomise would be to have a System.Event and have the threading library use that
16117:48 <     tibbe> if you then don't want the threading library to do that for you you could use System.Event directly
16217:48 < JaffaCake> tibbe: I like the sound of that, especially if it could be made platform-independent
16317:48 <     tibbe> JaffaCake: that would be the idea, like libevent or whatever the name of the C++ equivalent
16417:48 < JaffaCake> right
16517:48 <     tibbe> The C++ library is called Proactor I think
16617:49 <     tibbe> it uses callbacks
16717:49 <     tibbe> so it has a thread internally that calls epoll or uses aio when available
16817:50 < JaffaCake> our IO manager thread also handles dispatching signal handlers, btw
16917:50 <     tibbe> I had an idea where you could call read with some kind of callback e.g. a left fold and have that fold be called every time new data is available thereby creating efficient enumerator style I/O
17017:50 <     tibbe> JaffaCake: ok, I don't know much about that
17117:51 <     tibbe> JaffaCake: does the I/O manager hand over execution to another OS thread in the threaded RTS e.g. if the handler performs extensive processing
17217:51 < JaffaCake> each handler runs in a new thread
17317:52 <     tibbe> hmm
17417:53 <     tibbe> JaffaCake: so the IO manager requires NOINLINE global IORefs?
17517:53 < JaffaCake> yep
17617:53 <     tibbe> hmm
17717:54 < JaffaCake> there has to be a way to communicate with the IO manager thread
17817:55 < JaffaCake> ok, I have to go, bye folks
17917:55 <     tibbe> I guess that's a pro with an explicit event service you have to register with
18017:55 <     tibbe> it doesn't required global IORefs
18117:55 <     tibbe> JaffaCake: bye
18217:56 < quicksilv> I think it's an implementation detail that JaffaCake's IO manager uses global IO Refs
18317:56 < quicksilv> morally, the RTS subsystem "owns" the IO monad
18417:56 <     tibbe> quicksilver: what else could you do?
18517:56 < quicksilv> so it woudl be allowed to add state directly to the IO monad
18617:57 < quicksilv> for its own 'global variables'
18717:57 < JaffaCake> quicksilver: that's my thinking, yes
18817:57 <     tibbe> quicksilver: but it adds complexity if I want to add other functions that uses it
18917:57 <    ghcbot> Build x86 Windows head fast #2593 finished: Failure (failed stage2)
19017:57 <    ghcbot> Build details are at
19117:57 <     tibbe> so if I want FFI to add some more functions that uses async I/O then I have to integrate them with that system
19217:57 <     tibbe> instead of registering them with an object
19317:58 <     tibbe> hmm
19417:58 <     tibbe> I need to think more about this
19518:00 <     tibbe> oh well, I'm off, bye
19618:00 < quicksilv> at some stage you have to admit that the RTS isn't pure haskell.
19718:00 < quicksilv> for convenience, it's nice to write as much of it as you can in haskell
19818:01              <   jcpetruzza_!i=98510b2f@gateway/web/ajax/ []
19918:01 < quicksilv> because haskell is the nicer language :)
20018:01 < quicksilv> but there have to be primitives somewhere.
20118:01 < ndmitchel> also, global IORef's aren't that bad, they are well accepted
20218:01 < quicksilv> global IORef are extraordinarily bad.
20318:01 < quicksilv> probably worse than the sack of Carthage.
20418:01 < quicksilv> I definitely don't accept them :)
20518:02 <   LarstiQ> quicksilver: what makes you feel that strongly?
20618:02 <     tibbe> quicksilver: but it's more primitive than C on this point! C doesn
20718:02 <     tibbe> C, well Linux, doesn't use a global variable for its event system
20818:02 < quicksilv> LarstiQ: they are badly non-compositional.
20918:03 <     tibbe> LarstiQ: they rely on not being inlined not to break!
21018:03 <     tibbe> quicksilver: and what quicksilver said
21118:03 <   LarstiQ> ok :)
21218:03 < quicksilv> I forgive the RTS, tho', because the RTS is what defines the IO monad and its allowed to add stuff to it.
21318:03 < quicksilv> however, using them in pure haskell libraries is allowing libraries to "patch things onto" the IO monad.
21418:03 <     tibbe> quicksilver: sure, but there are better an worse ways to do it.
21518:03 < quicksilv> which is non-compositional.
21618:04 < quicksilv> because the two patches might not commute!
21718:04 < quicksilv> (forgive the darcs pun :P)