Opened 3 years ago

Closed 3 years ago

#9220 closed bug (fixed)

type roles for unboxed arrays

Reported by: rwbarton Owned by: goldfire
Priority: normal Milestone: 7.10.1
Component: Core Libraries Version: 7.8.1
Keywords: Cc: core-libraries-committee@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case: libraries/array/tests/T9220
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

rwbarton@morphism:~$ ghci
GHCi, version 7.8.1: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> :i Data.Array.Unboxed.UArray
type role Data.Array.Base.UArray representational phantom
data Data.Array.Base.UArray i e
  = Data.Array.Base.UArray !i
                           !i
                           {-# UNPACK #-} !Int
                           GHC.Prim.ByteArray#
  	-- Defined in ‘Data.Array.Base’
-- [instances trimmed]
Prelude> :i Data.Array.IO.IOUArray
type role Data.Array.IO.Internals.IOUArray representational phantom
newtype Data.Array.IO.Internals.IOUArray i e
  = Data.Array.IO.Internals.IOUArray (Data.Array.Base.STUArray
                                        GHC.Prim.RealWorld i e)
  	-- Defined in ‘Data.Array.IO.Internals’
Prelude> :i Data.Array.ST.STUArray
type role Data.Array.Base.STUArray nominal representational phantom
data Data.Array.Base.STUArray s i e
  = Data.Array.Base.STUArray !i
                             !i
                             {-# UNPACK #-} !Int
                             (GHC.Prim.MutableByteArray# s)
  	-- Defined in ‘Data.Array.Base’
Prelude> :i Data.Array.Storable.StorableArray
type role Data.Array.Storable.Internals.StorableArray representational phantom
data Data.Array.Storable.Internals.StorableArray i e
  = Data.Array.Storable.Internals.StorableArray !i
                                                !i
                                                Int
                                                !(GHC.ForeignPtr.ForeignPtr e)
  	-- Defined in ‘Data.Array.Storable.Internals’

These phantom roles for the element types let me create an unboxed array of one type (like Word8), cast it with coerce to another type (like Word64) and read outside the bounds of the array. I think they should all be nominal, since a newtype of an existing type could have a totally unrelated MArray or Storable instance.

Change History (29)

comment:1 Changed 3 years ago by goldfire

This is an area (arrays in Haskell) I know little about. How does this request compare to #9163, asking for Ptr to have a phantom role? Perhaps the difference is that Ptr refers to foreign objects but arrays store proper Haskell values? Sorry -- just not terribly familiar over here.

comment:2 Changed 3 years ago by simonpj

A difference is that you can index arrays, and that leads to the out-of-bounds possibilities that Reid highlights. But wouldn't reprsentational do?

Simon

comment:3 Changed 3 years ago by rwbarton

As I see it "representational" has to do with the representation of Haskell values as closures (usually on the heap). That's different from the representation of the "underlying value" in an unboxed or storable array. For example, if I needed to process packed arrays of 24-bit integers, I might create a newtype of Int and write a Storable instance for it with sizeOf _ = 3. (Admittedly this is a somewhat contrived scenario.)

The same considerations also apply to Ptr. (Indeed, a storable array is little more than a Ptr under the hood.) But Ptr is an inherently unsafe/unmanaged sort of thing. If you want to know how many items' worth of space is allocated in an area pointed to by a Ptr, you have to keep track of it yourself. Unboxed/storable arrays are supposed to be a safe abstraction on top of raw memory areas. That's why I think it's reasonable to have different defaults between unboxed/storable arrays and Ptrs. (And it is in some sense a matter of defaults, since there is still unsafeCoerce when you happen to know that the memory layouts match up properly.)

comment:4 Changed 3 years ago by rwbarton

Milestone: 7.10.1

Whatever we decide to do, we may as well decide by 7.10.1.

comment:5 Changed 3 years ago by simonpj

Owner: set to goldfire

I'm OK with Ptr being different. But if I have a UArray (Int,Int) Age, surely it's OK to cast it to UArray (Int,Int) Int, where

newtype Age = MkAge Int

After all, Int and Age are represented identically, and it might be useful to make such a bulk conversion.

Yes there might be different instances somewhere, but that's true of all uses of coerce. So I vote for the element parameter of these arrays being representational, rather than nominal. Unless someone has other arguments to make.

Richard (or Joachim), might you do this, in due course?

Simon

comment:6 Changed 3 years ago by goldfire

If I understand correctly, then, a representational role will allow users to get out-of-bounds errors, right? This would happen in the contrived scenario in comment:3.

comment:7 Changed 3 years ago by carter

@Richard, in Reid's example, that would be for Storable rather than Unboxed arrays (at least if I'm understanding this thread correctly, and thus because Storables are just wrapped up pointers, it falls under the same onus as Ptr's rather than Unboxed arrays?)

comment:8 in reply to:  6 Changed 3 years ago by rwbarton

Replying to goldfire:

If I understand correctly, then, a representational role will allow users to get out-of-bounds errors, right? This would happen in the contrived scenario in comment:3.

You won't actually get an out-of-bounds error, you'll simply read outside the memory area allocated to the array (which could cause a segfault, or leak other data, or just return nonsense).

Regarding UArray specifically, I believe it's true that all the different existing instances of IArray UArray are for non-representationally equal types (although under the hood many are actually represented identically, e.g. data Word8 = W8# Word#, data Word16 = W16# Word#; I believe these are still not inter-coerce-ible, though). And in order to write your own IArray instance you have to define functions with names like unsafeAt. So a representational role for UArray would be justifiable if ensuring that the in-array representation is identical is considered to be part of the contract when defining an IArray instance for a newtype. (One that would be satisfied automatically by GND.)

comment:9 Changed 3 years ago by goldfire

Ah -- that makes good sense then. Thanks, rwbarton. I'll take care of this in my next round of GHC hacking, sometime after ICFP.

comment:10 Changed 3 years ago by thoughtpolice

Component: libraries (other)Core Libraries

Moving over to new owning component 'Core Libraries'.

comment:11 Changed 3 years ago by goldfire

Cc: core-libraries-committee@… added

Working on this now... just clarifying that the fix is a breaking change, in that a previous use of GND will no longer work. It happened that this GND appears in the testsuite, test roles/should_compile/RolesIArray:

newtype N = MkN Word64
  deriving (IArray UArray)

With changing the roles on UArray, this no longer works.

comment:12 Changed 3 years ago by simonpj

Just to be clear, what exactly is "the fix" you mention?

  • What roles are you proposing for arrays of various sorts?
  • What goes wrong with the above instance? Is it rejecting a program that is bogus, or is it rejecting a program that "ought" to be ok?

Incidentally, I think it should definitely be part of the contract for Storable that sizeOf should return the same result for a newtype as for the type it wraps. Thus this would be BAD:

newtype N = MkN T

instance Storable T where
  sizeOf _ = 4

instance Storable N where
  sizeOf _ = 8

If N and T are representationally equaly, how could they take a different number of bytes to represent. That should be part of the documentation for Storable. I believe this is what Reid was saying in comment:8.

comment:13 Changed 3 years ago by goldfire

See my WIP commit here.

In the end, I have little idea what I'm doing here -- I was just implementing the fix I thought we had agreed on (but never concretely articulated), which is to make the element parameter to all the types mentioned in the initial report have a nominal role. Reid does seem to say in comment:8, upon closer inspection, that UArray should keep its representational role. But I'd love for him to chime in here.

I'll do what I'm told on this one!

comment:14 Changed 3 years ago by simonpj

Well I argued in comment:5 for a representational (not nominal!) role; and Reid agreed in comment:8.

Specifically, which types are "all the types mentioned in the initial report"? What roles do other array types have?

You didn't answer my second question in comment:12. Any thoughts on that?

Simon

comment:15 Changed 3 years ago by goldfire

Sorry for missing the second question.

If we change roles from representational to nominal, then some uses of GND will no longer type check. This is correct behavior. With UArray having its representational role back, the code in comment:11 will type check.

I think a fuller answer is that, according to Reid's analysis in comment:3, we don't want to be able to coerce a StorableArray i Int to a StorableArray i Age, because Age and Int might have different Storable instances. So, if GND fails, that's correct behavior.

In any case, I will wait for Reid (or someone else knowledgeable about array) to give explicit instructions on what to do.

comment:16 in reply to:  14 Changed 3 years ago by goldfire

Replying to simonpj:

Specifically, which types are "all the types mentioned in the initial report"? What roles do other array types have?

Data.Array.Unboxed.UArray
Data.Array.IO.IOUArray
Data.Array.ST.STUArray
Data.Array.Storable.StorableArray

I don't know anything in particular about the roles of other array types.

comment:17 Changed 3 years ago by carter

unboxed should DEFINITELY have representational (because the heap representation of a newtyped value is guaranteed to be the saem). I think thats pretty clearly what Reid was articulating earlier, and that Storable arrays (because of the possible non uniform storable sizes) *might* be a candidate for being NOT representation wrt off heap data, BUT that the argument might be weaker for that. Though i could be recalling this whole thing wrong.

edit: clarified language

Last edited 3 years ago by carter (previous) (diff)

comment:18 Changed 3 years ago by simonpj

Here's my summary

  • Plan A:
    • If an unboxed array of Age stores (unboxed) ages in a 16-bit field, but an unboxed array of Int stores (unboxed) ints in a 32-bit field, then it would be wrong to coerce from unboxed-array-of-Age to unboxed-array-of-Int. (The field width is described by sizeOf in the Storable class.)
    • This is not an unreasonable scenario. Perhaps you invented the newtype precisely because ages have a narrower range of values than ints.
    • In that case, unboxed arrays should have nominal role for their element type, and the GND example in comment:11 should rightfully fail.
  • Plan B:
    • An alternative is to insist that the sizeOf method should yield the same result for a newtype as for its underlying representation type.
    • Then a representational role for the element type would be justified.
    • Which would allow more expressive coercions; notably the GND in comment:11

I was originally voting for Plan B. But (a) it relies on an unenforced user convention about Storable instances, and (b) there are reasonable situations in which you might want a different width for a newtype.

So on reflection I think Plan A is probably right. If we adopt it, we should add a digest of this thread (e.g. the above Plan A/B notes) as a Note with the role declaration.

The core-libraries committee is already cc'd. Edward, we need a decision for 7.10. Thanks.

Simon

comment:19 Changed 3 years ago by carter

hrm, I see your point (now that i've caught up on sleep).

comment:20 Changed 3 years ago by ekmett

Simon,

I have code in production where the contract for a newtype and the wrapper you want to have enforced here doesn't hold -- in particular cases where the newtype _solely exists_ to create a Storable with different alignment and sizing properties. This comes up for all sorts of things, packed colors, bit arrays, etc, structures folks are marshaling for std140 compliance to/from OpenGL uniform buffers...

It seems dangerous to me to tie a contract about the behavior of a user defineable class to representability.

If we're letting users define Storable instances is seems strange to me to conflate the two notions of representation (bit representation of Storable) and the heap representation for GHC.

This puts me very much in the Plan A camp, personally.

Plan B it seems could just lead to segfaults for existing user instances out there.

With Ptr we're basically handing the user a live grenade, they know handling it wrong can lead to segfaults, and they do so with due care.

With an Array I don't think there is any such expectation of unsafety, and folks are expecting things to 'just work'.

comment:21 Changed 3 years ago by simonpj

OK I'll take that as a casting vote in favour of Plan A. Richard: would you like to execute on that (including the Note to explain)? Thanks

Simon

comment:22 Changed 3 years ago by ekmett

(Wrote the bulk of this before you replied, figured I'd send it anyways)

Hrmm, actually one could argue a stronger statement, it seems to me that Plan B is inconsistent.

How would you even construct a type for, say, endian swapping, or going to/from network order for a Word32 upon storage, you'd still want the underlying type to be a newtype, rather than just randomly throw in a bottom with an unnecessary 'data' for no reason, and I've seen both of these as instances in the wild as part of larger Storable types.

Also merely having sizeOf agree isn't sufficient to ensure correctness. Consider the above case of order swapping on storage, sizeOf gives the same answer in the above case, and if you coerced a boxed array nothing changes, but if you coerced an unboxed array you'd get all the characters spontaneously byte reversing due to effectively going through a "reinterpret_cast<>".

It strikes me that two very different things would be happening semantically there, so having UArray have a nominal element type, even though Array is able to (and should be) representational makes sense to me.

comment:23 Changed 3 years ago by simonpj

OK good.

So are boxed arrays representational? They certainly should be.

comment:24 Changed 3 years ago by ekmett

Boxed arrays are representational (just by the default roles) though both Array and UArray are currently also representational in the first argument. =/

That is dangerous for much the same reason as relying on the semantics of Storable not changing to a newtype, it is often the very point that the newtype would change the meaning of something like the Ix instance.

In fact UArray is currently even worse and gets inferred as

type role UArray representational phantom -- ack!

because the underlying constructor doesn't mention the element type at all.

We basically just ignored roles in the array package when shipping 7.8.x

--

It strikes me that we should have

type role Array nominal representational
type role UArray nominal nominal

The first argument is formed nominal by depending upon the chosen Ix instance, and in the case of UArray the second argument is forced nominal by depending upon the chosen Storable instance.

Last edited 3 years ago by ekmett (previous) (diff)

comment:25 Changed 3 years ago by goldfire

Apologies for being obtuse about this, but I can't proceed until I have a very concrete statement of what wants to be done here. Which types -- the full list -- should have what roles? And, I can attempt to write a Note, but I'm worried about getting the details wrong, and will want a double-check. I will post it here before pushing.

So, I'll stand by waiting for more direction.

comment:26 Changed 3 years ago by ekmett

Ok, lets go through array with a fine toothed comb:

type role Array nominal representational
type role IOArray nominal representational

That lets users cast the element type in the array. This is safe because e.g. instance IArray Array e is fully polymorphic in e with no constraints. The index has to be nominal as it is tied to the semantics of the internal bounds, the underlying array size, etc. Users can coerce boxed arrays to change the element types.

type role UArray nominal nominal
type role IOUArray nominal nominal
type role StorableArray nominal nominal

The index is nominal for the same reason as above, the element type is nominal because we'd otherwise silently be re-interpreting the bit representation of the underlying storables in the latter case, and the interpretation of the array size, etc. changes based on the same sort of information in UArray and IOUArray.

type role STArray nominal nominal representational
type role STUArray nominal nominal nominal

The region parameter is nominal lest users easily subvert ST safety, otherwise same as Array and UArray respectively.

It turns out there isn't a lot of representational material here, just the element type in Array, STArray and IOArray.

Every other role in the entire array package is necessarily nominal.

Does that give enough to proceed?

That should be every type used by the package, though, I think the bulk of them come from GHC.Arr.

Last edited 3 years ago by ekmett (previous) (diff)

comment:27 Changed 3 years ago by goldfire

Thanks, Edward; that's beautifully concrete. Will implement in due course.

comment:28 Changed 3 years ago by Richard Eisenberg <eir@…>

In e394e74df5ca8081c0ffd147d34e788d290fb21a/ghc:

Fix #9220 by adding role annotations.

This includes a submodule update for `array`.
There is also an added test in libraries/array/tests/T9220.

comment:29 Changed 3 years ago by goldfire

Resolution: fixed
Status: newclosed
Test Case: libraries/array/tests/T9220
Note: See TracTickets for help on using tickets.