Opened 3 years ago

Last modified 2 months ago

#11143 new feature request

Feature request: Add index/read/write primops with byte offset for ByteArray#

Reported by: vagarenko Owned by: sjakobi
Priority: normal Milestone:
Component: Compiler Version: 7.10.2
Keywords: newcomer Cc: Jaffacake
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Test Case:
Blocked By: Blocking:
Related Tickets: #4442 Differential Rev(s): Phab:D4433
Wiki Page:

Description

Currently, primops for indexing ByteArray# and reading/writing MutableByteArray# have the following form:

indexTYPEArray# :: ByteArray#          -> Int# -> TYPE#
readTYPEArray#  :: MutableByteArray# s -> Int# -> State# s -> (#State# s, TYPE##)
writeTYPEArray# :: MutableByteArray# s -> Int# -> TYPE# -> State# s -> State# s

where second argument of type Int# is an offset measured in terms of the size of TYPE#.

This is inconvinient if I want to store values of different types inside ByteArray#: I have to read values of type Int8 and then glue them together with some bitwise operations.

I suggest adding number of primops, similar to existing ones, which would accept offset in bytes from the start of the ByteArray#:

-- | Read 8-bit integer; offset in bytes.
indexByteInt8Array#  :: ByteArray# -> Int# -> Int#

-- | Read 16-bit integer; offset in bytes.
indexByteInt16Array# :: ByteArray# -> Int# -> Int#

-- | Read 32-bit integer; offset in bytes.
indexByteInt32Array# :: ByteArray# -> Int# -> Int#

-- | Read 8-bit integer; offset in bytes.
readInt8Array#  :: MutableByteArray# s -> Int# -> State# s -> (#State# s, Int##)

-- | Read 16-bit integer; offset in bytes.
readInt16Array# :: MutableByteArray# s -> Int# -> State# s -> (#State# s, Int##)

-- | Read 32-bit integer; offset in bytes.
readInt32Array# :: MutableByteArray# s -> Int# -> State# s -> (#State# s, Int##)

and so on...

Change History (17)

comment:1 Changed 3 years ago by thomie

Type of failure: None/UnknownRuntime performance bug

comment:2 Changed 2 years ago by Mathnerd314

Can we instead have primops which take both an offset measured in bytes and an offset measured in terms of the type?

indexTYPEArray# :: ByteArray#          -> Int# {-byte offset-} -> Int# {-type offset-} -> TYPE#
readTYPEArray#  :: MutableByteArray# s -> Int# {-byte offset-} -> Int# {-type offset-} -> State# s -> (#State# s, TYPE##)
writeTYPEArray# :: MutableByteArray# s -> Int# {-byte offset-} -> Int# {-type offset-} -> TYPE# -> State# s -> State# s

indexTYPEOffAddr# :: Addr# -> Int# {-byte offset-} -> Int# {-type offset-} -> TYPE
readTYPEOffAddr# :: Addr# -> Int# {-byte offset-} -> Int# {-type offset-} -> State# s -> (#State# s, TYPE ##) 
writeTYPEOffAddr# :: Addr# -> Int# {-byte offset-} -> Int# {-type offset-} -> TYPE -> State# s -> State# s 

All of these go through the mkBasicIndexed{Read,Write} functions, which take both a byte offset and a type offset, so it seems reasonable to expose that.

comment:3 Changed 22 months ago by bgamari

Milestone: 8.2.1

It seems unlikely that this will happen for 8.2.

comment:4 Changed 15 months ago by bgamari

Keywords: newcomers added

If anyone is interested in picking this up do ping me. It should be a relatively straightforward patch.

comment:5 in reply to:  2 Changed 9 months ago by sjakobi

Cc: sjakobi added

Replying to Mathnerd314:

Can we instead have primops which take both an offset measured in bytes and an offset measured in terms of the type?

indexTYPEArray# :: ByteArray#          -> Int# {-byte offset-} -> Int# {-type offset-} -> TYPE#
readTYPEArray#  :: MutableByteArray# s -> Int# {-byte offset-} -> Int# {-type offset-} -> State# s -> (#State# s, TYPE##)
writeTYPEArray# :: MutableByteArray# s -> Int# {-byte offset-} -> Int# {-type offset-} -> TYPE# -> State# s -> State# s

indexTYPEOffAddr# :: Addr# -> Int# {-byte offset-} -> Int# {-type offset-} -> TYPE
readTYPEOffAddr# :: Addr# -> Int# {-byte offset-} -> Int# {-type offset-} -> State# s -> (#State# s, TYPE ##) 
writeTYPEOffAddr# :: Addr# -> Int# {-byte offset-} -> Int# {-type offset-} -> TYPE -> State# s -> State# s 

I like these types.

All of these go through the mkBasicIndexed{Read,Write} functions, which take both a byte offset and a type offset, so it seems reasonable to expose that.

I currently don't see how this can be done. These functions require a byte offset with type ByteOff (Int) but we only have a CmmExpr. It seems to me that the new primops will require quite a bit of new plumbing down to CmmRegOff. Am I missing something?

comment:6 Changed 9 months ago by sjakobi

Cc: sjakobi removed
Differential Rev(s): Phab:D4433
Owner: set to sjakobi

I have implemented a first primop which was much less code than I had expected. Please review the attached Phab.

comment:7 in reply to:  2 ; Changed 9 months ago by vagarenko

Can we instead have primops which take both an offset measured in bytes and an offset measured in terms of the type?

indexTYPEArray# :: ByteArray#          -> Int# {-byte offset-} -> Int# {-type offset-} -> TYPE#
readTYPEArray#  :: MutableByteArray# s -> Int# {-byte offset-} -> Int# {-type offset-} -> State# s -> (#State# s, TYPE##)
writeTYPEArray# :: MutableByteArray# s -> Int# {-byte offset-} -> Int# {-type offset-} -> TYPE# -> State# s -> State# s

indexTYPEOffAddr# :: Addr# -> Int# {-byte offset-} -> Int# {-type offset-} -> TYPE
readTYPEOffAddr# :: Addr# -> Int# {-byte offset-} -> Int# {-type offset-} -> State# s -> (#State# s, TYPE ##) 
writeTYPEOffAddr# :: Addr# -> Int# {-byte offset-} -> Int# {-type offset-} -> TYPE -> State# s -> State# s 

All of these go through the mkBasicIndexed{Read,Write} functions, which take both a byte offset and a type offset, so it seems reasonable to expose that.

I'm confused. Why do you want this? Isn't byte offset here is just type offset * sizeof type?

comment:8 in reply to:  7 ; Changed 9 months ago by sjakobi

Replying to vagarenko:

I'm confused. Why do you want this? Isn't byte offset here is just type offset * sizeof type?

While I'm not the requester, I think the types that Mathnerd314 proposes have two advantages:

  • They relieve the user from doing the byte offset computation.
  • They are different from the preexisting types for the indexTYPEArray functions, thereby making it impossible to confuse the two.

comment:9 in reply to:  8 ; Changed 9 months ago by vagarenko

Replying to sjakobi:

Replying to vagarenko:

I'm confused. Why do you want this? Isn't byte offset here is just type offset * sizeof type?

While I'm not the requester, I think the types that Mathnerd314 proposes have two advantages:

  • They relieve the user from doing the byte offset computation.

Then what does that Int# {-byte offset-} parameter mean?

comment:10 in reply to:  9 ; Changed 9 months ago by sjakobi

Replying to vagarenko:

Replying to sjakobi:

  • They relieve the user from doing the byte offset computation.

Then what does that Int# {-byte offset-} parameter mean?

What I meant is that the user doesn't need to compute the total offset. While a user may use the API you proposed like

indexByteInt16Array# ba (byte_offset + type_offset * int16_size)

the other API offers

indexByteInt16Array# ba byte_offset type_offset

which IMO is simply more concise and convenient.

I hope this answers your question.

comment:11 in reply to:  10 ; Changed 9 months ago by vagarenko

Replying to sjakobi:

Replying to vagarenko:

Replying to sjakobi:

  • They relieve the user from doing the byte offset computation.

Then what does that Int# {-byte offset-} parameter mean?

What I meant is that the user doesn't need to compute the total offset. While a user may use the API you proposed like

indexByteInt16Array# ba (byte_offset + type_offset * int16_size)

I still don't understand. What is total offset? What are byte_offset and type_offset?

By byte_offset you mean number of bytes from the start of the array ba to the sought value of type Int#, correct?

Then what is type_offset? Number of Int16 elements before the sought element? But my motivation for this ticket was to be able to store values of different types in a ByteArray#.

comment:12 in reply to:  11 Changed 9 months ago by sjakobi

Replying to vagarenko:

I don't understand. What is total offset? What are byte_offset and type_offset?

By byte_offset you mean number of bytes from the start of the array ba to the sought value of type Int#, correct?

Then what is type_offset? Number of Int16 elements before the sought element? But my motivation for this ticket was to be able to store values of different types in a ByteArray#.

Yes, sorry for my lack of clarity and thanks for putting it in a nutshell!

What I assume is Mathnerd314's motivation is that even when a ByteArray# contains several types, it may contain a sequence of values of the same type.

For example you may want to serialize a vector of Int32s by first writing the length of the vector as a Word64 followed by the values.

comment:13 Changed 9 months ago by sjakobi

bgamari commented on D4433:

If I'm not mistaken this also needs to take care to avoid unaligned loads and stores on architectures that do not support such things.

I could need some guidance on how to do that. Is there an existing machinery for working around unaligned operations that I could use?

comment:14 Changed 9 months ago by bgamari

Cc: Jaffacake added

I don't believe so; this is one of the reasons the primops look like they do: they allow us to avoid dealing with alignment headaches. Frankly, I'm not even sure what sort of alignment guarantees our current C-- load and store nodes expect. Jaffacake, could you comment on this?

We do have a list of architectures for which we need to worry about alignment (essentially everything but amd64; see PprC.cLoad). To figure out how to lower these operations I would likely just use a C compiler. For instance, compile a test program like,

#include <stdint.h>

struct { uint32_t x; } __attribute__((packed)) *x;

void store() {
    x->x = 42;
}

with a cross-compiler and see what gets produced.

comment:15 Changed 8 months ago by sjakobi

comment:16 Changed 3 months ago by jberryman

I would also appreciate this. My use case is I'd like, for performance and coding simplicity, to be able to do potentially unaligned reads of Word64 when we're compiled for x86_64 arch.

Library writers need to think about alignment when using the Addr interface to pinned ByteArrays, so I don't think it's a big deal to expose this (though docs could use improvement, see https://ghc.haskell.org/trac/ghc/ticket/14731)

That said I would love it if the compiler didn't push this complexity down to library writers (both here and in the Addr load/store primops), and handled twiddling on architectures that don't support unaligned reads. I would like my code to be compatible but don't particularly care if there is a performance hit (99% of people are using x86). The alternative is I just have to write my own slow code and put it behind a CPP pragma.

But it's not clear to me if vagarenko actually wants to do any unaligned reads (and I don't want to hijack this request). Perhaps they just want to do aligned reads on their heterogeneous structure. In that case you can write your own implementation easily by dividing the byte offset provided by the width of the payload you're requesting.

comment:17 Changed 2 months ago by monoidal

Keywords: newcomer added; newcomers removed
Note: See TracTickets for help on using tickets.