Opened 9 years ago

Closed 3 years ago

#2917 closed bug (duplicate)

alloca and allocaArray do not respect alignment

Reported by: guest Owned by:
Priority: normal Milestone:
Component: Compiler (FFI) Version: 6.12.3
Keywords: Cc: lennart@…, danieldiaz@…, ghc@…, dterei
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Test Case:
Blocked By: Blocking:
Related Tickets: Differential Rev(s):
Wiki Page:

Description

When allocating memory with alloca or allocaArray the alignment of the Storable is not respected, alignment seems to be on 8 byte boundary. malloc and mallocArray seem to have the same problem. And because of this functions like withArray etc also break the alignment restriction of the array element.

Run this and look at the pointer.

import Foreign.Ptr
import Foreign.Storable
import Foreign.Marshal.Array
import Foreign.Marshal

data Foo = Foo

instance Storable Foo where
    sizeOf _ = 8
    alignment _ = 256

main =
    allocaArray 5 $ \ p -> do
    let _ = p :: Ptr Foo
    print p
    q <- mallocArray 5
    let _ = q :: Ptr Foo
    print q

Attachments (2)

alignment.patch (6.8 KB) - added by guest 9 years ago.
Patch to use alignment.
align.patch (21.3 KB) - added by guest 9 years ago.
New alignment patch.

Download all attachments as: .zip

Change History (29)

Changed 9 years ago by guest

Attachment: alignment.patch added

Patch to use alignment.

comment:1 Changed 9 years ago by guest

I've added a patch to use the alignment when allocating.

comment:2 Changed 9 years ago by simonmar

difficulty: Unknown

Thanks for the patch. As it stands, this will waste memory when the required alignment is <= 8 bytes, since allocaBytes and mallocBytes both always return 8-byte aligned memory. Could you modify the patch to take that into account?

comment:3 Changed 9 years ago by guest

Trying to fix my patch to not waste memory I realized that the patch is broken. It takes a pointer from some internal malloc and adjusts it for alignment. This will not work when that pointer is later given to free.

I'm not sure how to fix this. It seems we need to have the low level allocator return a pointer that is properly aligned, otherwise giving the block back to the low level free will never work.

In short, I don't know how to fix this.

Note that this is a real problem. Using SSE instructions on the x86 requires the vectors to be properly aligned, otherwise you get a fault. So I can't use any of the allocators for Storable in Foreign (nor Data.Array.Storable) when allocating vectors.

A hack would be to have the allocator return everything 16 byte aligned, but that really is a hack. The longer term solution must take alignment into account.

-- Lennart

comment:4 Changed 9 years ago by guest

I'm working on a new patch that I think will work, except it relies on posix_memalign() which seems to be a rare beast.

comment:5 Changed 9 years ago by simonmar

Yes, good point. The allocaBytes version is fine, btw (except that it should avoid wasting memory when alignement <= 8). posix_memalign() is the only way I know to allocate aligned memory from the OS.

Changed 9 years ago by guest

Attachment: align.patch added

New alignment patch.

comment:6 Changed 9 years ago by igloo

Milestone: 6.10.2
Priority: normalhigh

comment:7 Changed 9 years ago by igloo

I'm a little nervous about rushing this in, particularly as it changes (adds to) some APIs, and I'm not sure exactly what the behaviour should be.

You say

mallocAlignment :: Int
mallocAlignment = 16
allocaAlignment :: Int
allocaAlignment = 8

but my manpage says

GNU libc malloc() always returns 8-byte aligned  memory  addresses

Where did 16 come from? It also seems strange to me that allocaAlignment should be different to mallocAlignment.

According to my OS X manpages, OS X doesn't have posix_memalign or memalign, although it does have valloc ("The allocated memory is aligned on a page boundary."). However, for all the allocation functions on OS X, "The allocated memory is aligned such that it can be used for any data type, including AltiVec- and SSE-related types.". So if we're told the alignment is 32, and we get memory 16-byte aligned, should we fail? It's not clear to me.

comment:8 Changed 9 years ago by guest

The mallocAlignment is the alignment returned by C malloc(), the allocaAlignment is the alignment for ghc's heap based allocation. I should have spelled that out. Those two things should not be numbers, of course, but determined by the configure script.

yes, OS X malloc() works fine, but ghc's internal allocation fails since it allocates on 8 byte boundary.

I don't want to rush this in either. But the bug makes large parts of the FFI library unusable with LLVM, so I'd like some fix. How about a band aid that changes ghc's internal alloca to 16 byte boundary? That will make things work, and we have a chance to test my patch further.

-- Lennart

comment:9 Changed 9 years ago by simonmar

Owner: set to simonmar

I'll do as Lennart suggests and bump the default alignment to 16 bytes.

comment:10 Changed 9 years ago by simonmar

Architecture: x86Unknown/Multiple
Owner: changed from simonmar to igloo
Type: bugmerge

Done:

Thu Feb 19 02:32:45 PST 2009  Simon Marlow <marlowsd@gmail.com>
  * newPinnedByteArray#: align the result to 16-bytes (part of #2917)

comment:11 Changed 9 years ago by igloo

Type: mergebug

That patch was buggy (segfaults on 32-bit platforms, amongst other problems), so I've rolled it back.

comment:12 Changed 9 years ago by igloo

See also #2983.

comment:13 Changed 9 years ago by igloo

Owner: changed from igloo to simonmar

comment:14 Changed 9 years ago by simonmar

Owner: changed from simonmar to igloo
Type: bugmerge

alloca now aligns correctly according to the Storable instance, and allocaBytes now aligns to 16 bytes. malloc and mallocBytes haven't changed (yet). Perhaps we should use posix_memalign where possible.

Ian: please merge then bump to 6.12.

Thu Mar  5 07:41:53 PST 2009  Simon Marlow <marlowsd@gmail.com>
  * Partial fix for #2917
  Ignore-this: 3a06cd3ea09f1d6454d52031802a93fd
  
   - add newAlignedPinnedByteArray# for allocating pinned BAs with
     arbitrary alignment
  
   - the old newPinnedByteArray# now aligns to 16 bytes
  
  Foreign.alloca will use newAlignedPinnedByteArray#, and so might end
  up wasting less space than before (we used to align to 8 by default).
  Foreign.allocaBytes and Foreign.mallocForeignPtrBytes will get 16-byte
  aligned memory, which is enough to avoid problems with SSE
  instructions on x86, for example.
  
  There was a bug in the old newPinnedByteArray#: it aligned to 8 bytes,
  but would have failed if the header was not a multiple of 8
  (fortunately it always was, even with profiling).  Also we
  occasionally wasted some space unnecessarily due to alignment in
  allocatePinned().
  
  I haven't done anything about Foreign.malloc/mallocBytes, which will
  give you the same alignment guarantees as malloc() (8 bytes on
  Linux/x86 here).

comment:15 Changed 9 years ago by igloo

Resolution: fixed
Status: newclosed

Merged:

Wed Mar 11 13:26:17 PDT 2009  Ian Lynagh <igloo@earth.li>
  * Merge the #2917 fixes for the 6.10 branch

comment:16 Changed 9 years ago by igloo

Milestone: 6.10.26.12 branch
Resolution: fixed
Status: closedreopened

comment:17 Changed 9 years ago by igloo

Type: mergebug

comment:18 Changed 9 years ago by igloo

Owner: igloo deleted
Status: reopenednew

comment:19 Changed 9 years ago by igloo

See also #1605.

comment:20 Changed 7 years ago by igloo

Milestone: 6.12 branch6.14.1
Type of failure: None/Unknown

comment:21 Changed 7 years ago by simonmar

Also fixed allocaArray:

Thu Aug 12 03:55:24 PDT 2010  Simon Marlow <marlowsd@gmail.com>
  * export allocaBytesAligned; make allocaArray use the correct alignment (#2917)

comment:22 Changed 7 years ago by simonmar

Milestone: 6.14.1_|_
Priority: highnormal

Taking it off the milestone. The only remaining things to do are malloc and mallocBytes: these use C malloc, which satisfies the minimum alignment requirement for any native datatype, so there shouldn't be a problem in most cases.

comment:23 Changed 7 years ago by guest

Cc: danieldiaz@… added
Version: 6.10.16.12.3

I wasted several hours to find out that my LLVM related Haskell code is also affected by this problem (I found it in mallocArray). I already found out, that LLVM's malloc intrinsic is broken in this way, but I expected Haskell's malloc is correct in this respect. My malloc(C) manpage says: "For calloc() and malloc(), return a pointer to the allocated memory, which is suitably aligned for any kind of variable." But sometimes it returns pointers that are not divisible by 16, only by 8. Thus it seems, that malloc or its manpage is buggy.

My solution is to allocate a memory chunk that is 16+sizeof(ptr) bytes larger. Then I choose a start address within that chunk that is divisible by 16 and leaves space for a pointer to the full allocated chunk, that I store just before that 16-byte-aligned address.

#include <stdlib.h>

void *aligned_malloc(size_t size) {
  const int ptrsize = sizeof(void *);
  void *ptr = malloc(size+16+ptrsize);
  if (ptr) {
    void **alignedptr = (void **)((size_t)(ptr+16+ptrsize) & (-16));
    *(alignedptr-1) = ptr;
    return alignedptr;
  } else {
    return NULL;
  }
};

void aligned_free(void *alignedptr) {
  void **sptr = (void **) alignedptr;
  void *ptr = *(sptr - 1);
  free(ptr);
};

comment:24 in reply to:  22 Changed 7 years ago by Lemming

Cc: ghc@… added

Replying to simonmar:

Taking it off the milestone. The only remaining things to do are malloc and mallocBytes: these use C malloc, which satisfies the minimum alignment requirement for any native datatype, so there shouldn't be a problem in most cases.

On my machine the 'malloc' manpage says, that 'malloc' aligns correctly for any variable, but apparently it does not. Maybe the statement was true before the 16-byte vectors of SSE. We are not the first ones who use SSE, how could the problem be missed by the kernel developers? So I'm uncertain whether the behaviour of 'malloc' is intended or not and thus whether we should fix Haskell libraries or the Linux kernel.

comment:25 Changed 7 years ago by Lemming

For a wrapper around 'malloc' we could waste less memory, if we would require that 'free' has a Storable constraint and requires that its pointer type matches the one of the corresponding 'malloc'. With the size and alignment information, 'malloc' and 'free' could decide whether to pad at all. Even if they pad, they can save memory by putting the pointer to full allocated block after the block of aligned data. E.g. compare the layout, where 'p' is a byte of the pointer to the full allocated block, 'a' is a byte of 16-byte aligned data, and '*' is allocated but unused (=wasted) padding byte.

0           0   1               2              2
0           C   0               0              F
************ppppaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

0               1               2  2
0               0               0  3
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaapppp

comment:26 Changed 6 years ago by dterei

Cc: dterei added

comment:27 Changed 3 years ago by simonmar

Resolution: duplicate
Status: newclosed

reported as #9806

Note: See TracTickets for help on using tickets.