Hey, we started the Linuxdoc.org blog. You want to write a guest post about your favorite Linux related topic? Just send us a mail.The Linux community would appreciate it.

I am writing some code in C++ and need to implement some parts in assembly (X86 target). The code to be written in assembly performs some calculations where high accuracy is required - the calculations must be done using the extended double precision (80-bit, 64 bit mantissa) of the FPU, and many of the constants are full 80-bit extended double values, hence the requirement for hand-coded inline assembly. (Some calculations, in fact, are done to even higher than 64-bit mantissa precision by splitting values into parts.)

The code must compile in a few different environments, and with a few different compilers. I have the code compiling and running fine in MS VC, using MASM. I also have the code compiling and running fine using gcc in a Windows environment. However, I am having difficulty compiling in a Linux environment, gcc version 4.1.2 20070115 (SUSE Linux). I realize that this is an old version, but the version is dictated by external software we link our libraries to.

The difficulty, I am sure, stems from my less-than-complete knowledge and understanding of constraints. As noted, the assembler code uses 80-bit extended double constants. Whenever it needs one of these constants, my code loads it into the FPU stack, using fldt. The constants are stored as globals in the C++ code. Typical code I use is:

#define CONST
//#define CONST const

//Some regular double constants.
CONST double dHalf(0.5);
CONST double s_dLowerLimit(...);
CONST double s_dUpperLimit(...);
CONST double s_dSplit_2_3(...);
CONST double s_dSplit_1_2(...);
CONST double s_dSplit_3_4(...);

//some 80-bit constants
Double80 s_oneOverRootTwoPi = { 0x68, 0x84, 0xB2, 0xA1, 0x9E, 0x29, 0x42, 0xCC, 0xFD, 0x3F};
Double80 s_ab[9] =
{
    { 0xAE, 0x08, 0xA7, 0x44, 0x2A, 0x76, 0x84, 0x86, 0xFB, 0x3F},
    { 0x69, 0x9B, 0xBF, 0xA7, 0x8D, 0x5E, 0x0E, 0x8F, 0x00, 0x40},
    { 0xEC, 0xBA, 0x2D, 0xBC, 0x26, 0x3A, 0x07, 0xA1, 0x06, 0x40},
    { 0x6C, 0x8B, 0xEB, 0xCF, 0x43, 0x10, 0x76, 0x85, 0x09, 0x40},
    { 0x58, 0x5F, 0x97, 0xD6, 0x66, 0xF6, 0xD5, 0x8D, 0x0D, 0x40},
    { 0xDB, 0xEC, 0x96, 0x7D, 0xA1, 0x71, 0xCF, 0xBC, 0x04, 0x40},
    { 0x5A, 0xC0, 0xAD, 0xF2, 0xAB, 0x4E, 0x06, 0xF4, 0x08, 0x40},
    { 0x2C, 0x3E, 0x6E, 0xE5, 0x94, 0xBA, 0x53, 0xA0, 0x0C, 0x40},
    { 0x77, 0x64, 0x3D, 0xDC, 0x11, 0xCA, 0xC3, 0xB1, 0x0E, 0x40},
};

Double80 s_cd[19] = {...};
Double80 s_pq[29] = {...};

//A function I call within the asm code, also inline asm code...
double Y(double x);

//The problematic function containing the asm code
double calculate(double x)
{
    register double dResult;
    __asm__
    (
        "\n\t" "fldl %[x]"
        //... do some other stuff that leaves a value y at st(0),
                    // e.g. involving the constant oneOverRootTwoPi
        //    and all of the other constants in the input constraints below
        "\n\t" "fldt %[s_oneOverRootTwoPi]" // c, x
        "\n\t" "fmulp %%st(0), %%st(1)" //y=x*c
        //calculate rational polynomial R(y) - 
        //and example of using the global 80-bit constants s_ab[]
        //num
        "\n\t" "fldt 00+%[s_ab]" //a0, y,
        "\n\t" "fmul %%st(1), %%st(0)" //a0*y, y
        "\n\t" "fldt 10+%[s_ab]" //a1, a0*y, y
        "\n\t" "faddp %%st(0), %%st(1)" //a1+a0*y, y
        "\n\t" "fmul %%st(1), %%st(0)"
        "\n\t" "fldt 20+%[s_ab]" //a2
        "\n\t" "faddp %%st(0), %%st(1)"
        "\n\t" "fmul %%st(1), %%st(0)"
        "\n\t" "fldt 30+%[s_ab]" //a3
        "\n\t" "faddp %%st(0), %%st(1)"
        "\n\t" "fmul %%st(1), %%st(0)"
        "\n\t" "fldt 40+%[s_ab]" //a4
        "\n\t" "faddp %%st(0), %%st(1)" //a4+(a3+(a2+(a1+a0*y)*y)*y)*y=num, y
        //den
        "\n\t" "fldt 40+10+%[s_ab]" //b1, num, y
        "\n\t" "fadd %%st(2), %%st(0)" //b1+y, num, y
        "\n\t" "fmul %%st(2), %%st(0)"
        "\n\t" "fldt 40+20+%[s_ab]" //b2
        "\n\t" "faddp %%st(0), %%st(1)"
        "\n\t" "fmul %%st(2), %%st(0)"
        "\n\t" "fldt 40+30+%[s_ab]" //b3, (b2+(b1 + y)*y)*y, num, y
        "\n\t" "faddp %%st(0), %%st(1)" //b3+(b2+(b1 + y)*y)*y, num, y
        "\n\t" "fmulp %%st(0), %%st(2)" //num, (b3+(b2+(b1 + y)*y)*y)*y
        "\n\t" "fldt 40+40+%[s_ab]" //b4, num, (b3+(b2+(b1 + y)*y)*y)*y
        "\n\t" "faddp %%st(0), %%st(2)" //num, b4+(b3+(b2+(b1 + y)*y)*y)*y=den
        // num/den
        "\n\t" "fdivp %%st(0), %%st(1)" // st1:=st0/st1, //num/den
        : "=t" (dResult)
        : [x] "m" (x)
        , [dHalf] "m" (dHalf)
        , [s_dLowerLimit] "m" (s_dLowerLimit)
        , [s_dUpperLimit] "m" (s_dUpperLimit)
        , [s_dSplit_2_3] "m" (s_dSplit_2_3)
        , [s_dSplit_1_2] "m" (s_dSplit_1_2)
        , [s_dSplit_3_4] "m" (s_dSplit_3_4)
        , [s_oneOverRootTwoPi] "m" (*s_oneOverRootTwoPi)
        , [s_NORMAL_ab] "m" (**s_NORMAL_ab)
        , [s_NORMAL_cd] "m" (**s_NORMAL_cd)
        , [s_NORMAL_pq] "m" (**s_NORMAL_pq)
        , [Y] "r"(Y)
        : "st(7)", "st(6)", "st(5)", "st(4)", "ax"
    );
    return dResult;
}

The above code compiles and runs fine on the Windows platform using gcc version 4.5.2 (minGW). Indeed, the code gcc generates is perfect - optimizations take out the indexing, (address plus index is known at compile time), the double dResult is treated at st(0) for return purposes, etc. etc.

However, in Linux, compiling with

gcc -fpic -O3 rational.cpp -o rational

I get the error "can't find a register in class 'GENERAL_REGS' while reloading 'asm'". My understanding is that gcc and/or the assembler thinks it needs a separate register for each of the "m" input parameters, when in fact it needs no registers (or at most one with - fpic).

What is the correct constraint to use with the inputs? I have tried many things, but nothing seems to work.

Also, why does changing the definition of the double constants above from plain "double" to "const double" cause the compile to fail? How do I keep the compiler from trying to treat them as immediate double const's? Again, likely I am using the incorrect constraint?

asked 30 May '11, 10:53

David%20I%20McIntosh's gravatar image

David I McIn...
1111
accept rate: 0%

Be the first one to answer this question!
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×1
×1
×1

Asked: 30 May '11, 10:53

Seen: 2,324 times

Last updated: 30 May '11, 10:53

Related questions

we use OSQA