I am writing some code in C++ and need to implement some parts in
assembly (X86 target). The code to be written in assembly performs
some calculations where high accuracy is required - the calculations
must be done using the extended double precision (80-bit, 64 bit
mantissa) of the FPU, and many of the constants are full 80-bit
extended double values, hence the requirement for hand-coded inline
assembly. (Some calculations, in fact, are done to even higher than
64-bit mantissa precision by splitting values into parts.)
The code must compile in a few different environments, and with a few
different compilers. I have the code compiling and running fine in MS
VC, using MASM. I also have the code compiling and running fine using
gcc in a Windows environment. However, I am having difficulty
compiling in a Linux environment, gcc version 4.1.2 20070115 (SUSE
Linux). I realize that this is an old version, but the version is
dictated by external software we link our libraries to.
The difficulty, I am sure, stems from my less-than-complete knowledge
and understanding of constraints. As noted, the assembler code uses
80-bit extended double constants. Whenever it needs one of these
constants, my code loads it into the FPU stack, using fldt. The
constants are stored as globals in the C++ code. Typical code I use
is:
#define CONST
//#define CONST const
//Some regular double constants.
CONST double dHalf(0.5);
CONST double s_dLowerLimit(...);
CONST double s_dUpperLimit(...);
CONST double s_dSplit_2_3(...);
CONST double s_dSplit_1_2(...);
CONST double s_dSplit_3_4(...);
//some 80-bit constants
Double80 s_oneOverRootTwoPi = { 0x68, 0x84, 0xB2, 0xA1, 0x9E, 0x29, 0x42, 0xCC, 0xFD, 0x3F};
Double80 s_ab[9] =
{
{ 0xAE, 0x08, 0xA7, 0x44, 0x2A, 0x76, 0x84, 0x86, 0xFB, 0x3F},
{ 0x69, 0x9B, 0xBF, 0xA7, 0x8D, 0x5E, 0x0E, 0x8F, 0x00, 0x40},
{ 0xEC, 0xBA, 0x2D, 0xBC, 0x26, 0x3A, 0x07, 0xA1, 0x06, 0x40},
{ 0x6C, 0x8B, 0xEB, 0xCF, 0x43, 0x10, 0x76, 0x85, 0x09, 0x40},
{ 0x58, 0x5F, 0x97, 0xD6, 0x66, 0xF6, 0xD5, 0x8D, 0x0D, 0x40},
{ 0xDB, 0xEC, 0x96, 0x7D, 0xA1, 0x71, 0xCF, 0xBC, 0x04, 0x40},
{ 0x5A, 0xC0, 0xAD, 0xF2, 0xAB, 0x4E, 0x06, 0xF4, 0x08, 0x40},
{ 0x2C, 0x3E, 0x6E, 0xE5, 0x94, 0xBA, 0x53, 0xA0, 0x0C, 0x40},
{ 0x77, 0x64, 0x3D, 0xDC, 0x11, 0xCA, 0xC3, 0xB1, 0x0E, 0x40},
};
Double80 s_cd[19] = {...};
Double80 s_pq[29] = {...};
//A function I call within the asm code, also inline asm code...
double Y(double x);
//The problematic function containing the asm code
double calculate(double x)
{
register double dResult;
__asm__
(
"\n\t" "fldl %[x]"
//... do some other stuff that leaves a value y at st(0),
// e.g. involving the constant oneOverRootTwoPi
// and all of the other constants in the input constraints below
"\n\t" "fldt %[s_oneOverRootTwoPi]" // c, x
"\n\t" "fmulp %%st(0), %%st(1)" //y=x*c
//calculate rational polynomial R(y) -
//and example of using the global 80-bit constants s_ab[]
//num
"\n\t" "fldt 00+%[s_ab]" //a0, y,
"\n\t" "fmul %%st(1), %%st(0)" //a0*y, y
"\n\t" "fldt 10+%[s_ab]" //a1, a0*y, y
"\n\t" "faddp %%st(0), %%st(1)" //a1+a0*y, y
"\n\t" "fmul %%st(1), %%st(0)"
"\n\t" "fldt 20+%[s_ab]" //a2
"\n\t" "faddp %%st(0), %%st(1)"
"\n\t" "fmul %%st(1), %%st(0)"
"\n\t" "fldt 30+%[s_ab]" //a3
"\n\t" "faddp %%st(0), %%st(1)"
"\n\t" "fmul %%st(1), %%st(0)"
"\n\t" "fldt 40+%[s_ab]" //a4
"\n\t" "faddp %%st(0), %%st(1)" //a4+(a3+(a2+(a1+a0*y)*y)*y)*y=num, y
//den
"\n\t" "fldt 40+10+%[s_ab]" //b1, num, y
"\n\t" "fadd %%st(2), %%st(0)" //b1+y, num, y
"\n\t" "fmul %%st(2), %%st(0)"
"\n\t" "fldt 40+20+%[s_ab]" //b2
"\n\t" "faddp %%st(0), %%st(1)"
"\n\t" "fmul %%st(2), %%st(0)"
"\n\t" "fldt 40+30+%[s_ab]" //b3, (b2+(b1 + y)*y)*y, num, y
"\n\t" "faddp %%st(0), %%st(1)" //b3+(b2+(b1 + y)*y)*y, num, y
"\n\t" "fmulp %%st(0), %%st(2)" //num, (b3+(b2+(b1 + y)*y)*y)*y
"\n\t" "fldt 40+40+%[s_ab]" //b4, num, (b3+(b2+(b1 + y)*y)*y)*y
"\n\t" "faddp %%st(0), %%st(2)" //num, b4+(b3+(b2+(b1 + y)*y)*y)*y=den
// num/den
"\n\t" "fdivp %%st(0), %%st(1)" // st1:=st0/st1, //num/den
: "=t" (dResult)
: [x] "m" (x)
, [dHalf] "m" (dHalf)
, [s_dLowerLimit] "m" (s_dLowerLimit)
, [s_dUpperLimit] "m" (s_dUpperLimit)
, [s_dSplit_2_3] "m" (s_dSplit_2_3)
, [s_dSplit_1_2] "m" (s_dSplit_1_2)
, [s_dSplit_3_4] "m" (s_dSplit_3_4)
, [s_oneOverRootTwoPi] "m" (*s_oneOverRootTwoPi)
, [s_NORMAL_ab] "m" (**s_NORMAL_ab)
, [s_NORMAL_cd] "m" (**s_NORMAL_cd)
, [s_NORMAL_pq] "m" (**s_NORMAL_pq)
, [Y] "r"(Y)
: "st(7)", "st(6)", "st(5)", "st(4)", "ax"
);
return dResult;
}
The above code compiles and runs fine on the Windows platform using
gcc version 4.5.2 (minGW). Indeed, the code gcc generates is perfect -
optimizations take out the indexing, (address plus index is known at
compile time), the double dResult is treated at st(0) for return
purposes, etc. etc.
However, in Linux, compiling with
gcc -fpic -O3 rational.cpp -o rational
I get the error "can't find a register in class 'GENERAL_REGS' while
reloading 'asm'". My understanding is that gcc and/or the assembler
thinks it needs a separate register for each of the "m" input
parameters, when in fact it needs no registers (or at most one with -
fpic).
What is the correct constraint to use with the inputs? I have tried
many things, but nothing seems to work.
Also, why does changing the definition of the double constants above
from plain "double" to "const double" cause the compile to fail? How
do I keep the compiler from trying to treat them as immediate double
const's? Again, likely I am using the incorrect constraint?