WIP updated homebrew toolchain for PS2

Possible reason for printf %f giving wrong output... __extendsfdf2 (function for converting single precision 32-bit float (float) to double precision 64-bit float (double)) is bugged.
 
  • Like
Reactions: TnA
Since the newlib port is now done, @fjtrujy and I have been working on upgrading the ee toolchain to:
- binutils 2.34
- gcc 9.2
- newlib 3.3
NOTE: ee only, for iop and dvp we're still using the old toolchain.

We ran into the __extendsfdf2 issue. Internally __extendsfdf2 uses __muldi3. Somehow __muldi3 gets called recursively, causing the ps2 to freeze and/or the stack to overflow. I've worked around this issue here:
https://gitlab.com/ps2max/toolchain/gcc/-/commit/6e5a7b0f802831ca3c332cd382918ab5681f1935

Also today, for the first time, we got OPL running on the new toolchain:
Screenshot from 2020-03-20 11-35-49.png

There's something wrong with the background and I haven't tested if everything works. But the fact that it now boots up is already great.
 
OPL on the updated tools? Great?
Great work @Maximus32, @uyjulian and @fjtrujy!

Hm... The plasma is rendered on VU0 AFAIR, but I don't remember about that "cross"!

Retroarch is working on the new sdk/toolchain as well? Cool!

Once the SDK works, I think a kind of versioning and a way to install it in parallel to the old toolchain and SDK would be useful, along a tutorial which tells the people "how to" port old apps to the new SDK or how to convert other apps to PS2-Apps...!
 
Small update:

When calling a function with with 5 or more parameters using the old ABI (eabi64), The parameters are passed as:
- r4 = $a0
- r5 = $a1
- r6 = $a2
- r7 = $a3
- r8 = $t0 <- fifth parameter

Now with the new ABI n32, the register numbers are the same, but they are named differently:
- r4 = $a0
- r5 = $a1
- r6 = $a2
- r7 = $a3
- r8 = $a4 <- fifth parameter

I'm not sure what register $t0 translates to when using the new compiler, but it isn't the correct one (r8). This is a problem for all C code calling assembly functions with 5 or more parameters.

One such example was the InvokeUserModeCallback function, used by the kernel patch for the SetAlarm function. It's fixed here. I'm sure there's more issues like these, in multiple projects.
 
Huge progress to the gcc 9, but still lots of challenges ahead. Any help is apreciated!

Mostly the compiler already does a lot of things right. Autovectorization is working, and it's doing an amazing job at accelerating code using the mmi instructions. For instance:
Code:
void test_v16u8_padd (unsigned char * a, unsigned char * b, unsigned char * c) {
  int i; for(i=0;i<256;i++) a[i]=b[i]+c[i];
}
Will generate make gcc generate an urolled loop of mmi instruction like:
Code:
	lq	$8,0($6)
	lq	$2,0($5)
	paddb	$2,$2,$8
	sq	$2,0($4)
	lq	$8,16($6)
	lq	$2,16($5)
	paddb	$2,$2,$8
	sq	$2,16($4) etc,etc...

And something like this:
Code:
void test_v8u16_pext_h(unsigned int * a, unsigned short * b) {
  int i; for(i=0;i<256;i++) a[i]=b[i];
}
Will generate make gcc generate an urolled loop of mmi instruction like:
Code:
	lq	$2,0($5)
	pextlb	$6,$0,$2
	pextub	$2,$0,$2
	sq	$6,0($4)
	sq	$2,16($4)
	lq	$2,16($5)
	pextlb	$6,$0,$2
	pextub	$2,$0,$2
	sq	$6,32($4)
	sq	$2,48($4) etc,etc...

There's a lot working, all autovectorization tests are here:
https://gitlab.com/ps2max/testing/test_compiler/-/blob/gcc-output-compare/test_autovec.c

And the generated assembly output of those tests is here:
https://gitlab.com/ps2max/testing/test_compiler/-/blob/gcc-output-compare/test_autovec.S

Unfortunately, sometimes gcc will also crash during compilation. Se all commented out lines in test_autovec.c, they all fail. This is probably becouse gcc fails to allocate a needed register, or it fails to move a result of an operation somewhere else.
And there's still missing and/or broken features that are not yet ported correctly, like compare operators.
 
Awesome progress!

Combined with newlib, this is going to be great!

This certainly should help developers, getting stuff easier and faster ported and also to compile some newer things with relative ease, which would need A LOT of work (at least in some cases), to backport them to the old tools!

I wonder how much old apps would need to be adapted, to compile on it!
Some "old" emulators and tools would probably benefit from the updates and auto-vectorization to MMI!

It's great to have the GCC do that automatically and spare some instructions (hence CPU-Cycles), to do a "job" 4 times with one instruction.


That's really awesome!!!
 
When calling a function with with 5 or more parameters using the old ABI (eabi64), The parameters are passed as:
- r4 = $a0
- r5 = $a1
- r6 = $a2
- r7 = $a3
- r8 = $t0 <- fifth parameter

Now with the new ABI n32, the register numbers are the same, but they are named differently:
- r4 = $a0
- r5 = $a1
- r6 = $a2
- r7 = $a3
- r8 = $a4 <- fifth parameter

I'm not sure what register $t0 translates to when using the new compiler, but it isn't the correct one (r8). This is a problem for all C code calling assembly functions with 5 or more parameters.

One such example was the InvokeUserModeCallback function, used by the kernel patch for the SetAlarm function. It's fixed here. I'm sure there's more issues like these, in multiple projects.

Why do you think it is the wrong register? From this file, $t0 is was always register 8.
From what I do remember, they renamed some registers like $t0-$t3 to $a4 to $a7. Old code that used the register names would not compile with the new assembler, but code that used the register numbers directly are unaffected. This does not affect the assembly files assembled with GCC itself, with as_reg_compat.h included.
 
Wouldn't it be possible to "translate" the renamed registers, either ahead of time in the source-files, or during compilation (based on a script or whatever)?
 
Great progress on the work on auto vectorization. I've been meaning to look in the MIPS MSA patches, but didn't really get to it, as I've been busy doing other stuff in my free time.

I wonder if the code generation will be fast enough for blending functions (or if tables in scratchpad are faster)... Guess I'll try it out whenever the time comes.
 
Why do you think it is the wrong register? From this file, $t0 is was always register 8.
From what I do remember, they renamed some registers like $t0-$t3 to $a4 to $a7. Old code that used the register names would not compile with the new assembler, but code that used the register numbers directly are unaffected. This does not affect the assembly files assembled with GCC itself, with as_reg_compat.h included.
I found and solved the issue by using the debugger from PCSX2. I searched in binutils/gas and found the following:
https://gitlab.com/ps2max/toolchain/binutils/-/blob/ee-toolchain-gcc9/gas/config/tc-mips.c#L2788
Code:
#define N32N64_SYMBOLIC_REGISTER_NAMES \
    {"$a4",	RTYPE_GP | 8},  \
    {"$a5",	RTYPE_GP | 9},  \
    {"$a6",	RTYPE_GP | 10}, \
    {"$a7",	RTYPE_GP | 11}, \
    {"$ta0",	RTYPE_GP | 8},  /* alias for $a4 */ \
    {"$ta1",	RTYPE_GP | 9},  /* alias for $a5 */ \
    {"$ta2",	RTYPE_GP | 10}, /* alias for $a6 */ \
    {"$ta3",	RTYPE_GP | 11}, /* alias for $a7 */ \
    {"$t0",	RTYPE_GP | 12}, \
    {"$t1",	RTYPE_GP | 13}, \
    {"$t2",	RTYPE_GP | 14}, \
    {"$t3",	RTYPE_GP | 15}

#define O32_SYMBOLIC_REGISTER_NAMES \
    {"$t0",	RTYPE_GP | 8},  \
    {"$t1",	RTYPE_GP | 9},  \
    {"$t2",	RTYPE_GP | 10}, \
    {"$t3",	RTYPE_GP | 11}, \
    {"$t4",	RTYPE_GP | 12}, \
    {"$t5",	RTYPE_GP | 13}, \
    {"$t6",	RTYPE_GP | 14}, \
    {"$t7",	RTYPE_GP | 15}, \
    {"$ta0",	RTYPE_GP | 12}, /* alias for $t4 */ \
    {"$ta1",	RTYPE_GP | 13}, /* alias for $t5 */ \
    {"$ta2",	RTYPE_GP | 14}, /* alias for $t6 */ \
    {"$ta3",	RTYPE_GP | 15}  /* alias for $t7 */
As you can see "$t0" is mapped to a different register, depending on the ABI used. The failing code was compiled by gas directly, without using as_reg_compat.h:
https://gitlab.com/ps2max/ps2sdk/-/blob/ee-toolchain-gcc9/ee/kernel/src/srcfile/src/dispatch.s


Perhaps as_reg_compat.h needs to be updated too.
 
GCC 10 has been branched, and a release should be coming later.

For the time being, I will continue to maintain my toolchain (mainly for IOP support).
If you are working on stuff not related to the IOP, I recommend that you use the "ee-toolchain-gcc9" branch of gitlab/ps2max/ps2dev-repo instead.
 
  • Like
Reactions: TnA
well... actually...

I've started tracking the upstream master branches of binutils, gcc and newlib. I update daily, and right now, this is what my stable ee compiler tells me:
Code:
mips64r5900el-ps2-elf-gcc -v
Using built-in specs.
COLLECT_GCC=mips64r5900el-ps2-elf-gcc
COLLECT_LTO_WRAPPER=/home/rgaiser/dev/ps2max-ee-toolchain-gcc9/ps2dev/ee/libexec/gcc/mips64r5900el-ps2-elf/11.0.0/lto-wrapper
Target: mips64r5900el-ps2-elf
Configured with: ../../ee/gcc/configure --prefix=/home/rgaiser/dev/ps2max-ee-toolchain-gcc9/ps2dev/ee --target=mips64r5900el-ps2-elf --enable-languages=c,c++ --with-float=hard --with-newlib --disable-nls --disable-shared --disable-libssp --disable-libmudflap --disable-threads --disable-libgomp --disable-libquadmath --disable-target-libiberty --disable-target-zlib --without-ppl --without-cloog --with-headers=/home/rgaiser/dev/ps2max-ee-toolchain-gcc9/ps2dev/ee/mips64r5900el-ps2-elf/include --disable-libada --disable-libatomic --disable-multilib --enable-cxx-flags=-G0
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 11.0.0 20200430 (experimental) (GCC)

Feel free to clone or pull from this branch, but keep in mind that I rebase (and force push) almost daily.
 
  • Like
Reactions: TnA
Compiled using gcc master (early version 11):
Screenshot from 2020-05-05 14-29-08.png


I found the background issue, but I don't understand what's going wrong. The workaround is here:
https://gitlab.com/ps2max/Open-PS2-Loader/-/commit/0a6bdb8ac97e36d1d75042855cfc855dd4620800
I've replaced the VU0 code with C code.

It look like the vu0 assembly is somehow different, but I don't see any obvious changes between "__GNUC__ > 3" and "__GNUC__ <= 3". Perhaps disassembling will tell the difference between the 2...

USB seems to work, but there's still issues with rebooting the IOP. This freezes opl, so I've disabled the IOP reboots, but that also means games won't boot.
 
Have you compared the assembled code with the original? The compiler may still rearrange the code (there's no .noreorder directive and it's declared as just "asm" instead of "asm volatile"). I think it could be the compiler reusing some registers, since this asm block is missing the part about clobbering used registers.
 
Have you compared the assembled code with the original? The compiler may still rearrange the code (there's no .noreorder directive and it's declared as just "asm" instead of "asm volatile"). I think it could be the compiler reusing some registers, since this asm block is missing the part about clobbering used registers.
I tested with adding volatile, but it doesn't make a difference. I tried shortly to fill the clobbered list but I couldn't figure out the right syntax for the vu0 registers.

I'll leave this workaround as-is so I can focus on other things.
 
  • Like
Reactions: TnA
Is your GCC aware of the VU registers? If not, then I suppose it shouldn't be a problem. Are you able to write the clobbered register list for just the GPRs? It looks like it's using $2.
Other than explicitly using one of these registers, perhaps we could leave it to the compiler to allocate a register. Maybe that's even better.
For example (from ee_kmode_enter, mask is a temporary variable that is referred to as virtual register %1):
Code:
{
   u32 status, mask;

   __asm__ volatile (
     ".set\tpush\n\t"     \
     ".set\tnoreorder\n\t"     \
     ...
     "li\t%1, 0xffffffe7\n\t"   \
     ...
     ".set\tpop\n\t" : "=r" (status), "=r" (mask));
 
If you can upload the compiled elfs, one with the c code and one with the inline asm I can tell you what is going on.

Or if you have one compiled with a version of gcc where the inline asm works that would be preferred over the c one.

I compiled the function using gcc version 3.2.3 and then decompiled it in ghidra and got the following results.

Code:
void VU0MixVec(VU_VECTOR *a,VU_VECTOR *b,float t,VU_VECTOR *res)
{
  float fVar1;
  float y1;
  float z1;
  float w1;
  float y2;
  float z2;
  float w2;
   
  y1 = a->y;
  z1 = a->z;
  w1 = a->w;
  y2 = b->y;
  z2 = b->z;
  w2 = b->w;
  fVar1 = 1.00000000 - t;
  res->x = a->x * t + b->x * fVar1;
  res->y = y1 * t + y2 * fVar1;
  res->z = z1 * t + z2 * fVar1;
  res->w = w1 * t + w2 * fVar1;
  return;
}
 
Last edited:

Similar threads

Back
Top