PS2 Julian's various PS2 projects (Worklog)

It looks like SDL2 has simplified the implementation required for each API backend of SDL_Renderer.

I'd like to add PS2 support to SDL2, so that might be another thing I may look into when I'm less busy after this month.

So basically, some things I would like to finish up after this month:
* Arcade module RE/reimplementation
* Fix libmpeg2 for new toolchain
* Implement DVRP interface changes/additions for the newer version of the system software
* CDVDFSV/CDVDMAN reimplementation
* Fix SMS for new toolchain
* Fix external flash ROM driver
* Multi threaded PSF2 player
* POSIX/SDL2 port of libsd
* udpbd write support
* Move PCSX2 MSBuild build system to CMake

Probably won't finish all of them in a month, but we'll see...

Another thing... I'll be exhibiting at Virtual Market 2021. Please see the event website for details. https://winter2021.vket.com/
 
Last edited:
Looks like ARM is adding instructions for memset/memcpy/memmove:
https://community.arm.com/arm-commu.../arm-a-profile-architecture-developments-2021


Also, WebAssembly has instructions for those series of functions also:
https://github.com/WebAssembly/bulk.../proposals/bulk-memory-operations/Overview.md

> I've been looking at perf profiles for wasm unity benchmark a bit recently and see that some of the hottest functions are doing memcpy or memset like things. If this is any indication of normal wasm code patterns, I think we could see significant improvement with an intrinsic so it may be worth prioritizing.

I've been thinking about seeing if operations like memcpy/memmove and memset could be offloaded from the EE or IOP, maybe using the DMA controller?

For IOP, looks like the "OTC" channel 6 of DMAC can be used to copy from memory to memory.
For EE, it doesn't look like there isn't a channel that could be used to copy from memory to memory, so probably need to use a temporary buffer. Maybe the SPR channels like 8 and 9 would be best for this.

For overlapping memmove forwards, probably want to set the address to the memory block to the end and set CHCR for negative step.
For memset, probably need to use CPU to fill in the initial value, then just use chain tag to increase that further.

Probably not going to be that much useful for that much complexity because CPU cannot access main memory at same time as DMAC...
 
I've thinking a bit about how to make usage of udpbd a bit easier.
At the moment, you need to build a disk image every time.
However, qemu-nbd and nbdkit can expose a FAT32 disk image over NBD from a folder using VVFAT or floppy plugin, so you do not need to manually repack into a FAT32 image as VVFAT can do it on the fly.

Also, this would provide a potential gateway to usage of SMBv3.
udpbd client on PS2 -> udpbd server thunking to nbd client -> qemu-nbd or nbdkit -> VVFAT or floppy plugin -> SMBv3 share


Also, thinking along those lines, I wonder if PS2 could also do 1080i video streaming. Something like Moonlight but for PS2.
You could have a nice GUI shown on the PS2 without needing to deal with memory restrictions or the specific APIs used, so you can use SDL, Qt, GTK, WinForms, etc. which are familiar APIs to draw the interface.
Maybe you could even stream PS1 games in 240p, so you can enjoy the image quality on your CRT without the bugs or restrictions of the current emulation solutions.
 
Darko replied back with the source code of GPU Recorder v2.0. I uploaded it to Internet Archive:

https://archive.org/download/gpurecorder20src/Recorder20.zip

---

I'm slowly finishing up some other stuff before doing anything new. The stuff that requires debugging will be low priority until I have some more time, probably in March or June.

* CDVDFSV/CDVDMAN reimplementation
* Arcade module RE/reimplementation

* FMCB Installer signing isolation
* Fix SMS for new toolchain
* Fix libmpeg2 for new toolchain
* udpbd write support
* Fix external flash ROM driver
* Multi threaded PSF2 player
* POSIX/SDL2 port of libsd
 
So I've been thinking about how spuRecorder saves SPU2 log data to a file, and then that data could be replayed in another program.
This might actually fit in well with my multi threaded PSF2 player. One threads generates the data while another thread consumes the data.
Also, potentially, for minimal purposes I could have e.g. a PSF2 player just generate the log data instead of directly outputting PCM data, then have another program that reads the log data and outputs to PCM data.
I could also have that other program possibly interface to libsd -> SPU2 emulation -> PCM data, and then replace the SPU2 emulation with real SPU2.

One of my old projects is a SPC (SNES audio format) to IT (Impulse Tracker) converter (spc2it). It would log as it is playing into the IT file. However the limitation with this approach is that it could not analyze the sequence data ahead of time, so it sequence data would end up too fast which meant that the sequence data was mostly useless. However most people have been using it as a tool to extract the samples used in SNES games.
I'm thinking about the SPC player, instead of converting to IT, it would convert to the SPU2 log data. Then it would fit nicely into the SPU2 player infrastructure mentioned above.
 
I'm thinking about simplifying the MMI patch to get rid of the SA register / funnel shift and hi/lo register related code generation (at least for now). After that, I'll probably attempt to get it into ps2toolchain (along with the .iopmod section insertion by linker script) once I have some time to debug.

Once GCC MMI patch is implemented, you would be able to use builtin intrinsics and auto vectorization, reducing the need for inline assembly and making code more readable. Also, code in inline asm cannot be optimized further automatically by the compiler but intrinsics and autovectorized code can be optimized, like turning constant divide into multiply and bitshift or turning summation loop into its formula.
 
Another idea… Video streaming using UDP and IPU (and possibly some additional compression).
Maybe a Moonlight proxy so you could use an existing streaming setup.

Could very likely be low latency so you could play at 240p like native without the restriction of the PS2's memory or speed.

When I finish fixing up libmpeg and udpbd I'll see how feasible it is.
 
THX for mentioning it here!

IMO 1024x1024x30Hz is probably possible without much lag, but 1920x540@60Hz+y-shifted even/odd (1080i) would probably require some optimizations!

I think it would be possible to retrieve it via network, draw the frame on the EE side and use the multi-passing of HD-Modes to transfer it to the GS.

IMO this project can actually be realized in a fast way! The VLC and transfer-stuff is probably going to take more time than the drawing and showing stuff on screen.

Pad-commands would probably also not be that big of a problem.
It's really the continuous stream of audio and video which would require some work I think...
 
For the video streaming using UDP and IPU, I plan to call it "UDPIPU".

Basically how it would work is
UDPIPU client on PS2 -> UDPIPU server proxy running Moonlight -> Nvidia GameStream server

But no code has been written at this point. Still need to fix up libmpeg first

----

I'm thinking about making some improvements to the STABS debug info script. I think I can insert local variable into registers, but it would need to generate a Python script since the decompiler API is only accessible through C and Python. Also I think this means that it can't be automated since it relies on UI functions, but I would need to investigate it more.
 
It would be interesting if it could work without the middleman, but even with it that's some "viral worthy stuff" like FDVDB became!
 
I partially rewrote libmpeg to use less inline assembly.
Now what is left is to convert the vector instructions to use intrinsics whenever the GCC MMI patch has been implemented.
After that, I will debug it.
I may also replace the funnel shift with something else since managing the "sa" register has quite invasive changes, so I'd like to avoid doing that.

x86 has a similar instruction to funnel shift, which would be "double precision shift" (e.g. SHRD/SHLD).
It appears that funnel shift wasn't implemented for some reason in MMX/SSE/AVX etc.
Related reading: https://stackoverflow.com/questions/39276634/simd-versions-of-shld-shrd-instructions

I wonder why they bothered to implement the "sa" register in the r5900 instruction set just for the funnel shift related instructions...

---

I partially wrote UDPBD write support, but nothing debugged yet.
To make it a bit more reliable I'm thinking of adding a acknowledgement after each fragment write. It won't be as fast as reads but at least it is more reliable.

---

Once I finish the local variable type setter from .stab section debug info, I think I'll be able to finish reversing various things.
 
I somewhat figured out how to set local variable names in the decompiler from Python. It doesn't look like it exposes the names from the API just like in the UI, so I did some workarounds to calculate it...
Also, it appears that the interface in the UI for changing the type is not very user friendly from the Python side... First, you need to use an API to get the type, then you find the variable, then you call the change method on the variable with that type...

But here's the kicker... The API doesn't appear to work correctly on primitive types. If you call it with "int", you get the object for "const long double" back. ???

I'll probably just end up hardcoding the primitive types...

---

There are multiple modules that mostly have the same code but differ in some places. I wanted to see if there was a tool that could diff more than 2 sets of source code and add preprocessor statements automatically (#if #elif #endif etc) or see if I could implement one myself.

Well, there is a patent on this by Lucent Technologies: US 20060225040

There is also a OSS program to do this: https://github.com/Quuxplusone/difdef
 
Last edited:

Similar threads

Back
Top