PS3 Fault finding YLOD with the SYSCON - First steps and Error reporting

Attachments

Nice, I made a version which prints everything "pretty".

Thanks for sharing your update! I've just merged your final changes with the psl1ght version, and added the "save log to USB" code.
If the user has a usb drive connected to the PS3, when they run the .self, the'll get the output log on /dev_usb00x/sm_error.log file.

(also, some debugging info is sent on UDP port 9999)

Github: https://github.com/bucanero/psl1ghtv2_ports/tree/master/sm_error_log
 
Im thinking that you could add another syscall for identification purposes
sys_ss_get_console_id... to get the IDPS... but only print bytes 7 and 8 of it (to get the Product sub-code)
This identifyer of the product sub-code is important but is unkown in superslims :/

And while you are at it i guess a timestamp with the sys_time_get_current_time could be handy, it can be used as the suffix of the log filename

---------
Edit:
There is a typo in this line (a missing "r" in the word "Fimware"). And i think it will look better this way adding the words ROM and RAM (the goal is to indicate what means the last value)
Code:
fprintf(fp, "Syscon Firmware Version: %04lX.%016lX(ROM) %016lX(RAM)\n", soft_id, patch_id_rom, patch_id_ram);

And i would add the word "build" in the line where it prints the system firmware version
Code:
printf("Firmware Version: %01x.%02x (build: %d)\n", hwinfo.firmware_version_high, hwinfo.firmware_version_low >> 4, hwinfo.firmware_build);
 
Last edited:
After few days of testing that cok002 I've ported 40 nm with modchip , it was not good enough because it could not run games, it was fine on XMB left it many hours. After few months left on shelf, when I've want to run few tests it did not work anymore. Took UART diagnostic to get errors, only few 3034 and some 4002 think left from test when modchip was not installed correctly, and one 1002. Powered few more times, only 1002 took place at UART so I've changed nec's.
Now all running fine in game cpu is at 65 rsx 50.
In first place though is defective cpu and running games will ylod in the moment starting game with only 3034 and 4002, nothing related 1002. I was about to change if I see it in first place.
At least I have a unit for test, and if I ever touch a phat unit or were I see nec caps from now I will exchange, even if board is starting and working, when cpu /rsx are out because is safe like that. They are really a pain.
This case convinced to do it.
4003af4675d6c3a4d164bc9979178258.jpg
cabe5f5ca92a98a00d8c2bf8693c5242.jpg
 
After few days of testing that cok002 I've ported 40 nm with modchip , it was not good enough because it could not run games, it was fine on XMB left it many hours. After few months left on shelf, when I've want to run few tests it did not work anymore. Took UART diagnostic to get errors, only few 3034 and some 4002 think left from test when modchip was not installed correctly, and one 1002. Powered few more times, only 1002 took place at UART so I've changed nec's.
Now all running fine in game cpu is at 65 rsx 50.
In first place though is defective cpu and running games will ylod in the moment starting game with only 3034 and 4002, nothing related 1002. I was about to change if I see it in first place.
At least I have a unit for test, and if I ever touch a phat unit or were I see nec caps from now I will exchange, even if board is starting and working, when cpu /rsx are out because is safe like that. They are really a pain.
This case convinced to do it.
4003af4675d6c3a4d164bc9979178258.jpg
cabe5f5ca92a98a00d8c2bf8693c5242.jpg
Can I ask why did you think your CPU was bad?
Because in reality it wasn't bad yes?
Congratulations.
You had the famous "shutdown under load" problem that is associated with error 80 1002.

Also, because you have 40nm RSX in a board built for 90nm, it's hard to believe that RSX power delivery was causing the problem.
More probably CPU power fail.
But still the error is 1002. (Not 1001)

Did you measure the voltage coming to the RSX?
Also do you remember if the shutdown also was happening in ps2 games?
Because in ps2 mode, the RSX is idle (less work than in XMB) but the CPU is working hard. This can help see if the power problem is with RSX or CPU.

If XMB and ps2 game is OK, but ps3 game shutdown, then is most likely RSX.
If XMB is ok but ps2 game shutdown too, then it's sign it's the CPU having the problem.

In any case it is not necessarily good idea to be brute and replace everything.

Cheers
 
Can I ask why did you think your CPU was bad?
Because in reality it wasn't bad yes?
Congratulations.
You had the famous "shutdown under load" problem that is associated with error 80 1002.

Also, because you have 40nm RSX in a board built for 90nm, it's hard to believe that RSX power delivery was causing the problem.
More probably CPU power fail.
But still the error is 1002. (Not 1001)

Did you measure the voltage coming to the RSX?
Also do you remember if the shutdown also was happening in ps2 games?
Because in ps2 mode, the RSX is idle (less work than in XMB) but the CPU is working hard. This can help see if the power problem is with RSX or CPU.

If XMB and ps2 game is OK, but ps3 game shutdown, then is most likely RSX.
If XMB is ok but ps2 game shutdown too, then it's sign it's the CPU having the problem.

In any case it is not necessarily good idea to be brute and replace everything.

Cheers
This board was reballed twice before era of UART about 2 years ago. It was acting same with 2 90nm rsx. When started with UART I did not had any 1002 and thought is defective cpu still 2,7 ohms. At the time I seen 1.18v so nearly to good enough. All this time though it was cpu after all rsx swap and twice reball cpu. I'm to confident about doing reball, not sure if an ic is still at his good potential. I see strange values, had one slim 2500 with 1.6 ohms, is working fine. A common 3 ohms can not be very specific.
 
This board was reballed twice before era of UART about 2 years ago. It was acting same with 2 90nm rsx. When started with UART I did not had any 1002 and thought is defective cpu still 2,7 ohms. At the time I seen 1.18v so nearly to good enough. All this time though it was cpu after all rsx swap and twice reball cpu. I'm to confident about doing reball, not sure if an ic is still at his good potential. I see strange values, had one slim 2500 with 1.6 ohms, is working fine. A common 3 ohms can not be very specific.
Yes... It's always difficult to know if a chip is good or bad inside. The resistance can only tell you if there's a short inside or not. Not much more.
But now you can be happy then.

I was just curious about the voltage coming to the new RSX.
You didn't do any more modifications to the voltage regulators or anything yes?
So the voltage is still about 1.2 as if it were a 90nm?
 
I didn't modify that, was just curious about gaming, I will reply later with voltage. Have another slim sur board with glod after reball errors 3034 4402 and no AV/Hdmi image, can hear beeps for recovery, I'm going to exchange both, not sure what ic is not working without error, cleared old errors to get new, nothing, all ff.
Those 3034 and 4402 were old errors, now only ff and can boot to recovery from what I hear. No random shutdown or anything unusual. I don't have analog multimeter to test CLK.
 
Try the command "hdmi help", it displays a list of other hdmi commands you can use
And use some of the hdmi commands from that list intended to dsiplay info from the hdmi chip. You can ask for the ID of the hdmi chip, his status, etc...
If the hdmi is dead i guess all that commands are going to return errors
 
for document here: a friend bring me a ps3 cok-002 from gamestop with label. "corrupt harddisk" but.........
- 1 slow loading menu, no load
- 2 hd reemplaced, almost load but slow, controllers slow sync by blueetooh, after some minutes freeze, restart garbled pixelated image
- 3 dissamble (internal plastic part broken have to break screw) syscon mod and read errlog:
upload_2021-6-4_17-2-37.png

-a0901001=BE VRAM power Fail (possible nec reemplacement adviced 1post), so reemplaced:
>1 4x470uf BE Side test *random aftermentioned problems
>14x470uf RSX Side test *random aftermentioned problems
> change PSU for APS 226 (it have powerhungry ZSRXXXXX) test
> last upgrade firmware to 4.87 test
stable boot now but only after some seconds or 1minute:
upload_2021-6-4_17-1-29.png

Becount show 302days (pretty used); someone with experience in this error? rsx almost dead? reballing required?
also did maintenance, RSX delid, thermal paste, clean motherboard, check BE = 63-73 - RSX = 50 temps in syscon and syscon dont show any other error. thanks.
 
for document here: a friend bring me a ps3 cok-002 from gamestop with label. "corrupt harddisk" but.........
- 1 slow loading menu, no load
- 2 hd reemplaced, almost load but slow, controllers slow sync by blueetooh, after some minutes freeze, restart garbled pixelated image
- 3 dissamble (internal plastic part broken have to break screw) syscon mod and read errlog:
View attachment 33553
-a0901001=BE VRAM power Fail (possible nec reemplacement adviced 1post), so reemplaced:
>1 4x470uf BE Side test *random aftermentioned problems
>14x470uf RSX Side test *random aftermentioned problems
> change PSU for APS 226 (it have powerhungry ZSRXXXXX) test
> last upgrade firmware to 4.87 test
stable boot now but only after some seconds or 1minute:
View attachment 33552
Becount show 302days (pretty used); someone with experience in this error? rsx almost dead? reballing required?
also did maintenance, RSX delid, thermal paste, clean motherboard, check BE = 63-73 - RSX = 50 temps in syscon and syscon dont show any other error. thanks.
I've seen that kind of RSX problem...
Yes, even interfering with bluetooth controller input, low framerate and eventually those greenish "matrix style" artifacts. Unfortunately nothing to do with capacitors as you noticed.

Edit: Btw In my case, the machine was supposedly damaged in shipping. If the guy was telling the truth and was working fine before he shipped, then I guess there's a very high probability of a reball fixing it. And I always believed him. But they say I'm a naive guy hehehe. (The damage was very obvious at least externally, but the thing was otherwise pristine, sealed and barely used)

Sadly you are yet another person that was misled by the first post here about error 1001. You can't assume you have a capacitor problem just by seeing error 1001 in the log. You need to catch it with bringup, watch it happen ideally in the form of a "shutdown under heavy load YLOD".

Otherwise it's pretty meaningless to see it in the error log.
I've asked @db260179 to edit this way too many times already. I guess he's been waiting for more people to come like you.
Even "true" things in theory can be misleading in practice.
 
Last edited:
I've seen that kind of RSX problem...
Yes, even interfering with bluetooth controller input, low framerate and eventually those greenish "matrix style" artifacts. Unfortunately nothing to do with capacitors as you noticed.

Edit: Btw In my case, the machine was supposedly damaged in shipping. If the guy was telling the truth and was working fine before he shipped, then I guess there's a very high probability of a reball fixing it. And I always believed him. But they say I'm a naive guy hehehe. (The damage was very obvious at least externally, but the thing was otherwise pristine, sealed and barely used)

Sadly you are yet another person that was misled by the first post here about error 1001. You can't assume you have a capacitor problem just by seeing error 1001 in the log. You need to catch it with bringup, watch it happen ideally in the form of a "shutdown under heavy load YLOD".

Otherwise it's pretty meaningless to see it in the error log.
I've asked @db260179 to edit this way too many times already. I guess he's been waiting for more people to come like you.
Even "true" things in theory can be misleading in practice.

The power on states stages give an indication on how bad a component has gone bad.

Ironically i have recieved a DIA-002 board with the following:

ofst[ 84]:err_code:0xa0801002, clock:0x26d6f228 2020/08/24 21:48:24
ofst[ 88]:err_code:0xa0801001, clock:0xffffffff
ofst[ 92]:err_code:0xa0801002, clock:0xffffffff
ofst[ 96]:err_code:0xa0801002, clock:0xffffffff
ofst[100]:err_code:0xa0801002, clock:0xffffffff
ofst[104]:err_code:0xa0801002, clock:0xffffffff
ofst[108]:err_code:0xa0002120, clock:0xffffffff

$ lasterrlog
lasterrlog
Last Error Code:0xa0002120, Time:0xffffffff
[mullion]$

This is indicating i have two issues (possibly) - Faulty hdmi decoder or power components related? (power ic's, nec tokins...)

I believe you correct in saying that 1001 is a symptom and not a cause (related to the CELL power on process) 1002 is the RSX code.

I'm going to reorganise my error info and just label as possible issue for these codes.

I'll update on my findings and how i diagnosed the fault.
 
The power on states stages give an indication on how bad a component has gone bad.

Ironically i have recieved a DIA-002 board with the following:

ofst[ 84]:err_code:0xa0801002, clock:0x26d6f228 2020/08/24 21:48:24
ofst[ 88]:err_code:0xa0801001, clock:0xffffffff
ofst[ 92]:err_code:0xa0801002, clock:0xffffffff
ofst[ 96]:err_code:0xa0801002, clock:0xffffffff
ofst[100]:err_code:0xa0801002, clock:0xffffffff
ofst[104]:err_code:0xa0801002, clock:0xffffffff
ofst[108]:err_code:0xa0002120, clock:0xffffffff

$ lasterrlog
lasterrlog
Last Error Code:0xa0002120, Time:0xffffffff
[mullion]$

This is indicating i have two issues (possibly) - Faulty hdmi decoder or power components related? (power ic's, nec tokins...)

I believe you correct in saying that 1001 is a symptom and not a cause (related to the CELL power on process) 1002 is the RSX code.

I'm going to reorganise my error info and just label as possible issue for these codes.

I'll update on my findings and how i diagnosed the fault.
Yeah... Actually the same could be said of many errors. Sometimes can be a side effect of an unrelated problem. Sometimes it's not even a real problem.
1001 is just the worst offender, because it's so commonly appearing as a non-error. But diligent troubleshooting is always required. Simply bringup can already go a long way.

I think even this 2120 error you have can be a non-error. This time you may have real RSX power fail... And this power fail can cause the RSX to not complete the handshake with the HDMI IC or whatever. Resulting in error 2120 as a side effect.

Probably if you solve the RSX power fail, the 2120 will go away too.
(Maybe this 2120 will not even appear if you simply unplug HDMI cable)

Of course this is all assuming that you actually have a RSX power fail. This is why bringup can be trusted more than the simple log.

Check your voltages coming to the RSX too. I have a board with abnormal RSX voltage and similar errors. But in my case I'm pretty sure it's a problem with the RSX itself. Not necessarily the power circuit.

Your board does look more like a simple power delivery problem.
 
I checked my soldering of my tx, rx and ground points and they seem to be fine, but when I try to enter "AUTH" it keeps saying "Auth1 response invalid". Anyone got any ideas?
 
I checked my soldering of my tx, rx and ground points and they seem to be fine, but when I try to enter "AUTH" it keeps saying "Auth1 response invalid". Anyone got any ideas?
Did you try swapping rx and tx around?
It's the first thing to try. Some adapters are labeled differently, but normally the Rx (Receive) of the syscon should connect to the Tx (Transmit) of the adapter and so on. It's not dangerous to reverse them since it's just a signal.

Also the syscon needs to be in working order and getting power. Standby power from PSU.
Remember also to use the appropriate CXR or SW mode depending on your syscon.
Lastly I hope your adapter is 3.3v.

 
Yeah... Actually the same could be said of many errors. Sometimes can be a side effect of an unrelated problem. Sometimes it's not even a real problem.
1001 is just the worst offender, because it's so commonly appearing as a non-error. But diligent troubleshooting is always required. Simply bringup can already go a long way.

I think even this 2120 error you have can be a non-error. This time you may have real RSX power fail... And this power fail can cause the RSX to not complete the handshake with the HDMI IC or whatever. Resulting in error 2120 as a side effect.

Probably if you solve the RSX power fail, the 2120 will go away too.
(Maybe this 2120 will not even appear if you simply unplug HDMI cable)

Of course this is all assuming that you actually have a RSX power fail. This is why bringup can be trusted more than the simple log.

Check your voltages coming to the RSX too. I have a board with abnormal RSX voltage and similar errors. But in my case I'm pretty sure it's a problem with the RSX itself. Not necessarily the power circuit.

Your board does look more like a simple power delivery problem.

Using the hdmi commands, i've established the hdmi decoder is working and picking up edid reading.

Going down the voltage readings are all checking out, im currently looking at the sem-001 schematics, and a working dia-002 board.

IC2501 regulator on pin 6 hdmi int is not getting anything? mmmm
 
Last edited:
Im thinking that you could add another syscall for identification purposes
sys_ss_get_console_id... to get the IDPS... but only print bytes 7 and 8 of it (to get the Product sub-code)
This identifyer of the product sub-code is important but is unkown in superslims :/
---------
Edit:
There is a typo in this line (a missing "r" in the word "Fimware"). And i think it will look better this way adding the words ROM and RAM (the goal is to indicate what means the last value)
Code:
fprintf(fp, "Syscon Firmware Version: %04lX.%016lX(ROM) %016lX(RAM)\n", soft_id, patch_id_rom, patch_id_ram);

And i would add the word "build" in the line where it prints the system firmware version
Code:
printf("Firmware Version: %01x.%02x (build: %d)\n", hwinfo.firmware_version_high, hwinfo.firmware_version_low >> 4, hwinfo.firmware_build);
Here you go ;)
As you can see scversion doesn't return the full Patch ID, but in combination with the Soft ID it's enough to identify it.
 

Attachments

Similar threads

Back
Top