PS3 Fault finding YLOD with the SYSCON - First steps and Error reporting

I've found a picture of this board with resistances (only on the back side) and they all seem perfect. I wonder if it is a problem with mounting pressure; I've only tried this trick with CECHG consoles, but comparing clamp screws from CECHC, CECHG, CECHK seem to be slightly different in thread length.

I don't remember seeing 2102 (RSX Error) associated with BGA/Bump defects, but a YLOD after anything to do with changing mounting pressure is a sure sign of BGA/Bump defects. If the pads you used are non-conductive (no shorts) and nothing got knocked off, that only leaves one explanation.

Thinking critically about this, most voltages and whatever are connected to the RSX/CELL through traces that have 2 points of contact. The BGA and the source. A break can occure at either end or in the middle. In any case, the same error would be generated. So, if it's responding to pressure changes the break is at the BGA, not the source or trace.
 
Alright I just did the measurements and I even doubted my own capabilities, but basically I got 1.22V on the CELL, and 1.20 on the RSX, and judging from what you said and from what the schematics says, the CELL should be 1.0V

20211119_121446.jpg

That's where I probed
20211119_122909.jpg

The CELL (ignore the minus I just switched my probes)

20211119_122951.jpg

RSX seems to be in order.

Im pretty sure I did my measurements right, Also keep in mind that I still have 6 caps installed on the other side of the board... What do you think?
In case you were wondering, the CPU routinely measures 1.2-1.3v. So your measurements seem fine.
 
Ok no problem but those units need first thing to be done is delid both, ppl must understand that all ps3 where we can must delid both, start from 2100 to 3000 are a pain. On 3000 only I can delid out of board rsx with shaving blade old type. Is a pain to delid rsx starting from 2100. Cell can be delid easy with grey silicone around. White silicone cells are bounded so tight and most often those will die first if people don't maintain units on right time every 2 years.
White silicone is not for delid models.

So you're suggesting me to delid the CELL and RSX, give them a good clean (maybe reflow them once more without the heat spreader on them) and see if that works? what about the CELL's VDDC line that read 1.2 instead of 1.0, any ideas on that?
 
In case you were wondering, the CPU routinely measures 1.2-1.3v. So your measurements seem fine.

Considering that before I reflowed the chips, the console would occasionally turn on, if it's at least for a few mins, now I'm starting to doubt if the solder balls under the chips didn't actually melt, and I just barely warmed them up, I think I'll give it another try, but this time delided
 
Considering that before I reflowed the chips, the console would occasionally turn on, if it's at least for a few mins, now I'm starting to doubt if the solder balls under the chips didn't actually melt, and I just barely warmed them up, I think I'll give it another try, but this time delided

I think the IHS provides stability to the IC and helps keep it strait during reflow, but that probably doesn't matter if your equipment can heat both sides evenly without a big difference between top/btm sides.

Be sure to use a temp probe to know the reflow temperature and before stopping nudge the chip just enough to see it spring back into place. That will tell you it flowed for sure. But be careful not to knock the chip and merge solder balls. It's easier to accomplish this with something to hold your top heater (depends on your setup). And don't forget to clean the BGA before/After THOROUGHLY!!!

I spray electronics contact cleaner, let it run underneath BGA with the board tilted at an angle until it comes out the other side. Soak up the dissolved flux residues as it comes out the other side, so it doesn't run all over the board and dry on underneath something else. I turn the board 90-degrees and repeat. Then after a full revolution of that, I switch to 99% IPA in a spray bottle. Then compressed electronics duster to blow out all the remaining IPA from underneath the IC's. Then 2 hours at 100C to dry the board (and drive out the water dissolved in the MB from humidity). This prevents popcorning and squeezing bumps out from the underfill. It also preps the pads and BGA as best you can for a reflow. Be sure to apply plenty of Flux to 3 sides of the chip (leave the tokin side free). This allows the liquid to flow under and air to escape so you don't have a bubble prevent the liquid from wicking under.

Beyond that, good luck. The most important part is to minimize the time at reflow temps (above 150C), but it's all for nothing is you don't actually reflow the chip, so don't stop before you are sure the BGA flowed (by nudging it).
 
I suggest reflow with ihs on top, then delid. You may kill ic without right thermal profile. Ihs is best on top before reflow /reball.
I also desolder with ihs on top, then resolder rsx without ihs as 205 isn't dangerous as 225~230. Cell desolder and resolder with ihs, then delid. All work like this for me
 
Alright, thanks a lot for the tip, I'm gonna make sure to do this right, so I'll have to do a shopping round before I start, at this point it's really fine if the console doesn't revive, at least it was worth the try, I'll probably update you on this when I get this done.
Again thank you for the help and support, I appreciate it.
 
I don't remember seeing 2102 (RSX Error) associated with BGA/Bump defects, but a YLOD after anything to do with changing mounting pressure is a sure sign of BGA/Bump defects. If the pads you used are non-conductive (no shorts) and nothing got knocked off, that only leaves one explanation.

Thinking critically about this, most voltages and whatever are connected to the RSX/CELL through traces that have 2 points of contact. The BGA and the source. A break can occure at either end or in the middle. In any case, the same error would be generated. So, if it's responding to pressure changes the break is at the BGA, not the source or trace.

As you mentioned this I thought of a physically cracked die, but that does not appear to be the case with these chips. When I got it back into the case and applied medium to somewhat high pressure with my hand all over the RSX area and monitored syscon output - it always goes from state 0103 to 0303, power failure and YLOD. I also tested the thermopads resistance - multimeter leads 1mm apart from each other and got absolutely nothing. This seems like a have a perfect board to harvest syscon (and ideally RSX) and some time in the future create a 65nm COK-002 board :)
 
Got a fresh CHECHE Console, these is the syscon dump, if im correct, tokin?

>$ ERRLOG GET 00
00000000 A0093004 292AC8FA
>$ ERRLOG GET 01
00000000 A0093004 292AC8F8
>$ ERRLOG GET 02
00000000 A0093004 292AC8F5
>$ ERRLOG GET 03
00000000 A0093004 292AC8F3
>$ ERRLOG GET 04
00000000 A0093004 292AC8F1
>$ ERRLOG GET 05
00000000 A0093004 292AC8EE
>$ ERRLOG GET 06
00000000 A0093004 292AC8EA
>$ ERRLOG GET 07
00000000 A0093004 28934022
>$ ERRLOG GET 08
00000000 A0093004 28933E38
>$ ERRLOG GET 09
00000000 A0093004 28933E32
>$ ERRLOG GET 0A
00000000 A0093004 28933E2B
>$ ERRLOG GET 0B
00000000 A0093004 28933E1A
>$ ERRLOG GET 0C
00000000 A0093004 26E836A5
>$ ERRLOG GET 0D
00000000 A0093004 26E836A3
>$ ERRLOG GET 0E
00000000 A0093004 26E8369F
>$ ERRLOG GET 0F
00000000 A0093004 26E8369B
>$ ERRLOG GET 10
00000000 A0093004 26E8363C
>$ ERRLOG GET 11
00000000 A0093004 26E8361E
>$ ERRLOG GET 12
00000000 A0093004 26E83616
>$ ERRLOG GET 13
00000000 A0093004 26E835FC
>$ ERRLOG GET 14
00000000 A0093004 26E835E9
>$ ERRLOG GET 15
00000000 A0093004 26E8359A
>$ ERRLOG GET 16
00000000 A0901001 1B2686E1
>$ ERRLOG GET 17
00000000 A0801004 1B05C5F5
>$ ERRLOG GET 18
00000000 A0901001 1AE6F12C
>$ ERRLOG GET 19
00000000 A0801004 1AE6A532
>$ ERRLOG GET 1A
00000000 A0801001 1AE6A243
>$ ERRLOG GET 1B
00000000 A0802022 1ADF48F9
>$ ERRLOG GET 1C
00000000 A0802022 1ADE289A
>$ ERRLOG GET 1D
00000000 A0901001 1ADE126A
>$ ERRLOG GET 1E
00000000 A0801004 1ADC8888
 
Got a fresh CHECHE Console, these is the syscon dump, if im correct, tokin?

>$ ERRLOG GET 00
00000000 A0093004 292AC8FA
>$ ERRLOG GET 01
00000000 A0093004 292AC8F8
>$ ERRLOG GET 02
00000000 A0093004 292AC8F5
>$ ERRLOG GET 03
00000000 A0093004 292AC8F3
>$ ERRLOG GET 04
00000000 A0093004 292AC8F1
>$ ERRLOG GET 05
00000000 A0093004 292AC8EE
>$ ERRLOG GET 06
00000000 A0093004 292AC8EA
>$ ERRLOG GET 07
00000000 A0093004 28934022
>$ ERRLOG GET 08
00000000 A0093004 28933E38
>$ ERRLOG GET 09
00000000 A0093004 28933E32
>$ ERRLOG GET 0A
00000000 A0093004 28933E2B
>$ ERRLOG GET 0B
00000000 A0093004 28933E1A
>$ ERRLOG GET 0C
00000000 A0093004 26E836A5
>$ ERRLOG GET 0D
00000000 A0093004 26E836A3
>$ ERRLOG GET 0E
00000000 A0093004 26E8369F
>$ ERRLOG GET 0F
00000000 A0093004 26E8369B
>$ ERRLOG GET 10
00000000 A0093004 26E8363C
>$ ERRLOG GET 11
00000000 A0093004 26E8361E
>$ ERRLOG GET 12
00000000 A0093004 26E83616
>$ ERRLOG GET 13
00000000 A0093004 26E835FC
>$ ERRLOG GET 14
00000000 A0093004 26E835E9
>$ ERRLOG GET 15
00000000 A0093004 26E8359A
>$ ERRLOG GET 16
00000000 A0901001 1B2686E1
>$ ERRLOG GET 17
00000000 A0801004 1B05C5F5
>$ ERRLOG GET 18
00000000 A0901001 1AE6F12C
>$ ERRLOG GET 19
00000000 A0801004 1AE6A532
>$ ERRLOG GET 1A
00000000 A0801001 1AE6A243
>$ ERRLOG GET 1B
00000000 A0802022 1ADF48F9
>$ ERRLOG GET 1C
00000000 A0802022 1ADE289A
>$ ERRLOG GET 1D
00000000 A0901001 1ADE126A
>$ ERRLOG GET 1E
00000000 A0801004 1ADC8888
Possibly. I did generate 3004 when removing tokins, yes. So currently I think this error will occur in the case of a PWR failure on the main core voltage of the GPU. For example, if the filtering capacitors (NEC/TOKIN's) are severely damaged. There are other SMD's in that filter, so it could be related to them as well. But we don't know what else can cause it. Or more specifically what it's measuring to trigger the error.
 
I need some help, I have wired up the serial debug port on my DIA-002 board based PS3. When I run the Sysconreader program (Link: https://github.com/db260179/ps3syscon/tree/master/SysconReader) on windows I can successfully auth the system but when I try and read the codes the output log does not make any sense. I have copied the log below. I am using a Pl-2303 based serial adapter and using a special driver to get it to work under windows (search for PL2303_64bit_Installer.exe if you want to try the driver that I am using). Not sure if this is part of my problems. Also just have the main PS3 PCB and power supply connected to each other and no other things attached (fan, hard drive, disk drive, wi-fi board). The issue with the unit is that it tries to turn on but then goes to a flashing red light with 3 beeps (I can make a video of it if you want). I know it is not overheating as the thermal past has been already replaced and the issue happens within a few seconds of it trying to turn on. Got any ideas what may be wrong?

===================================
ERR 00: 00000000 FFFFFFFF FFFFFFFF
ERR 01: 00000000 FFFFFFFF FFFFFFFF
ERR 02: 00000000 FFFFFFFF FFFFFFFF
ERR 03: 00000000 FFFFFFFF FFFFFFFF
ERR 04: 00000000 FFFFFFFF FFFFFFFF
ERR 05: 00000000 FFFFFFFF FFFFFFFF
ERR 06: 00000000 FFFFFFFF FFFFFFFF
ERR 07: 00000000 FFFFFFFF FFFFFFFF
ERR 08: 00000000 FFFFFFFF FFFFFFFF
ERR 09: 00000000 FFFFFFFF FFFFFFFF
ERR 10: 00000000 FFFFFFFF FFFFFFFF
ERR 11: 00000000 FFFFFFFF FFFFFFFF
ERR 12: 00000000 FFFFFFFF FFFFFFFF
ERR 13: 00000000 FFFFFFFF FFFFFFFF
ERR 14: 00000000 FFFFFFFF FFFFFFFF
ERR 15: 00000000 FFFFFFFF FFFFFFFF
ERR 16: 00000000 FFFFFFFF FFFFFFFF
ERR 17: 00000000 FFFFFFFF FFFFFFFF
ERR 18: 00000000 FFFFFFFF FFFFFFFF
ERR 19: 00000000 FFFFFFFF FFFFFFFF
===================================
 
Possibly. I did generate 3004 when removing tokins, yes. So currently I think this error will occur in the case of a PWR failure on the main core voltage of the GPU. For example, if the filtering capacitors (NEC/TOKIN's) are severely damaged. There are other SMD's in that filter, so it could be related to them as well. But we don't know what else can cause it. Or more specifically what it's measuring to trigger the error.

Replaced the NEC on the bottom by the RSX and this bad boy decided to boot! One out of these 3 i've been working on so far came back to life.
 
The issue with the unit is that it tries to turn on but then goes to a flashing red light with 3 beeps (I can make a video of it if you want). I know it is not overheating as the thermal past has been already replaced and the issue happens within a few seconds of it trying to turn on. Got any ideas what may be wrong?

ERR 00: 00000000 FFFFFFFF FFFFFFFF...
At this point, I would try the script manually instead of using the GUI. You can learn how in my SYSCON tutorial.

I assume you mean it only happens after attempting to boot, not before. I ask, because before we get too far down the "what it could be path", if you attempt to gain internal SYSCON access you have to change a bit in the eeprom to enable access on mullion syscons. If you did this, but did not fix the checksum afterwards, that will cause a checksum mismatch. The console will beep 3 times and flash red. Except the 3 beeps will occur as soon at you flip the pwr on at the back rocker. But it sounds like you mean it's trying to power on, not beeping in standby. Maybe just confirm.

Is there a YLOD? Does it flash yellow once before beeping and flashing red? If so, then it's definately suposed to be registering an errorlog. So if it is, and you're not getting it from the GUI, try manually.

I wouldn't rule out overheating. A console with dried out paste between the die/IHS can overheat in seconds, even with fresh paste between the IHS/HS. But you should be getting a 1200 error if that were the case.
 
@RIP-Felix Thanks for the quick response. I guess I should have been more clear in my message, it gives me the 3 beeps after I press the power button (power light goes green, yellow, flashing red I think this is the YLOD sequence). I have managed to get it into internal mode and even tried another command line based script (Link: https://github.com/mbcrump/PS3) to see if that would make any difference and it did not (still gave me the same FFFFFFFF errors in the command line when I ran the errlog command. I did try the bringup and powerstate commands and got these errors:

> bringup
bringup
[SSM] state: 0000 -> 0101
Bringup Mode #0 (0xFF)
[SSM] ssmCb_OnStartingBePowOn() called.
[SSM] First Boot.
[SSM] Bringup mode : syspm_stat=00000000/00000000
[POWSEQ] PowerSeq_Setup called.
[SSM] state: 0101 -> 0201
[POWSEQ] AV Backend Setup
[SSM] state: 0201 -> 0102
[SSM] state: 0102 -> 0202
[SSM] state: 0202 -> 0103
[SSM] state: 0103 -> 0203
[SSM] ssmCb_BeforeBeOn() called.
[SSM] state: 0203 -> 0104
Psbd_SbTransMode_Half:0x20e7
> powerstate
[SSM] state: 0104 -> 0204
[SSM] state: 0204 -> 0105
[SSM] state: 0105 -> 0400
(PowerOn State)
[SERV NVS] READ CMD

Boot Loader SE Version 2.3.5 (Build ID: 3034,32025, Build Data: 2008-05-12_15:29:27)
Copyright(C) 2007 Sony Computer Entertainment Inc.All Rights Reserved.
[SERV SETCFG] XDR (CH0,CH1) ASSERT
[SERV SETCFG] XDR (CH0,CH1) DEASSERT
[SERV NVS] READ CMD
[ERROR]: 0xb0000004 lv0 authentication fail
[SERV NVS] WRITE CMD
[SERV NVS] WRITE CMD
[SSM] *** FATAL ERROR requested by OS ***
[SSM] state: 0400 -> 0700
[POWSEQ] AV Backend Letup
[SSM] ssmCb_AfterBeOn() called.
[SSM] Shutdown mode : syspm_stat=00000000/00000000
[POWSEQ] PowerSeq_Letup called.
[SSM] state: 0700 -> 0600
(PowerOff State) (Fatal)
powerstate
ATA Power : OFF
PCI Power : OFF
RSX Power : OFF
XDR Power : OFF
Eurus Power : OFF
SB Power : OFF
RSX Thermal Sensor : UNAVAILABLE
BE Thermal Sensor : UNAVAILABLE

This was with no other components connected to the motherboard exept for the power supply and heatsink (no fan, hdd, odd, WiFi). Not sure if that is normal or not. I don't like the authentication fail message. I do have a oscilloscope multimeter soldering iron and a hot air gun so if you need me to try anything don't be afraid to ask.
 
Last edited:

Similar threads

Back
Top