PS3 Frankenstein PHAT PS3: CECHA with 40nm RSX

Octal450 · Sep 13, 2022

Agreed, unless there is tangible proof that it is a BGA issue which can be corrected.

Kind Regards,
Josh

RIP-Felix · Sep 13, 2022

Pacorretaco said:
...
Most notably internal shorts have been found (the opposite of "broken bumps") that open again with some heat. Often in the data lines, between each other or to ground.
Electromigration is another thing that affect all chips in the world, but I am not too sure how this can behave with heat.

The silicon die is a sandwich of microscopic traces separated between one another with a dielectric (electrical insulator). The force of high current passing through these traces over time wears through the dielectric layers and causes an internal short. Or it can wear away at the internal trace itself and cause an open line. Pretty standard fare actually.

Here's an overview...

This one is detailed, but more advanced...

So that's the internal short/open fault side of the equation. If a chip gets that far along, it basically wore out from use as expected. One thing to keep in mind is that electromigration began to become an problem around the 90nm node size, perhaps a bit earlier. Strategies to mitigate it were being developed, but may have been in the early stages - not standard/effective.

The VRAM could possbly be affected by the heat as well, or could be something related to moisture, who knows...

If VRAM shorted is would cause a different error. And if it died, it would cause a GLOD. So it's not very relevent to the 3034 errors we look for when diagnosing suspecting bumps.

In my experiance the pressure test usually doesn't do anything. But heat does. Someone I spoke with on Discord recently had a 3034 today and tried pressure test. It did nothing. Then He tried some hot air at 90C and it began working. It's just one console, but if you take a console at random that is experiancing a YLOD, chances are it's got a 3034. And the fact his didn't respo d to the pressure test could mean it's not rare at all, perhaps even likly it wouldn't. If I were to wager, I would estimate the pressure test does nothing in 60-80% of 3034s.

I like Josh's explanation, because it gives us a simple test that gives you a qualitative way to know whether there is a an BGA defect tgat a reball is likely to fix. And if it doesn't work, then the possability it's dead and needs replaced is higher. So even though I agree it's not 100%, It's an easy way to make a descision about the best way to repair the console on the first try.

You could reball and see, worst case it fails and needs replaced then. But if you can save yourself (and customer) the inconvenience, it's a useful test.

squeept · Sep 15, 2022

1.) I stopped doing any pressure tests awhile ago because I found it to be inconclusive across hundreds of consoles. (obligatory "HEAT TEST IS MEANINGLESS!")

2.) I have some diagnostic spreadsheets of possible failures that I haven't shown on the syscon thread yet because I was either waiting on a new CPU stencil or threw them to the back of the queue because I don't like wasting time replacing an RSX multiple times to check if I killed chips. Life is hectic these days, so if anyone is doing data collection from those posts, it's gonna be an ecological fallacy for awhile so... be aware.

3.) I'll need to add model numbers for RSX to my sheets and get another data point or three, but it seems there is at least one model of RSX that the 40nm syscon changes on one line are in the wrong category (IHS or no IHS). I am slightly dyslexic and I love beer, so I'll need to verify that once or twice more. I'll definitely be uploading a new spreadsheet if anyone uses it. I want to now track exact part numbers on RSX swaps, and I'm starting to think the combined arms offense with bittraining errors recorded might be useful information.

Pacorretaco · Sep 15, 2022

RIP-Felix said:
The silicon die is a sandwich of microscopic traces separated between one another with a dielectric (electrical insulator). The force of high current passing through these traces over time wears through the dielectric layers and causes an internal short. Or it can wear away at the internal trace itself and cause an open line. Pretty standard fare actually.
Here's an overview...

The thing with electromigration is that is a general phenomenon that affect all the chips in the world. And as the second video explicitly says, the problem becomes worse at smaller manufacturing nodes.
So it may be a good explanation but we still have the mystery of why it would be in the 90nm RSX specifically. By that rule the 65nm, 40nm and 28nm should be more and more affected, not less... Similar question would be valid about the CPUs as well...
Also is not that clear how it would react to heat or pressure. Who knows, maybe it also could, or maybe not, if it had more to do with current. But is mostly speculation... All these things are just difficult to know.

RIP-Felix said:
In my experiance the pressure test usually doesn't do anything.

And yeah thats the problem with the pressure... It doesnt always work, and when it doesnt work is inconclusive...
However it does work sometimes, thats the thing. I'll admit that most of the times where Ive seen it work, was more commonly a case of GLOD/artifacting. But I have also seen it in cases of 3034 as well. Some, the test is gentle enough that it doesnt look like a test at all... Is just disassembly and reassembly and things like that... Which could be done inadvertently by many people.

It happened to me before, and many other people. I believe you also mentioned it the other day.

RIP-Felix said:
I just went over my spreadsheet of SYSCON errorlogs and found 7 consoles that had 3034 in which people applied a pressure test to them, as Josh did above. 4 of them booted. 3 of them did not respond at all.

Then theres the fact that not many people pay attention to this at all, or do any similar kind of tests

squeept said:
1.) I stopped doing any pressure tests awhile ago because I found it to be inconclusive across hundreds of consoles. (obligatory "HEAT TEST IS MEANINGLESS!")

Anything you say is interesting because not many people have the experience and long track record that you do.
I agree about the heat test by itself being inconclusive because it affect too many things at once.
And the "pressure tests" dont always work or dont always say anything useful. But sometimes they do?
I found an old video of yours. The example is exaggerated but is still great:

Maybe this was old days before the frankenstein but is still very interesting to me, the stories from the early days.
So back then you just reballed by default, right? Regarldess of pressure tests or did you have your own kind of tests or something?

squeept · Sep 15, 2022

@Pacorretaco yep, just like the heat test - I eventually concluded (just my personal guess on this one) that it could be any type of mechanical reconnection, so it didn't really help narrow things down to the BGA specifically. I remember taking that video, and it was meant more as a dig towards people that were telling me it was ridiculous that mechanical connections could be impacted so easily.

And yes, until a few years ago I would just reball the RSX as part of the diagnosis instead of using it as a fix when properly narrowed down. I didn't have any unique information or testing that I was acting on, I was just willing to put in the work because they're expensive systems.

My criteria for reballing versus swapping in a new chip changes frequently these days as I get more information under my belt from my diagnostic sheets, but basically I'll only reball the original 90nm back if the chip ohms look really good AND I see clearly oxidized or damaged pads.

We've come a long way, baby.

sandungas · Sep 15, 2022

I just readed the talk of the last 10 pages of the thread and i dont see any mention to an important detail related with the bumps

If we ignore the 4 "dummy" components at the corners we could say the whole RSX is "floating" on top of the BGAs... but the DIE is not "floating" of top of the bumps because it have a resine all around

One of the functions of that resine is to preserve the distance in between the DIE bottom surface and the substrate/interposer top surface (prevents the bumps to be squeezed because the resine is solid)
Lets say... the resine is doing a solid bounding in between 3 materials, fiberglass, silicon, and the resine itself
The expansion/comptraction characteristiques of that resine should match with the others... but i guess everyone of them have his own characteristiques... so the resine should be something intermediate

Anyway... the point is that resine is not sentitive to pressure, specially because the DIE is made of silicon. It have an atomic structure close to glass/metal, so it should have very low flexibility, is the kind of material that is going to "crack" instead of bending
Thats probably why the overheat tests could help to diagnose the bumpgate but pressure test does nothing to the bumps

Victor Hugo Alvarez · Sep 15, 2022

sandungas said:
I just readed the talk of the last 10 pages of the thread and i dont see any mention to an important detail related with the bumps

If we ignore the 4 "dummy" components at the corners we could say the whole RSX is "floating" on top of the BGAs... but the DIE is not "floating" of top of the bumps because it have a resine all around

One of the functions of that resine is to preserve the distance in between the DIE bottom surface and the substrate/interposer top surface (prevents the bumps to be squeezed because the resine is solid)
Lets say... the resine is doing a solid bounding in between 3 materials, fiberglass, silicon, and the resine itself
The expansion/comptraction characteristiques of that resine should match with the others... but i guess everyone of them have his own characteristiques... so the resine should be something intermediate

Anyway... the point is that resine is not sentitive to pressure, specially because the DIE is made of silicon. It have an atomic structure close to glass/metal, so it should have very low flexibility, is the kind of material that is going to "crack" instead of bending
Thats probably why the overheat tests could help to diagnose the bumpgate but pressure test does nothing to the bumps

Thats exactly what I said to Felix but nobody listen to What I Said, I just wrotte the Glue on Silicon is Titanium Solid so I dont think slight pressure or pressure could "fix" the bumps.

Well at least @Pacorretaco listen to me.

Octal450 · Sep 16, 2022

I would say I agree mostly.

For sure the pressure test is not 100%, but when identifying a failed BGA it is fairly good at providing a positive result (that BGA is failed), but not very reliable at providing a negative result (BGA is ok).

So unless I get a failed BGA result, I just replace the chip.

Reballing is not a good form of diagnosis unless you get bad resistances and such, because the heat from reballing can temporarily make it work again. I had one of my 10 testing systems work for a year after reball before the code returned.

I will say, performing the pressure test on a PS3 "accurately" (note the quotes!) is very hard, because of the cooler design. It's much easier and more reliable and X360 and laptop/GPU card devices. Perhaps I will take some old PC heatsinks to do it and some brace behind the chips, but still it probably is not as reliable...

It's a can worms. But I maintain, that when in doubt - replace the chip and provide the customer a lasting fix.

Kind Regards,
Josh

DeadEnd · Sep 16, 2022

truemaster said:
ok guys looks like we have 2 different opinions about the 90nm gpu my self felix and dead end strongly believes that the bga is for replace. but the otherside of people that believe the reball is a permant fix they have the right to believe so its not a crime. but heres a neutral and logical answer if yours ps3 has a 3034 (gpu error) what would you do? reball the chip? or change it with a working one? and since there is a way to solder back a newer made gpu that produce less heat, cosumes less power and the phat heatsing can keep it twice as cooler compared to slim why not change it with 40 nm? for a peace of mind? reball the same gpu can be a two side blade, and replacement with a known working one or newer one would be my choice and that not only for ps3 but on anything with bga issues as long the said bga is not married to motherboard as the cpu of ps3-xbox 360. reaaly now continue to argue is pointless lets accept that we have different opinions and be done with it

Don't waste your time, some of the people here are impossible to reason with.

Pacorretaco said:
But that "peace of mind" may be due to fear of something that is not that well understood...
Peace of mind didnt which work so well for you in particular, when your board failed shortly after anyway... After the the 40nm RSX replacement from Deadend.

What a ridiculous example and attempt to show me in bad light. The issue I faced had nothing to do with the 40nm. I have admitted multiple times I took on something that was harder than I had expected without professional equipment. it may come to you as a surprise, but reballing/swapping is not a simple procedure. But hey, you would know all about that. You are a reballer after all, or are you?

Truemaster's board developed a temperature sensor error and possibly could have been corrected by a new sensor, but luckily Victor helped out and went even further to repair it. In any case, there are multiple examples of boards working fine after the swap, which in no way discredits the 40nm's performance. But it matters very little because you are cheater and it is just yet another excuse for you to propel your points. Just like you cheated when you provided screenshots of 90nm running unbelievably cool only to find out you drilled holes into the case.

Pacorretaco · Sep 16, 2022

DeadEnd said:
Don't waste your time, some of the people here are impossible to reason with.

I hope not...

DeadEnd said:
What a ridiculous example and attempt to show me in bad light. The issue I faced had nothing to do with the 40nm. I have admitted multiple times I took on something that was harder than I had expected without professional equipment. it may come to you as a surprise, but reballing/swapping is not a simple procedure. But hey, you would know all about that. You are a reballer after all, or are you?

Truemaster's board developed a temperature sensor error and possibly could have been corrected by a new sensor, but luckily Victor helped out and went even further to repair it. In any case, there are multiple examples of boards working fine after the swap, which in no way discredits the 40nm's performance. But it matters very little because you are cheater and it is just yet another excuse for you to propel your points. Just like you cheated when you provided screenshots of 90nm running unbelievably cool only to find out you drilled holes into the case.

May be a good time to take the hate googles away friend, and maybe make the moderators happy in the process.

Thats not what I said. I didnt put the quality of your work into question. You did. In fact I am sure you did the best you could. Besides it was a success, wasnt it?
If it failed shortly after, it was despite your best efforts and despite the superiority of the new 40nm chip, which was successfully installed by you. Surely we can agree on that.

That was the point. That the "superiority" of the chip is not the determining factor for the life of the machine. So being paranoid about it specifically may be foolish since is not so easy to predict what will fail first anyway.
In this case, the 40nm did not get the chance matter, and nobody knows if the board would have lasted shorter or longer with the 90nm.
These are 16 year old machines that are incredibly complex. Everyone should be happy if they work at all. And when they dont, we can try and fix what is broken.

DeadEnd · Sep 16, 2022

Pacorretaco said:
I hope not...

May be a good time to take the hate googles away friend, and maybe make the moderators happy in the process.

Thats not what I said. I didnt put the quality of your work into question. You did. In fact I am sure you did the best you could. Besides it was a success, wasnt it?
If it failed shortly after, it was despite your best efforts and despite the superiority of the new 40nm chip, which was successfully installed by you. Surely we can agree on that.

That was the point. That the "superiority" of the chip is not the determining factor for the life of the machine. So being paranoid about it specifically may be foolish since is not so easy to predict what will fail first anyway.
In this case, the 40nm did not get the chance matter, and nobody knows if the board would have lasted shorter or longer with the 90nm.
These are 16 year old machines that are incredibly complex. Everyone should be happy if they work at all. And when they dont, we can try and fix what is broken.

If moderators were here, you would be blocked already.

Spare me your explanations and facts twisting .You knew exactly what you were saying and you wanted to take a jab at me. The chip did not fail, the TEMPERATURE SENSOR failed due to several reballs. And even if there was an issue with the chip itself, it's because I had to reball it several times due to balls merging. The chip (and the sensor) were already probably compromised by then. He explained it to you several times himself. THAT DOES NOT COUNT AS AN EXAMPLE OF 40NM FAILURE , IT WAS A TRIAL RUN. WHY ARE YOU IGNORING THAT FACT ?

RIP-Felix · Sep 16, 2022

Pacorretaco said:
These are 16 year old machines that are incredibly complex. Everyone should be happy if they work at all. And when they dont, we can try and fix what is broken.

Usually what's broken is the GPU. So when he replaced it with a 40nm GPU, he did exactly that - "fixed what's broken." And did so using the method that has the best chance of working long term.

That somthing else failed later is not suprising, when we (Ghetto reballers) are still learning. @truemaster knew that when he sent the board to @DeadEnd. He willingly entered into that risk, having been told he should not expect it to work (or last).

We have already established the ratios of common problems. GPU's are foremost. Tokins are up there and likely to become a larger perportion as they age and defective GPU's all die out and get replaced with reliable ones. After that there are the odd IC. Clock generators, fuses, SMD's...or the good ole PSU and Bluray drive laser.

sandungas · Sep 16, 2022

What are you discussing anyway ?. Is obvious the best repair posible is to replace the RSX by a 40nm one because is more efficient in general, probably have less use, is less prone to fail in that theory about the bumpgate and reduces the probability do to several reballings that could wear the motherboard...

But dont forget that sentence that says "the customer is always righ". And dont take the word "customer" as a commercial transaction only, because sometimes the customer is yourself or a friend

Also, it could happen you like electronics and have some money to spend in some reballing "toys" and you are doing it for fun and learning or/and a personal challenge
Also it could happen you bought a couple of PS3's for 10€ and 8€ so the worst that could happen by trying the 90nm replacement is to lose 18€... imo good investment for a hobby

I mean... there are many factors that could justify doing ghetto repairs

Pacorretaco · Sep 16, 2022

If you really want to know my opinion...

DeadEnd said:
If moderators were here, you would be blocked already.

Spare me your pathetic explanations and facts twisting .You knew exactly what you were saying and you wanted to take a jab at me. The chip did not fail, the TEMPERATURE SENSOR failed due to several reballs. And even if there was an issue with the chip itself, it's because I had to reball it several times due to balls merging. The chip (and the sensor) were already probably compromised by then. He explained it to you several times himself. THAT DOES NOT COUNT AS AN EXAMPLE OF 40NM FAILURE , IT WAS A TRIAL RUN. WHY ARE YOU IGNORING THAT FACT ?

Thats a fallacy. Boards dont normally fail "later"... "because of too many reballs".
But hey, maybe you were right. However, If anything, I'd say it failed for the opposite reason. Not enough reballs, not enough practice. You didnt reball a single COK before, but you tried to do a swap straight away to "do it better". Thats why it was so difficult and found so many problems...

But ok, maybe that wasnt good example because it was the first... It was a trial and whatnot...
Do you really want to talk about the second one? Oh, it failed too. So did the third one and forth one and who knows how many more... Even the one from your video...
Paranoia and hate dont help fixing, only destroying.

And hey I am not just being a hater like you. I dont hate the 40nm and I never said it could be a bad thing. Didnt I also help to make it possible in one way or another? Wasnt I here from the beginning too, even before you?

I was just pointing out facts, and trying to make peace...
If you arent good example but still want example... What about the original frankenstein from SONY from icferrum?
The first one we heard about. Surely they did it properly? But not only had problems, the problems were RSX related too, even with 40nm.
(The very first post, first few sentences from this 120 page thread... Not like I had to dig...)

lcferrum said:
Hello, guys!

Long story short. Recently I bought CECHA00 straight from the Japan. It arrived in pretty bad state: console was tempered with (seals broken), original 60gb HDD was switched for 20gb (presumably because seller just didn't want to share his game collection), probably was dropped on the floor (several plastic bolt holes are torn out or stripped) and sometimes YLOD/GLOD.

Edit: Lastly, whats even your point, besides trying attack me for some reason? Why cant there just be peace?

RIP-Felix · Sep 16, 2022

said it was dropped. That's a common way for BGA to break. So it had nothing to do with the farnkenstein mod. That worked. If you drop the PS3 down a staircase, don't blame GPU defects. Blame the pizza grease on your fingers.

Pacorretaco · Sep 16, 2022

RIP-Felix said:
said it was dropped. That's a common way for BGA to break. So it had nothing to do with the farnkenstein mod. That worked. If you drop the PS3 down a staircase, don't blame GPU defects. Blame the pizza grease on your fingers.

Your point is? Go on with the fights?
Where did I blame GPU defects?
And why do the RSX always break "from the drops" and never CPU or something else?

RIP-Felix · Sep 16, 2022

Pacorretaco said:
Your point is?
Where did I blame GPU defects?

My point is the HW didn't fail. The user broke it.

Pacorretaco said:
If you arent good example but still want example... What about the original frankenstein from SONY from icferrum?
The first one we heard about. Surely they did it properly? But not only had problems, the problems were RSX related too, even with 40nm.

Just saying that's not a good example either.

You keep making the case that consoles are complex and anything can break. That's true of any electronic device. It doesn't mean we can't figure out how to fix them. Sure there are hard cases we haven't been able to fix, like my no PS2 console. But I haven't given up on it becuse it's pointlessly complex, or a futuile venture because something else will just break and I'll be chasing broken components untill I have replaced everything. I and all repair techs operate under the assumption that there are a finite number of components "prone to failure" and after replacing them with a reliable component, it'll be good to go for as long as possible. But not forever.

Pacorretaco · Sep 16, 2022

RIP-Felix said:
My point is the HW didn't fail. The user broke it.
Just saying that's not a good example either.

You keep making the case that consoles are complex and anything can break. That's true of any electronic device. It doesn't mean we can't figure out how to fix them. Sure there are hard cases we haven't been able to fix, like my no PS2 console. But I haven't given up on it becuse it's pointlessly complex, or a futuile venture because something else will just break and I'll be chasing broken components untill I have replaced everything. I and all repair techs operate under the assumption that there are a finite number of components "prone to failure" and after replacing them with a reliable component, it'll be good to go for as long as possible. But not forever.

I am not discouraging anybody or making anybody think of "giving up". If anything, the opposite.
I am simply a proponent of fixing what is broken. What could be so wrong with that?

But yeah you make a good reminder of the other frankenstein cases that got PS2 problems. Maybe we could focus on that challenge now for example, instead of silly "fights"?
You had a board like that, Victor had a couple, Botakompong too and I believe Computer booter as well.
Regardless of whether they are "examples" or not... That clearly is an open front that deserve our attention. I dont think those were "dropped" but who knows...
Nobody said to give up. Just fix what is broken. If PS2 is broken in those...

RIP-Felix · Sep 16, 2022

About fixing only what's currently broken...

In general that advice sounds good. But what if the there are other components prone to fail? If you are offer a warrany and it reduces customer dissatisfaction to replace the tokins preemptivly, is that such a bad thing?

I have said that tokins are a better cap. But that's when they work. But they are aged, and aging. A user I've been conversing with on Discord recently had a console he previously only fixed what was wrong, and then it later develop bad tokins (we think). So had he premptivly replaced them at the same time as the original issue, then he wouldn't have to disassemble, repaste, clean, all of that to get down to the motherboard. It's an added inconvenience, especially when you are satisfied it's fixed and then are emotionally done repairing and ready to enjoy the console...just to have it YLOD in the middle of a game! That's the kind thing that'll make people give up and buy a slim...believing that the BC models are hopeless.

And if you do that to a customer, they'll be less likely to be satisfied.

So yeah, we delid and repaste. Preventative maintenence. Adjust fan curves. Dust. Keep the console out of enclosed cabinets. If we replace the GPU and the tokins at the same time it may make the repairs last longer for the end user. Which also means fewer returns and bad ratings for repair techs.

I know some techs around here do, or used to replace toki s at the same time as a reball/frankie. Vyktor reballs the CELL at the same time just to prempt that too.

So we accept that repair doesn't just mean to litterally fix only what's broken. The old axiom, "if it ain't broke don't fix it" is not a rule. It's general advice to not break your currently working console to fix what hasn't broken yet. Once it has, or is in trouble (overheating), you can decide to break it to make it last as long as possable.

I wouldn't reccoment replacing a 90nm RSX because your cell is overheating. Or replace good tokins at that time. But if you are reballing, taking tokins off or reballing both cpu and gpu at the same time makes some kind of sense. But it depends on your circumstances, level of proficcency, etc.

There is no, "this is the only right way." It's up to each person to decide what the best ourse of action is the best for them.

squeept · Sep 16, 2022

Pacorretaco said:
Thats a fallacy. Boards dont normally fail "later"... "because of too many reballs"

Every component has a MTBF directly affected by heat. Every reflow cycle has a small chance at popcorning/delaminating something no matter how careful you are, and you may not hear it even if you're listening for it. And we may not know when a GPU is on the way out and we killed it by swapping, so it could just need another new GPU.

I know a lot of things we talk about here are specifically about getting your own unit working no matter what so the waters are muddied by arguing from two different standpoints, but from the perspective of both turning a profit AND providing a warranty, you stop after a few cycles.

edit: and to @RIP-Felix's point, I still think the TOKIN crowd is a little nuts. But that all started like 2 years ago, and I give a 2 year warranty on Franks, so that's 4 extra years since I went apeshit on everyone while it does seem to be getting worse. Plus I am literally finding it easier to replace them than to answer a hundred questions about why I didn't replace them, so I just rip them off these days (the thing I was trying to avoid by trying to stamp out the misinformation early) on the more expensive configurations. Seems like it helps justify the cost to customers as well.

edit 2: Also, in my experience, heat KILLS the TOKIN caps, so if you've reworked a board like whoa, you really should rip them off at this point.

PS3 Frankenstein PHAT PS3: CECHA with 40nm RSX

Octal450

Forum Noob

RIP-Felix

Senior Member

squeept

Senior Member

Pacorretaco

Member

squeept

Senior Member

sandungas

Victor Hugo Alvarez

Forum Noob

Octal450

Forum Noob

DeadEnd

Senior Member

Pacorretaco

Member

DeadEnd

Senior Member

RIP-Felix

Senior Member

sandungas

Pacorretaco

Member

RIP-Felix

Senior Member

Pacorretaco

Member

RIP-Felix

Senior Member

Pacorretaco

Member

RIP-Felix

Senior Member

squeept

Senior Member

Similar threads