Maybe Google shouldn't strip non-ascii characters when training AI
64 comments
@sil @KevinMarks that episode had a plot wish could have lasted 2-3 episodes to really flesh out the slow shrinking of the known universe. Such a cool story. @KevinMarks I'm made of perhaps 1026 of them, leaving 52 to 56 of them left for everyone else. @profdc9 @KevinMarks "the universe is made out of 1082 atoms, and I'm made of 1026 of them" is a level of egocentric I never thought possible. @dragoniff2 @profdc9 @KevinMarks "If there's anything in here more important than my ego, I want it caught and killed at once!" - Zaphod Beeblebrox in THHGTTG by Douglas Adams. But ^ is an Ascii character, so it may be that it was originally a superscript markup. Even so, that should not force removal of valid UTF-8. @SpaceLifeForm @KevinMarks It usually uses the sup html tag unless they're doing weird things now. @KevinMarks swear if I hear Trump complaining that the Haitian immigrants are taking all the atoms imma lose my shit @KevinMarks Speaking as someone who’s at least 4 million times worse than the average person at estimation, this seems good enough to me! @KevinMarks Google's Gemini doesn't have an issue with it. Probably a problem with the output not the training? @KevinMarks i have the impression this is not correct the universe is just 6 atoms we are just very good at sharing them @KevinMarks I asked Google for some prime numbers and it gave me 4 prime numbers and the letter "Q"... @Crazypedia @KevinMarks Q isn't a prime number, but I'm not so confident about ℚ ... @KevinMarks the bleeding edge in computing,with the full power of a neural network of fifteen and a half neurons. The finest in the world. @KevinMarks How can there be 1.5 billion chinese if the universe only got about 1078 atoms? I'll tell you what, that whole China business is some CIA Conspiracy! Engineers: Accurate to two decimal places Physicists: Accurate to an order of magnitude Astrophysicists: Accurate to an order of magnitude in the exponent @failedLyndonLaRouchite @KevinMarks @KevinMarks @nellie_m It’s pretty close to correct if you use base 10^27 (1000 in that base = 10^81 in decimal). However, “unlikely to be too far off the mark” is the best you’re likely to get, since no one I’ve met has been able to remember all the different digits required to write any given number in that base. @KevinMarks This is unacceptable. People use Google to find information. The results produced by Google themselves should not be so erroneous that the feature lacks all reason for existing. @KevinMarks @KevinMarks @glennf I’ve always leaned towards 1078; however, I’m willing to go as high as 1080. 1082, though, is just absurd. @KevinMarks And companies are falling over themselves to get on this AI bandwagon, and boast about how it is already being used to handle important decisions such as people's taxes? Brave new world! @KevinMarks Well, that would simplify the fuck out of the traveling salesman problem. @KevinMarks I have a uBlock filter that hides “Slop Overview” from my search results. Clearly it’s a filter worth keeping. @KevinMarks That particular one's not just Google or AI. The number of times I've seen that kind of error in newspapers is just silly. |
@KevinMarks