@simevidas dang not unicode aware :')
7 comments
Following up with that, as I was thinking of some examples of what I mean... Take kanji, for example. ζΌ’ε is 2 characters, but it's 6 bytes, so is the length 2 or 6? Or the phrase "GΓ³Γ°a nΓ³tt" in Icelandic. It's 9 characters (counting the space in the middle), but it's 12 bytes. So, should this fail the maxlength check, if the maxlength is 10? @ramsey @jkt @simevidas length is 2 characters, size is 6 bytes when encoded in utf8 I believe? @f4grx @jkt @simevidas The size is always 6 bytes, but yes, when encoded in utf-8, the length is 2 characters. @ramsey @jkt @simevidas bytes assume an encoding. Codepoints vs. grapheme clusters is the distinction in experience, I guess. @johannes @jkt @simevidas I thought it would be the other way around. The same grouping of bytes could represent different codepoints, based on the encoding. @ramsey @jkt @simevidas yes, but working on bytes means that the encoding has to be carried thorough the different layers and might cut utf-8 sequences apart (assuming utf-8 being the default encoding) With either codepoints or grapheme clusters you at least get some valid (while not always sensible) result. |
@jkt @simevidas In this case, Safari is the one thatβs Unicode aware. The other browsers are treating maxlength as the number of bytes rather than the number of characters. π