Email or username:

Password:

Forgot your password?
7 comments
Boo Ramsey πŸ§›πŸ»β€β™‚οΈπŸ§Ÿβ€β™‚οΈπŸ‘»πŸŽƒ

@jkt @simevidas In this case, Safari is the one that’s Unicode aware. The other browsers are treating maxlength as the number of bytes rather than the number of characters. πŸ™‚

Boo Ramsey πŸ§›πŸ»β€β™‚οΈπŸ§Ÿβ€β™‚οΈπŸ‘»πŸŽƒ

@jkt @simevidas

Following up with that, as I was thinking of some examples of what I mean...

Take kanji, for example. ζΌ’ε­— is 2 characters, but it's 6 bytes, so is the length 2 or 6?

Or the phrase "GΓ³Γ°a nΓ³tt" in Icelandic. It's 9 characters (counting the space in the middle), but it's 12 bytes. So, should this fail the maxlength check, if the maxlength is 10?

f4grx Sebastien (OLD ACCOUNT)

@ramsey @jkt @simevidas length is 2 characters, size is 6 bytes when encoded in utf8 I believe?

Johannes βœ”οΈ

@ramsey @jkt @simevidas bytes assume an encoding. Codepoints vs. grapheme clusters is the distinction in experience, I guess.

Boo Ramsey πŸ§›πŸ»β€β™‚οΈπŸ§Ÿβ€β™‚οΈπŸ‘»πŸŽƒ

@johannes @jkt @simevidas I thought it would be the other way around. The same grouping of bytes could represent different codepoints, based on the encoding.

Johannes βœ”οΈ

@ramsey @jkt @simevidas yes, but working on bytes means that the encoding has to be carried thorough the different layers and might cut utf-8 sequences apart (assuming utf-8 being the default encoding)

With either codepoints or grapheme clusters you at least get some valid (while not always sensible) result.

Go Up