8
votes
IMO the text used for formatting/markdown shouldn't count towards the character limit in user bios
I made a new bio recently and here been tweaking it for a while and hit the 2000 character cap and messed with some of the formatting and wording for it to fit in.
Thing is, you don't read markdown formatting. My bio has quite a bit of formatting and the text with the formatting is ~1960 characters, but the text you read is only ~885 characters (according to word counter), or less than half that. I feel like that's not how it should work.
I'm not necessarily opposed to increasing the limit somewhat, but most of the purpose of putting a limit on it is to prevent the length of people's bios from getting out of control by restricting them to a certain amount of "material" to work with. Your bio is already over a full screen height on my monitor, and you feel like you should be able to make it more like 2.5x that long, which would be... an extremely large bio.
Allowing longer bios wouldn't be harmful or anything, I just don't know if we really want them to turn into massive compilations of things like interesting links, where you click on a user page and there's basically a personal Wikipedia page in their sidebar. The intention is closer to "some relatively brief information about yourself".
We could always consider doing something separate too, like giving each user their own wiki page that they could keep random info/stuff in, and you could link to that from your bio. I'm not sure if that would be useful for many people though.
(I'm not gonna answer the rest of your comment because I feel I might repeat myself.)
Personally I feel @Emerald_Knight's idea of a separate limit for formatting characters alongside the normal limit is a good idea, mainly because it satisfies my wish of formatting characters being treated separately. (And that just letting formatting characters go up to infinity is pretty abusable.)
What do you think of that? If you like that or find it OK, what do you think the limits should be? I say 1500 characters for visible text and markdown text seems like a good bet.
Small correction: I never actually suggested a separate limit for formatting characters, so I'm being incorrectly credited. That's 100% your idea, merely inspired by me perhaps not communicating my thoughts as well as I'd intended. I do appreciate that you made sure to give credit where you believed it was due, though, so thank you for being considerate. But yeah, you own this idea, not me :)
From a purely technical standpoint, a limit to formatting characters is still very much a necessity. Even if you use a DBMS that has non-fixed storage like MongoDB, you would still be opening yourself up to users potentially filling up the entirety of a 16MB document size limit by including an arbitrarily large amount of formatting characters. That can lead to a host of potential (if perhaps unlikely) problems, among them being any of Tildes' typically very tech literate users having a really bad day and inconveniencing other users with (again, under a MongoDB system) a 16MB chunk of data (painful for mobile users with very limited data) or potentially combining the 16MB document size with a DDoS attack to flood memory utilization.
Limits are a necessity. What those limits should be are a matter for potential debate, and workarounds for larger limits can certainly be put in place such as @Deimos' mention of personal wiki pages, but reasonable limits are essential for ensuring stability and one of Tildes' goals of keeping the site lightweight.
Wow, that's an impressively in-depth bio... Just out of curiosity, what else would you like to include?
Not op. I want to populate it with links of stuff I like here and elsewhere.
After reading @rish's comment, adding some things I like (the 2 that come to mind are quotes and music) is definitely something I would do.
That being said, I wasn't really thinking about adding anything when I made the post, just not needing to "crunch" my markdown so my bio can fit under the 2000 character limit, even though the actual text you will be reading is almost unchanged.
Using my table as an example
Preformatted text, shortened
Preformatted text, not shortened
Formatted text, shortened
I wish, Am I Rig-Male.Formatted text, not shortened
I wish, amirit-Male.That's quite the impressive bio, @kuromantis. :)
p.s. Gitlab issue for this suggestion:
https://gitlab.com/tildes/tildes/-/issues/711
I agree with the OP and would add that non-ASCII characters most probably count as more than one character to the limit. (Edit: according to the responses, it is characters that count, not bytes, so a multi-byte Unicode character counts as one character; still, markup tags should not count.)
Is it feasible to re-implement user bios as a comment hack to some hidden topic, since comments can be much longer than 2000 characters?
why, what's the benefit? a character is a character isn't it?
Not a benefit, just a (probable) fact of how Unicode is stored in binary. Most Unicode characters are a byte with a tag saying “look at the next byte too” these can be chained until you can store many bytes in a system designed to only store one per character. Either tildes has a naive character counter that doesn’t understand Unicode, or the database uses fixed size storage.
The cool thing is, we can look at the Tildes source code to find out!
The bio info is stored in the
user
model, and its restriction can be found here. It shows that it's using Postgres' built-in length function to determine that.What does Postgres' documentation say? Found here:
Well, darn.
...what does Postgres itself say?
Neat. For most intents and purposes, it seems like unicode characters count as 1 in length.
Just wanted to mention that the database check constraint is more of a fallback and makes sure that overly-long data doesn't end up in the column somehow, but the main check is here, just from the
max_length
on that field declaration. The "schemas" are using the Python marshmallow library and used to validate/sanitize data.Python checks Unicode character-length properly too though, so it's the same result in the end.
Those are implementation details that should be hidden from the user though, right? Same way that formatting shouldn't count towards the limit, it'd be similarly confusing for users to say "a character is a character unless it's most emoji or some other multi byte character."