Handling text reposts (recurring subjects)
While browsing over the past little while I have noticed that I starting to see "text reposts".
I did a quick search and saw that link reposts have been discussed in the past, but I didn't come across something in relation to text posts.
To make it a bit more clear what I mean, a recent example:
- In my feed today: https://tildes.net/~tech/16nj/second_brain_software_of_choice
- A very similar thread last week: https://tildes.net/~creative/1639/what_do_you_use_for_note_taking_writing
Generally speaking I don't mind reposts, certainly with "ask" topics new insights can be gained over time and different people might give different answers. At the same time I do think that the landscape around note taking software hasn't drastically changed in a week.
To be clear, I am not saying that the OP of the most recent topic did anything wrong either. Even if you remember to check if a question has been asked before (I ironically almost forgot myself in this case) you might not find it.
But I am wondering if more could be done to surface previous discussions. Not to specifically prevent these sorts of reposts but also to surface potentially valuable information of previous discussions.
Something that does come to mind is having a mechanism that uses the title someone is typing as (part of) a search query in the same space. Matching topics could then be shown before submission.
Or if we care more about making previous discussions accessible give the OP after submission the option to also link to previous topics around the subject. Interlinked topics is something that would be interesting to explore, basically borrowing from the "other discussions" tab idea but make it "similar discussions".
I'm curious to see what others think about it.
I think we do what the internet has done for 30 years now. Either it gets interaction and people have a discussion or people recognize it is a repost and they ignore it. I see no reason to make a big deal out of it.
To be clear, I am not seeing reposts in themselve as a problem. I just think that if a subject has been the subject of discussion before it might be of value to surface that again. Either to a potential OP or as additional information in a post.
Anyone who finds it can explicitly link to the previous discussion, maybe? Ideally, that'd prompt people to either figure out the differences between the posts, or it'd serve as prior work. For example, I can see a substantial difference between mere note taking and "second brain", which makes me think of a Zettelkasten level of organization. Posting possible dupes and then clarifying whether or why not they are dupes is informative for the topic by itself.
Possibly, although that does require people remembering previous discussions. Which is why I was leaning towards an automated solution. Although that also has problems with false positives and as you point out different contexts.
I think that's what tags are supposed to do, but there's a balancing act there. On the one hand, having a limited number of curated tags limits the users' ability to tie topics together. On the other, giving users the ability to create and apply their own tags often leads to certain users shoehorning in every tag they can fit in the hopes of getting eyeballs, even if the tags don't really apply.
Tags certainly help in narrowing into a subject, but I feel that they are still very broadly applicable so I am not sure if it would help to the degree I am thinking of.
I really like the idea of a specific link to other similar topics; it could, for example, go in the sidebar as part of the "meta" info about the post. I think an analogous and helpful thing would be for
recurring.weekly
posts to have a link to the previous week, in the same way that old posts have a link to the current post.I think automating this may not be the best first step; if I was to envision how to implement this, I would suggest:
I think you have identified something very interesting, though, which is that sometimes the discussion has happened already, and some people will just kind of skip over the newer topic; I know there are things that I just haven't participated in because I read / commented earlier. That doesn't make those "reposts" a problem, it just means that it would be great to be able to share some of that still-green information with people having the conversation again.
Fair enough. The steps you sketch out also do seem like a feasible path.
Just to expand a bit on why I was leaning towards automation:
I also realized that I project-managered your point, and started breaking it down into tasks, and what I really meant was I think your idea is a good one, I think that linking to related submissions would be really helpful for people, especially in your example. Even if the difference in age of the threads was a month, the note-taking app space won't have changed much in that time.
The only reason that I am hesitant about automation is that there's a single developer, and I don't know if automation of this sort is going to go high on Deimos' list, but relying on the power of crowds is something that I think Tildes already does very well, so if we can tap into that, it might be a way to get the feature you want with a minimal amount of development. But I think all your other points are good ones as well, so automation being the goal makes a lot of sense.
No worries, I totally get your perspective and do agree that crowdsourcing it probably is the best path forward for now.
In the long term, I still do think it is worthwhile to see if part of it can also be automated for the reasons I outlined.
edit:
I just had a look at the source code. I had not realized that not more contribute on a regular basis. I also had not realized development in general has been a bit slow. I want to say that I am going to checkout the code and take a look at it. But realistically, at the moment, that is probably more wishful thinking than anything else.
Getting a working setup is difficult - I had one on my previous laptop, but was not successful in getting it working on my M1 MBP. I think it's important to understand that the way the site is currently setup is explicitly for how Deimos runs things, which I think is smart, but that means that it's not necessarily easy to get involved and make changes.
Edit: here's a thread that's about self-hosting, but it also touches on getting a working setup and some previous threads on the matter.
I finally got around running Tildes locally for development, it was much easier than I had thought. As it turns out there are nice instructions to be found here https://docs.tildes.net/development/initial-setup and for me the vagrant setup just worked.
Reading up on it, it seems that previously, virtualbox was not yet available for M1. It seems it is now, so you might be able to get it running again. Although I am not sure if the vm image will work.
Also, to be fair to the OP in that particular case, the earlier topic was posted in a different group, which they might not be subscribed to.
I feel like you didn't really read what I wrote? That, or I did a poor job of explaining. As I said in a different reply, I am not looking to prevent reposts in themselves. But, I am saying that previous discussions might also be valuable and was thinking about ways to surface these in process.
In the case of two almost identical topics that both sparked discussion, this sounds like a case where topic merging would be appropriate.
The way I would see such a feature working, a user having sufficient permissions could choose the two similar topics, resulting in a new one both topics now redirect to with the comments of both. The user would also choose which topic header stays as the header, with the other one being added right below, but in a collapsed summary box. Both authors of the original topics would be marked as OP. The vote count would be set to the set of users who voted for either, counting those who voted for both only once. If one topic is too old to have the vote information stored, it'll get the votes of the other one, and if both are, it'll start at 0.
As for a "similar discussions" tab, I think it's a good idea as well, however I believe it would be more fitting to use this to link to the other iterations of a recurring thread, and to use merging for the case above.
We don't need to be rigorous about duplicates like in an issue tracker, but it helps to do a search before posting. It's better to do it from the front page since it might be in a different group.
If the old discussion has died off and it's pretty long already, it might be better to start a new topic anyway to continue it. It helps to put a link to the previous topic in the introduction, so people know you've seen it and they can look at the previous topic if they like. Sometimes you can even make it explicit in the title that it's part 2 (or whatever).
If you notice a duplicate and it wasn't linked, it seems nice to post a polite comment linking to it.
This library
https://sbert.net/
is super easy to use (benefits from but doesn't need a GPU) and very capable of detecting similarity between short texts. We could run the title or the first few hundred tokens of text through the encode method (see the first example on that page) and then showed the submitter the top 10 closest submissions and they might decide not to submit it. Vector similarity is fast enough (particularly if you use Faiss, Pinecone or something like that) you could compare a new submission to all the existing submissions.
I would not at all want to use it on an automated basis because it is not calibrated, you can't really say there is a certain similarity score that is the right threshold to say things are too similar.
That library also supports cross-encoders
https://sbert.net/examples/applications/cross-encoder/README.html
and those treat the problem of "Is text A similar to text B" as a classification problem and return a calibrated probability score that would be more appropriate for automated use It is slower you you might take the top 10 from the embedding and use the cross-encoder to test those.
Here’s an idea that could actually make “reposts” a valuable part of the discussion.
Existing posts can have an “expand on this question” button.
If you are reading a discussion on tildes and after reading the responses, you realize that there was a better question that needs to be asked, you can click a button to make an entirely new post that is linked to the original. The original post will have a spot to display the follow up posts and the follow up posts will link to the parent discussion.
This is also different than continuing a discussion in the comments because it will have the same visibility as a new post instead of a new comment lost in an old post.
This system will encourage users to not ask the exact same question, but instead to read the original question and reopen the discussion if they think it is appropriate.
This is also different than the tag system because it will group around a central discussion instead of the more general topics that tags combine.
I could see this kind of system creating an FAQ style collection of posts for many topics that will make it easier for users to explore different niches in depth.