A script that posts topics to Tildes, designed to be run on a schedule.
So yeah, this took a bit longer to get out than it should have, but that's because my summer classes are eating up all my free time.
Tildes Automated Posting Script, or TAPS, is a Python script that, when properly configured, will post a topic to Tildes under the account credentials, to the group, with the title, comment, link, and tags that you set. I created it because every Monday and Friday around 11:00 AM I post a topic to ~talk, but that can be a problem for someone who is forgetful like me, so I wrote a program that posts the topic for me, and now I can just run it on a schedule with something like cron.
The documentation should explain enough for you to get started with it, but I should have time tonight to answer questions and discuss feedback or suggestions.
Some features I might add in the future:
- [Done] An argument that posts all the topics defined in
config.pyinstead of having to name them individually
- Check that topics defined in
config.pyhave all the necessary values and fail if they don't
- Check that the username and password variables are set, fail if they aren't
- Check that the link, title, or tags of a topic will be accepted by Tildes and fail if not
- Add an "interactive mode" where the program just prompts the user to answer a couple of questions then posts a topic using the provided answers
- Add the ability to post comments to topics (maybe)
- Add a config option that waits a certain amount of time between posting topics to avoid the rate limit
Some quick thoughts about this approach to automating interactions with Tildes:
I'd much rather drive a text based browser, but I haven't found anything (yet) with Python bindings
I'd much rather just send HTTP requests, but I don't know how to do that (yet)
I'd much rather Tildes just have an API, but I don't know how to build one (yet)
Neat, nice work!
Doing this with Selenium and a (headless or not) browser is really excessive though. You should be able to do it with Requests pretty easily. I'm not sure if there's some extra complexity that's not coming to mind, but I think it should just involve:
https://tildes.net/loginand get the CSRF token value out of the HTML (it's in both a
<meta>tag and a hidden input in the form).
https://tildes.net/loginwith the login info and that CSRF token (as
https://tildes.net/~whatever_group/topicswith that cookie data, the same CSRF token and all the data you want to use for the new topic (title, link/markdown, tags)
That should be it, I think.
Thanks for the links! Requests is where I'm going to look next.
In my head, I liked the idea of driving a browser and I think it's alright for a first attempt (Selenium was the first thing that was recommended to me), but it takes work to configure it correctly and some things are very finicky (like Chrome headless). Plus, I thought dealing with HTTP requests was going to be very complicated. However, judging from those links, it doesn't look as bad as I thought it would be. It's also something I'm going to need to learn anyways.
I know better than to give an estimation on when I might start working on a different approach to this, but in two weeks I have a five day respite from class that should give me a lot of free time.
If you're new to it, http request can seem pretty complicated. However there are a ton of libraries available that simplify the standard kinds of requests, handling post data, cookies, etc. Every language has several, and at least one favorite one. Axios is my preferred JS one.
That said, working through the details of how an http request is assembled, sent, and resolved is a good learning exercise. The protocol is pretty straight forward, especially for basic things like "go get this html".
Using requests, I had to take some extra steps to post something (though it was pretty easy to figure out going from your description):
CC @hungariantoast :D
Ah, right. The
Refererheader would be necessary too because I have an origin check to help try to prevent CSRF too: https://github.com/OWASP/CheatSheetSeries/blob/master/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.md#user-content-verifying-origin-with-standard-headers
I may want to drop that check though, since it seems like it causes issues for people that have disabled sending referrer info: https://gitlab.com/tildes/tildes/issues/128
What are the most significant risks of a CSRF attack for Tildes? Most likely stolen account credentials, possibly posting/editing in spam of some kind, or just general trolling (e.g. deleting all of the user's posts). Stealing account credentials would just be simple phishing and wouldn't require a CSRF attack (unless the intention is to act as a proxy and remain undetected), so we can probably strike that out. The big question, then, is how risky do we consider the insertion of spam or automation of trolling behaviors like destroying post history? Since this sort of attack could theoretically be triggered by simply visiting a malicious website while logged in to Tildes, I would say it's a fairly significant risk. Probably not likely, but certainly not something we would want to disregard.
The good news is that, to my knowledge, the Referer header is considered "forbidden" and thus not possible to modify programmatically in the browser (at least not in any browser using modern security standards). With that in mind, I have a recommendation: if possible, whitelist the cases where the header is provided and matches the expected and where the header is not provided/is empty.
This would provide maximum flexibility while limiting the risks to phishing (already a possibility), unofficial app installs (also already a possibility), and opting out of browser security features, all of which are squarely the fault of the user and not of lax security on the part of Tildes.
I'm not sure how feasible/workable this is with Tildes' architecture as I haven't had the time to really dig into it, but it may be an option worth exploring.
As a separate question, I know you were testing on the live site in ~test because you make suspicious spikes on my graphs like this (okay and also you told me you were doing it), but have you tried setting up a dev version locally with Vagrant?
It should work exactly the same for testing scripts like this except that you point them to
https://tildes.net, and that way you could probably be a little less worried about doing something harmful to the real site.
Edit: actually, it might need a little more than that because some libraries might not be happy about the self-signed SSL certificate that get used on the development version, but it's usually just an argument or something to say that you don't care about the certificate being valid.
I do have an environment set up on my laptop, but I don't have one set up on my desktop, which is what I wrote most of this on, and where I did all the testing for the Windows side of things. In the future I'll try to keep things contained to a dev environment though.
This strikes me as something that really ought to be built right in to Tildes someday, along with a simple calendar to help manage it. Good future project when it gets to the point where lots of groups are busy enough to have their own teams moderating and mod tools really get started. Nice work. :D
I totally forgot to add a gitlab issue when @Lionirdeadman first submitted that though.... but I have done so now:
Where is the user's password stored, and who has access to it?
Just to repeat what others have said, the user would enter their username and password into this
config.pyfile after they download the code onto their own computer. So their credentials live in there, but no one else will ever see them unless they do something catastrophically wrong.
Looks like the password and username live in the config.py file, based on the GitLab repo. As long as you kept your cloned copy of the repository local and didn't push it back to GitLab, the plaintext password would only be accessible to you on your local machine (no one could just grab it off of GitLab or anything).
config.py should ideally be added to the repo's .gitignore file so that nobody risks making this mistake.