Coding Noob Needs Help/Guidance on Small Project
Hi,
There's a certain site which hosts media files and has a player that depends on a lot of third-party resources to play, while browsers have native support for those file types. Those 3rd-party resources are often blocked by ad blockers and I have no desire to white-list them. I would like to extract the direct link to the media file and make it playable on my custom web page.
The link to the media file is present in the page source of each page, always on the same line. It's not anchored in HTML but present in the JavaScript for the player, like so:
$(document).ready(function(){
$("#jquery_jplayer_1").jPlayer({
ready: function () {
$(this).jPlayer("setMedia", {
[ext]: "https://[domain]/[filename.ext]"
});
},
In this example it's on line #5. [ext] = the file extension.
I want to build the following:
- A web page with a form with a single input field meant to receive links from that specific file host
- [Something] that extracts the file link from the source of the host's page
- Present the linked file as playable in an embedded native player
So far I've managed to create a form with an input box and a submit button, but it doesn't do anything yet. What is the best way to build the actual functionality? I know HTML/CSS. I have some rudimentary understanding of JavaScript/jQuery and Python3, so those would be my preferred tools.
For those worried about piracy: The files in question are not copyrighted and I'm not looking to make copies. I just want to make them playable. This is for personal use.
Thank you for reading this far. Any and all advice is welcome!
There are a lot of folks on this site that know a lot more than I do, but you're basically combining several elements here:
You have #1. Below is a link that explains how to play the video using HTML5, so that's #3. You'll need to work with others or on your own to come up with the code for the submit button that actually goes and gets the URL for the media. That's as much time as I have today (but maybe I can come back to this...).
W3 does a good job of explaining how to play the video using HTML5:
https://www.w3schools.com/html/html5_video.asp
NOTE: To everyone, please don't rip me for referring someone to w3. They're much improved from where they were 10 years ago.
EDIT: HTML5 assumes you want to play mp4 video files. If you want to do something else, please provide more info :)
Thank you for your reply. It's the bit between step 2 and 3 as you listed them
that I most need help with. If I were to build it in Python3, I've read about how I could use BeautifulSoup or html.parser and re (regular expression) to extract the file url. But how to get the input from the form into that Python script and get the output of that script back on the HTML page remains out of my reach :)
There's audio support too! https://www.w3schools.com/html/html5_audio.asp
The easiest way would probably be to make it a flask web app. Create a flask route that calls your script. Provide the URL as a query or post parameter.
The most straightforward way to get at the resource URL is probably to use a regex. People will, wisely, tell you that you shouldn't try to parse HTML with a regex, but in this case, the data you're looking for (the URL) has a pretty specific form, and any higher level approach is probably going to be more trouble than it's worth.
A full URL spec-compliant regex is also going to be a rabbit hole, you're (presumably) only interested in a fairly restricted set of URLs, all from the same host (or a small set) and having the same extension.
Let me know if you need any help understanding how it works, and there's always a few quirks between implementations in different languages, but they're pretty fun to learn and a great tool to have in your toolbox.
I managed to figure it out!
Now I need to find a way to get the input from the HTML5/jQuery form into
link
and get the output ofresult
onto that page to play it with<audio controls><source src="[result output]" type="audio/mp4"></audio>
Thank you for* sending me in the right direction.
Glad I could help! I love regular expressions (probably too much for my own good), so I'm always happy to help someone get started.
You may already have some recommendations for simple Python web frameworks, and I'm not super familiar with the territory, but if it were me I'd probably start with something minimal like Flask.
You'll want to set up a route to display the main page with the form, and a second one that the browser will POST to when the user submits the form.
Thank you for your advice. I'll dig into it before bothering you with any questions :)
This is just basic HTML for the form, and something to catch it on the backend. If you want to use Python, for the simplest approach you might want to look at a Flask application which you can then host on PythonAnywhere or similar.
I assume that there's something like BeautifulSoup for JS, which would let you parse the code to extract what you want. I doubt BS can extract from JS itself, but that's the kind of thing you want. Just googling there seem to be a few options, but I don't want to suggest any because I have no experience with them.
Simple enough to embed something like that into a template within the Flask site. However, I don't have nearly enough info to provide a suggestion for the player itself, so I'll leave that up to you.
This is the road I'm going to take. Thank you for your advice!
No worries! I use python/Flask a bit myself (though I'm by no means an expert), so if you run into any problems let me know and I might be able to point you in the right direction.
I know you want help, and not judgement, but effectively you are asking for help to skim bandwidth while denying revenue. That's a dick move. The minimum you should do here is rehost it, and if you don't have the rights to do that, don't consume the video.
Wow, you're making a lot of assumptions!
If you're planning on judging and lecturing people, make sure you have the facts first. Start with asking questions.
Now, do you have anything constructive to add to help me solve my problem?
I do. I would ask them directly if they would be willing to expose an api to consume the media directly. I wouldn't try to parse or scrape the js. It's fragile and flaky.
How about I'll continue forming my own judgements and opinions based on the information I have to hand, and you can continue begging for help.
ಠ_ಠ
https://docs.tildes.net/overall-goals#the-golden-rule
Sure. I'll consider myself advised.
I'd also advise the OP that the easiest way to demonstrate the accuracy of their claims would be to point to the site with free, hosted, and copyright-free videos that is absolutely fine with the type of scraping the OP wants to implement. I suggested what they were doing was bad, not that they were bad. The fact that got a huffy denial rather than the obvious objective refutation (a link to the site in question) leaves me confident that I am not wrong, but I am open to the fact that I might be.
The raison d'être for this project is protecting my privacy. Why would I publicly disclose something that I wish to keep private - so much so that I'm willing to learn how to code to do it?
You are essentially saying that because I wish to keep something private, I must be doing something wrong. Yes, that presses one of my buttons because I'm a privacy activist; One of the main reasons I'm here on Tildes is to support a project that takes a privacy-by-design approach. It's not a place where I expected to encounter a 'nothing to hide, nothing to fear' mentality. Perhaps I should've known better.
If you want to keep assuming bad faith and bad intentions that's fine. I'm not going to be bullied into disclosing anything more than what I already have.
I didn't say any of the things that you are imputing to me. I am not trying to bully you into doing anything. I merely laid out my standard. You don't have to meet it or agree with it. Other people on this site certainly do not agree with me. That's fine.