6 votes

Passing the Same Parameters to Multiple Functions

Tags: python, design

12 comments

  1. [5]
    arghdos
    (edited )
    Link
    Why not: def distances(sentence, window=3, scale=True): pass def co_occurence_matrix(sentences, **kwargs): # ... whatever distances(sentences, **kwargs) I tend to leave any argument that will not...

    Why not:

    def distances(sentence, window=3, scale=True):
      pass
    
    def co_occurence_matrix(sentences, **kwargs):
      # ... whatever
      distances(sentences, **kwargs)
    

    I tend to leave any argument that will not be used in a function as unnamed. This is often useful, e.g., if you had other distances overloads, typically leaving a note that all **kwargs are passed to the distances function in the documentation.

    1 vote
    1. [4]
      onyxleopard
      (edited )
      Link Parent
      I tend to dislike using **kwargs in this way for the following reasons: It does not adhere to the "explicit is better than implicit" principle. That is to say, using **kwargs in this way...

      I tend to dislike using **kwargs in this way for the following reasons:

      1. It does not adhere to the "explicit is better than implicit" principle. That is to say, using **kwargs in this way implicitly couples the implementations of distances and co_occurrence_matrix in a way that makes me uncomfortable.
      2. It is much more cumbersome to document and has the effect of forcing a reader to look at the implementation of co_occurrence_matrix, rather than the function declaration to understand how to call it. That is to say, **kwargs hides the self-documenting nature of explicit keyword arguments.

      But, I’m open to being convinced otherwise (and liberal use of **kwargs is something I see a lot in Python).

      1 vote
      1. [3]
        arghdos
        Link Parent
        Yes — agreed that the documentation part is suboptimal in these situations. You could change: def distances(sentence, window=3, scale=True): pass def co_occurence_matrix(sentences,dist_func): #...

        Yes — agreed that the documentation part is suboptimal in these situations. You could change:

        def distances(sentence, window=3, scale=True):
          pass
        
        def co_occurence_matrix(sentences,dist_func):
          # ... whatever
          dist_func(sentence)
        
        co_occurence_matrix(sentences, lambda sentence: distance(sentence, window=my_window, scale=do_scale)
        
        1 vote
        1. [2]
          onyxleopard
          (edited )
          Link Parent
          Parameterizing co_occurrence_matrix to take a distance function is a nice idea and definitely useful as there are many possible distance functions that may be of interest, and this is why I...

          Parameterizing co_occurrence_matrix to take a distance function is a nice idea and definitely useful as there are many possible distance functions that may be of interest, and this is why I wondered if a closure would be preferable here. I tend to avoid using anonymous functions, so I’d probably do it like this:

          def distances(sentence, window=3, scale=True):
            ...
          
          def co_occurrence_matrix(sentences, distance_f):
            ...
            for sentence in sentences:
              for (word1, word2), distance in distance_f(sentence).items():
            ...
          
          # define concrete, named distance function
          def scaled_trigram_distance(sentence):
            return distances(sentence, window=3, scale=True)
          
          # call with the concrete, named distance function
          index, matrix = co_occurrence_matrix(sentences, scaled_trigram_distance)
          

          Thanks for the suggestion!

          Edit: I wonder if every time I pass the same parameters to nested functions, it should be a clue that I really ought to be passing the inner function to the outer function instead?

          1 vote
          1. arghdos
            Link Parent
            Heh, yes the lambda was primarily cause I was typing on my phone :)

            Heh, yes the lambda was primarily cause I was typing on my phone :)

  2. [7]
    onyxleopard
    (edited )
    Link
    I linked to a gist with some code I recently wrote in order to answer a StackOverflow question about computing a "scaled" (i.e., distance-normalized) word co-occurrence matrix, which is a useful...

    I linked to a gist with some code I recently wrote in order to answer a StackOverflow question about computing a "scaled" (i.e., distance-normalized) word co-occurrence matrix, which is a useful computation in computational linguistics. As I was writing this bit of code, I noticed a design decision that I frequently encounter, and I was wondering if the way I landed here (which is a decision I’ve made many, many times in the past as well) is a good practice or not.

    Specifically, I wrote a distances function that, given a sentence can compute the distances between the words in the sentence, with the following parameters:

    def distances(sentence, window=3, scale=True):
        
    

    This is an intentional abstraction, though the implementation could probably be optimized further instead of iterating over all pair-wise combinations of tokens in a sentence and filtering by distance. But that’s not the focus of my question. My question is about the window and scale parameters.

    I use these same parameters in the ultimate co_occurrence_matrix function, which calls distances and passes the keyword parameters along:

    def co_occurrence_matrix(sentences, window=3, scale=True):
        
        for sentence in sentences:
            for (word1, word2), distance in distances(
                sentence,
                window=window, # this keyword parameter gets passed along
                scale=scale # this one too
            ).items():
        
    

    Instead of defining distances as a top-level function, Python would allow for defining distances in the scope of co_occurrence_matrix, like this:

    def co_occurrence_matrix(sentences, window=3, scale=True):
        def distances(sentence):
            distances = {}
            for pair in combinations(enumerate(sentence), 2):
                (i, word1), (j, word2) = pair
                word1, word2 = sorted((word1, word2))
                if (word1, word2) not in distances:
                    distances[(word1, word2)] = 0.0
                distance = abs(i - j)
                if distance <= window:
                    distances[(word1, word2)] += (1.0 / distance) if scale else 1.0
            return distances
        vocab = vocabulary(sentences)
        shape = len(vocab), len(vocab)
        matrix = np.zeros(shape)
        index = {word: i for (i, word) in vocab.items()}
        for sentence in sentences:
            for (word1, word2), distance in distances(sentence).items(): # no repeated parameters
                cell = index[word1], index[word2]
                matrix[cell] += distance
                # maintain symmetry
                if word1 != word2:
                    inverse_cell = index[word2], index[word1]
                    matrix[inverse_cell] += distance
        return index, matrix
    

    This is possibly cleaner in the sense that the window and scale parameters are not repeated (adhering to the DRY principle) but it also means the distances function is not fully abstracted, and if one wanted to reuse distances for other purposes it is not feasible.

    Is this alternative better than the way I’ve implemented it in the gist? Is there another option, such as a closure, that would be preferable to both options I’ve laid out? I’ve encountered this situation of repeating parameters to many functions quite often, and sometimes I wonder if this is a bad habit.

    Thanks for your input!

    1. [2]
      Nodja
      Link Parent
      The only change I would make would be to change def distances(sentence) to def distances(sentence, window=window, scale=scale), this makes it explicit what data the function is using. When reading...

      The only change I would make would be to change def distances(sentence) to def distances(sentence, window=window, scale=scale), this makes it explicit what data the function is using. When reading functions I like to know what kind of data they're working on, and using "pseudo-globals" like that can be surprising, even to yourself down the line.

      2 votes
      1. onyxleopard
        Link Parent
        I was explicitly contrasting passing only sentence to distances with the linked gist where I implemented distances in its own scope outside of the other function. You’re suggesting that it would...

        I was explicitly contrasting passing only sentence to distances with the linked gist where I implemented distances in its own scope outside of the other function. You’re suggesting that it would be preferable to keep distances within the scope of co_occurrence_matrix, but keep the explicit keyword arguments as well? This isn’t an option I had considered, and I don’t think I see the benefit of this (whereas, keeping distances in the scope where window and scale are available, they don’t need to explicitly passed). My unease with my implementation is the repetition of the window and scale arguments, and your suggestion does not alleviate my unease. 😆

        2 votes
    2. [2]
      skybrian
      Link Parent
      Defining an inner function seems like a clean way to do it, assuming you never want to call distances from anywhere else. When you read the definition of distances, the only other variables in...

      Defining an inner function seems like a clean way to do it, assuming you never want to call distances from anywhere else. When you read the definition of distances, the only other variables in scope are the outer function's parameters, so that's simple enough.

      It does mean that sentences is in scope when it doesn't need to be, but that doesn't seem like a big deal. If the outer function took many more parameters that the inner function doesn't use, moving the function outside might be better.

      You would also run into a problem if you wanted to test the distances function separately, in which case it would need to be moved outside.

      (By the way, you misspelled occurrence in the example.)

      1 vote
      1. onyxleopard
        Link Parent
        Yes, thanks, I fixed that in the gist.

        (By the way, you misspelled occurrence in the example.)

        Yes, thanks, I fixed that in the gist.

        1 vote
    3. [2]
      smores
      Link Parent
      I don’t think this is necessarily the right thing to do in this particular case, but I think often in scenarios where the same data is being passed around to multiple functions, object oriented...

      I don’t think this is necessarily the right thing to do in this particular case, but I think often in scenarios where the same data is being passed around to multiple functions, object oriented programming can sometimes clean up that data management. For example, creating a class that is constructed with window and scale arguments and has these two functions as class methods.

      Again, I’m this case where the parameters are only used in two functions, a class might be overkill, but it might start to make sense if the problem was a bit larger!

      1 vote
      1. onyxleopard
        Link Parent
        Yes, I see the case for creating an object to reuse the scale and window values—object-oriented programming is just not my style. I am more interested in other options such as what @arghdos...

        Yes, I see the case for creating an object to reuse the scale and window values—object-oriented programming is just not my style.

        I am more interested in other options such as what @arghdos suggested here, but thanks for pointing this out.

        1 vote