Why not: def distances(sentence, window=3, scale=True): pass def co_occurence_matrix(sentences, **kwargs): # ... whatever distances(sentences, **kwargs) I tend to leave any argument that will not...
I tend to leave any argument that will not be used in a function as unnamed. This is often useful, e.g., if you had other distances overloads, typically leaving a note that all **kwargs are passed to the distances function in the documentation.
I tend to dislike using **kwargs in this way for the following reasons: It does not adhere to the "explicit is better than implicit" principle. That is to say, using **kwargs in this way...
I tend to dislike using **kwargs in this way for the following reasons:
It does not adhere to the "explicit is better than implicit" principle. That is to say, using **kwargs in this way implicitly couples the implementations of distances and co_occurrence_matrix in a way that makes me uncomfortable.
It is much more cumbersome to document and has the effect of forcing a reader to look at the implementation of co_occurrence_matrix, rather than the function declaration to understand how to call it. That is to say, **kwargs hides the self-documenting nature of explicit keyword arguments.
But, I’m open to being convinced otherwise (and liberal use of **kwargs is something I see a lot in Python).
Yes — agreed that the documentation part is suboptimal in these situations. You could change: def distances(sentence, window=3, scale=True): pass def co_occurence_matrix(sentences,dist_func): #...
Yes — agreed that the documentation part is suboptimal in these situations. You could change:
Parameterizing co_occurrence_matrix to take a distance function is a nice idea and definitely useful as there are many possible distance functions that may be of interest, and this is why I...
Parameterizing co_occurrence_matrix to take a distance function is a nice idea and definitely useful as there are many possible distance functions that may be of interest, and this is why I wondered if a closure would be preferable here. I tend to avoid using anonymous functions, so I’d probably do it like this:
defdistances(sentence,window=3,scale=True):...defco_occurrence_matrix(sentences,distance_f):...forsentenceinsentences:for(word1,word2),distanceindistance_f(sentence).items():...# define concrete, named distance functiondefscaled_trigram_distance(sentence):returndistances(sentence,window=3,scale=True)# call with the concrete, named distance functionindex,matrix=co_occurrence_matrix(sentences,scaled_trigram_distance)
Thanks for the suggestion!
Edit: I wonder if every time I pass the same parameters to nested functions, it should be a clue that I really ought to be passing the inner function to the outer function instead?
I linked to a gist with some code I recently wrote in order to answer a StackOverflow question about computing a "scaled" (i.e., distance-normalized) word co-occurrence matrix, which is a useful...
I linked to a gist with some code I recently wrote in order to answer a StackOverflow question about computing a "scaled" (i.e., distance-normalized) word co-occurrence matrix, which is a useful computation in computational linguistics. As I was writing this bit of code, I noticed a design decision that I frequently encounter, and I was wondering if the way I landed here (which is a decision I’ve made many, many times in the past as well) is a good practice or not.
Specifically, I wrote a distances function that, given a sentence can compute the distances between the words in the sentence, with the following parameters:
defdistances(sentence,window=3,scale=True):…
This is an intentional abstraction, though the implementation could probably be optimized further instead of iterating over all pair-wise combinations of tokens in a sentence and filtering by distance. But that’s not the focus of my question. My question is about the window and scale parameters.
I use these same parameters in the ultimate co_occurrence_matrix function, which calls distances and passes the keyword parameters along:
defco_occurrence_matrix(sentences,window=3,scale=True):…forsentenceinsentences:for(word1,word2),distanceindistances(sentence,window=window,# this keyword parameter gets passed alongscale=scale# this one too).items():…
Instead of defining distances as a top-level function, Python would allow for defining distances in the scope of co_occurrence_matrix, like this:
defco_occurrence_matrix(sentences,window=3,scale=True):defdistances(sentence):distances={}forpairincombinations(enumerate(sentence),2):(i,word1),(j,word2)=pairword1,word2=sorted((word1,word2))if(word1,word2)notindistances:distances[(word1,word2)]=0.0distance=abs(i-j)ifdistance<=window:distances[(word1,word2)]+=(1.0/distance)ifscaleelse1.0returndistancesvocab=vocabulary(sentences)shape=len(vocab),len(vocab)matrix=np.zeros(shape)index={word:ifor(i,word)invocab.items()}forsentenceinsentences:for(word1,word2),distanceindistances(sentence).items():# no repeated parameterscell=index[word1],index[word2]matrix[cell]+=distance# maintain symmetryifword1!=word2:inverse_cell=index[word2],index[word1]matrix[inverse_cell]+=distancereturnindex,matrix
This is possibly cleaner in the sense that the window and scale parameters are not repeated (adhering to the DRY principle) but it also means the distances function is not fully abstracted, and if one wanted to reuse distances for other purposes it is not feasible.
Is this alternative better than the way I’ve implemented it in the gist? Is there another option, such as a closure, that would be preferable to both options I’ve laid out? I’ve encountered this situation of repeating parameters to many functions quite often, and sometimes I wonder if this is a bad habit.
The only change I would make would be to change def distances(sentence) to def distances(sentence, window=window, scale=scale), this makes it explicit what data the function is using. When reading...
The only change I would make would be to change def distances(sentence) to def distances(sentence, window=window, scale=scale), this makes it explicit what data the function is using. When reading functions I like to know what kind of data they're working on, and using "pseudo-globals" like that can be surprising, even to yourself down the line.
I was explicitly contrasting passing only sentence to distances with the linked gist where I implemented distances in its own scope outside of the other function. You’re suggesting that it would...
I was explicitly contrasting passing only sentence to distances with the linked gist where I implemented distances in its own scope outside of the other function. You’re suggesting that it would be preferable to keep distances within the scope of co_occurrence_matrix, but keep the explicit keyword arguments as well? This isn’t an option I had considered, and I don’t think I see the benefit of this (whereas, keeping distances in the scope where window and scale are available, they don’t need to explicitly passed). My unease with my implementation is the repetition of the window and scale arguments, and your suggestion does not alleviate my unease. 😆
Defining an inner function seems like a clean way to do it, assuming you never want to call distances from anywhere else. When you read the definition of distances, the only other variables in...
Defining an inner function seems like a clean way to do it, assuming you never want to call distances from anywhere else. When you read the definition of distances, the only other variables in scope are the outer function's parameters, so that's simple enough.
It does mean that sentences is in scope when it doesn't need to be, but that doesn't seem like a big deal. If the outer function took many more parameters that the inner function doesn't use, moving the function outside might be better.
You would also run into a problem if you wanted to test the distances function separately, in which case it would need to be moved outside.
(By the way, you misspelled occurrence in the example.)
I don’t think this is necessarily the right thing to do in this particular case, but I think often in scenarios where the same data is being passed around to multiple functions, object oriented...
I don’t think this is necessarily the right thing to do in this particular case, but I think often in scenarios where the same data is being passed around to multiple functions, object oriented programming can sometimes clean up that data management. For example, creating a class that is constructed with window and scale arguments and has these two functions as class methods.
Again, I’m this case where the parameters are only used in two functions, a class might be overkill, but it might start to make sense if the problem was a bit larger!
Yes, I see the case for creating an object to reuse the scale and window values—object-oriented programming is just not my style. I am more interested in other options such as what @arghdos...
Yes, I see the case for creating an object to reuse the scale and window values—object-oriented programming is just not my style.
I am more interested in other options such as what @arghdos suggested here, but thanks for pointing this out.
Why not:
I tend to leave any argument that will not be used in a function as unnamed. This is often useful, e.g., if you had other
distances
overloads, typically leaving a note that all**kwargs
are passed to thedistances
function in the documentation.I tend to dislike using
**kwargs
in this way for the following reasons:**kwargs
in this way implicitly couples the implementations ofdistances
andco_occurrence_matrix
in a way that makes me uncomfortable.co_occurrence_matrix
, rather than the function declaration to understand how to call it. That is to say,**kwargs
hides the self-documenting nature of explicit keyword arguments.But, I’m open to being convinced otherwise (and liberal use of
**kwargs
is something I see a lot in Python).Yes — agreed that the documentation part is suboptimal in these situations. You could change:
Parameterizing
co_occurrence_matrix
to take a distance function is a nice idea and definitely useful as there are many possible distance functions that may be of interest, and this is why I wondered if a closure would be preferable here. I tend to avoid using anonymous functions, so I’d probably do it like this:Thanks for the suggestion!
Edit: I wonder if every time I pass the same parameters to nested functions, it should be a clue that I really ought to be passing the inner function to the outer function instead?
Heh, yes the lambda was primarily cause I was typing on my phone :)
I linked to a gist with some code I recently wrote in order to answer a StackOverflow question about computing a "scaled" (i.e., distance-normalized) word co-occurrence matrix, which is a useful computation in computational linguistics. As I was writing this bit of code, I noticed a design decision that I frequently encounter, and I was wondering if the way I landed here (which is a decision I’ve made many, many times in the past as well) is a good practice or not.
Specifically, I wrote a
distances
function that, given asentence
can compute the distances between the words in the sentence, with the following parameters:This is an intentional abstraction, though the implementation could probably be optimized further instead of iterating over all pair-wise combinations of tokens in a sentence and filtering by distance. But that’s not the focus of my question. My question is about the
window
andscale
parameters.I use these same parameters in the ultimate
co_occurrence_matrix
function, which callsdistances
and passes the keyword parameters along:Instead of defining
distances
as a top-level function, Python would allow for definingdistances
in the scope ofco_occurrence_matrix
, like this:This is possibly cleaner in the sense that the
window
andscale
parameters are not repeated (adhering to the DRY principle) but it also means thedistances
function is not fully abstracted, and if one wanted to reusedistances
for other purposes it is not feasible.Is this alternative better than the way I’ve implemented it in the gist? Is there another option, such as a closure, that would be preferable to both options I’ve laid out? I’ve encountered this situation of repeating parameters to many functions quite often, and sometimes I wonder if this is a bad habit.
Thanks for your input!
The only change I would make would be to change
def distances(sentence)
todef distances(sentence, window=window, scale=scale)
, this makes it explicit what data the function is using. When reading functions I like to know what kind of data they're working on, and using "pseudo-globals" like that can be surprising, even to yourself down the line.I was explicitly contrasting passing only
sentence
todistances
with the linked gist where I implementeddistances
in its own scope outside of the other function. You’re suggesting that it would be preferable to keepdistances
within the scope ofco_occurrence_matrix
, but keep the explicit keyword arguments as well? This isn’t an option I had considered, and I don’t think I see the benefit of this (whereas, keepingdistances
in the scope wherewindow
andscale
are available, they don’t need to explicitly passed). My unease with my implementation is the repetition of thewindow
andscale
arguments, and your suggestion does not alleviate my unease. 😆Defining an inner function seems like a clean way to do it, assuming you never want to call distances from anywhere else. When you read the definition of distances, the only other variables in scope are the outer function's parameters, so that's simple enough.
It does mean that sentences is in scope when it doesn't need to be, but that doesn't seem like a big deal. If the outer function took many more parameters that the inner function doesn't use, moving the function outside might be better.
You would also run into a problem if you wanted to test the distances function separately, in which case it would need to be moved outside.
(By the way, you misspelled occurrence in the example.)
Yes, thanks, I fixed that in the gist.
I don’t think this is necessarily the right thing to do in this particular case, but I think often in scenarios where the same data is being passed around to multiple functions, object oriented programming can sometimes clean up that data management. For example, creating a class that is constructed with window and scale arguments and has these two functions as class methods.
Again, I’m this case where the parameters are only used in two functions, a class might be overkill, but it might start to make sense if the problem was a bit larger!
Yes, I see the case for creating an object to reuse the scale and window values—object-oriented programming is just not my style.
I am more interested in other options such as what @arghdos suggested here, but thanks for pointing this out.