Week 10 - Interacting With Scikit-learn Maintainers

This week, we continued our contributions to scikit-learn. We were able to make pull requests for several issues, but not all of them have been replied. While waiting for the maintainers’ response, we have also found a long-term easy contribution and have planned to work on that.

Effective Interations with Maintainers

Speaking of interacting with maintainers, the pull request ENH RocCurveDisplay add option to plot chance level #25987 is definitely an important experience. Without having a channel like Discord, GitHub is the most important communication method. In fact, I had up to 63 conversations with the maintainers @glemaitre and @jeremiedbb in this PR.

This was about enhancing the ROCCurveDisplay class. Since the ROC curve and the PR curve are both within [0, 1] range, the author of the original issue mentioned forcing to scale the axes and making the border dotted. Another maintainer also mentioned adding a chance level line option to the curve. These were essentially the contents of my initial pull request. Then @glemaitre mentioned that those are too many different tasks for a single PR, so we decided only to focus on plotting the chance level line for specifically the ROC curve, during which he gave me a lot of precious suggestions.

First, he suggested that for backward compatibility, the chance level line option should have default value False. Also, he asked me to provide a chance_level_kw argument that allow users to customize the rendering of the chance level line instead of being forced to use the default style. He also helped me simplify the test cases, so that the tests can cover all the codes while being time-efficient.

After that, he also asked me to modify the examples, in which some are plotting the chance level lines manually. While adopting this new feature to the examples, I noticed that the chance level line could have been plotted multiple times, affecting the legend of the plot. Therefore, I implemented an O(n) check to avoid that, but @glemaitre later told me that specifying this limitation in the docstring is sufficient and we need not be so “smart”. He also used specific examples to demonstrate how users can use this feature correctly even in for loops without repetitively rendering the chance level line.

Many other things happend that benifit me a lot. I deepened my understanding of Test-Driven Development, improved my coding style, and most importantly, learned how to work as a team to develop the optimal solution. Since scikit-learn does not have any experimental release, this is especially important.

Some Other PRs

In the past week, I have also FIX’ed SequentialFeatureSelector throws IndexError when cv is a generator #25973. This is much simpler than the previously mentioned issue, but similarly, the maintainers suggested me in the issue the best solution for future maintenace and backward compatibility. Apart from the enhancement and the fix, I also fixed a docstring, which is essentially a one-line modification.

Finding “A Piece of Cake”

While many of the PRs of me and my teammates are waiting for review, we switched our searching strategy. Initially we sorted by “newest issue”, but most of the times there are not that many new issues ready. Therefore, we tried to sort by “most recently updated” and found a long-term issue Make automatic validation for all scikit-learn public functions #24862. It was about automating parameter validation via a newly-implemented decorator (not that new, actually half a year from now). The work is kind of easy and repetitive, but I believe such maintenance work are as important for an open source project. After discussion with the group, I made a list of functions that can be worked on, and we decided to start right away.

Summary

In this blog post, I have talked about my experience of interacting with the maintainers to work out best solutions for issues. I have also talked about the long-term issue that we are planning to work on. Finally, thank you very much for your meticulous reading and hope you enjoy it.

Written before or on April 2, 2023