June 10, 2009
My To-Do List
This is kind of a meta-post, a to-do list for the next week or so. What do I need to get a handle on before I can start looking through the Hadley centre data (Whenever that will be available – hopefully soon. The practical part of my brain is getting twitchy.) Each part needs links to papers, and better summaries.
Code ownership: The summer students (Sarah and Ainsley) are trying to create a social network graph, or to suggest experts, based on code ownership. I should look more into the pitfalls that some of the poster presenters told me about, artifacts of the software development process. In particular, there’s always the configuration files, and build scripts, that would need special treatment.
Similarly, the bug assignment and expertise finder research includes a few different ideas of what code ownership or expertise looks like. There’s the number of changes, number of lines of changes metrics for ascertaining expertise, and a few others.
Tightly Coupled Code: This comes down to clustering. There are a few different approaches, from ignoring source code and clustering based on check in time, check in comments, etc, to clustering based entirely on a source code searching technique (LSI or other statistical methods, some dictionary-based methods may be less applicable to code written in FORTRAN). There’s also static analysis based methods, which mean I want to evaluate all static analysis tools available for FORTRAN, and their applicability.
Of course, there’s also the configuration settings and their relationship to changes, as well as data files. Does linking data files to the configuration setting checked in when the data files were first checked in help? Would mining run records help?
Comparing the clusters formed by different methods would be both a validating technique, and a way to potentially identify smaller clusters of files.
Personally, I like the idea of using a smaller unit of analysis than a file, maybe a line.
For displaying this coupling, I like something like software cartography (here) that I talked to Adrian Kuhn about. I can almost see how to extend the visualization for different clustering techniques, maybe.
Traceability from experiment to code to published results: I’m still grasping about for appropriate prior results on this one, but one thing I’d like to do is see if the tightly coupled clusters are associated with _something_ in the experimental world.
OK, that’s enough. I’ll try to tie the interesting papers and posters in soon.