Consider technical computing. Matlab is expensive but simple: One function per .m file – send a function inputs, get outputs. Python’s adherents claim that it can supplant Matlab for most scientific purposes. Reality, as usual, is more nuanced. Since Python supports objects, classes, namespaces, and a lot of other funky features, Python tools are chock full of them. Pick a package – numpy, scipy, matplotlib, or any of the ‘batteries included’ standard library. It is difficult to figure out how to pass inputs to something and get outputs, assuming that thing is a function and not an object with methods, a class, a module, or something else. Documentation is often lacking so there will be multiple visits to StackOverflow, Usenet and Google Groups, and mailing lists.
I wrote some experimental Python spaghetti code, pyCustoms, to take a Python package, figure out which of its modules connect to which other modules, and then to recursively list each module’s builtins, classes, functions, submodules, and a bunch of stuff falling into ‘none of the above.’ I also sent the results into graphviz to visualize the results and perhaps gain some insight. It was one compromise after another, figuring out ‘good enough’ when ‘ideal’ wasn’t convenient or possible. The firework-like graphviz output was fun to look at although not practically useful due to the large amount of zooming and panning needed to see details – what you see is all you’ve got. I may use the plain text output from the pyCustoms algorithm in the future to figure out the lay of the land before studying a package in any detail.
The pyCustoms code is on Github in a Jupyter Notebook. Here are the graphviz outputs for numpy and matplotlib. Each image links to a PDF. Zooming and panning works better in a standalone PDF reader than in a typical browser PDF plugin. Right-clicking should permit downloading the files. I normally use the Skim PDF reader for Macs but was surprised to find that Acrobat DC did a better job for these graphics intensive files.
The IPython Notebook has evolved into the Jupyter project. This free, open-source hook into many different programming languages simplifies some types of software experimentation. Jupyter’s advocates have attracted some generous institutional and foundational funding to develop the tool. The project has posted its winning proposal touting it as the “Engine of Collaborative Data Science” and ramming home the “computational narrative” as the means. Authors write notebooks with embedded data and code for a variety of audiences and interested readers can run computations for themselves.
It isn’t clear how this will work for complex algorithms that require a lot of computing power. Notebooks can be static presentations in those cases but then they have no advantage over a conventional report. The current Notebook doesn’t have the tools for real software development or algorithm analysis. Savvy users recommend not relying on them beyond certain limits. Variable inspection, debugging, and change control are all on the roadmap for the new JupyterLab and the project’s claims can’t be addressed until we see how well these work. Every addition will require screen space which will mean less space for the data and visualizations. It might in time be as convenient as the current (not-free) Matlab User Interface but it will take work to get there.
Yes, this is the funded scope and if it existed, they would be proposing something else. The Principal Investigators agree that other Notebook interfaces have been aroundfor a long time but imply that cost and proprietary architectures have been the principal roadblocks to their impact. The Notebook metaphor itself is left alone and that’s puzzling. There should be plenty of data (ha!) on how prior interfaces have or have not revolutionized the areas they claimed they were going to revolutionize. The proposal does devote detail to the enabling technologies, the support of large companies, and the future constituency.
But, it is the word ‘narrative’ gets my hackles up. It sounds disturbingly similar to ‘pitch’ and the pitch culture is dangerous. People can be led down a bad path any number of ways – yellow journalism, Powerpoint, or just outright demagoguery. Groups can lie just as well as individuals and Notebooks, like vaunted social media, can just as easily be co-opted for b.s. Data-driven decisionmaking is resurgent yet cyclical. It ebbs when the data don’t match the preconceptions – the internal narratives – of the ones with the money. We may, as a society, have gone past failsafe in handing over control to the unworthy.
Jonathan Touboul models a hipster system using statistical physics methods: A hipster looks at a representation of the people around him and based on some interaction probability, decides whether or not to follow the herd. What’s the long-term result? It’s a delightful paper, full of analytic firepower, and not easy to follow.
Jake Vanderplas uses the powerful and friendly Python programming language to explore the problem through computer simulation. He finds “In other words, with enough hipsters around responding to delayed fashion trends, a plethora of facial hair and fixed gear bikes is a natural result.”