We should view LLMs as a potential on-ramp to computational methods that eases the learning curve and engages students that might otherwise be turned away.
In a few months, first year Ph.D. students will be sitting in math and programming "boot-camps". I want to argue that you should teach your students to program with AI assistance. Before I get into it, my sense is that a lot of social scientists have not yet had meaningful interactions with these chat bots beyond Twitter threads from AI hype men, AI skeptics, and business guru grifters promising to "10x your productivity with these 7 chatGPT prompts". If that’s you, I encourage you to close this out and fire up ChatGPT or Google's Bard. Describe a hypothetical data frame and ask it to generate some code for a plot in your preferred language using your preferred packages. Ask it to iterate on the plot and explain specific lines in the code. Ask yourself if you think this can be useful to yourself or a student learning to program.
With that said, here are six reasons why you should teach your students to program with AI assistance.
Many are concerned that hallucinating chat bots will contribute to the proliferation of misinformation. I share this concern. However, programming is fundamentally different because the output of the model can be immediately validated. The code outputs the expected result, or it doesn’t. Sometimes models provide explanations of code that are a bit off, but this again can be immediately validated by testing if the code outputs expected results.
If your impression is that students will use LLMs to output code and avoid doing work then you’re both overestimating the capabilities of the models and thinking too narrowly about how they can be used. LLMs do not usually produce perfect code on their first try. Even if they did, there will forever be a gap between the output or code that humans can envision and what they can precisely describe in natural language for a model to generate. Humans will still need to manually write and edit code on even somewhat trivial tasks for the foreseeable future. This will become apparent very quickly if you start using them in your own workflow. AI assistance doesn’t mean your students won’t need to learn to program.
For more experienced programmers, the value of an LLM is in increasing the speed of code output. Your students aren’t experienced programmers, however, so some of the advantages they realize might not be obvious to you. Specifically, LLMs can be very good programming tutors -- explaining specific lines of the code and programming concepts. But what about hallucination you protest! Hallucination can certainly occur, but I think many more experienced programmers forget just how time consuming some of the most basic and trivial problems can be for beginners. What’s a directory and how do I set it? What’s a delimiter? Why is R telling me that these two columns of numbers have different data types? Trouble shooting issues like this can take a long time for new programmers on their own. Chat bots will be able to reliably produce good answers to questions like this and explain specific lines of code in plain English.
One thing I routinely do is pass working code to a chat bot and ask it to make my script more concise and adhere to best programming practices. This is often very informative and broadens my functional programming vocabulary. Teach your students to do this so that they can get more regular feedback on their work.
More often than not, researchers will use the methods they know rather than the methods that are best. We tend to get locked into a particular eco-system be it R, Python, or (god forbid) STATA. Often, this doesn’t matter very much but in some cases it does. The chasm in the development of NLP methods between Python and R is a clear example. Somewhere out there a grad student is fretting about how their text pre-processing is affecting their results when they could be using a transformer. Because they've never branched out from R, they don’t know that pre-processing is largely unnecessary for many tasks with modern language models and they labor under the misconception that transformers are more complicated to implement than the bag-of-words approach they were taught.
All of this is to say that the less time students spend laboring over the trivial aspects of a programming language the more they can focus on broadening their skill set. Being a good programmer in social science is not about memorizing a lot of functions and banging out scripts without referencing the documentation, although that helps. Rather, it’s more important to have a broad view of what tools and methods are available and have the skills to figure out implementation if needed. AI programming assistants can go a long way in breaking researchers out of confining software ecosystems and help them broaden their skill set.
There isn’t much to this. They will use it anyways and you might as well set them off on the right foot and teach them to use it productively. Teach them that it's not purely a code generator, but a tool to help them work faster and understand better. Teach them how to properly validate code, and what mistakes to watch out for.
In 2005 California attempted to streamline the tax filing process by implementing what it called the "Ready Return". Residents would receive a copy of their taxes already filled out, sign off on it, and submit it to the state. Intuit, the makers of Turbo Tax, recognized this made their product useless and successfully lobbied to kill the ready return. Citizens, they argued, needed to do the hard work of filing their own taxes because it kept them aware of what the government was doing and where their tax dollars were going. This is a terrible argument, but it's the same argument many lean on to argue against using AI generated code. Students do need to learn to write, edit, and understand code. Making programming more arduous is not a good way to accomplish this though and is only more likely to turn students away from computational methods. Rather, we should view LLMs as a potential on-ramp that eases the learning curve and engages students that might otherwise be turned away.