Jake Williams, PhD, assistant professor of information science, joined Drexel University in the fall of 2016. Prior to Drexel, he was a postdoctoral researcher and faculty instructor at the University of California, Berkeley. Williams received a PhD in mathematical science, an MS in applied mathematics, and a BA in physics from the University of Vermont. His research and teaching interests center around data science, computational social science, natural language processing, mathematics, machine learning, scientific programming, and algorithms design.
CCI: Some of your research so far has dealt with linguistics and using algorithms and code to process phrases and expressions. What got you interested in this line of work?
Jake Williams: I found linguistics at the start of my PhD program. I landed a research assistantship in a math lab that mostly studies computational social science, which means lots of network science, games and social dynamics studies, and text analysis.
One of their big things at the time (and still today) was (and is) the hedonometer(.org), which is word-based sentiment analysis applied to the whole of Twitter. You might think of this as representing the live state of "happiness" for Twitter's population. I was bugged by the system method's naïveté, taking words out of context. So, I eventually wound up on a project aimed at developing methods for the identification of minimal independent units of meaning in text. For example, idioms should often be taken as whole, and not decomposed by word, grammatically. This turned out to be really hard, became a bit of an obsession, and, as it turned out, led to some deep-enough math to get published in physics journals and write an applied math thesis. I suppose I've always had a penchant for bad jokes and wordplay, so in hindsight it's not too surprising that I got into computational linguistics.
CCI: What are the most interesting applications of your research?
JW: I think people are probably most interested in the social science and machine learning research that I do, especially since it's often in the context of social media. Everyone's hooked on social media these days, and it feels like there are more and more protests going on. So, when I tell people that some of my research focuses on social action and event detection in social media it really seems to strike a chord. I find this research incredibly interesting. There are a lot of exciting paths to explore for the benefit of people and communities —- not just as a surveillance tool. However, it's really hard to make progress in this area in the face of data access limitations. I'm not sure how much people realize that nearly all of the information they share online is commercially traded. Instead of informing socially-beneficial research, most of what we share gets turned back on us through direct marketing. So, I'm also very excited to develop new ways to access data for research, but we'll see where that goes.
CCI: Can you describe the research you’re working on currently?
JW: Right now, I'm taking a bit of a step back from machine learning and natural language processing (NLP) algorithms development. Instead, I'm working on developing ways to collect more and better data, which will improve my efforts in algorithms development in the long run.
There's an adage in computer science: "garbage in, garbage out", that applies to data as input to algorithms, as well. This extends beyond data quality, too, where in many contexts algorithms are hampered by a general lack of data. In one area, I'm currently teamed up with computer scientists, environmental engineers, and social scientists for work that will build better crowdsourcing utilities. This project specifically focuses on improving water quality and the water delivery system in the Delaware River Basin by tying in a social component to help monitor conditions and identify potential water-related problems. For me, this project has it all; I get to develop a point of data access, the social component will include lots of text-as-data, and the environmental focus holds possibilities for NLP and machine learning developments in the identification of events, stressors, and conditions.
CCI: Why did you become a professor?
JW: I always knew I was interested in being a professor. I really like the creative aspects of research; there's a lot of freedom to follow your own interests and to get excited about your work. I have also taken great pleasure in education for a long time. I tutored and taught algebra and calculus all throughout graduate school. It's really a wonderful thing when you're able to develop a classroom environment where people feel safe enough to ask questions and become enlightened.
CCI: What made you want to come to Drexel?
JW: Coming to Drexel was and is an incredible opportunity. Getting a job offer at a great school in academia is a big deal, and to have one in a large, exciting, vibrant, and historical city is just icing on the cake. To have an opportunity to focus on developing a curriculum and research program in data science has really made this a dream job. I get to do the research I love, shape an educational standard in an up-and-coming discipline, and perhaps best of all, work in a department and College committed to my success through support and mentorship.