AI in Human Language for Social Good

Algorithms that read subtext and visuals

An Age has come where online reviews build credibility, whether it's for products, restaurants, companies, or retailers. However, what happens when companies talk up an otherwise poor product by devising a number of enticing fake reviews? Furthermore, what if there were reputation-enhancement services out there that companies can hire to generate deceptive online reviews? Dr. Yejin Choi, Assistant Professor of Computer Science & Engineering at the University of Washington, develops natural language processing (NLP) algorithms to improve the quality of information on the Internet. Dr. Choi was one of the first collaborators to introduce algorithms for detecting deceptive reviews in 2011, which have drawn much attention from both media and academia as well as the law enforcement. This awareness has visibly gained momentum, as the New York law enforcement announced in September, 2013 that it caught and charged 19 firms $350,000 for servicing fake reviews! As she continues her work on detecting subliminal message underlying online text, Dr. Choi expands her influence in exposing fake reviews, predicting the success of novels, and enhancing computer image search.

The human mind tends to focus on a few cues in wordings, while computers can crawl through hundreds of small and big evidences simultaneously and draw correlations between these evidences to make predictions. Devoid of cognitive bias, computers can be trained in a particular domain to decipher distribution and choices of words people use to pick up subtle cues about the intent of the author. Such is the power of NLP algorithms, where the science of artificial intelligence merges with that of human linguistics. With a talented team of Ph.D. students and collaborators across the nation, Dr. Choi harnesses this power to create statistical models that analyze the subtext behind writing styles and enable complex image searches.

Current projects include:

  • From Language to the Mind: This project enables computers to read from language to the mind, identifying hidden thoughts and emotions based on diction and syntax. Dr. Choi’s algorithm moves beyond merely detecting deception in online reviews, though; she is working to improve public policy by predicting hygiene inspections using online reviews. When Los Angeles County began requiring restaurants to display their hygiene grades in 1998, the revenues for restaurants with grade A hopped by 5 percent, encouraging owners to maintain a clean environment and dropping hospitalizations for food-borne illnesses by 20 percent. However, hygiene inspection grades are no longer effective in keeping restaurants clean, as the way customers choose where they eat has radically changed due to the rise of online restaurant reviews. These online ratings are based on taste, service, and the overall opinion of an individual customer, which can powerfully and easily render hygiene grades invisible. To this front, Dr. Choi works with collaborators to develop an algorithm that detects keywords in customer reviews that correlate to restaurant hygiene, which will implement a hygiene prediction system for customers to consider.
  • Predicting the Success of Novels: Predicting the success of literary works is a curious question among publishers and aspiring writers alike. With collaborators, Dr. Choi is building computer algorithms that can assess writing styles and storylines, to both predict the success of novels and offer feedback to writers. Developing statistical models will help quantify thousands of books, and Dr. Choi has thus far been able to predict literary success with an 84% accuracy. By evolving her algorithms to sort through large-scale, heavy volumes of novels, Dr. Choi hopes to identify key characteristic and stylistic elements that are more prominent in successful writings, and pave the way for assisting writers and publishers.
  • From Language to the Visual World: Today, highly multi-modal content inundates the Internet with abundance of images and videos. Visuals have indeed become the most effective way to communicate with users. However, as the demand for visuals continues to grow, the web is becoming less accessible to those visually impaired. In order to provide equal web access, Dr. Choi is designing scalable new machine learning algorithms to translate visual information to non-visual information. As a result, Dr. Choi will facilitate complex image caption generation, using highly expressive and dynamic language instead of a single keyword to describe and find an image. Ultimately, Dr. Choi’s research will aid all users around the nation in accurate image search.

Bio

Dr. Yejin Choi is an assistant professor at the Computer Science and Engineering Department of University of Washington, and was an assistant professor at the Computer Science Department of Stony Brook University. She received her Ph.D. in Computer Science at Cornell University, and B.S. in Computer Science and Engineering at Seoul National University in Korea. She has received the Marr Prize (best paper award) at ICCV 2013.

Growing up tomboyish and adventurous in nature, the subconscious gender bias of the social norms and the expectations of her family often created tension in her decision-making process. Thus, many of her choices to e.g. compete for flying model airplanes and backpack in Europe were met with backlash, and Dr. Choi continually sought outlets to express herself.

To pursue an avenue of creativity and breakthrough, Dr. Choi first came to the US to work for Microsoft as a software engineer after finishing her undergraduate study in South Korea. Here, her daily programming work entailed telling computers exactly what to do, and Dr. Choi began to wonder if she could instead teach computers to learn what to do and how. To find answers, she started her Ph.D at Cornell University, specializing in Natural Language Processing, a branch of Artificial Intelligence that focuses on human language. Rigorous research, she realized, could open up new possibilities for intelligent systems to understand and interact in human language. To this day, she continues to push the limits of computer abilities, and envisions a future robot that can read a cooking recipe and prepare dinner for users, or an algorithm that reads online reviews and filter through deceptive ones.

One of the few women in a male-dominated field of computer science, Dr. Choi has been attracting and empowering many female students with curious, creative and courageous minds. The ultimate hope of Dr. Choi and her team of students is to develop NLP algorithms that can assist our everyday lives and contribute to our society. When not seeking adventures in research, Dr. Choi enjoys underwater adventures through scuba diving.

For information, visit http://homes.cs.washington.edu/~yejin/

Publications

TreeTalk: Composition and Compression of Trees for Image Descriptions

Polina Kuznetsova, Vicente Ordonez, Tamara Berg and Yejin Choi. Transaction of Association for Computational Linguistics (TACL), 2014. (presented at EMNLP 2014)

PDF

Success with Style – Using Writing Style to Predict the Success of Novels

Vikas Ashok, Song Feng and Yejin Choi. Empirical Methods on Natural Language Processing (EMNLP), 2013.

PDF

Where Not to Eat? Improving Public Policy by Predicting Hygiene Inspections Using Online Reviews

Jun Seok Kang, Polina Kuznetsova, Michael Luca and Yejin Choi. Empirical Methods on Natural Language Processing (EMNLP), short, 2013.

PDF

From Large Scale Image Categorization to Entry-Level Categories

Vicente Ordonez, Jia Deng, Yejin Choi, Alexander C Berg, Tamara L Berg. International Conference on Computer Vision (ICCV), 2013.

PDF

Finding Deceptive Opinion Spam by Any Stretch of the Imagination

Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey Hancock. Association for Computational Linguistics (ACL), 2011.

PDF

Awards

Marr Prize (Best Paper Award) at ICCV, 2013

International Conference on Computer Vision

Patents

System and method for determining deceptive opinion spam

M. Ott, Y. Choi, C. Cardie, J. Hancock. Patent pending: filed by Cornell University, 2011

U.S. Patent No. 20090112892 A1: "System and method for summarizing fine-grained opinions in text."

C. Cardie, Y. Choi, E. Breck, V. Stoyanov. Filed by Appinions, LLC, 2007.

U.S. Patent No. 20050066063: "Sparse caching for streaming media."

A. Grigorovitch, Y. Choi, T. Carvalho. Filed by Microsoft Corporation, 2003.

U.S. Patent No. 20040267503: "Midstream determination of varying bandwidth availability."

T. Batterberry, A. Grigorovitch, A. Klemets, J. Stewart, Y. Choi. Filed by Microsoft Corporation, 2003.

U.S. Patent No. 20030236905: "System and method for automatically recovering from failed network connections in streaming media scenarios."

Y. Choi, A. Grigorovitch, T. Batterberry. Filed by Microsoft Corporation, 2002.