Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
Welcome to KDD-2013’s online program
View analytic

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Saturday, August 10
 

9:00am

Big Data Camp
Saturday August 10, 2013 9:00am - 5:00pm
Michigan A/B

9:00am

9:00am

9:00am

5:00pm

Registration
Saturday August 10, 2013 5:00pm - 7:00pm
Chicago Prom
 
Sunday, August 11
 

7:30am

Registration
Sunday August 11, 2013 7:30am - 8:00am
Chicago Prom

8:10am

UrbComp: Keynote: Computational Urban Sciences: Emerging Opportunities
Speakers
CC

Charlie Catlett

founding director of the Computation Institute's Urban Center for Computation and Data


Sunday August 11, 2013 8:10am - 9:05am
Chicago 8

9:00am

MDMKDD: Who is Repinning? Predicting a Brand's User Interactions using Social Media Retrieval
by  - Shantanu Singh, Yan Wang, Lei Ding

Sunday August 11, 2013 9:00am - 9:25am
Michigan B

9:00am

9:00am

ODD: Keynote Presentation : Outlier Detection in Personalized Medicine - Raymond Ng
Abstract Personalized medicine has been hailed as one of the main directions for medical research in this century. In the first half of the talk, we give an overview on our personalized medicine projects that use gene expression, proteomics, DNA and clinical features. In the second half, we give two applications where outlier detection is valuable for the success of our work. The first one focuses on identifying mislabeled patients, and the second one deals with quality control of microarrays.

Sunday August 11, 2013 9:00am - 9:30am
Missouri

9:00am

MLG: Evimaria Terzi : The Dynamics of Opinion Formation in Social Networks
The process of opinion formation through synthesis and contrast of different viewpoints has been the subject of many studies in economics and social sciences. Today, this process manifests itself also in online social networks and social media. The key characteristic of successful promotion campaigns is that they take into consideration such opinion-formation dynamics in order to create a overall favorable opinion about a specific information item, such as a person, a product, or an idea. In this talk, we will review models of opinion dynamics and give a game-theoretic viewpoint to the opinion-formation process. Moreover, we will formalize the campaign-design problem as the problem of identifying a set of target individuals whose positive opinion about an information item will maximize the overall positive opinion for the item in the social network. From the technical point of view, we will discuss different variants of such campaign-design problems and analyze their computational difficulties as well as their applicability in practical settings.

Sunday August 11, 2013 9:00am - 9:35am
Sheraton 1

9:00am

Tutorial: Algorithmic techniques for modeling and mining large graphs (AMAzING)
Abstract Network science has emerged over the last years as an interdisciplinary area spanning traditional domains including mathematics, computer science, sociology, biology and economics. Since complexity in social, biological and economical systems, and more generally in complex systems, arises through pairwise interactions there exists a surging interest in understanding networks. In this tutorial, we will provide an in-depth presentation of the most popular random-graph models used for modeling real-world networks. We will then discuss efficient algorithmic techniques for mining large graphs, with emphasis on the problems of extracting graph sparsifiers, partitioning graphs into densely connected components, and finding dense subgraphs. We will motivate the problems we will discuss and the algorithms we will present with real-world applications. Our aim is to survey important results in the areas of modeling and mining large graphs, to uncover the intuition behind the key ideas, and to present future research directions. Who Should Attend The tutorial presents both classic and cutting-edge research topics on networks. We aim to go into depth for the following topics: random graphs, graph sparsifiers, graph partitioning, finding dense subgraphs and their applications. The tutorial will combine a blend of computer science rigor and real-world applications. It should be of theoretical and practical interest to the graph analysis community and a large part of the data mining community as well. Prerequisites Computer science background (B.Sc or equivalent); familiarity with undergraduate level concepts covered in probability and algorithm classes. BIO Dr. Alan Frieze is a professor in the Department of Mathematical Sciences at Carnegie Mellon University, Pittsburgh, United States. He graduated from the University of Oxford in 1966, and obtained his Ph.D. from the University of London in 1975. His research interests lie in combinatorics, discrete optimization and theoretical computer science. In 1991, Dr. Frieze received the Fulkerson Prize in Discrete Mathematics awarded by the American Mathematical Society and the Mathematical Programming Society. In 1997 he was a Guggenheim Fellow In 2000, he received the IBM Faculty Partnership Award. In 2006 he jointly received (with Michael Krivelevich) the Professor Pazy Memorial Research Award from the United States-Israel Binational Science Foundation. In 2011 he was selected as a SIAM Fellow. In 2012 he was selected as an AMS fellow. Dr. Aristides Gionis is an associate professor in the Department of Information and Computer Science, in Aalto University, Finland. Previously he has been a senior research scientist in Yahoo! Research. He received his Ph.D. from the Computer Science department of Stanford University in 2003. He is currently serving as an associate editor in the Transactions of Knowledge and Data Engineering (TKDE). He has served in the PC of numerous premium conferences, including being the PC co-chair for WSDM 2013 and ECML PKDD 2010. His research interests include data mining, web mining, and algorithmic data analysis. Dr. Charalampos Tsourakakis is an Aalto Science Fellow. He received his Ph.D. in Algorithms, Combinatorics and Optimization at Carnegie Mellon University. He holds a Diploma in Electrical and Diploma Engineering from the National Technical University of Athens and a Master of Science from the Machine Learning Department at Carnegie Mellon University. His research interests include algorithm design, random graphs and data mining.


Sunday August 11, 2013 9:00am - 12:00pm
Chicago 6

9:00am

Tutorial: Big Data Analytics for Healthcare
Abstract Large amounts of heterogeneous medical data have become available in various healthcare organizations (payers, providers, pharmaceuticals). Those data could be an enabling resource for deriving insights for improving care delivery and reducing waste. The enormity and complexity of these datasets present great challenges in analyses and subsequent applications to a practical clinical environment. In this tutorial, we introduce the characteristics and related mining challenges on dealing with big medical data. Many of those insights come from medical informatics community, which is highly related to data mining but focuses on biomedical specifics. We survey various related papers from data mining venues as well as medical informatics venues to share with the audiences key problems and trends in healthcare analytics research, with different applications ranging from clinical text mining, predictive modeling, survival analysis, patient similarity, genetic data analysis, and public health. The tutorial will include several case studies dealing with some of the important healthcare applications. Speaker bio for each presenter Jimeng Sun is a research staff member at IBM TJ Watson Research Center. Dr. Sun graduated with PhD in Computer Science in Carnegie Mellon University in the fall 2007. His advisor was Prof. Christos Faloutsos. He studied in Computer science department at Carnegie Mellon University from 2003 to 2007. His research focus is on healthcare analytics and informatics, large-scale data mining, graph mining, high dimensional data mining such as time series, matrices, and tensors (data cubes) and visual analytics. Dr. Sun has received ICDM best research paper in 2007 and KDD Dissertation runner-up award in 2008 and SDM best research paper in 2007. For more details, one can refer to his personal homepage at http://www.dasfa.net/jimeng . Chandan K. Reddy is an Assistant Professor in the Department of Computer Science at Wayne State University. He received his PhD from Cornell University and MS from Michigan State University. His primary research interests are in the areas of data mining and machine learning with applications to healthcare, bioinformatics, and social network analysis. His research is funded by the National Science Foundation, the National Institutes of Health, the Department of Transportation, and the Susan G. Komen for the Cure Foundation. He has published over 45 peer-reviewed articles in leading conferences and journals. He received the Best Application Paper Award at the ACM SIGKDD conference in 2010 and was a finalist of the INFORMS Franz Edelman Award Competition in 2011. He is a member of IEEE, ACM, and SIAM.


Sunday August 11, 2013 9:00am - 12:00pm
Chicago 10

9:00am

Tutorial: Mining Data from Mobile Devices: A Survey of Smart Sensing and Analytics
Abstract: Mobile connected devices, and smartphones in particular, are rapidly emerging as a dominant computing and sensing platform. This poses several unique opportunities for data collection and analysis, as well as new challenges. In this tutorial, we survey the state-of-the-art in terms of mining data from mobile devices across different application areas such as ads, healthcare, geo-social, public policy, etc. Our tutorial has three parts. In part one, we summarize data collection in terms of various sensing modalities. In part two, we present cross-cutting challenges such as real-time analysis, security, and we outline cross-cutting methods for mobile data mining such as network inference, streaming algorithms, etc. In the last part, we specifically overview emerging and fast-growing application areas, such as noted above. Concluding, we briefly highlight the opportunities for joint design of new data collection techniques and analysis methods, suggesting additional directions for future research. Speaker bio for presenter 1: Spiros Papadimitriou is mainly interested in data mining for graphs and streaming data, clustering, time series, large-scale data processing, and mobile applications. His interests span from the very small (embedded devices, and sensors; Arduino) to the very large (large-scale data processing and analysis; Hadoop). He has published more than forty papers on these topics in refereed conferences and journals. He received the best paper award in SDM 2008, has three invited journal publications in best paper issues, several book chapters and he has filed multiple patents. He has also been invited to give keynote talks on graph and social network analysis (WAAMD 2008, and ADN 2009) and tutorials on time series stream mining (University of Maine Summer School, 2008) and large-scale analytics (Carnegie Mellon University, 2012). In the past, he has also developed and released a number of Android applications (including live-view mobile OCR, and web service clients) that have 50,000 downloads. He is currently an assistant professor at Rutgers University (MSIS-RBS). Prior to that, he was a research scientist at Google, and a research staff member at IBM Research. He was a Siebel scholarship recipient in 2005. He obtained his MSc and PhD degrees from Carnegie Mellon University. Speaker bio for presenter 2: Tina Eliassi-Rad is an Associate Professor of Computer Science at Rutgers University. Before joining academia, she was a Member of Technical Staff and Principal Investigator at Lawrence Livermore National Laboratory. Tina earned her Ph.D. in Computer Sciences (with a minor in Mathematical Statistics) at the University of Wisconsin-Madison. Within data mining and machine learning, Tinas research has been applied to the World-Wide Web, text corpora, large-scale scientific simulation data, complex networks, and cyber situational awareness. She has published over 50 peer-reviewed papers (including a best paper runner-up award at ICDM09 and a best interdisciplinary paper award at CIKM12); and has given over 70 invited presentations. Tina is an action editor for the Data Mining and Knowledge Discovery Journal. In 2010, she received an Outstanding Mentor Award from the US DOE Office of Science and a Directorate Gold Award from Lawrence Livermore National Laboratory for work on cyber situational awareness. For more details, visit http://eliassi.org.


Sunday August 11, 2013 9:00am - 12:00pm
Chicago 7

9:00am

9:00am

9:00am

DMH: Workshop on Data Mining for Healthcare
Sunday August 11, 2013 9:00am - 5:00pm
Superior B

9:00am

9:00am

9:00am

9:00am

9:00am

9:05am

UrbComp: Session 1 : Human Mobility
Exploring Human Movements in Singapore: A Comparative Analysis Based on Mobile Phone and Taxicab Usages (Full) Chaogui Kang, Stanislav Sobolevsky, Yu Liu, Carlo Ratti A Review of Urban Computing for Mobile Phone Traces: Current Methods, Challenges and Opportunities (Full) Shan Jiang, Gaston Fiore, Yingxiang Yang, Joseph Ferreira, Emilio Frazzoli, Marta Gonzlez Daily travel behavior: Lessons from a week-long survey for the extraction of human mobility motifs related information : Christian Schneider, Christian Rudloff, Dietmar Bauer, Marta Gonzalez

Sunday August 11, 2013 9:05am - 10:10am
Chicago 8

9:10am

9:10am

IDEA: Keynote 1 - Interactive Visual Analytics for High Dimensional Data
Bio : Prof. Haesun Park received her B.S. degree in Mathematics from Seoul National University, Seoul Korea, in 1981 with summa cum laude and the University President's Medal for the top graduate, and her M.S. and Ph.D. degrees in Computer Science from Cornell University, Ithaca, NY, in 1985 and 1987, respectively. She has been a professor in the School of Computational Science and Engineering at the Georgia Institute of Technology, Atlanta, Georgia since 2005. Before joining Georgia Tech, she was on faculty at University of Minnesota, Twin Cities, and program director at the National Science Foundation, Arlington, VA. She has published extensively in the areas including numerical algorithms, data analysis, visual analytics, text mining, and parallel computing. She has been the director of the NSF/DHS FODAVA-Lead (Foundations of Data and Visual Analytics) center and executive director of Center for Data Analytics at Georgia Tech. She has served on numerous editorial boards including IEEE Transactions on Pattern Analysis and Machine Intelligence, SIAM Journal on Matrix Analysis and Applications, SIAM Journal on Scientific Computing, and has served as a conference co-chair for SIAM International Conference on Data Mining in 2008 and 2009. In 2013, she was elected as a SIAM Fellow. Abstract : Many modern data sets can be represented in high dimensional vector spaces and have benefited from computational methods that utilize advanced techniques from numerical linear algebra and optimization. Visual analytics approaches have contributed greatly to data understanding and analysis due to utilization of both automated algorithms and humans quick visual perception and interaction. However, visual analytics targeting high dimensional large-scale data has been challenging due to low dimensional screen space with limited pixels to represent data. Among various computational techniques supporting visual analytics, dimension reduction and clustering have played essential roles by reducing the dimension and volume to visually manageable scales. In this talk, we present some of the key foundational methods for supervised dimension reduction such as linear discriminant analysis (LDA), dimension reduction and clustering/topic discovery by nonnegative matrix factorization (NMF), and visual spatial alignment for effective fusion and comparisons by Orthogonal Procrustes. We demonstrate how these methods can effectively support interactive visual analytic tasks that involve large-scale document and image data sets.

Speakers
HP

Haesun Park

Georgia Tech, School of Computational Science


Sunday August 11, 2013 9:10am - 10:00am
Michigan A

9:10am

SNAKDD: Keynote Speech 1: From Social Networks to Heterogeneous Social and Information Networks: A Data Mining Perspective
ABSTRACT: Many people treat social networks as homogeneous networks, modeled mainly as people network. Actually, people and informational objects are interconnected, forming gigantic, interconnected, integrated social and information networks. By structuring these data objects into multiple types, such networks become semi-structured heterogeneous social and information networks. Most real world applications that handle big data, including interconnected social media and social networks, medical information systems, online e-commerce systems, or database systems, can be structured into typed, heterogeneous social and information networks. For example, in a medical care network, objects of multiple types, such as patients, doctors, diseases, medication, and links such as visits, diagnosis, and treatments are intertwined together, providing rich information and forming heterogeneous information networks. Effective analysis of large-scale heterogeneous social and information networks poses an interesting but critical challenge. In this talk, we present a set of data mining scenarios in heterogeneous social and information networks and show that mining typed, heterogeneous networks is a new and promising research frontier in data mining research. Departing from many existing network models that view data as homogeneous graphs or networks, the semi-structured heterogeneous information network model leverages the rich semantics of typed nodes and links in a network and can uncover surprisingly rich knowledge from interconnected data. This heterogeneous network modeling will lead to the discovery of a set of new principles and methodologies for mining interconnected data. The examples to be used in this discussion include (1) meta path-based similarity search, (2) rank-based clustering, (3) rank-based classification, (4) meta path-based link/relationship prediction, (5) relation strength-aware mining, as well as a few other recent developments. We will also point out some promising research directions and provide convincing arguments on that mining heterogeneous information networks could be a key to social intelligence mining.

Speakers
JH

Jiawei Han

Jiawei Han, Abel Bliss Professor of Computer Science, University of Illinois at Urbana-Champaign. He has been researching into data mining, information network analysis, database systems, and data warehousing, with over 600 journal and conference publications. He has chaired or served on many program committees of international conferences, including PC co-chair for KDD, SDM, and ICDM conferences, and Americas Coordinator for VLDB conferences. He... Read More →


Sunday August 11, 2013 9:10am - 10:00am
Chicago 8

9:10am

WISDOM: Statistical Methods for Integration and Analysis of Opinionated Text Data (Chengxiang Zhai)
ChengXiang Zhai is an Associate Professor of Computer Science at the University of Illinois at Urbana-Champaign, where he is also affiliated with the Graduate School of Library and Information Science, Institute for Genomic Biology, and Department of Statistics. He received a Ph.D. in Computer Science from Nanjing University in 1990, and a Ph.D. in Language and Information Technologies from Carnegie Mellon University in 2002. He worked at Clairvoyance Corp. as a Research Scientist and a Senior Research Scientist from 1997 to 2000. His research interests include information retrieval, text mining, natural language processing, machine learning, and biomedical informatics, in which he published over 150 research papers. He is an Associate Editor of ACM Transactions on Information Systems, and Information Processing and Management, and serves on the editorial board of Information Retrieval Journal. He is a program co-chair of ACM CIKM 2004, NAACL HLT 2007, and ACM SIGIR 2009. He is an ACM Distinguished Scientist and a recipient of multiple best paper awards, Alfred P. Sloan Research Fellowship, IBM Faculty Award, HP Innovation Research Program Award, and the Presidential Early Career Award for Scientists and Engineers (PECASE). 2013 KEYNOTE (Statistical Methods for Integration and Analysis of Opinionated Text Data) Opinionated text data such as blogs, forum posts, product reviews and online comments are increasingly available on the Web. They are very useful sources for public opinions about virtually any topics. However, because the opinions are scattered and abundant, it is a significant challenge for users to collect all the opinions about a topic and digest them efficiently. In this talk, I will present a suite of general statistical text mining methods that can help users integrate, summarize and analyze scattered online opinions to obtain actionable knowledge for decision making. Specifically, I will first present approaches to integration of scattered opinions by aligning them to a well-structured article or relevant ontology. Second, I will discuss several techniques for generating a concise opinion summary that can reveal the major sentiments and opinion points buried in large amounts of opinionated text data. Finally, I will present probabilistic general models for analyzing review data in depth to discover latent aspect ratings and relative weights placed by reviewers on different aspects. These methods are completely general and can thus help users integrate and analyze large amounts of online opinionated text data on any topic in any natural language.

Sunday August 11, 2013 9:10am - 10:00am
Arkansas

9:15am

9:25am

MDMKDD: Robust Detection of Hyper-local Events from Geotagged Social Media Data
Ke Xie, Chaolun Xia, Nir Grinberg, Raz Schwartz, Mor Naaman

Sunday August 11, 2013 9:25am - 9:50am
Michigan B

9:30am

ODD: Enhancing One-class Support Vector Machines for Unsupervised Anomaly Detection
by Mennatallah Amer, Markus Goldstein, Slim Abdennadher

Sunday August 11, 2013 9:30am - Wednesday December 31, 1969 6:00pm
Missouri

9:30am

DMH: Automated Spot Type Identification on NAPPA Arrays
Robert Rivera, Jie Wang, Ji Qiu, Joshua LaBaer, Garrick Wallstrom

Sunday August 11, 2013 9:30am - 9:45am
Superior B

9:35am

BioKDD: Drug-Target Interaction Prediction for Drug Repurposing with Probabilistic Similarity Logic
by Shobeir Fakhraei, Louiqa Raschid and Lise Getoor

Sunday August 11, 2013 9:35am - 10:00am
Mississippi

9:35am

MLG: Spotlights A
Sunday August 11, 2013 9:35am - 10:00am
Sheraton 1

9:45am

DMH: Unravelling Communities of ALS Patients Using Network Mining
 Andre V. Carreiro, Sara C. Madeira, Alexandre P. Francisco

Sunday August 11, 2013 9:45am - 9:50am
Superior B

9:45am

ODD: Systematic Construction of Anomaly Detection Benchmarks from Real Data
by Andrew Emmott, Shubhomoy Das, Thomas Dietterich, Alan Fern, Weng-Keen Wong

Sunday August 11, 2013 9:45am - 10:00am
Missouri

9:50am

DMH: Clustering Health Trajectories Using Hidden Markov Models
Shima Ghassempour, Federico Girosi

Sunday August 11, 2013 9:50am - 9:55am
Superior B

9:50am

MDMKDD:Towards Social Imagematics: Sentiment Analysis in Social Multimedia
 Quanzeng You, JieboLou

Sunday August 11, 2013 9:50am - 10:15am
Michigan B

9:55am

DMH: Exploring Preprocessing Techniques for Prediction of Risk of Readmission for Congestive Heart Failure Patients
Naren Meadem, Nele Verbiest, Kiyana Zolfaghar, Jayshree Agarwal, Si-Chi Chin, Senjuti Basu Roy, Ankur Teredesai,

Sunday August 11, 2013 9:55am - 10:00am
Superior B

10:00am

BioKDD: Computational phenotype prediction of ionizing-radiation-resistant bacteria with a multiple-instance learning model
by Sabeur Aridhi, Haitham Sghaier, Mondher Maddouri and Engelbert Mephu Nguifo

Sunday August 11, 2013 10:00am - 10:25am
Mississippi

10:00am

Coffee Break
Sunday August 11, 2013 10:00am - 10:30am
Prom/Sheraton 5

10:00am

DMH: Invited Talk : Dr. Jonathan Silverstein, Vice President
Speakers
JS

Jonathan Silverstein

Vice President, Clinical Research Informatics at NorthShore University HealthSystem


Sunday August 11, 2013 10:00am - 11:00am
Superior B

10:30am

Zips: Mining Compressing Sequential Patterns in Streams
by Hoang Thanh Lam, Toon Calders, Jie Yang, Fabian Moerchen and Dmitriy Fradkin

Sunday August 11, 2013 10:30am - 10:50am
Michigan A

10:30am

ODD: Keynote Presentation : Outlier Ensembles - Charu Aggarwal
Abstract Ensemble analysis is a widely used meta-algorithm for many data mining problems such as classification and clustering. Numerous ensemble-based algorithms have been proposed in the literature for these problems. Compared to the cluster- ing and classification problems, ensemble analysis has been studied in a limited way in the outlier detection literature. In some cases, ensemble analysis techniques have been implicitly used by many outlier analysis algorithms, but the approach is often buried deep into the algorithm and not formally recognized as a general-purpose meta-algorithm. This is in spite of the fact that this problem is rather important in the context of outlier analysis. This talk discusses the various methods which are used in the literature for outlier ensembles and the general principles by which such analysis can be made more effective. A discussion is also provided on how outlier ensembles relate to the ensemble-techniques used commonly for other data mining problems.

Sunday August 11, 2013 10:30am - 11:00am
Missouri

10:30am

MLG: Sam Shah : Large-Scale Graph Mining for Recommendations
The availability and affordability of large-scale data processing is transforming graph mining into a core production use case, especially in the consumer web space. At LinkedIn, the largest professional online social network with 225 million members, a crucial characteristic is the use of static and temporal network features for many applications, particularly recommendations. These include "People You May Know", a link prediction system to find other members on the network; "Endorsements", a lightweight skill reputation product; "Related Searches", query recommendations in our search engine; and more. How do we perform this graph mining at scale? What are some of the challenges we face? Besides the social graph, what about other interesting, but potentially more complex and larger graphs? In this talk, I will illustrate several of LinkedIn's solutions in large scale graph mining.

Sunday August 11, 2013 10:30am - 11:05am
Sheraton 1

10:30am

10:30am

10:30am

SNAKDD: Keynote Speech 2: Challenges and Advances on Social Network Mining : Philip S. Yu
ABSTRACT: Mining social network data has become an important and active research topic in the last decade, which has a wide variety of scientific and commercial applications. We first consider the survivability issue of communities. Among communities, we notice that some of them are magnetic to people. A magnet community is such a community that attracts significantly more people's interests and attentions than other communities of similar topics. We will study the magnet community identification problem. Next we consider the cascading effect of nodes in a network. This is sometime referred to as the "too big to fail" problem in the financial world, describing certain financial institutions which are so large and so interconnected that their failure will be disastrous to the economy, and which therefore must be supported by government when they face difficulty. We call such high impact entities shakers. To discover shakers, we introduce the concept of a cascading graph to capture the causality relationships among evolving entities over some period of time, and then infer shakers from the graph. In a cascading graph, nodes represent entities and weighted links represent the causality effects. Finally, we consider how to capture anomaly behavior in a network. Specifically, we look into the spam review detection problem. Online reviews provide valuable information about products and services to consumers. However, spammers are joining the community trying to mislead readers by writing fake reviews. We propose a novel concept of a heterogeneous review graph to capture the relationships among reviewers, reviews and stores that the reviewers have reviewed. We explore how interactions between nodes in this graph can reveal the cause of spam and propose an iterative model to identify suspicious reviewers.

BIO: Philip S. Yu is currently a Distinguished Professor in the Department of Computer Science at the University of Illinois at Chicago and also holds the Wexler Chair in Information Technology. He spent most of his career at IBM Thomas J. Watson Research Center and was manager of the Software Tools and Techniques group. His research interests include data mining, privacy preserving data publishing, data stream, Internet applications and technologies, and database systems. Dr. Yu has published more than 740 papers in refereed journals and conferences. He holds or has applied for more than 300 US patents. Dr. Yu is a Fellow of the ACM and the IEEE. He is the Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data. He is on the steering committee of the IEEE Conference on Data Mining and ACM Conference on Information and Knowledge Management and was a member of the IEEE Data Engineering steering committee. He was the Editor-in-Chief of IEEE Transactions on Knowledge and Data Engineering (2001-2004). He had also served as an associate editor of ACM Transactions on the Internet Technology and Knowledge and Information Systems. Dr. Yu received an IEEE Computer Society 2013 Technical Achievement Award for "pioneering and fundamentally innovative contributions to the scalable indexing, querying, searching, mining and anonymization of big data" and a Research Contributions

Speakers
PS

Philip S. Yu

UIC Distinguished Professor and Wexler Chair in Information Technology, Department of Computer Science, University of Illinois at Chicago


Sunday August 11, 2013 10:30am - 11:30am
Chicago 8

10:30am

UrbComp: Session 2 : Social Behaviors and Urban Activities
A comparison of Foursquare and Instagram to the study of city dynamics and urban social behavior (Full) : Thiago Silva, Pedro Vaz de Melo, Jussara Almeida, Juliana Salles,Antonio Loureiro Inferring human activities from GPS tracks (Full) Chiara Renso, Barbara Furletti, Paolo Cintia, Laura Spinsanti Understanding Urban Human Activity and Mobility Patterns Using Large-scale Location-based Data from Online : Social Media Samiul Hasan, Xianyuan Zhan, Satish Ukkusuri On the Importance of Temporal Dynamics in Modeling Urban Activity Ke Zhang, Qiuye Jin, Konstantinos Pelechrinis, Theodoros Lappas Prediction of User Location Using the Radiation Model and Social Check-Ins : Alexey Tarasov, Felix Kling, Alexei Pozdnoukhov

Sunday August 11, 2013 10:30am - 12:00pm
Chicago 8

10:30am

WISDOM: Session I
Identifying Purpose Behind Electoral Tweets (Saif Mohammad, Svetlana Kiritchenko, and Joel Martin) Combining Strengths, Emotions and Polarities for Boosting Twitter Sentiment Analysis (Felipe Bravo-Marquez, Marcelo Mendoza, and Barbara Poblete) Modelling Political Disaffection from Twitter Data (Corrado Monti, Alessandro Rozza, Giovanni Zappella, Matteo Zignani, Adam Arvidsson, and Elanor Colleoni)

Sunday August 11, 2013 10:30am - 12:00pm
Arkansas

10:50am

11:00am

DMH: Active Literature Discovery for Scoping Evidence Reviews: How Many Needles are There?
by Byron C. Wallace, Issa J. Dahabreh, Kelly H. Moran, Carla E. Brodley, Thomas A.Trikalinos

Sunday August 11, 2013 11:00am - 11:15am
Superior B

11:00am

ODD: Anomaly Detection on ITS Data via View Association
 by Junaidillah Fadlil, Hsing-Kuo Pao, Yuh-Jye Lee

Sunday August 11, 2013 11:00am - 11:15am
Missouri

11:00am

BioKDD: Keynote talk : Eric Schadt
The causal chain of events that lead to the development of complex diseases such as schizophrenia remains elusive. Such diseases are complex, resulting from the interplay of potentially hundreds (or thousands) of genetic loci and environmental factors. Genetic and environmental perturbations induce changes in the molecular interactions of cellular pathways whose collective effect may become clear through the organized structure of multiscale biological networks. We have developed a novel systems approach to study psychiatric disorders such as schizophrenia that models the global molecular, functional, and structural changes in the affected brain that in turn can lead us to the root causes of the disease. To characterize the molecular, cellular, and physiological systems associated with common human diseases, we constructed gene regulatory networks, functional and structural MRI based networks, high-content phenotypic networks and then integrated these network models across all of the data modalities generated across multiple human cohorts comprised of several thousand individuals. Because DNA variation was systematically assessed across all cohorts, it provides a common set of perturbations that can be leveraged to not only infer causal relationships among different molecular and higher order traits, but that can help link networks at different scales (e.g., molecular and imaging) across cohorts. Through this integrative network-based approach, we rank-order the resulting network structures for relevance to different diseases, highlighting both known and novel biological pathways involved in disease pathogenesis and progression. We demonstrate that the causal network structures we construct from this big data integration exercise is a useful predictor of response to gene perturbations and presents a novel framework to test models of disease mechanisms underlying disease. We further demonstrate that our approach can offer novel insights for drug discovery programs aimed at treating disease by screening our disease-associated networks against molecular signatures induced by marketed and novel compounds across a number of cell-bases systems, including those derived from stem cells isolated from patients with disease.

Sunday August 11, 2013 11:00am - 12:00pm
Mississippi

11:05am

MLG: David Bader
Sunday August 11, 2013 11:05am - 11:40am
Sheraton 1

11:10am

One Click Mining Interactive Local Pattern Discovery through Implicit Preference and Performance Learning
by  : Mario Boley, Bo Kang, Pavel Tokmakov, Michael Mampaey and Stefan Wrobel

Sunday August 11, 2013 11:10am - 11:30am
Michigan A

11:15am

DMH: Fast entity recognition in biomedical text
by : Amy Siu, Dat Ba Nguyen, Gerhard Weikum

Sunday August 11, 2013 11:15am - 11:30am
Superior B

11:15am

11:15am

11:20am

MDMKDD: Invited Talk - Qiaozhu Mei
Speakers
QM

Qiaozhu Mei

University of Michigan, Ann Arbor


Sunday August 11, 2013 11:20am - 12:00pm
Michigan B

11:30am

11:30am

11:30am

11:30am

SNAKDD: Invited Talk: Living Analytics: Challenges and Opportunities
Living Analytics: Challenges and Opportunities BIO: Ee-Peng Lim is a professor at the School of Information Systems of Singapore Management University (SMU). He received Ph.D. from the University of Minnesota, Minneapolis in 1994 and B.Sc. in Computer Science from National University of Singapore. His research interests include social network and web mining, information integration, and digital libraries. He is the principal investigator and co-PI of several research projects funded by A*Star, National Research Foundation (NRF) of Singapore, and DSO National Labs. He is currently an Associate Editor of the ACM Transactions on Information Systems (TOIS), Information Processing and Management (IPM), Social Network Analysis and Mining, Journal of Web Engineering (JWE), IEEE Intelligent Systems, International Journal of Digital Libraries (IJDL) and International Journal of Data Warehousing and Mining (IJDWM). He was a member of the ACM Publications Board until December 2012. He serves on the Steering Committee of the International Conference on Asian Digital Libraries (ICADL), Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD), and International Conference on Social Informatics.

Sunday August 11, 2013 11:30am - 12:00pm
Chicago 8

11:40am

MLG: Spotlights B
Sunday August 11, 2013 11:40am - 12:00pm
Sheraton 1

11:45am

DMH: Electronic Patient Records
by  Jing Zhao, Isak Karlsson, Lars Asker, Henrik Bostrom

Sunday August 11, 2013 11:45am - 11:50am
Superior B

11:45am

11:45am

11:50am

DMH: Improving Relationship Extraction from Clinical Notes
by Sentence Classification : Ehsan Emadzadeh, Azadeh Nikfarjam, Graciela Gonzalez

Sunday August 11, 2013 11:50am - 11:55am
Superior B

11:55am

DMH: Discussion - Session 1 and 2
Sunday August 11, 2013 11:55am - 12:00pm
Superior B

12:00pm

BigMine: Best Paper Award.
Sunday August 11, 2013 12:00pm - 12:10pm
Chicago 9

1:00pm

Exhibit Set-up
Sunday August 11, 2013 1:00pm - 8:00am
Sheraton 5

1:00pm

1:30pm

1:30pm

WISDOM: Session II
Enhancing Sentiment Extraction from Text by Means of Arguments (Lucas Carstens and Francesca Toni) Evaluation of an Algorithm for Aspect-Based Opinion Mining Using a Lexicon-Based Approach (Florian Wogenstein, Johannes Drescher, Dirk Reinel, Sven Rill, and Jorg Scheidt) Commonsense-Based Topic Modeling (Dheeraj Rajagopal, Daniel Olsher, Erik Cambria, and Kenneth Kwok) Online Debate Summarization using Topic Directed Sentiment Analysis (Sarvesh Ranade, Jayant Gupta, Vasudeva Varma, and Radhika Mamidi)

Sunday August 11, 2013 1:30pm - 3:30pm
Arkansas

1:40pm

UrbComp: Session 3 : Mining Urban Traffic
Fast and Exact Network Trajectory Similarity Computation: A

Sunday August 11, 2013 1:40pm - 3:00pm

1:55pm

BioKDD: A Fast and Scalable Clustering-based Approach for Constructing Reliable Radiation Hybrid Maps
by  : Raed I. Seetan, Anne M. Denton, Omar Al-Azzam, Ajay Kumar, M. Javed Iqbal and Shahryar F. Kianian

Sunday August 11, 2013 1:55pm - 2:20pm
Mississippi

2:00pm

2:00pm

2:00pm

MLG: Tina Eliassi-Rad : Measuring Tie Strength in Implicit Social Networks
Given a set of people and a set of events attended by them, we address the problem of measuring connectedness or tie strength between each pair of persons. The underlying assumption is that attendance at mutual events gives an implicit social network between people. We take an axiomatic approach to this problem. Starting from a list of axioms, which a measure of tie strength must satisfy, we characterize functions that satisfy all the axioms. We then show that there is a range of tie-strength measures that satisfy this characterization. A measure of tie strength induces a ranking on the edges of the social network (and on the set of neighbors for every person). We show that for applications where the ranking, and not the absolute value of the tie strength, is the important aspect about the measure, the axioms are equivalent to a natural partial order. To settle on a particular measure, we must make a non-obvious decision about extending this partial order to a total order. This decision is best left to particular applications. We also classify existing tie-strength measures according to the axioms that they satisfy; and observe that none of the "self-referential" tie-strength measures satisfy the axioms. In our experiments, we demonstrate the efficacy of our approach; show the completeness and soundness of our axioms, and present Kendall Tau Rank Correlation between various tie-strength measures.

Sunday August 11, 2013 2:00pm - 2:35pm
Sheraton 1

2:00pm

2:00pm

SNAKDD: Session 1
Full Paper: Finding Contexts of Social Influence in Online Social Networks Jennifer H. Nguyen, Bo Hu, Stephan Gnnemann and Martin Ester ProfileRank: Finding Relevant Content and Influential Users based on Information Diffusion Arlei Silva, Sara Guimares, Wagner Meira Jr. and Mohammed Zaki Network Flows and the Link Prediction Problem Kanika Narang, Kristina Lerman and Ponnurangam Kumaraguru Short Paper: Twitter Volume Spikes: Analysis and Application in Stock Trading Yuexin Mao, and Wei Wei and Bing Wang Analysis and Identification of Spamming Behaviors in Sina Weibo Microblog Chengfeng Lin, Yi Zhou, Kai Chen, Jianhua He, Li Song and Xiaokang Yang CUT: Community Update and Tracking in Dynamic Social Networks Hao-Shang Ma and Jen-Wei Huang Leveraging Candidate Popularity On Twitter To Predict Election Outcome Manish Gaurav, Anoop Kumar, Amit Srivastava and Scott Miller

Sunday August 11, 2013 2:00pm - 3:30pm
Chicago 8

2:00pm

KDD Cup Workshop
Sunday August 11, 2013 2:00pm - 5:00pm
Missouri

2:00pm

2:00pm

2:00pm

Tutorial: Entity Resolution for Big Data
Abstract: Entity resolution (ER), the problem of extracting, matching and resolving entity mentions in structured and unstructured data, is a long-standing challenge in database management, information retrieval, machine learning, natural language processing and statistics. Accurate and fast entity resolution has huge practical implications in a wide variety of commercial, scientific and security domains. Despite the long history of work on entity resolution, there is still a surprising diversity of approaches, and lack of guiding theory. Meanwhile, in the age of big data, the need for high quality entity resolution is growing, as we are inundated with more and more data, all of which needs to be integrated, aligned and matched, before further utility can be extracted. In this tutorial, we bring together perspectives on entity resolution from a variety of fields, including databases, information retrieval, natural language processing and machine learning, to provide, in one setting, a survey of a large body of work. We discuss both the practical aspects and theoretical underpinnings of ER. We describe existing solutions, current challenges and open research problems. In addition to giving attendees a thorough understanding of existing ER models, algorithms and evaluation methods, the tutorial will cover important research topics such as scalable ER, active and lightly supervised ER, and query-driven ER. Lise Getoor is a professor in the Computer Science Department at the University of Maryland, College Park. Her primary research interests are in machine learning and reasoning with uncertainty, applied to structured and semi-structured data. She also works on data integration, social network analysis and visual analytics. She has six best paper awards, an NSF Career Award, has served as associate editor for the Machine Learning Journal, JAIR, and TKDD, is elected member of the International Machine Learning Society board and AAAI Executive council, was PC co-chair of ICML 2011, and has served on a variety of program committees including AAAI, ICML, IJCAI, ISWC, KDD, SIGMOD, UAI, VLDB, WSDM and WWW. She received her Ph.D. from Stanford University, her M.S. from UC Berkeley, and her B.S. from UC Santa Barbara. Ashwin Machanavajjhala is an Assistant Professor in the Department of Computer Science, Duke University. Previously, he was a Senior Research Scientist in the Knowledge Management group at Yahoo! Research. His primary research interests lie in data privacy, systems for massive data analytics, and statistical methods for information extraction and entity resolution. He is a recipient of the NSFCAREER award in 2013 and the ACM SIGMOD Jim Gray Dissertation Award Honorable Mention in 2008. He received his Ph.D. from Cornell University and a B.Tech in Computer Science and Engineering from the Indian Institute of Technology, Madras.


Sunday August 11, 2013 2:00pm - 5:00pm
Chicago 10

2:00pm

Tutorial: Network Sampling
Abstract: Entity resolution (ER), the problem of extracting, matching and resolving entity mentions in structured and unstructured data, is a long-standing challenge in database management, information retrieval, machine learning, natural language processing and statistics. Accurate and fast entity resolution has huge practical implications in a wide variety of commercial, scientific and security domains. Despite the long history of work on entity resolution, there is still a surprising diversity of approaches, and lack of guiding theory. Meanwhile, in the age of big data, the need for high quality entity resolution is growing, as we are inundated with more and more data, all of which needs to be integrated, aligned and matched, before further utility can be extracted. In this tutorial, we bring together perspectives on entity resolution from a variety of fields, including databases, information retrieval, natural language processing and machine learning, to provide, in one setting, a survey of a large body of work. We discuss both the practical aspects and theoretical underpinnings of ER. We describe existing solutions, current challenges and open research problems. In addition to giving attendees a thorough understanding of existing ER models, algorithms and evaluation methods, the tutorial will cover important research topics such as scalable ER, active and lightly supervised ER, and query-driven ER. Lise Getoor is a professor in the Computer Science Department at the University of Maryland, College Park. Her primary research interests are in machine learning and reasoning with uncertainty, applied to structured and semi-structured data. She also works on data integration, social network analysis and visual analytics. She has six best paper awards, an NSF Career Award, has served as associate editor for the Machine Learning Journal, JAIR, and TKDD, is elected member of the International Machine Learning Society board and AAAI Executive council, was PC co-chair of ICML 2011, and has served on a variety of program committees including AAAI, ICML, IJCAI, ISWC, KDD, SIGMOD, UAI, VLDB, WSDM and WWW. She received her Ph.D. from Stanford University, her M.S. from UC Berkeley, and her B.S. from UC Santa Barbara. Ashwin Machanavajjhala is an Assistant Professor in the Department of Computer Science, Duke University. Previously, he was a Senior Research Scientist in the Knowledge Management group at Yahoo! Research. His primary research interests lie in data privacy, systems for massive data analytics, and statistical methods for information extraction and entity resolution. He is a recipient of the NSFCAREER award in 2013 and the ACM SIGMOD Jim Gray Dissertation Award Honorable Mention in 2008. He received his Ph.D. from Cornell University and a B.Tech in Computer Science and Engineering from the Indian Institute of Technology, Madras.


Sunday August 11, 2013 2:00pm - 5:00pm
Chicago 6

2:00pm

Tutorial: The Dataminers Guide to Scalable Mixed-Membership and Nonparametric Bayesian Models - Dr Alex Smola, Dr Amr Ahmed
Abstract: Large amounts of data arise in a multitude of situations, ranging from bioinformatics to astronomy, manufacturing, and medical applications. For concreteness our tutorial focuses on data obtained in the context of the internet, such as user generated content (microblogs, e-mails, messages), behavioral data (locations, interactions, clicks, queries), and graphs. Due to its magnitude, much of the challenges are to extract structure and interpretable models without the need for additional labels, i.e. to design effective unsupervised techniques. We present design patterns for hierarchical nonparametric Bayesian models, efficient inference algorithms, and modeling tools to describe salient aspects of the data. Dr. Amr Ahmed is a Research Scientist at Google. He received his PhD from Carnegie Mellon University in 2011. His thesis Modeling Users and Content: Structured Probabilistic Representation and Scalable Online Inference Algorithms was awarded the prestigious ACM SIGKDD Doctoral Dissertation award in 2012. He spent a year as a Research Scientist at Yahoo! Research before joining Google. He authored over 40 papers on topics that are core to this tutorial (including a best-paper runner-up award at WSDM 2012) and co-presented 3 tutorials at web and machine learning conferences. Dr. Alex Smola received his PhD from the University of Technology in Berlin in 1998. Subsequently he was research group leader and professor at the Australian National University and Senior Principal Researcher at National ICT Australia. From 2008 until 2012 he was Principal Research Scientist at Yahoo. Since 2012 he is a visiting researcher at Google and since 2013 a full professor at the Machine Learning Department of Carnegie Mellon University. He has written over 180 papers (that won several best paper awards at ICML, WSDM and SIGIR) and authored or edited 5 books. His work covers a broad range of subjects from statistical learning theory, convex optimization, and functional analysis to practical algorithms for scalable data classification, regression, clustering, and topic models. His recent work focuses on distributed, very large scale latent variable models for user profiling and content recommendation.

Sunday August 11, 2013 2:00pm - 5:00pm
Chicago 7

2:10pm

IDEA: Keynote 2 - Exploratory Text Analysis and The Middle Distance : Prof. Marti Hearst, UC Berkeley, School of Information
Bio Dr. Marti Hearst is a professor in the UC Berkeley School of Information. She received BA, MS, and PhD degrees in Computer Science from UC Berkeley and was a Member of the Research Staff at Xerox PARC from 1994 to 1997. A primary focus of Dr. Hearst's research is user interfaces for search, and she is the author of the 2009 book Search User Interfaces. She has invented or participated in several well-known search interface projects including the Flamenco project that investigated and the promoted the use of faceted metadata for collection navigation, TileBars query term visualization, BioText search over the bioscience literature, and Scatter/Gather clustering of search results. She has also researched extensively in computational linguistics and text mining with a focus on detecting semantic relations, and text segmentation including discourse boundaries and abbreviation recognition. Her more recent research interests include user interfaces for the exploratory text analysis in the digital humanities and peer learning in MOOCS. Abstract In this talk I will describe a project whose goal is to help scholars and analysts discover patterns and formulate and test hypotheses about the contents of text collections, midway between what humanities scholars call a traditional "close read" and the new "distant read" or "culturomics" approach. To this end, we describe a text analysis and discovery tool called WordSeer that allows for highly flexible "slicing and dicing" (hence "sliding") across a text collection. We illustrate the text sliding capabilities of the tool with two real-world case studies from the humanities and social sciences the practice of literacy education, and U.S. perceptions of China and Japan over the last 30 years showing how the tool has enabled scholars with no technical background to make new discoveries in these text collections. (Joint work with Aditi Muralidharan. Sponsored by NEH HK-50011.)

Sunday August 11, 2013 2:10pm - 3:00pm
Michigan A

2:15pm

2:20pm

2:30pm

DMH: Fraud Detection for Healthcare
by Hoda Eldardiry, Juan Liu, Ying Zhang, Markus Fromherz

Sunday August 11, 2013 2:30pm - 2:45pm
Superior B

2:30pm

2:35pm

MLG: Evgeniy Gabrilovich : Understanding the Web using Big Knowledge
Google's Knowledge Graph contains over half a billion entities and over 18 billion facts and connections. The Knowledge Graph can grow via human contributions, linking to existing knowledge repositories, and automatic acquisition of knowledge from the Internet. In this talk, we will discuss the frontiers of research in knowledge discovery on the Web. We will also discuss new functionalities that become possible due to deeper, knowledge-based text understanding, including proactively fetching relevant information and entity-based services.

Sunday August 11, 2013 2:35pm - 3:10pm
Sheraton 1

2:45pm

2:45pm

2:45pm

2:45pm

BioKDD: Invited talk 1: State-of-the-art in protein function prediction : Predrag Radivojac, Indiana University
Prof. Predrag Radivojac, Indiana University will deliver a talk titled State-of-the-art in protein function prediction. His summary of the talk: In this talk I will first provide the significance and computational problem formulation of protein function prediction. I will then present details of the first Critical Assessment of Functional Annotation (CAFA) experiment, where we evaluated state-of-the-art in the field. We provided evidence that modern methods significantly outperform simple BLAST alignments but that there is significant need and room for improvement. I will lay out possible avenues for improvements and accuracy assessment of function prediction proposed by my research group. Finally, I will briefly discuss the CAFA 2013-2014 challenge whose start is anticipated for Summer 2013.

Sunday August 11, 2013 2:45pm - 3:30pm
Mississippi

2:50pm

2:55pm

3:00pm

3:00pm

3:00pm

3:05pm

DMH: Discussion
Sunday August 11, 2013 3:05pm - 3:30pm
Superior B

3:10pm

MLG: Poster Session (and coffee break)
Sunday August 11, 2013 3:10pm - 4:00pm
Sheraton 1

3:15pm

BigMine: Concluding Remarks
Sunday August 11, 2013 3:15pm - 3:30pm
Chicago 9

3:15pm

3:15pm

3:30pm

Coffee Break
Sunday August 11, 2013 3:30pm - 4:00pm
Prom/Sheraton 5

3:30pm

UrbComp: Session 4 : Understanding Cities
Analyzing the Composition of Cities Using Spatial Clustering (Full) : zechun cao, Sujing Wang, Germain Forestier, Anne Puissant, Christoph Eick Real-time Air Quality Monitoring Through Mobile Sensing in Metropolitan Areas : Srinivas Devarakonda, Parveen Sevusu, HONGZHANG LIU, Ruilin Liu, Liviu Iftode, Badri Nath Exploring venue-based city-to-city similarity measures : Daniel Preotiuc-Pietro, Justin Cranshaw, Tae Yano Whose City of Tomorrow Is It? On Urban Computing, Utopianism, and Ethics : Justin Cranshaw

Sunday August 11, 2013 3:30pm - 4:40pm
Chicago 8

4:00pm

4:00pm

4:00pm

4:00pm

4:00pm

BigMine: Poster Session
Soft-CsGDT: Soft Cost-sensitive Gaussian Decision Tree for Cost-sensitive Classification of Data Streams Ning Guo, Yanhua Yu, Meina Song, Junde Song and Yu Fu. Searching time series with Hadoop in an electric power company Alice Brard and Georges Hebrail. Long-memory time series ensembles for concept shift detection Marcelo Mendoza, Felipe Bravo-Mrquez, Brbara Poblete and Daniel Gayo-Avello. Estimating Building Simulation Parameters via Bayesian Structure Learning Richard Edwards, Joshua New and Lynne Parker. Solving Combinatorial Optimization Problems using Relaxed Linear Programming: A High Performance Computing Perspective Chen Jin, Qiang Fu, Huahua Wang, Ankit Agrawal, William Hendrix, Wei-Keng Liao, Mostofa Patwary, Arindam Banerjee and Alok Choudhary. CAPRI: A Tool for Mining Complex Line Patterns in Large Log Data Farhana Zulkernine, Patrick Martin, Wendy Powley, Sima Soltani, Serge Mankovksi and Mark Addleman. Direct Out-of-Memory Distributed Parallel Frequent Pattern Mining Zheyi Rong and Jeroen De Knijf. TV Predictor: Personalized Program Recommendations to be displayed on SmartTVs Christopher Krauss, Lars George and Stefan Arbanowski. Data-driven Study of Urban Infrastructure to Enable City-wide Ubiquitous Computing Gautam S. Thakur, Pan Hui and Ahmed Helmy. Pushing Constraints into Data Streams Andreia Silva and Claudia Antunes. Forecasting Building Occupancy Using Sensor Network Data James Howard and William Hoff. Maintaining connected components for infinite graph streams Jonathan Berry, Matthew Oster, Cynthia Phillips, Steven Plimpton and Timothy Shead. An Architecture for Detecting Events in Real-Time using Massive Heterogeneous Data Sources George Valkanas, Dimitrios Gunopulos, Ioannis Boutsis and Vana Kalogeraki.

Sunday August 11, 2013 4:00pm - 5:00pm
Chicago 9

4:00pm

SNAKDD: Session 2
Full Paper: Epidemiological Modeling of News and Rumors on Twitter Fang Jin, Edward Doughertyu, Parang Saraf, Yang Cao and Naren Ramakrishnan Modeling Direct and Indirect Influence across Heterogeneous Social Networks Minkyoung Kim, David Newth and Peter Christen Structure and Attributes Community Detection: Comparative Analysis of Composite, Ensemble and Selection Methods Haithum Elhadi and Gady Agam Short Paper: Mixing Bandits: A Recipe for Improved Cold-Start Recommendations in a Social Network Stphane Caron and Smriti Bhagat The User's Communication Patterns on A Mobile Social Network Site Youngsoo Kim Community Finding within the Community Set Space Jerry Scripps and Christian Trefftz Customized Reviews for Small User-Databases using Iterative SVD and Content Based Filtering Jon Gregg and Nitin Jain

Sunday August 11, 2013 4:00pm - 5:00pm
Chicago 8

4:00pm

WISDOM: Session III
RBEM: A Rule Based Approach to Polarity Detection (Erik Tromp and Mykola Pechenizkiy) Cross-lingual Polarity Detection with Machine Translation (Erkin Demirtas and Mykola Pechenizkiy) Sentribute: Image Sentiment Analysis from a Mid-level Perspective (Jianbo Yuan, Quanzeng You, Sean Mcdonough, and Jiebo Luo)

Sunday August 11, 2013 4:00pm - 5:00pm
Arkansas

4:15pm

4:15pm

4:25pm

BioKDD: Invited talk 2: Systems Biology of Cellular Aging and Age-Related Degeneracies : Ananth Grama, Purdue University
Ananth Grama, Purdue University will deliver a talk titled Systems Biology of Cellular Aging and Age-Related Degeneracies. His summary of the talk: Cellular aging is a multi-factorial complex phenotype, characterized by the accumulation of damaged cellular components over the organism's life-span. The progression of aging depends on both the increasing rate of damage to DNA, RNA, proteins, and cellular organelles, as well as the gradual decline of the cellular defense mechanisms against stress. This can ultimately lead to a dysfunctional cell, with a higher risk factor for a number of diseases, including cancers, cardiovascular disease, and multiple neurodegenerative disorders. With a view to uncovering the pathways associated with aging, and their role in age-related degeneracies, we have developed a number of algorithms and statistical models that integrate and analyze disparate data over human and yeast interactomes. In this talk, we present two recent results: (i) we demonstrate the use of directed random walks in uncovering the downstream effectors of Target of Rapamycin (TOR), a highly conserved protein kinase that plays a key role in the aging process of various organisms; and (ii) we build tissue-specific networks for human cells and develop a complete framework for projecting these tissue-specific networks on to the yeast interactome. The goals of this effort are many-fold -- strong alignments indicate tissues for which yeast is a good model organism (in terms of underlying biochemistry), alignments reveal specific pathways that are well conserved, and they serve as a first step in understanding the etiology of age-related degeneracies.

Sunday August 11, 2013 4:25pm - 5:00pm
Mississippi

4:25pm

MLG: David Gleich : Personalized PageRank based Community Detection
Personalized PageRank is a reasonably well known technique to find a community in a network starting from a single node. It works by approximating the stationary distribution of a resetting random-walk and using that stationary distribution to estimate the presence of nearby cuts in the graph. Ill discuss recent work on how to find use a personalized PageRank community to quickly estimate the sets of best conductance anywhere in the graph as well as how to find a good set of seeds to cover the entire graph with personalized PageRank communities.

Sunday August 11, 2013 4:25pm - 5:00pm
Sheraton 1

4:30pm

4:30pm

4:30pm

4:40pm

UrbComp: Business meeting, Panel Discussion, and closing
Chair: Yu Zheng,Microsoft Research Asia Panelists: Charlie Catlett, Steven E. Koonin, Ouri Wolfson

Sunday August 11, 2013 4:40pm - 5:00pm
Chicago 8

4:45pm

4:45pm

5:00pm

MLG: Wrapup
Sunday August 11, 2013 5:00pm - 5:05pm
Sheraton 1

6:00pm

Opening Ceremony
Sunday August 11, 2013 6:00pm - 8:00pm
Chicago 6/7
 
Monday, August 12
 

7:30am

Continental Breakfast
Monday August 12, 2013 7:30am - 9:00am
Prom/Sheraton5

7:30am

Registration
Monday August 12, 2013 7:30am - 9:00am
Chicago Prom

8:00am

Womens Breakfast, sponsored by Microsoft
Speakers: Esin Saka, Applied Researcher Bing Advertising and Jan Pedersen, Partner Architect Bing

Monday August 12, 2013 8:00am - 9:00am
Columbus A/B

8:00am

8:30am

Opening Announcements
Monday August 12, 2013 8:30am - 8:45am
Chicago 6/7

8:45am

Keynote: Scale-out Beyond Map-Reduce

The amount of data being collected is growing at a staggering pace. The default is to capture and store any and all data, in anticipation of potential future strategic value, and vast amounts of data are being generated by instrumenting key customer and systems touchpoints. Until recently, data was gathered for well-defined objectives such as auditing, forensics, reporting and line-of-business operations; now, exploratory and predictive analysis is becoming ubiquitous. These differences in data scale and usage are leading to a new generation of data management and analytic systems, where the emphasis is on supporting a wide range of data to be stored uniformly and analyzed seamlessly using whatever techniques are most appropriate, including traditional tools like SQL and BI and newer tools for graph analytics and machine learning. These new systems use scale-out architectures for both data storage and computation.

Hadoop has become a key building block in the new generation of scale-out systems. Early versions of analytic tools over Hadoop, such as Hive and Pig for SQL-like queries, were implemented by translation into Map-Reduce computations. This approach has inherent limitations, and the emergence of resource managers such as YARN and Mesos has opened the door for newer analytic tools to bypass the Map-Reduce layer. This trend is especially significant for iterative computations such as graph analytics and machine learning, for which Map-Reduce is widely recognized to be a poor fit. In this talk, I will examine this architectural trend, and argue that resource managers are a first step in re-factoring the early implementations of Map-Reduce, and that more work is needed if we wish to support a variety of analytic tools on a common scale-out computational fabric. I will then present REEF, which runs on top of resource managers like YARN and provides support for task monitoring and restart, data movement and communications, and distributed state management. Finally, I will illustrate the value of using REEF to implement iterative algorithms for graph analytics and machine learning.

This is joint work with the CISL team at Microsoft.



Monday August 12, 2013 8:45am - 10:00am
Chicago 6/7

10:00am

Coffee Break
Monday August 12, 2013 10:00am - 10:30am
Prom/Sheraton5

10:00am

10:30am

Industry Government Session 1: Business
177 Assessing Team Strategy using Spatiotemporal Data Authors: Patrick Lucey, Disney Research Pittsburgh; Dean Oliver, ESPN; Iain Matthews, Disney Research; Peter Carr, Disney Research; Joe Roth, Disney Research 141 Financing Lead Triggers: Empowering Sales Reps Through Knowledge Discovery and Fusion Authors: Kareem Aggour, GE Global Research; Bethany Hoogs, GE Global Research 470 iHR: An Online Recruiting System for Xiamen Talent Service Center Authors: Lei Li, Florida International University; Wenxing Hong, Xiamen University; Wenfu Pan, Xiamen Talent Service Center; Tao Li, Florida International University 271 Exploratory Analysis of Highly Heterogeneous Document Collections Authors: Arun Maiya, Institute for Defense Analyses

Monday August 12, 2013 10:30am - 12:00pm
Michigan

10:30am

IPE 1 : Mining the digital universe of data to develop personalized cancer therapies - Eric Schadt
Abstract:The development of a personalized approach to medical care is now well recognized as an urgent priority. This approach is particularly important in oncology, where it is well understood that each cancer diagnosis is unique at the molecular level, arising from a particular and specific collection of genetic alterations. Furthermore, taking a personalized approach to oncology may expedite the treatment process, pre-empting therapeutic decisions based on fewer data in favor of treatments targeted to an individuals tumor. This directed course may be key to survival for many patients who are terminal or have failed standard therapies. I will discuss a personalized cancer therapy program we have initiated that involves DNA and RNA sequencing of a patients tumor and germline DNA and the projection of high-dimensional features extracted from these data onto predictive network models constructed by integrating large-scale, high dimensional data that exists for the patients cancer type. From the causal network inference procedures to the ensemble-based classification methods, big data analytics is front and center for interpreting large-scale patient data in the context of the digital universe of information that exists for the patients condition. Bio: Dr. Eric Schadt is Chairman and Professor of the Department of Genetics and Genomic Sciences at the Icahn School of Medicine at Mount Sinai and the Director of the Institute for Genomics and Multiscale Biology at Mount Sinai. Previously, Dr. Schadt had been the Chief Scientific Officer at Pacific Biosciences, overseeing the scientific strategy for the company, including creating the vision for next-generation sequencing applications of the companys technology. Dr. Schadt is also a founding member of Sage Bionetworks, an open access genomics initiative designed to build and support databases and an accessible platform for creating innovative, dynamic models of disease. Dr. Schadts current efforts at Mount Sinai involve the generation and integration of large-scale, high-dimension molecular, cellular, and clinical data to build more predictive models of disease, a research direction motivated by the genomics and systems biology research he led at Merck to elucidate common human diseases and drug response using novel computational approaches applied to genetic and molecular profiling data. Dr. Schadt received his B.S. in applied mathematics/computer science from California Polytechnic State University, his M.A. in pure mathematics from UCD, and his Ph.D. in bio-mathematics from UCLA (requiring Ph.D. candidacy in molecular biology and mathematics).

Monday August 12, 2013 10:30am - 12:00pm
Superior

10:30am

IPE 1: To Buy or Not to BuyThat is the Question - Oren Etzioni
Abstract: Shopping can be decomposed into three basic questions: what, where, and when to buy? In this talk, Ill describe how we utilize advanced data-mining and text-mining techniques at Decide.com (and earlier at Farecast) to solve these problems for on-line shoppers. Our algorithms have predicted prices utilizing billions of data points, and ranked products based on millions of reviews. Bio: Oren Etzioni received his PhD from Carnegie Mellon in 1991. He is the WRF Entrepreneurship Professor of Computer Science at the University of Washington. Oren is the author of over 200 technical papers, cited over 18,000 times. He received the NSF Young Investigator Award in 1993, and was selected as a AAAI Fellow a decade later. In 2007, he received the Robert S. Engelmore Memorial Award for long-standing technical and entrepreneurial contributions to Artificial Intelligence. Oren is the founder of three companies focused on increased transparency for shoppers. His first company, Netbot, was the first online comparison shopping company (acquired by Excite in 1997). His second company, Farecast, advised travelers when to buy their air tickets. Farecast was acquired by Microsoft in 2008 and became the foundation for Bing Travel. Decide.com, founded in 2010, utilizes cutting-edge data-mining methods to minimize buyers remorse. In 2013, Oren was chosen as the Geek of the Year by a vote of the Seattle Tech. Community.

Monday August 12, 2013 10:30am - 12:00pm
Superior

10:30am

Research Session 1: Document and topic models
980 One Theme in All Views: Modeling Consensus Topics in Multiple Contexts
Authors: Jian Tang, Peking University; Ming Zhang, ; Qiaozhu Mei, University of Michigan

198 Representing Documents Through Their Readers
Authors: Khalid El-Arini, Carnegie Mellon University; Min Xu, Carnegie Mellon University; Emily Fox, University of Washington; Carlos Guestrin, University of Washington

811 Text-Based Measures of Document Diversity
Authors: Kevin Bache, University of California, Irvine; Padhraic Smyth, UC Irvine; David Newman, University of California, Irvine

528 Diversity Maximization Under Matroid Constraints
Authors: Zeinab Abbassi, Columbia University; Vahab Mirrokni, Google; Mayur Thakur, Google

Monday August 12, 2013 10:30am - 12:00pm
Chicago 8

10:30am

Research Session 2: Social media
611 Connecting Users across Social Media Sites: A Behavioral-Modeling Approach
Authors: Reza Zafarani, Arizona State University; Huan Liu, Arizona State University

722 Automatic selection of social media responses to news
Authors: Tadej _tajner, Jo_ef Stefan Institute; Bart Thomee, Yahoo! Research; Ana Maria Popescu, Yahoo! Labs; Marco Pennacchiotti, eBay Inc.; Alejandro Jaimes, Yahoo!

1041 Estimating Unbiased Sharer Reputation via Social Data Calibration
Authors: Jaewon Yang, Stanford University; Bee-Chung Chen, LinkedIn; Deepak Agarwal, LinkedIn

1045 Linking Named Entities in Tweets with Knowledge Base via User Interest Modeling
Authors: Wei Shen, Tsinghua University; Jianyong Wang, Tsinghua University; Ping Luo, HP Lab; Min Wang, Google Research

Monday August 12, 2013 10:30am - 12:00pm
Chicago 9

10:30am

Research Session 3: Big data frameworks
52 TurboGraph: A Fast Parallel Graph Engine Handling Billion-scale Graphs in a Single PC
Authors: Wook-Shin Han, POSTECH; Sangyeon Lee, POSTECH; Kyungyeol Park, POSTECH; Jeong-Hoon Lee, POSTECH; Min-Soo Kim, DGIST; Jinha Kim, POSTECH; Hwanjo Yu, POSTECH

142 A Probabilistic Framework for Big Data Pipelines
Authors: Karthik Raman, Cornell University; Adith Swaminathan, Cornell University; Thorsten Joachims, Cornell; Johannes Gehrke, Cornell University

914 Big Data Analytics with Small Footprint: Squaring the Cloud
Authors: John Canny, UC Berkeley; Huasha Zhao, UC Berkeley

Monday August 12, 2013 10:30am - 12:00pm
Chicago 10

12:00pm

Lunch
Monday August 12, 2013 12:00pm - 1:30pm
n/a

1:30pm

Keynote: The Online Revolution: Education for Everyone
In 2011, Stanford University offered three online courses, which anyone in the world could enroll in and take for free. Together, these three courses had enrollments of around 350,000 students, making this one of the largest experiments in online education ever performed. Since the beginning of 2012, we have transitioned this effort into a new venture, Coursera, a social entrepreneurship company whose mission is to make high-quality education accessible to everyone by allowing the best universities to offer courses to everyone around the world, for free. Coursera classes provide a real course experience to students, including video content, interactive exercises with meaningful feedback, using both auto-grading and peer-grading, and a rich peer-to-peer interaction around the course materials. Currently, Coursera has 62 university partners, and over 3 million students enrolled in its over 300 courses. These courses span a range of topics including computer science, business, medicine, science, humanities, social sciences, and more. In this talk, I’ll report on this far-reaching experiment in education, and why we believe this model can provide both an improved classroom experience for our on-campus students, via a flipped classroom model, as well as a meaningful learning experience for the millions of students around the world who would otherwise never have access to education of this quality.

Speakers

Monday August 12, 2013 1:30pm - 2:45pm
Chicago 6/7

2:00pm

2:45pm

Break
Monday August 12, 2013 2:45pm - 3:00pm
n/a

3:00pm

Industry Government Session 2: Health and Safety
53 Modeling and Probabilistic Reasoning of Population Evacuation During Large-scale Disaster
Authors: Xuan Song*, The University of Tokyo; Quanshi Zhang, The University of Tokyo; Yoshihide Sekimoto, Teerayut Horanont, The Univeristy of Tokyo; Satoshi Ueyama; Ryosuke Shibasaki, The University of Tokyo

1146 Towards long-lead forecasting of extreme flood events: a data mining framework for precipitation cluster precursors identification
Authors: Dawei Wang, UMass Boston; Wei Ding, University of Massachusetts Boston Kui Yu; Xindong Wu; Ping Chen, University of Houston; David Small, Tufts University; Shafiqul Islam, Tufts University 700 Knowledge Discovery from Massive Healthcare Claims Data Authors: Varun Chandola, Oak Ridge National Laboratory

281 An Integrated Framework for Suicide Risk Prediction
Authors: Truyen Tran, Deakin University; Dinh Phung, Deakin University; Svetha Venkatesh, Deakin University; Wei Luo, Deakin University; Richard Harvey, Barwon Health; Michael Berk, Deakin University

973 Empirical Bayes Model to Combine Signals of Adverse Drug Reactions
Authors: Rave Harpaz, Stanford University; William DuMouchel; Paea LePendu; Nigam Shah 780 Gaussian Multiple Instance Learning Approach for Mapping the Slums of the World Using Very High Resolution Imagery Authors: Ranga Vatsavai, Oak Ridge National Labs

Moderators
Monday August 12, 2013 3:00pm - 4:30pm
Michicagn

3:00pm

IPE 2 : Adaptive Adversaries: Building Systems to Fight Fraud and Cyber Intruders - Ari Gesher, Palantir
Statistical machine learning / knowledge discovery techniques tend to fail when faced with an adaptive adversary attempting to evade detection in the data. Humans do an excellent job of correctly spotting adaptive adversaries given a good way to digest the data. On the other hand, humans are glacially slow and error-prone when it comes to moving through very large volumes of data, a task best left to the machines. Fighting complex fraud and cyber-security threats requires a symbiosis between the computers and teams of human analysts. The computers use algorithmic analysis, heuristics, and/or statistical characterization to nd interesting simple patterns in the data. These candidate events are then queued for in-depth human analysis in rich, expressive, interactive analysis environments. In this talk, well take a look at case studies of three different systems, using a partnership of automation and human analysis on large scale data to nd the clandestine human behavior that these datasets hold, including a discussion of the backend systems architecture and a demo of the interactive analysis environment. The backend systems architecture is a mix of open source technologies, like Cassandra, Lucene, and Hadoop, and some new components that bind them all together. The interactive analysis environment allows seamless pivoting between semantic, geospatial, and temporal analysis with a powerful GUI interface thats usable by non-data scientists. The systems are real systems currently in use by commercial banks, pharmaceutical companies, and governments.

Monday August 12, 2013 3:00pm - 4:30pm
Superior

3:00pm

IPE 2: The business impact of deep learning
In the last year deep learning has gone from being a special purpose machine learning technique used mainly for image and speech recognition, to becoming a general purpose machine learning tool. This has broad implications for all organizations that rely on data analysis. It represents the latest development in a general trend towards more automated algorithms, and away from domain specific knowledge. For organizations that rely on domain expertise for their competitive advantage, this trend could be extremely disruptive. For start-ups interested in entering established markets, this trend could be a major opportunity. This talk will be a non-technical introduction to general-purpose deep learning, and its potential business impact.

Monday August 12, 2013 3:00pm - 4:30pm
Superior

3:00pm

Research Session 4: Graph mining
583 Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees
Authors: Charalampos Tsourakakis, Carnegie Mellon University; Francesco Bonchi, Yahoo! Research; Aristides Gionis, Aalto University; Francesco Gullo, Yahoo! Research; Maria Tsiarli, University of Pittsburgh

399 Guided Learning for Role Discovery (GLRD): Framework, Algorithms, and Applications
Authors: Sean Gilpin, U.C. Davis; Tina Eliassi-Rad, ; Ian Davidson, University of California Davis

1083 Redundancy-Aware Maximal Cliques
Authors: Jia Wang, Chinese University of Hong Kong; James Cheng, Chinese University of Hong Kong; Ada Wai-Chee Fu, Chinese University of Hong Kong

569 Selective Sampling on Graphs for Classification
Authors: Quanquan Gu, CS, UIUC; Charu Aggarwal, IBM Research; Jialu Liu, UIUC; Jiawei Han, University of Illinois at Urbana-Champaign

Monday August 12, 2013 3:00pm - 4:30pm
Chicago 8

3:00pm

Research Session 5: Classification
95 Density-Based Logistic Regression
Authors: Wenlin Chen, Yixin Chen, Washington University in St Louis; Yi Mao, Baolong Guo, Xidian University

627 MILS: Multi-Instance Learning from Multiple Information Sources
Authors: Dan Zhang, Purdue University; Jingrui He, Stevens Institute of Technolog; Richard Lawrence, IBM Research

572 Querying Discriminative and Representative Samples for Batch Mode Active Learning
Authors: Zheng Wang, Arizona State University; Jieping Ye, Arizona State University

863 SVM_{pAUC}^{tight}: A New Support Vector Method for Optimizing Partial AUC Based on a Tight Convex Upper Bound
Authors: Harikrishna Narasimhan, Indian Institute of Science; Shivani Agarwal, Indian Institute of Science, Bangalore

Monday August 12, 2013 3:00pm - 4:30pm
Chicago 9

4:00pm

4:30pm

Coffee Break
Monday August 12, 2013 4:30pm - 5:00pm
Prom/Sheraton5

5:00pm

Industry Government Session 3: Experimentation
1108 Dependence and Uncertainty in Large Experiments: An Evaluation of Bootstrap Methods
Auhors: Eytan Bakshy, Facebook; Dean Eckles, Facebook

998 Predictive Model Performance: Online and Offline Evaluations
Authors: Jeonghee Yi; Ye Chen, Microsoft; Jie Li, Microsoft; Swaraj Sett, Microsoft; Tak Yan, Microsoft; Jeonghee Yi, Microsoft

1075 Online Controlled Experiments at Large Scale
Authors: Ron Kohavi, Microsoft; Alex Deng, Microsoft; Brian Frasca, Microsoft; Toby Walker, Microsoft; Ya Xu, Microsoft; Nils Pohlmann, Microsoft

Monday August 12, 2013 5:00pm - 5:30pm
Michigan

5:00pm

Doctoral Dissertation Awards Session
Monday August 12, 2013 5:00pm - 6:10pm
Chicago 10

5:00pm

Industry Government Session 4: Social Good
The Eric and Wendy Schmidt Data Science for Social Good summer fellowship is a University of Chicago program where around 40 fellows from around the world are spending the summer working on machine learning and data science projects with social impact. For three months in Chicago they are working closely with governments and non-profits, applying their computer science and analytics skills, and learning from mentors with experience in industry and academia. The program is organized by Rayid Ghani, funded by the Schmidt Family Foundation, and led by a team from the Computation Institute and the Harris School of Public Policy at the University of Chicago.

This session will consist of short 3-minute presentations from each of the 12 projects from the fellowship ranging from education, healthcare, energy, transportation, to public safety.

Moderators
Monday August 12, 2013 5:00pm - 6:10pm
Superior

5:00pm

Research Session 6:Healthcare and bioinformatics
543 Succinct Interval-Splitting Tree for Scalable Similarity Search of Compound-Protein Pairs with Property Constraints
Authors: Yasuo Tabei, JST; Akihiro Kishimoto, IBM Research, Dublin; Masaaki Kotera, Kyoto University; Yoshihiro Yamanishi, Kyushu university

191 Multi-Source Learning with Block-wise Missing Data For Alzheimers Disease Prediction
Authors: Shuo Xiang, Arizona State University; Lei Yuan, Arizona State University; Wei Fan, IBM Research; Yalin Wang, ; Paul Thompson, ; Jieping Ye, Arizona State University

395 Network Discovery via Constrained Tensor Analysis of fMRI Data
Authors: Ian Davidson, University of California Davis

Monday August 12, 2013 5:00pm - 6:10pm
Chicago 8

5:00pm

Research Session 7: Recommender systems
631 Learning to question: Leveraging user preferences for shopping advice
Authors: Mahashweta Das, Univ of Texas at Arlington; Gianmarco De Francisci Morales, Yahoo! Research; Aristides Gionis, Aalto University; Ingmar Weber, Qatar Computing Research Institute

444 Active Learning and Search on Low-Rank Matrices
Authors: Dougal Sutherland, Carnegie Mellon University; Barnabs Pczos, Carnegie Mellon University; Jeff Schneider,

293 LCARS: A Location-Content-Aware Recommender System
Authors: Hongzhi Yin, Peking University; Yizhou Sun, ; Bin Cui, Peking University; Zhiting Hu, ; Ling Chen

Monday August 12, 2013 5:00pm - 6:10pm
Chicago 9

6:30pm

Poster Session
Monday August 12, 2013 6:30pm - 8:30pm
River B
 
Tuesday, August 13
 

7:30am

Continental Breakfast
Tuesday August 13, 2013 7:30am - 9:00am
Prom/Sheraton5

7:30am

Registration
Tuesday August 13, 2013 7:30am - 6:30pm
Chicago Prom

8:00am

8:00am

Exhibits
Tuesday August 13, 2013 8:00am - 6:00pm
Sheraton 5

8:45am

Keynote: Optimization in Learning and Data Analysis
Optimization tools are vital to data analysis and learning. The optimization perspective has provided valuable insights, and optimization formulations have led to practical algorithms with good theoretical properties. In turn, the rich collection of problems in learning and data analysis is providing fresh perspectives on optimization algorithms and is driving new fundamental research in the area. We discuss research on several areas in this domain, including signal reconstruction, manifold learning, and regression / classification, describing in each case recent research in which optimization algorithms have been developed and applied successfully. A particular focus is asynchronous parallel algorithms for optimization and linear algebra, and their applications in data analysis and learning.


Tuesday August 13, 2013 8:45am - 10:00am
Chicago 6/7

10:00am

Coffee Break
Tuesday August 13, 2013 10:00am - 10:30am
Prom/Sheraton5

10:00am

10:30am

Industry Government Session 5: Social
756 Using Co-visitation Networks For Classifying Non-Intentional Traffic
Authors: Ori Stitelman, M6d; Claudia Perlich, M6D; Brian Dalessandro, m6d; Rod Hook, m6d; Troy Raeder, Media6Degrees; Foster Provost, NYU

1279 Dynamic Memory Allocation Policies for Postings in Real-Time Twitter Search
Authors: Nima Asadi, Jimmy Lin*, University of Maryland; Michael Busch

247 Mining for Geographically Disperse Communities in Social Networks by Leveraging Distance Modularity
Authors: Paulo Shakarian, US Military Academy; Patrick Roos; Devon Callahan; Cory Kirk

930 Experience from Hosting a Corporate Prediction Market: Benefits beyond the Forecasts
Authors: Thomas Montgomery, Ford Motor Company; Paul Stieg, Ford Motor Company; Michael Cavaretta, Ford Motor Company; Paul Moraal, Ford Motor Company

Tuesday August 13, 2013 10:30am - 12:00pm
Michigan

10:30am

IPE 3 : Hadoop: A View from the Trenches
From its beginnings as a framework for building web crawlers for small-scale search engines to being one of the most promising technologies for building datacenter-scale distributed computing and storage platforms, Apache Hadoop has come far in the last seven years. In this talk I will reminisce about the early days of Hadoop, and will give an overview of the current state of the Hadoop ecosystem, and some real-world use cases of this open source platform. I will conclude with some crystal gazing in the future of Hadoop and associated technologies.


Tuesday August 13, 2013 10:30am - 12:00pm
Superior

10:30am

IPE 3: Targeting and Influencing at Scale: From Presidential Elections to Social Good
If youre still recovering from the barrage of ads, news, emails, Facebook posts, and newspaper articles that were giving you the latest poll numbers, asking you to volunteer, donate money, and vote, this talk will give you a look behind the scenes on why you were seeing what you were seeing. I will talk about how machine learning and data mining along with randomized experiments were used to target and influence tens of millions of people. Beyond the presidential elections, these methodologies for targeting and influence have the power to solve big problems in education, healthcare, energy, transportation, and related areas. I will talk about some recent work were doing at the University of Chicago Data Science for Social Good summer fellowship program working with non-profits and government organizations to tackle some of these challenges.

Speakers

Tuesday August 13, 2013 10:30am - 12:00pm
Superior

10:30am

Research Session 10: Graph clustering
93 Flexible and Robust Co-regularized Multi-Domain Graph Clustering Authors: Wei Cheng, UNC at Chapel Hill; xiang Zhang, Case Western Reserve University; Patrick Sullivan, UNC at Chapel Hill; Wei Wang, University of California, Los Angeles 1186 Clustered Graph Randomization: Network Exposure to Multiple Universes Authors: Johan Ugander, Cornell University; Brian Karrer, Facebook; Lars Backstrom, Facebook; Jon Kleinberg, Cornell 566 Social Influence Based Clustering of Heterogeneous Information Networks Authors: Yang Zhou, Georgia Institute of Technolog; Ling Liu, Georgia Institute of Technology

Tuesday August 13, 2013 10:30am - 12:00pm
Chicago 10

10:30am

Research Session 8: Temporal/social influence
425 Discovering Latent Influence in Online Social Activities via Shared Cascade Poisson Processes
Authors: Tomoharu Iwata, NTT Communication Science Laboratories; Amar Shah, University of Cambridge; Zoubin Ghahramani, Cambridge University

707 STRIP: Stream Learning of Influence Probabilities
Authors: Konstantin Kutzkov, University of Copenhagen; Albert Bifet, Yahoo! Research; Francesco Bonchi, Yahoo! Research; Aristides Gionis, Aalto University

18 Fast Structure Learning in Generalized Stochastic Processes with Latent Factors
Authors: Mohammad Taha Bahadori, University of Southern Califor; Yan Liu, University of Southern California; Eric Xing, CMU

Tuesday August 13, 2013 10:30am - 12:00pm
Chicago 8

10:30am

Research Session 9: Sparse learning
770 Robust Sparse Estimation of Multiresponse Regression and Inverse Covariance Matrix via the L2 distance
Authors: Aurelie Lozano, IBM Research; Huijing Jiang, IBM Research; Xinwei Deng, Virginia Tech

1167 Exact Sparse Recovery with L0 Projections
Authors: Ping Li, Cornell University; Cun-Hui Zhang, Rutgers University

251 Robust Principal Component Analysis via Capped Norms
Authors: Qian Sun, Arizona State University; Shuo Xiang, Arizona State University; Jieping Ye, Arizona State University

Tuesday August 13, 2013 10:30am - 12:00pm
Chicago 9

12:00pm

Business Lunch
Tuesday August 13, 2013 12:00pm - 1:30pm
River B

1:30pm

Panel
Tuesday August 13, 2013 1:30pm - 2:45pm
Chicago 6/7

2:00pm

3:00pm

Industry Government Session 6: Monitoring
547 Heat Pump Detection from Coarse Grained Smart Meter Data with Positive and Unlabeled Learning
Authors: Hongliang Fei*, IBM T.J. Watson Research; Younghun Kim, IBM Research; Sambit Sahu, IBM Research; Millind Naphade, IBM Research

242 Analysis of Advanced Meter Infrastructure Data of Water Consumption in Apartment Buildings
Authors: Einat Kermany, IBM; Hagai Michaelis, Arad Technologies; Dorit Baras, IBM; Hanna Mazzawi, IBM; Yehuda Naveh, IBM

590 A data mining driven risk profiling method for road asset management
Authors: Daniel Emerson, Queensland University of Technology; Richi Nayak, Queensland University of Technology; Justin Weligamage, Road asset management consultant

190 Improving Quality Control by Early Prediction of Manufacturing Outcomes
Authors: Sholom Weiss, IBM Research; Amit Dhurandhar, IBM TJ Watson; Robert Baseman, IBM Research 51 U-Air: When Urban Air Quality Inference Meets Big Data Authors: Yu Zheng, Microsoft; Furui Liu, Microsoft Research Asia; Hsun-Ping Hsieh, Microsoft Research Asia

Tuesday August 13, 2013 3:00pm - 4:30pm
Michigan

3:00pm

Research Session 11: Scalable methods for big data
773 Comparing Apples to Oranges: A Scalable Solution with Heterogeneous Hashing
Authors: Mingdong Ou, Tsinghua University; Peng Cui, Tsinghua University; Fei Wang, IBM T. J. Watson Research Lab; Jun Wang, IBM Research

166 Fast and Scalable Polynomial Kernels via Explicit Feature Maps
Authors: Ninh Pham, IT University of Copenhagen; Rasmus Pagh, IT University of Copenhagen

434 Indexed Block Coordinate Descent for Large-Scale Linear Classification with Limited Memory
Authors: En-Hsu Yen, National Taiwan University; Chun-Fu Chang, National Taiwan University; Ting-Wei Lin, National Taiwan University; Shan-Wei Lin, National Taiwan University; Shou-De Lin, National Taiwan University

577 Recursive Regularization for Large-scale Classification with Hierarchical and Graphical Dependencies
Authors: Siddharth Gopal, CMU; Yiming Yang, CMU

Tuesday August 13, 2013 3:00pm - 4:30pm
Chicago 8

3:00pm

Research Session 12: Diffusion in social networks
1163 Confluence: Conformity Influence in Large Social Networks
Authors: Jie Tang, Tsinghua University; Sen Wu, Tsinghua University; Jimeng Sun, IBM Research

291 The Role of Information Diffusion in the Evolution of Social Networks
Authors: Lilian Weng, Indiana University; Jacob Ratkiewicz, Google Inc.; Nicola Perra, Northeastern University; Bruno Goncalves, Aix-Marseille Universite; Carlos Castillo, Qatar Computing Research Institute; Francesco Bonchi, Yahoo! Research; Rossano Schifanella, Universita degli Studi di Torino, Italy; Filippo Menczer, Indiana University; Alessandro Flammini, Indiana University

1006 Information Cascade at Group Scale
Authors: Milad Eftekhar, University of Toronto; Yashar Ganjali, University of Toronto; Nick Koudas, University of Toronto

120 Extracting Social Events for Learning Better Information Diffusion Models
Authors: Shuyang Lin, UIC; Fengjiao Wang, University of Illinois at Chic; Qingbo Hu, University of Illinois at Chic; philip Yu, University of Illinois at Chicago

Tuesday August 13, 2013 3:00pm - 4:30pm
Chicago 9

3:00pm

Research Session 13: Time series and spatial data
341 Model Selection in Markovian Procsses
Authors: Assaf Hallak, The Technion; Dotan Di-Castro, Technion; Shie Mannor, Technion

511 DTW-D: Time Series Semi-Supervised Learning from a Single Example
Authors: Yanping Chen, UCR; Bing Hu, ; Eamonn Keogh, University of California Riverside; Gustavo Batista,

1287 Model-based Kernel for Efficient Time Series Analysis
uthors: Huanhuan Chen, University of Birmingham; Fengzhen Tang, University of Birmingham; Peter Tino, University of Birmingham; Xin Yao, University of Birmingham

131 Mining Lines in the Sand: On Trajectory Discovery From Untrustworthy Data in Cyber-Physical System
Authors: Lu-An Tang, UIUC; Xiao Yu, University of Illinois at Urbana-Champaign; Quanquan Gu, CS, UIUC; Jiawei Han, UIUC; Alice Leung, BBN; Thomas La Porta, PSU

Tuesday August 13, 2013 3:00pm - 4:30pm
Chicago 10

3:00pm

IPE 4 : Using Big Data to Solve Small Data Problems
The brief history of knowledge discovery is filled with products that promised to bring BI to the masses. But how do you build a product that truly bridges the gap between the conceptual simplicity of questions and answers and the structure needed to query traditional data stores?

In this talk, Chris Neumann will discuss how DataHero applied the principles of user-centric design and development over a year and a half to create a product with which more than 95% of new users can get answers on their first attempt. Hell demonstrate the process DataHero uses to determine the best combination of algorithms and user interface concepts needed to create intuitive solutions to potentially complex interactions, including:
- Determining the structure of files uploaded by users - Accurately identifying data types within files
- Presenting users with an optimal visualization for any combination of data
- Helping users to ask questions of data when they dont know what to do

Chris will also talk about what its like to start a Big Data company and how he applied lessons from his time as the first engineer at Aster Data Systems to DataHero.

Speakers

Tuesday August 13, 2013 3:00pm - 6:00pm
Superior

3:00pm

IPE 4: Cyber Security How Visual Analytics Unlock Insight
In the Cyber Security domain, we have been collecting big data for almost two decades. The volume and variety of our data is extremely large, but understanding and capturing the semantics of the data is even more of a challenge. Finding the needle in the proverbial haystack has been attempted from many different angles. In this talk we will have a look at what approaches have been explored, what has worked, and what has not. We will see that there is still a large amount of work to be done and data mining is going to play a central role. Well try to motivate that in order to successfully find bad guys, we will have to embrace a solution that not only leverages clever data mining, but employs the right mix between human computer interfaces, data mining, and scalable data platforms. Traditionally, cyber security has been having its challenges with data mining. We are different. We will explore how to adopt data mining algorithms to the security domain. Some approaches like predictive analytics are extremely hard, if not impossible. How would you predict the next cyber attack? Others need to be tailored to the security domain to make them work. Visualization and visual analytics seem to be extremely promising to solve cyber security issues. Situational awareness, large-scale data exploration, knowledge capture, and forensic investigations are four top use-cases we will discuss. Visualization alone, however, does not solve security problems. We need algorithms that support the visualizations. For example to reduce the amount of data so an analyst can deal with it, in both volume and semantics.

Speakers

Tuesday August 13, 2013 3:00pm - 6:00pm
Superior

4:00pm

4:30pm

Coffee Break
Tuesday August 13, 2013 4:30pm - 5:00pm
Prom/Sheraton5

4:45pm

5:00pm

Industry Government Session 7: Security
769 An Integrated Framework for Optimizing Automatic Monitoring Systems in Large IT Infrastructures
Authors: Liang Tang, Florida International Univ; Tao Li, Florida International University; Larisa Shwartz, IBM Research; Florian Pinel, IBM Research; Genady Grabarnik, St. Johns University

960 Detecting Insider Threats in a Real Corporate Database of Computer Usage Activities
Authors: Ted Senator, SAIC Henry Goldberg, SAIC Alex Memory, William Young, SAIC Bradley Rees, Robert Pierce, SAIC Daniel Huang, SAIC Matthew Reardon, SAIC David Bader, Ga Tech Edmond Chow, Ga Tech Irfan Essa, Georgia Institute of Technology Joshua Jones, Ga Tech Vinay Bettadapura, Georgia Institute of Technology Duen Horng Chau, Georgia Institute of Technology Oded Green, Ga Tech Oguz Kaya, Ga Tech Anita Zakrzewska, Ga Tech Erica Briscoe, GTRI Rudolph Mappus IV, GTRI Robert McColl, GTRI Lora Weiss, GTRI Thomas Dietterich, Oregon State University Alan Fern, Oregon State University Weng-Keen Wong, Oregon State University Shubhomoy Das, Oregon State University Andrew Emmott, Oregon State University Jed Irvine, Oregon State University Jay-Yoon Lee, CMU Danai Koutra, Carnegie Mellon University Christos Faloutsos, CMU Daniel Corkill, University of Massachusetts Lisa Friedland, University of Massachusetts Amanda Gentzel, University of Massachusetts David Jensen, Univ of Massachusetts Amherst

825 Efficiently Rewriting Large Multimedia Application Execution Traces with few Event Sequences
Authors: Christiane Kamdem Kengne, UJF/LIG; Leon Constantin Fopa; alexandre Termier; Noha Ibrahim; Marie-Christine Rousset;Takashi Washio, Osaka University ;Miguel Santana

1128 Discriminant Malware Distance Learning on Structural Information for Automated Malware Classification
Authors: Deguang Kong, University of Texas at Arlington; Guanhua Yan, Los Alamos National Laboratory 702 A Privacy Preserving Framework for Managing Vehicle Data in Road Pricing Systems Authors: Huayu Wu, Institute for Infocomm Research; Wee Siong Ng, Institute for Infocomm Research; Kian-Lee Tan, National University of Singapore; Wei Wu , Institute for Infocomm Research; Shili Xiang, Institute for Infocomm Research; Mingqiang Xue, Institute for Infocomm Research

Tuesday August 13, 2013 5:00pm - 6:30pm
Michigan

5:00pm

IPE Panel Discussion: Death of the Expert? The Rise of Algorithms and Decline of Domain Experts
Title: Death of the expert? The rise of algorithms and decline of domain experts Abstract: Machine learning algorithms used to require features to be carefully hand created and filtered. The algorithms of yesteryear needed us to tell them about interactions, non-linearities, non-normal distributions, etc etc and if we added too many features to the model, we would over-fit and end up with something that was useless in practice. That meant that domain experts were vital in manipulating and filtering the data to create just the right set of inputs. But now that we have deep learning nets, ensembles of decision trees, and so forth, features are created automatically, and over-fitting is avoided even with huge numbers of features. Furthermore, these general purpose algorithms have proven their worth in everything from video object tracking to speech recognition to automated drug discovery to natural language processing. So where does that leave the role of the domain expert? In this panel, we will discuss and debate where domain experts fit in to this new world of general purpose machine learning algorithms. Moderator: Jeremy Howard, Kaggle Panelists: Oren Etzioni, University of Washington John Akred, Idibon Robert Munro, Silicon Valley Data Science

Tuesday August 13, 2013 5:00pm - 6:30pm
Superior

5:00pm

Research Session 14: Unsupervised and topic learning
623 A General Bootstrap Performance Diagnostic
Authors: Ariel Kleiner, ; Ameet Talwalkar, UC Berkeley; Sameer Agarwal, ; Michael Jordan, ; Ion Stoica,

897 Subsampling for Efficient and Effective Unsupervised Outlier Detection Ensembles
Authors: Arthur Zimek, University of Alberta; Matthew Gaudet, University of Alberta; Ricardo J. G. Campello, University of Alberta; Jrg Sander, University of Alberta

503 A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy
Authors: Chi Wang, University of Illinois; Marina Danilevsky, University of Illinois; Nihit Desai, University of Illinois at Urbana-Champaign; Yinan Zhang, University of Illinois at Urbana-Champaign; Phuong Nguyen, University of Illinois at Urbana-Champaign; Thrivikrama Taula, University of Illinois at Urbana-Champaign; Jiawei Han, University of Illinois at Urbana-Champaign

1199 Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation
Authors: James Foulds, UC Irvine; Levi Boyles, UC Irvine; Christopher Dubois, UC Irvine; Padhraic Smyth, UC Irvine; max Welling, University of Amsterdam

Tuesday August 13, 2013 5:00pm - 6:30pm
Chicago 8

5:00pm

Research Session 15: Social and information networks
571 WiseMarket: A New Paradigm for Managing Wisdom of Online Social Users
Authors: Chen Cao, HKUST; Yongxin Tong, HKUST; Lei Chen, HKUST; H.V. Jagadish, University of Michigan

295 Multi-Label Relational Neighbor Classification using Social Context Features
Authors: Xi Wang, University of Central Florida; Gita Sukthankar, University of Central Florida

1162 Scalable Text and Link Analysis with Mixed-Topic Link Models
Authors: YAOJIA ZHU, University of New Mexico; Xiaoran Yan, University of New Mexico; Lise Getoor, The University of Maryland College Park; Cristopher Moore, Santa Fe Institute

730 Collaborative Boosting for Activity Classification in Microblogs
Authors: Yangqiu Song, HKUST; Zhengdong Lu, Huawei; Cane Wing-Ki Leung, Huawei; Qiang Yang, Hong Kong University of Science and Technology

Tuesday August 13, 2013 5:00pm - 6:30pm
Chicago 9

1:30am

Best Papers [Joint Research and IG]
Wednesday August 14, 2013 1:30am - 3:00am
Chicago 6/7
 
Wednesday, August 14
 

7:30am

Registration
Wednesday August 14, 2013 7:30am - 3:00pm
Chicago Prom

8:00am

8:00am

Exhibits
Wednesday August 14, 2013 8:00am - 12:00pm
Sheraton 5

8:45am

Keynote: Predicting the Present with Search Engine Data
Many businesses now have almost real time data available about their operations. This data can be helpful in contemporaneous prediction (“nowcasting”) of various economic indicators. We illustrate how one can use Google search data to nowcast economic metrics of interest, and discuss some of the ramifications for research and policy. Our approach combines three Bayesian techniques: Kalman filtering, spike-and-slab regression, and model averaging. We use Kalman filtering to whiten the time series in question by removing the trend and seasonal behavior. Spike-andslab regression is a Bayesian method for variable selection that works even in cases where the number of predictors is far larger than the number of observations. Finally, we use Markov Chain Monte Carlo methods to sample from the posterior distribution for our model; the final forecast is an average over thousands of draws from the posterior. An advantage of the Bayesian approach is that it allows us to specify informative priors that affect the number and type of predictors in a flexible way.

Speakers

Wednesday August 14, 2013 8:45am - 10:00am
Chicago 6/7

10:00am

Coffee Break
Wednesday August 14, 2013 10:00am - 10:30am
Prom/Sheraton5

10:00am

10:30am

Industry Government Session 8: Advertising and Search
483 Palette Power: Enabling Visual Search through Colors
Authors: Anurag Bhardwaj, eBay Research Labs; Atish Das Sarma, EBay Research Lab; Wei Di, EBay Research Labs; Raffay Hamid, eBay Research Labs; Robinson Piramuthu, eBay Research Labs; Neel Sundaresan, eBay Research

416 A Unified Search Federation System Based on Online User Feedback
Authors: Jie Luo, Yahoo! Labs; Sudarshan Lamkhede, Yahoo! Labs Rochit Sapra; Evans Hsu, Yahoo!; Helen Song, Yahoo!; Yi Chang, Yahoo Labs

760 Scalable Supervised Dimensionality Reduction Using Clustering
Authors: Troy Raeder, Media6Degrees; Claudia Perlich, M6D; Brian Dalessandro, M6D; Ori Stitelman, M6D; Foster Provost, NYU Stern School of Business

473 Ad Click Prediction: a View from the Trenches
Authors: H. Brendan McMahan;l Gary Holt, Google, Inc.; D Sculley, Google

546 Why People Hate Your App Making Sense of User Feedback in a Mobile App Store
Authors: Bin Fu, Carnegie Mellon University; Jialiu Lin, Carnegie Mellon University; Lei Li, UC Berkeley; Christos Faloutsos, CMU; Jason Hong, Carnegie Mellon University; Norman Sadeh-Koniecpol, Carnegie Mellon University

Wednesday August 14, 2013 10:30am - 12:00pm
Michigan

10:30am

Research Session 16: Graph mining and sampling
752 Optimal Algorithms for Network Inference
Authors: Bruno Abrahao, Cornell; Flavio Chierichetti, Sapienza University; Robert Kleinberg, Cornell; Alessandro Panconesi, Sapienza University of Rome

1024 Debiasing Social Wisdom
Authors: Abhimanyu Das, Microsoft; Sreenivas Gollapudi, Microsoft Research; Rina Panigrahy, Microsoft Research; Mahyar Salek, Microsoft

1148 Mining Discriminative Subgraphs from Global-state Networks
Authors: Sayan Ranu, IBM; Minh Hoang, UC Santa Barbara; Ambuj Singh, UC Santa Barbara

249 Approximate Graph Mining with Label Costs
Authors: Pranay Anchuri, RPI; Mohammed Zaki, Rensselaer Polytechnic Institute; Omer Barkol, HP Labs; Shahar Golan, HP Labs; Moshe Shamy, HP Labs

Wednesday August 14, 2013 10:30am - 12:00pm
Chicago 8

10:30am

Research Session 17: Rule and pattern mining
392 Summarizing Probabilistic Frequent Patterns: A Fast Approach
Authors: Chunyang Liu, UTS; Ling Chen, ; Chengqi Zhang, QCIS, University of Technology, Sydney

668 Mining High Utility Episodes in Complex Event Sequences
Authors: Cheng-Wei Wu, National Cheng Kung University; Yu Feng Lin, National Cheng Kung University, Taiwan, ROC; Philip Yu, University of Illinois; Vincent Tseng, National Cheng Kung University

245 Mining Frequent Graph Patterns with Differential Privacy
Authors: Entong Shen, North Carolina State Univ; Ting Yu, North Carolina State University

Wednesday August 14, 2013 10:30am - 12:00pm
Chicago 9

10:30am

Research Session 18: Web mining
240 Statistical Quality Estimation for General Crowdsourcing Tasks
Authors: Yukino Baba, The University of Tokyo; Hisashi Kashima, The University of Tokyo

1205 Exploring Consumer Psychology for Click Prediction in Sponsored Search
Authors: Taifeng Wang, Microsoft; Jiang Bian, ; Tie-Yan Liu, Microsoft Research

172 SiGMa: Simple Greedy Matching for Aligning Large Knowledge Bases
Authors: Simon Lacoste-Julien, INRIA / ENS; Konstantina Palla, University of Cambridge; Alex Davies, University of Cambridge; Gjergji Kasneci, Microsoft Research; Thore Graepel, Microsoft Research; Zoubin Ghahramani, Cambridge University

Wednesday August 14, 2013 10:30am - 12:00pm
Chicago 10

12:00pm

Lunch (Open Time)
Wednesday August 14, 2013 12:00pm - 1:30am
Anywhere

3:00pm

Plenary Feedback, Concluding Session and Prizes
Wednesday August 14, 2013 3:00pm - 3:30pm
Chicago 6/7