Loading…
KDD2013 has ended
Welcome to KDD-2013’s online program

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

ipe [clear filter]
Monday, August 12
 

10:30am

IPE 1 : Mining the digital universe of data to develop personalized cancer therapies - Eric Schadt
Abstract:The development of a personalized approach to medical care is now well recognized as an urgent priority. This approach is particularly important in oncology, where it is well understood that each cancer diagnosis is unique at the molecular level, arising from a particular and specific collection of genetic alterations. Furthermore, taking a personalized approach to oncology may expedite the treatment process, pre-empting therapeutic decisions based on fewer data in favor of treatments targeted to an individuals tumor. This directed course may be key to survival for many patients who are terminal or have failed standard therapies. I will discuss a personalized cancer therapy program we have initiated that involves DNA and RNA sequencing of a patients tumor and germline DNA and the projection of high-dimensional features extracted from these data onto predictive network models constructed by integrating large-scale, high dimensional data that exists for the patients cancer type. From the causal network inference procedures to the ensemble-based classification methods, big data analytics is front and center for interpreting large-scale patient data in the context of the digital universe of information that exists for the patients condition. Bio: Dr. Eric Schadt is Chairman and Professor of the Department of Genetics and Genomic Sciences at the Icahn School of Medicine at Mount Sinai and the Director of the Institute for Genomics and Multiscale Biology at Mount Sinai. Previously, Dr. Schadt had been the Chief Scientific Officer at Pacific Biosciences, overseeing the scientific strategy for the company, including creating the vision for next-generation sequencing applications of the companys technology. Dr. Schadt is also a founding member of Sage Bionetworks, an open access genomics initiative designed to build and support databases and an accessible platform for creating innovative, dynamic models of disease. Dr. Schadts current efforts at Mount Sinai involve the generation and integration of large-scale, high-dimension molecular, cellular, and clinical data to build more predictive models of disease, a research direction motivated by the genomics and systems biology research he led at Merck to elucidate common human diseases and drug response using novel computational approaches applied to genetic and molecular profiling data. Dr. Schadt received his B.S. in applied mathematics/computer science from California Polytechnic State University, his M.A. in pure mathematics from UCD, and his Ph.D. in bio-mathematics from UCLA (requiring Ph.D. candidacy in molecular biology and mathematics).

Monday August 12, 2013 10:30am - 12:00pm
Superior

10:30am

IPE 1: To Buy or Not to BuyThat is the Question - Oren Etzioni
Abstract: Shopping can be decomposed into three basic questions: what, where, and when to buy? In this talk, Ill describe how we utilize advanced data-mining and text-mining techniques at Decide.com (and earlier at Farecast) to solve these problems for on-line shoppers. Our algorithms have predicted prices utilizing billions of data points, and ranked products based on millions of reviews. Bio: Oren Etzioni received his PhD from Carnegie Mellon in 1991. He is the WRF Entrepreneurship Professor of Computer Science at the University of Washington. Oren is the author of over 200 technical papers, cited over 18,000 times. He received the NSF Young Investigator Award in 1993, and was selected as a AAAI Fellow a decade later. In 2007, he received the Robert S. Engelmore Memorial Award for long-standing technical and entrepreneurial contributions to Artificial Intelligence. Oren is the founder of three companies focused on increased transparency for shoppers. His first company, Netbot, was the first online comparison shopping company (acquired by Excite in 1997). His second company, Farecast, advised travelers when to buy their air tickets. Farecast was acquired by Microsoft in 2008 and became the foundation for Bing Travel. Decide.com, founded in 2010, utilizes cutting-edge data-mining methods to minimize buyers remorse. In 2013, Oren was chosen as the Geek of the Year by a vote of the Seattle Tech. Community.

Monday August 12, 2013 10:30am - 12:00pm
Superior

3:00pm

IPE 2 : Adaptive Adversaries: Building Systems to Fight Fraud and Cyber Intruders - Ari Gesher, Palantir
Statistical machine learning / knowledge discovery techniques tend to fail when faced with an adaptive adversary attempting to evade detection in the data. Humans do an excellent job of correctly spotting adaptive adversaries given a good way to digest the data. On the other hand, humans are glacially slow and error-prone when it comes to moving through very large volumes of data, a task best left to the machines. Fighting complex fraud and cyber-security threats requires a symbiosis between the computers and teams of human analysts. The computers use algorithmic analysis, heuristics, and/or statistical characterization to nd interesting simple patterns in the data. These candidate events are then queued for in-depth human analysis in rich, expressive, interactive analysis environments. In this talk, well take a look at case studies of three different systems, using a partnership of automation and human analysis on large scale data to nd the clandestine human behavior that these datasets hold, including a discussion of the backend systems architecture and a demo of the interactive analysis environment. The backend systems architecture is a mix of open source technologies, like Cassandra, Lucene, and Hadoop, and some new components that bind them all together. The interactive analysis environment allows seamless pivoting between semantic, geospatial, and temporal analysis with a powerful GUI interface thats usable by non-data scientists. The systems are real systems currently in use by commercial banks, pharmaceutical companies, and governments.

Monday August 12, 2013 3:00pm - 4:30pm
Superior

3:00pm

IPE 2: The business impact of deep learning
In the last year deep learning has gone from being a special purpose machine learning technique used mainly for image and speech recognition, to becoming a general purpose machine learning tool. This has broad implications for all organizations that rely on data analysis. It represents the latest development in a general trend towards more automated algorithms, and away from domain specific knowledge. For organizations that rely on domain expertise for their competitive advantage, this trend could be extremely disruptive. For start-ups interested in entering established markets, this trend could be a major opportunity. This talk will be a non-technical introduction to general-purpose deep learning, and its potential business impact.

Monday August 12, 2013 3:00pm - 4:30pm
Superior
 
Tuesday, August 13
 

10:30am

IPE 3 : Hadoop: A View from the Trenches
From its beginnings as a framework for building web crawlers for small-scale search engines to being one of the most promising technologies for building datacenter-scale distributed computing and storage platforms, Apache Hadoop has come far in the last seven years. In this talk I will reminisce about the early days of Hadoop, and will give an overview of the current state of the Hadoop ecosystem, and some real-world use cases of this open source platform. I will conclude with some crystal gazing in the future of Hadoop and associated technologies.


Tuesday August 13, 2013 10:30am - 12:00pm
Superior

10:30am

IPE 3: Targeting and Influencing at Scale: From Presidential Elections to Social Good
If youre still recovering from the barrage of ads, news, emails, Facebook posts, and newspaper articles that were giving you the latest poll numbers, asking you to volunteer, donate money, and vote, this talk will give you a look behind the scenes on why you were seeing what you were seeing. I will talk about how machine learning and data mining along with randomized experiments were used to target and influence tens of millions of people. Beyond the presidential elections, these methodologies for targeting and influence have the power to solve big problems in education, healthcare, energy, transportation, and related areas. I will talk about some recent work were doing at the University of Chicago Data Science for Social Good summer fellowship program working with non-profits and government organizations to tackle some of these challenges.

Speakers

Tuesday August 13, 2013 10:30am - 12:00pm
Superior

3:00pm

IPE 4 : Using Big Data to Solve Small Data Problems
The brief history of knowledge discovery is filled with products that promised to bring BI to the masses. But how do you build a product that truly bridges the gap between the conceptual simplicity of questions and answers and the structure needed to query traditional data stores?

In this talk, Chris Neumann will discuss how DataHero applied the principles of user-centric design and development over a year and a half to create a product with which more than 95% of new users can get answers on their first attempt. Hell demonstrate the process DataHero uses to determine the best combination of algorithms and user interface concepts needed to create intuitive solutions to potentially complex interactions, including:
- Determining the structure of files uploaded by users - Accurately identifying data types within files
- Presenting users with an optimal visualization for any combination of data
- Helping users to ask questions of data when they dont know what to do

Chris will also talk about what its like to start a Big Data company and how he applied lessons from his time as the first engineer at Aster Data Systems to DataHero.

Speakers

Tuesday August 13, 2013 3:00pm - 6:00pm
Superior

3:00pm

IPE 4: Cyber Security How Visual Analytics Unlock Insight
In the Cyber Security domain, we have been collecting big data for almost two decades. The volume and variety of our data is extremely large, but understanding and capturing the semantics of the data is even more of a challenge. Finding the needle in the proverbial haystack has been attempted from many different angles. In this talk we will have a look at what approaches have been explored, what has worked, and what has not. We will see that there is still a large amount of work to be done and data mining is going to play a central role. Well try to motivate that in order to successfully find bad guys, we will have to embrace a solution that not only leverages clever data mining, but employs the right mix between human computer interfaces, data mining, and scalable data platforms. Traditionally, cyber security has been having its challenges with data mining. We are different. We will explore how to adopt data mining algorithms to the security domain. Some approaches like predictive analytics are extremely hard, if not impossible. How would you predict the next cyber attack? Others need to be tailored to the security domain to make them work. Visualization and visual analytics seem to be extremely promising to solve cyber security issues. Situational awareness, large-scale data exploration, knowledge capture, and forensic investigations are four top use-cases we will discuss. Visualization alone, however, does not solve security problems. We need algorithms that support the visualizations. For example to reduce the amount of data so an analyst can deal with it, in both volume and semantics.

Speakers

Tuesday August 13, 2013 3:00pm - 6:00pm
Superior

5:00pm

IPE Panel Discussion: Death of the Expert? The Rise of Algorithms and Decline of Domain Experts
Title: Death of the expert? The rise of algorithms and decline of domain experts Abstract: Machine learning algorithms used to require features to be carefully hand created and filtered. The algorithms of yesteryear needed us to tell them about interactions, non-linearities, non-normal distributions, etc etc and if we added too many features to the model, we would over-fit and end up with something that was useless in practice. That meant that domain experts were vital in manipulating and filtering the data to create just the right set of inputs. But now that we have deep learning nets, ensembles of decision trees, and so forth, features are created automatically, and over-fitting is avoided even with huge numbers of features. Furthermore, these general purpose algorithms have proven their worth in everything from video object tracking to speech recognition to automated drug discovery to natural language processing. So where does that leave the role of the domain expert? In this panel, we will discuss and debate where domain experts fit in to this new world of general purpose machine learning algorithms. Moderator: Jeremy Howard, Kaggle Panelists: Oren Etzioni, University of Washington John Akred, Idibon Robert Munro, Silicon Valley Data Science

Tuesday August 13, 2013 5:00pm - 6:30pm
Superior