DSW 2019DSW 2019

IEEE
IEEE Signal Processing Society
  • ORGANIZING COMMITTEE
  • PLENARY SPEAKERS
  • PAPER SUBMISSION
Menu
  • Home
  • Call for Papers
  • Paper Submission
  • SPS Journal Submission
  • Program
  • Tutorials
  • Plenary Speakers
  • Special Sessions
  • Registration
  • Organizing Committee
  • Paper Awards
  • Presentation Guidelines
  • Student Travel Support
  • Venue & Host City
  • Accommodation
  • Travel & Visa
  • Minneapolis, MN

    June 2-5, 2019

    The second IEEE Data Science Workshop

Plenary Speakers

    Monday, June 3

    Morning

  • Vipin Kumar

    Regents Professor and William Norris Endowed Chair
    Computer Science and Engineering
    University of Minnesota
    USA
  • Big Data in Climate and Earth Sciences: Challenges and Opportunities for Data Science

    The climate and earth sciences have recently undergone a rapid transformation from a data-poor to a data-rich environment. In particular, massive amount of data about Earth and its environment is now continuously being generated by a large number of Earth observing satellites as well as physics-based earth system models running on large-scale computational platforms. These massive and information-rich datasets offer huge potential for understanding how the Earth's climate and ecosystem have been changing and how they are being impacted by humans actions. This talk will discuss various challenges involved in analyzing these massive data sets as well as opportunities they present for both advancing machine learning as well as the science of climate change in the context of monitoring the state of the tropical forests and surface water on a global scale.

    You can access the slides here.

    More info about the speaker here.

  • Afternoon

  • Robert D. Nowak

    Nosbusch Professor in Engineering
    University of Wisconsin-Madison
    USA
  • Active Machine Learning: From Theory to Practice

    Machine learning has advanced considerably in recent years, but mostly in well-defined domains using huge amounts of human-labeled training data. Machines can recognize objects in images and translate text, but they must be trained with more images and text than a person can see in nearly a lifetime. Generating the necessary training data sets can require an enormous human effort. Active machine learning tackles this issue by designing learning algorithms that automatically and adaptively select the most informative data for labeling so that human time is not wasted on irrelevant or trivial examples. This lecture will cover theory, methods, and applications of active machine learning.

    You can access the slides here.

    More info about the speaker here.

  • Tuesday, June 4

    Morning

  • Bin Yu

    Chancellor's Professor
    Statistics, Electrical Engineering and Computer Science
    University of California, Berkeley
    USA
  • PCS Workflow, Interpretable Machine Learning, and DeepTune

    In this talk, I'd like to discuss the intertwining importance and connections of three principles of data science: predictability, computability and stability (PCS) and the PCS workflow that is built on the three principles. I will also define interpretable machine learning (iML) through the PDR desiderata (Predictive accuracy, Descriptive accuracy and Relevancy) and discuss stability as a minimum requirement for interpretability. The principles and iML desiderata, PCS and PDR, will be demonstrated in the context of a collaborative project in neuroscience, DeepTune, for interpretable data results and testable hypothesis generation. If time allows, I will present proposed PCS inference that includes perturbation intervals and PCS hypothesis testing. PCS inference uses prediction screening and takes into account both data and model perturbations. Last but not least, a PCS documentation is proposed based on Rmarkdown, iPython, or Jupyter Notebook, with publicly available, reproducible codes and narratives to back up human choices made throughout an analysis. (The PCS workflow and documentation are demonstrated in a genomics case study available on Zenodo.)

    You can access the slides here.

    More info about the speaker here.

  • Afternoon

  • Rong Jin

    Principal Engineer
    Alibaba Group
    China
  • Optimization in Alibaba: Beyond Convexity

    In this talk, we will talk about the recent developments in large-scale optimization that are beyond the conventional wisdom of convex optimization. I will specifically address three challenging problems that have found applications in many Alibaba businesses. In the first application, we study the problem of optimizing truncated loss functions that are of particularly importance when coming to learning from heavily tailed distributions. We show that, despite of its non-convexity, under appropriate condition, a variant of gradient descent could efficiently find the global optimal. In the second application, we study the problem of how to find local optimal in the case of non-convex optimization. We show that with introduction of appropriate random perturbation, we could find the local optimal at the rate of O(1/gamma^3) where gamma defines the suboptimality, which significantly improves the results of the existing studies. In the last application, we consider optimizing a continuous function over a discrete space comprised of a huge number of data points. The special instances of this problem include approximate nearest neighbor search and learning a quantized neural network. The most intriguing result from our study is that this optimization problem becomes relatively easier when the size of discrete space is sufficiently large. We provide results of both theoretical analysis and empirical studies.

    You can access the slides here.

    More info about the speaker here.

  • Wednesday, June 5

    Morning

  • Xiao-Li Meng

    Whipple V. N. Jones Professor of Statistics
    Harvard University
    USA
  • Is it a Computing Algorithm or a Statistical Procedure: Can you tell or should you care?

    The line between computing algorithms and statistical procedures is becoming increasingly blurred, as practitioners are now typically given a black box, which turns data into an “answer”. Is such a black box a computing algorithm or a statistical procedure? Does it matter that we know which is which? This talk reports my contemplations of such questions that originated in my taking part in a project that investigates the self-consistency principle introduced by Efron (1967). We will start with a simple regression problem to illustrate a self-consistency method and the audiences will be invited to contemplate whether it is a magical computing algorithm or a powerful statistical procedure. We will then discuss how such contemplations have played critical roles in developing the self-consistency principle into a “Likelihood-Free EM algorithm” for semi/non-parametric estimation with incomplete data and under an arbitrary loss function, capable of addressing wavelets de-noising with irregularly spaced data as well as variable selection via LASSO-type of methods with incomplete data. Throughout the talk, the audience will also be invited to consider a widely open problem: how to formulate in general the trade-off between statistical efficiency and computational efficiency? (This talk is based on joint work with Thomas Lee and Zhan Li.)

    You can access the slides here.

    More info about the speaker here.

  • Afternoon

    (Shared with the Graph Signal Processing workshop)
  • Jennifer Neville

    Prof. of Computer Science and Statistics
    Purdue University
    USA
  • Towards Relational AI – The good, the bad, and the ugly of learning over networks

    In the last 20 years, there has been a great deal of research on machine learning methods for graphs, networks, and other types of relational data. By moving beyond the independence assumptions of more traditional ML methods, relational models are now able to successfully exploit the additional information that is often observed in relationships among entities. Specifically, network models are able to use relational information to improve predictions about user interests, behavior, and interactions, particularly when individual data is sparse. The tradeoff however, is that the heterogeneity, partial-observability, and interdependence of large-scale network data can make it difficult to develop efficient and unbiased methods, due to several algorithmic and statistical challenges. In this talk, I will discuss these issues while surveying several general approaches used for relational learning in large-scale social and information networks. In addition, to reflect on the movement toward pervasive use of the models in personalized online systems, I will discuss potential implications for privacy, polarization of communities, and spread of misinformation.

    You can access the slides here.

    More info about the speaker here.

 

DSW - IEEE Data Science Workshop

IEEE Privacy Policy | IEEE Event Terms and Conditions

IEEE Nondiscrimination Policy | Copyright IEEE Signal Processing Society - All Rights Reserved.