DigitalCommons@Kennesaw State University

Home > CCSE > Data Science and Analytics > PhD DSA

Doctor of Data Science and Analytics Dissertations

The PhD Website

The Ph.D. in Data Science and Analytics is an advanced degree with a dual focus of application and research - where students will engage in real world business problems, which will inform and guide their research interests.

We launched the first formal PhD program in Data Science in 2015. Our program sits at the intersection of computer science, statistics, mathematics, and business. Our students engage in relevant research with faculty from across our eleven colleges. As one of the institutions on the forefront of the development of data science as an academic discipline, we are committed to developing the next generation of Data Science leaders, researchers, and educators. Culturally, we are committed to the discipline of Data Science, through ethical practices, attention to fairness, to a diverse student body, to academic excellence, and research which makes positive contributions to our local, regional, and global community. -Sherry Ni, Director, Ph.D. in Data Science and Analytics

This degree will train individuals to translate and facilitate new innovative research, structured and unstructured, complex data into information to improve decision making. This curriculum includes heavy emphasis on programming, data mining, statistical modeling, and the mathematical foundations to support these concepts. Importantly, the program also emphasizes communication skills – both oral and written – as well as application and tying results to business and research problems.

Need to Submit Your Dissertation? Submit Here!

Dissertations from 2024 2024.

A Holistic and Collaborative Behavioral Health Detection Framework Using Sensitive Police Narratives , Martin Keagan Wynne Brown

Multi-Modality Transformer for E-Commerce: Inferring User Purchase Intention to Bridge the Query-Product Gap , Srivatsa Mallapragada

Dissertations from 2023 2023

Quantification of Various Types of Biases in Large Language Models , Sudhashree Sayenju

Dissertations from 2022 2022

Appley: Approximate Shapley Values for Model Explainability in Linear Time , Md Shafiul Alam

Ethical Analytics: A Framework for a Practically-Oriented Sub-Discipline of AI Ethics , Jonathan Boardman

Novel Instance-Level Weighted Loss Function for Imbalanced Learning , Trent Geisler

Debiasing Cyber Incidents – Correcting for Reporting Delays and Under-reporting , Seema Sangari

Dissertations from 2021 2021

Integrated Machine Learning Approaches to Improve Classification performance and Feature Extraction Process for EEG Dataset , Mohammad Masum

A Distance-Based Clustering Framework for Categorical Time Series: A Case Study in Episodes of Care Healthcare Delivery System , Lauren Staples

Dissertations from 2020 2020

A CREDIT ANALYSIS OF THE UNBANKED AND UNDERBANKED: AN ARGUMENT FOR ALTERNATIVE DATA , Edwin Baidoo

Quantitatively Motivated Model Development Framework: Downstream Analysis Effects of Normalization Strategies , Jessica M. Rudd

Data-driven Investment Decisions in P2P Lending: Strategies of Integrating Credit Scoring and Profit Scoring , Yan Wang

A Novel Penalized Log-likelihood Function for Class Imbalance Problem , Lili Zhang

ATTACK AND DEFENSE IN SECURITY ANALYTICS , Yiyun Zhou

Dissertations from 2019 2019

One and Two-Step Estimation of Time Variant Parameters and Nonparametric Quantiles , Bogdan Gadidov

Biologically Interpretable, Integrative Deep Learning for Cancer Survival Analysis , Jie Hao

Deep Embedding Kernel , Linh Le

Ordinal HyperPlane Loss , Bob Vanderheyden

Advanced Search

  • Notify me via email or RSS
  • All Collections
  • Disciplines
  • Conferences
  • Faculty Works
  • Open Access
  • Research Support
  • Student Works
  • Data Science Homepage

Useful Links

  • Training Materials

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright DigitalCommons@Kennesaw State University ISSN: 2576-6805

LIBRARIES | ARCH

Data science masters theses.

The Master of Science in Data Science program requires the successful completion of 12 courses to obtain a degree. These requirements cover six core courses, a leadership or project management course, two required courses corresponding to a declared specialization, two electives, and a capstone project or thesis. This collection contains a selection of masters theses or capstone projects by MSDS graduates.

Collection Details

List of items in this collection
  Title Date Added Visibility
 

2022-06-15
 

2022-06-05
 

2020-06-16
 

2020-06-13
 

2019-11-26
 

2019-11-21
 

2019-06-23

Machine Learning - CMU

PhD Dissertations

PhD Dissertations

[all are .pdf files].

The Neurodynamic Basis of Real World Face Perception Arish Alreja, 2024

Towards More Powerful Graph Representation Learning Lingxiao Zhao, 2024

Robust Machine Learning: Detection, Evaluation and Adaptation Under Distribution Shift Saurabh Garg, 2024

UNDERSTANDING, FORMALLY CHARACTERIZING, AND ROBUSTLY HANDLING REAL-WORLD DISTRIBUTION SHIFT Elan Rosenfeld, 2024

Representing Time: Towards Pragmatic Multivariate Time Series Modeling Cristian Ignacio Challu, 2024

Foundations of Multisensory Artificial Intelligence Paul Pu Liang, 2024

Advancing Model-Based Reinforcement Learning with Applications in Nuclear Fusion Ian Char, 2024

Learning Models that Match Jacob Tyo, 2024

Improving Human Integration across the Machine Learning Pipeline Charvi Rastogi, 2024

Reliable and Practical Machine Learning for Dynamic Healthcare Settings Helen Zhou, 2023

Automatic customization of large-scale spiking network models to neuronal population activity (unavailable) Shenghao Wu, 2023

Estimation of BVk functions from scattered data (unavailable) Addison J. Hu, 2023

Rethinking object categorization in computer vision (unavailable) Jayanth Koushik, 2023

Advances in Statistical Gene Networks Jinjin Tian, 2023 Post-hoc calibration without distributional assumptions Chirag Gupta, 2023

The Role of Noise, Proxies, and Dynamics in Algorithmic Fairness Nil-Jana Akpinar, 2023

Collaborative learning by leveraging siloed data Sebastian Caldas, 2023

Modeling Epidemiological Time Series Aaron Rumack, 2023

Human-Centered Machine Learning: A Statistical and Algorithmic Perspective Leqi Liu, 2023

Uncertainty Quantification under Distribution Shifts Aleksandr Podkopaev, 2023

Probabilistic Reinforcement Learning: Using Data to Define Desired Outcomes, and Inferring How to Get There Benjamin Eysenbach, 2023

Comparing Forecasters and Abstaining Classifiers Yo Joong Choe, 2023

Using Task Driven Methods to Uncover Representations of Human Vision and Semantics Aria Yuan Wang, 2023

Data-driven Decisions - An Anomaly Detection Perspective Shubhranshu Shekhar, 2023

Applied Mathematics of the Future Kin G. Olivares, 2023

METHODS AND APPLICATIONS OF EXPLAINABLE MACHINE LEARNING Joon Sik Kim, 2023

NEURAL REASONING FOR QUESTION ANSWERING Haitian Sun, 2023

Principled Machine Learning for Societally Consequential Decision Making Amanda Coston, 2023

Long term brain dynamics extend cognitive neuroscience to timescales relevant for health and physiology Maxwell B. Wang, 2023

Long term brain dynamics extend cognitive neuroscience to timescales relevant for health and physiology Darby M. Losey, 2023

Calibrated Conditional Density Models and Predictive Inference via Local Diagnostics David Zhao, 2023

Towards an Application-based Pipeline for Explainability Gregory Plumb, 2022

Objective Criteria for Explainable Machine Learning Chih-Kuan Yeh, 2022

Making Scientific Peer Review Scientific Ivan Stelmakh, 2022

Facets of regularization in high-dimensional learning: Cross-validation, risk monotonization, and model complexity Pratik Patil, 2022

Active Robot Perception using Programmable Light Curtains Siddharth Ancha, 2022

Strategies for Black-Box and Multi-Objective Optimization Biswajit Paria, 2022

Unifying State and Policy-Level Explanations for Reinforcement Learning Nicholay Topin, 2022

Sensor Fusion Frameworks for Nowcasting Maria Jahja, 2022

Equilibrium Approaches to Modern Deep Learning Shaojie Bai, 2022

Towards General Natural Language Understanding with Probabilistic Worldbuilding Abulhair Saparov, 2022

Applications of Point Process Modeling to Spiking Neurons (Unavailable) Yu Chen, 2021

Neural variability: structure, sources, control, and data augmentation Akash Umakantha, 2021

Structure and time course of neural population activity during learning Jay Hennig, 2021

Cross-view Learning with Limited Supervision Yao-Hung Hubert Tsai, 2021

Meta Reinforcement Learning through Memory Emilio Parisotto, 2021

Learning Embodied Agents with Scalably-Supervised Reinforcement Learning Lisa Lee, 2021

Learning to Predict and Make Decisions under Distribution Shift Yifan Wu, 2021

Statistical Game Theory Arun Sai Suggala, 2021

Towards Knowledge-capable AI: Agents that See, Speak, Act and Know Kenneth Marino, 2021

Learning and Reasoning with Fast Semidefinite Programming and Mixing Methods Po-Wei Wang, 2021

Bridging Language in Machines with Language in the Brain Mariya Toneva, 2021

Curriculum Learning Otilia Stretcu, 2021

Principles of Learning in Multitask Settings: A Probabilistic Perspective Maruan Al-Shedivat, 2021

Towards Robust and Resilient Machine Learning Adarsh Prasad, 2021

Towards Training AI Agents with All Types of Experiences: A Unified ML Formalism Zhiting Hu, 2021

Building Intelligent Autonomous Navigation Agents Devendra Chaplot, 2021

Learning to See by Moving: Self-supervising 3D Scene Representations for Perception, Control, and Visual Reasoning Hsiao-Yu Fish Tung, 2021

Statistical Astrophysics: From Extrasolar Planets to the Large-scale Structure of the Universe Collin Politsch, 2020

Causal Inference with Complex Data Structures and Non-Standard Effects Kwhangho Kim, 2020

Networks, Point Processes, and Networks of Point Processes Neil Spencer, 2020

Dissecting neural variability using population recordings, network models, and neurofeedback (Unavailable) Ryan Williamson, 2020

Predicting Health and Safety: Essays in Machine Learning for Decision Support in the Public Sector Dylan Fitzpatrick, 2020

Towards a Unified Framework for Learning and Reasoning Han Zhao, 2020

Learning DAGs with Continuous Optimization Xun Zheng, 2020

Machine Learning and Multiagent Preferences Ritesh Noothigattu, 2020

Learning and Decision Making from Diverse Forms of Information Yichong Xu, 2020

Towards Data-Efficient Machine Learning Qizhe Xie, 2020

Change modeling for understanding our world and the counterfactual one(s) William Herlands, 2020

Machine Learning in High-Stakes Settings: Risks and Opportunities Maria De-Arteaga, 2020

Data Decomposition for Constrained Visual Learning Calvin Murdock, 2020

Structured Sparse Regression Methods for Learning from High-Dimensional Genomic Data Micol Marchetti-Bowick, 2020

Towards Efficient Automated Machine Learning Liam Li, 2020

LEARNING COLLECTIONS OF FUNCTIONS Emmanouil Antonios Platanios, 2020

Provable, structured, and efficient methods for robustness of deep networks to adversarial examples Eric Wong , 2020

Reconstructing and Mining Signals: Algorithms and Applications Hyun Ah Song, 2020

Probabilistic Single Cell Lineage Tracing Chieh Lin, 2020

Graphical network modeling of phase coupling in brain activity (unavailable) Josue Orellana, 2019

Strategic Exploration in Reinforcement Learning - New Algorithms and Learning Guarantees Christoph Dann, 2019 Learning Generative Models using Transformations Chun-Liang Li, 2019

Estimating Probability Distributions and their Properties Shashank Singh, 2019

Post-Inference Methods for Scalable Probabilistic Modeling and Sequential Decision Making Willie Neiswanger, 2019

Accelerating Text-as-Data Research in Computational Social Science Dallas Card, 2019

Multi-view Relationships for Analytics and Inference Eric Lei, 2019

Information flow in networks based on nonstationary multivariate neural recordings Natalie Klein, 2019

Competitive Analysis for Machine Learning & Data Science Michael Spece, 2019

The When, Where and Why of Human Memory Retrieval Qiong Zhang, 2019

Towards Effective and Efficient Learning at Scale Adams Wei Yu, 2019

Towards Literate Artificial Intelligence Mrinmaya Sachan, 2019

Learning Gene Networks Underlying Clinical Phenotypes Under SNP Perturbations From Genome-Wide Data Calvin McCarter, 2019

Unified Models for Dynamical Systems Carlton Downey, 2019

Anytime Prediction and Learning for the Balance between Computation and Accuracy Hanzhang Hu, 2019

Statistical and Computational Properties of Some "User-Friendly" Methods for High-Dimensional Estimation Alnur Ali, 2019

Nonparametric Methods with Total Variation Type Regularization Veeranjaneyulu Sadhanala, 2019

New Advances in Sparse Learning, Deep Networks, and Adversarial Learning: Theory and Applications Hongyang Zhang, 2019

Gradient Descent for Non-convex Problems in Modern Machine Learning Simon Shaolei Du, 2019

Selective Data Acquisition in Learning and Decision Making Problems Yining Wang, 2019

Anomaly Detection in Graphs and Time Series: Algorithms and Applications Bryan Hooi, 2019

Neural dynamics and interactions in the human ventral visual pathway Yuanning Li, 2018

Tuning Hyperparameters without Grad Students: Scaling up Bandit Optimisation Kirthevasan Kandasamy, 2018

Teaching Machines to Classify from Natural Language Interactions Shashank Srivastava, 2018

Statistical Inference for Geometric Data Jisu Kim, 2018

Representation Learning @ Scale Manzil Zaheer, 2018

Diversity-promoting and Large-scale Machine Learning for Healthcare Pengtao Xie, 2018

Distribution and Histogram (DIsH) Learning Junier Oliva, 2018

Stress Detection for Keystroke Dynamics Shing-Hon Lau, 2018

Sublinear-Time Learning and Inference for High-Dimensional Models Enxu Yan, 2018

Neural population activity in the visual cortex: Statistical methods and application Benjamin Cowley, 2018

Efficient Methods for Prediction and Control in Partially Observable Environments Ahmed Hefny, 2018

Learning with Staleness Wei Dai, 2018

Statistical Approach for Functionally Validating Transcription Factor Bindings Using Population SNP and Gene Expression Data Jing Xiang, 2017

New Paradigms and Optimality Guarantees in Statistical Learning and Estimation Yu-Xiang Wang, 2017

Dynamic Question Ordering: Obtaining Useful Information While Reducing User Burden Kirstin Early, 2017

New Optimization Methods for Modern Machine Learning Sashank J. Reddi, 2017

Active Search with Complex Actions and Rewards Yifei Ma, 2017

Why Machine Learning Works George D. Montañez , 2017

Source-Space Analyses in MEG/EEG and Applications to Explore Spatio-temporal Neural Dynamics in Human Vision Ying Yang , 2017

Computational Tools for Identification and Analysis of Neuronal Population Activity Pengcheng Zhou, 2016

Expressive Collaborative Music Performance via Machine Learning Gus (Guangyu) Xia, 2016

Supervision Beyond Manual Annotations for Learning Visual Representations Carl Doersch, 2016

Exploring Weakly Labeled Data Across the Noise-Bias Spectrum Robert W. H. Fisher, 2016

Optimizing Optimization: Scalable Convex Programming with Proximal Operators Matt Wytock, 2016

Combining Neural Population Recordings: Theory and Application William Bishop, 2015

Discovering Compact and Informative Structures through Data Partitioning Madalina Fiterau-Brostean, 2015

Machine Learning in Space and Time Seth R. Flaxman, 2015

The Time and Location of Natural Reading Processes in the Brain Leila Wehbe, 2015

Shape-Constrained Estimation in High Dimensions Min Xu, 2015

Spectral Probabilistic Modeling and Applications to Natural Language Processing Ankur Parikh, 2015 Computational and Statistical Advances in Testing and Learning Aaditya Kumar Ramdas, 2015

Corpora and Cognition: The Semantic Composition of Adjectives and Nouns in the Human Brain Alona Fyshe, 2015

Learning Statistical Features of Scene Images Wooyoung Lee, 2014

Towards Scalable Analysis of Images and Videos Bin Zhao, 2014

Statistical Text Analysis for Social Science Brendan T. O'Connor, 2014

Modeling Large Social Networks in Context Qirong Ho, 2014

Semi-Cooperative Learning in Smart Grid Agents Prashant P. Reddy, 2013

On Learning from Collective Data Liang Xiong, 2013

Exploiting Non-sequence Data in Dynamic Model Learning Tzu-Kuo Huang, 2013

Mathematical Theories of Interaction with Oracles Liu Yang, 2013

Short-Sighted Probabilistic Planning Felipe W. Trevizan, 2013

Statistical Models and Algorithms for Studying Hand and Finger Kinematics and their Neural Mechanisms Lucia Castellanos, 2013

Approximation Algorithms and New Models for Clustering and Learning Pranjal Awasthi, 2013

Uncovering Structure in High-Dimensions: Networks and Multi-task Learning Problems Mladen Kolar, 2013

Learning with Sparsity: Structures, Optimization and Applications Xi Chen, 2013

GraphLab: A Distributed Abstraction for Large Scale Machine Learning Yucheng Low, 2013

Graph Structured Normal Means Inference James Sharpnack, 2013 (Joint Statistics & ML PhD)

Probabilistic Models for Collecting, Analyzing, and Modeling Expression Data Hai-Son Phuoc Le, 2013

Learning Large-Scale Conditional Random Fields Joseph K. Bradley, 2013

New Statistical Applications for Differential Privacy Rob Hall, 2013 (Joint Statistics & ML PhD)

Parallel and Distributed Systems for Probabilistic Reasoning Joseph Gonzalez, 2012

Spectral Approaches to Learning Predictive Representations Byron Boots, 2012

Attribute Learning using Joint Human and Machine Computation Edith L. M. Law, 2012

Statistical Methods for Studying Genetic Variation in Populations Suyash Shringarpure, 2012

Data Mining Meets HCI: Making Sense of Large Graphs Duen Horng (Polo) Chau, 2012

Learning with Limited Supervision by Input and Output Coding Yi Zhang, 2012

Target Sequence Clustering Benjamin Shih, 2011

Nonparametric Learning in High Dimensions Han Liu, 2010 (Joint Statistics & ML PhD)

Structural Analysis of Large Networks: Observations and Applications Mary McGlohon, 2010

Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy Brian D. Ziebart, 2010

Tractable Algorithms for Proximity Search on Large Graphs Purnamrita Sarkar, 2010

Rare Category Analysis Jingrui He, 2010

Coupled Semi-Supervised Learning Andrew Carlson, 2010

Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong, 2009

Efficient Matrix Models for Relational Learning Ajit Paul Singh, 2009

Exploiting Domain and Task Regularities for Robust Named Entity Recognition Andrew O. Arnold, 2009

Theoretical Foundations of Active Learning Steve Hanneke, 2009

Generalized Learning Factors Analysis: Improving Cognitive Models with Machine Learning Hao Cen, 2009

Detecting Patterns of Anomalies Kaustav Das, 2009

Dynamics of Large Networks Jurij Leskovec, 2008

Computational Methods for Analyzing and Modeling Gene Regulation Dynamics Jason Ernst, 2008

Stacked Graphical Learning Zhenzhen Kou, 2007

Actively Learning Specific Function Properties with Applications to Statistical Inference Brent Bryan, 2007

Approximate Inference, Structure Learning and Feature Estimation in Markov Random Fields Pradeep Ravikumar, 2007

Scalable Graphical Models for Social Networks Anna Goldenberg, 2007

Measure Concentration of Strongly Mixing Processes with Applications Leonid Kontorovich, 2007

Tools for Graph Mining Deepayan Chakrabarti, 2005

Automatic Discovery of Latent Variable Models Ricardo Silva, 2005

data science thesis pdf

Warning icon

Thesis/Capstone for Master's in Data Science | Northwestern SPS - Northwestern School of Professional Studies

  • Post-baccalaureate
  • Undergraduate
  • Professional Development
  • Pre-College
  • Center for Public Safety
  • Get Information

SPS Logo

Data Science

Capstone and thesis overview.

Capstone and thesis are similar in that they both represent a culminating, scholarly effort of high quality. Both should clearly state a problem or issue to be addressed. Both will allow students to complete a larger project and produce a product or publication that can be highlighted on their resumes. Students should consider the factors below when deciding whether a capstone or thesis may be more appropriate to pursue.

A capstone is a practical or real-world project that can emphasize preparation for professional practice. A capstone is more appropriate if:

  • you don't necessarily need or want the experience of the research process or writing a big publication
  • you want more input on your project, from fellow students and instructors
  • you want more structure to your project, including assignment deadlines and due dates
  • you want to complete the project or graduate in a timely manner

A student can enroll in MSDS 498 Capstone in any term. However, capstone specialization courses can provide a unique student experience and may be offered only twice a year. 

A thesis is an academic-focused research project with broader applicability. A thesis is more appropriate if:

  • you want to get a PhD or other advanced degree and want the experience of the research process and writing for publication
  • you want to work individually with a specific faculty member who serves as your thesis adviser
  • you are more self-directed, are good at managing your own projects with very little supervision, and have a clear direction for your work
  • you have a project that requires more time to pursue

Students can enroll in MSDS 590 Thesis as long as there is an approved thesis project proposal, identified thesis adviser, and all other required documentation at least two weeks before the start of any term.

From Faculty Director, Thomas W. Miller, PhD

Tom Miller

Capstone projects and thesis research give students a chance to study topics of special interest to them. Students can highlight analytical skills developed in the program. Work on capstone and thesis research projects often leads to publications that students can highlight on their resumes.”

A thesis is an individual research project that usually takes two to four terms to complete. Capstone course sections, on the other hand, represent a one-term commitment.

Students need to evaluate their options prior to choosing a capstone course section because capstones vary widely from one instructor to the next. There are both general and specialization-focused capstone sections. Some capstone sections offer in individual research projects, others offer team research projects, and a few give students a choice of individual or team projects.

Students should refer to the SPS Graduate Student Handbook for more information regarding registration for either MSDS 590 Thesis or MSDS 498 Capstone.

Capstone Experience

If students wish to engage with an outside organization to work on a project for capstone, they can refer to this checklist and lessons learned for some helpful tips.

Capstone Checklist

  • Start early — set aside a minimum of one to two months prior to the capstone quarter to determine the industry and modeling interests.
  • Networking — pitch your idea to potential organizations for projects and focus on the business benefits you can provide.
  • Permission request — make sure your final project can be shared with others in the course and the information can be made public.
  • Engagement — engage with the capstone professor prior to and immediately after getting the dataset to ensure appropriate scope for the 10 weeks.
  • Teambuilding — recruit team members who have similar interests for the type of project during the first week of the course.

Capstone Lesson Learned

  • Access to company data can take longer than expected; not having this access before or at the start of the term can severely delay the progress
  • Project timeline should align with coursework timeline as closely as possible
  • One point of contact (POC) for business facing to ensure streamlined messages and more effective time management with the organization
  • Expectation management on both sides: (business) this is pro-bono (students) this does not guarantee internship or job opportunities
  • Data security/masking not executed in time can risk the opportunity completely

Publication of Work

Northwestern University Libraries offers an option for students to publish their master’s thesis or capstone in Arch, Northwestern’s open access research and data repository.

Benefits for publishing your thesis:

  • Your work will be indexed by search engines and discoverable by researchers around the world, extending your work’s impact beyond Northwestern
  • Your work will be assigned a Digital Object Identifier (DOI) to ensure perpetual online access and to facilitate scholarly citation
  • Your work will help accelerate discovery and increase knowledge in your subject domain by adding to the global corpus of public scholarly information

Get started:

  • Visit Arch online
  • Log in with your NetID
  • Describe your thesis: title, author, date, keywords, rights, license, subject, etc.
  • Upload your thesis or capstone PDF and any related supplemental files (data, code, images, presentations, documentation, etc.)
  • Select a visibility: Public, Northwestern-only, Embargo (i.e. delayed release)
  • Save your work to the repository

Your thesis manuscript or capstone report will then be published on the MSDS page. You can view other published work here .

For questions or support in publishing your thesis or capstone, please contact [email protected] .

  • DSpace@MIT Home
  • MIT Libraries

This collection of MIT Theses in DSpace contains selected theses and dissertations from all MIT departments. Please note that this is NOT a complete collection of MIT theses. To search all MIT theses, use MIT Libraries' catalog .

MIT's DSpace contains more than 58,000 theses completed at MIT dating as far back as the mid 1800's. Theses in this collection have been scanned by the MIT Libraries or submitted in electronic format by thesis authors. Since 2004 all new Masters and Ph.D. theses are scanned and added to this collection after degrees are awarded.

MIT Theses are openly available to all readers. Please share how this access affects or benefits you. Your story matters.

If you have questions about MIT theses in DSpace, [email protected] . See also Access & Availability Questions or About MIT Theses in DSpace .

If you are a recent MIT graduate, your thesis will be added to DSpace within 3-6 months after your graduation date. Please email [email protected] with any questions.

Permissions

MIT Theses may be protected by copyright. Please refer to the MIT Libraries Permissions Policy for permission information. Note that the copyright holder for most MIT theses is identified on the title page of the thesis.

Theses by Department

  • Comparative Media Studies
  • Computation for Design and Optimization
  • Computational and Systems Biology
  • Department of Aeronautics and Astronautics
  • Department of Architecture
  • Department of Biological Engineering
  • Department of Biology
  • Department of Brain and Cognitive Sciences
  • Department of Chemical Engineering
  • Department of Chemistry
  • Department of Civil and Environmental Engineering
  • Department of Earth, Atmospheric, and Planetary Sciences
  • Department of Economics
  • Department of Electrical Engineering and Computer Sciences
  • Department of Humanities
  • Department of Linguistics and Philosophy
  • Department of Materials Science and Engineering
  • Department of Mathematics
  • Department of Mechanical Engineering
  • Department of Nuclear Science and Engineering
  • Department of Ocean Engineering
  • Department of Physics
  • Department of Political Science
  • Department of Urban Studies and Planning
  • Engineering Systems Division
  • Harvard-MIT Program of Health Sciences and Technology
  • Institute for Data, Systems, and Society
  • Media Arts & Sciences
  • Operations Research Center
  • Program in Real Estate Development
  • Program in Writing and Humanistic Studies
  • Science, Technology & Society
  • Science Writing
  • Sloan School of Management
  • Supply Chain Management
  • System Design & Management
  • Technology and Policy Program

Collections in this community

Doctoral theses, graduate theses, undergraduate theses, recent submissions.

Thumbnail

An Approach to Fault Management Design for the Proposed Mars Sample Return EDL and Ascent Phase Architectures 

Thumbnail

Silicon Photomultipliers as Free Space Optical Communication Sensors 

Thumbnail

Study of Cavity Geometry to Improve Optical Quality of Windows in Hypersonic Flow 

Show Statistical Information

feed

The Art and Science of Data Analysis

Primary view of object titled 'The Art and Science of Data Analysis'.

PDF Version Also Available for Download.

Description

This thesis aims to utilize data analysis and predictive modeling techniques and apply them in different domains for gaining insights. The topics were chosen keeping the same in mind. Analysis of customer interests is a crucial factor in present marketing trends and hence we worked on twitter data which is a significant part of digital marketing. Neuroscience, especially psychological behavior, is an important research area. We chose eye tracking data based on which we differentiated human concentration while watching controllable (video game) videos and uncontrollable (sports) videos. Currently, cities are using data analysis for becoming smart cities. We worked on … continued below

Physical Description

vii, 31 pages

Creation Information

Daita, Ananda Rohit May 2018.

This thesis is part of the collection entitled: UNT Theses and Dissertations and was provided by the UNT Libraries to the UNT Digital Library , a digital repository hosted by the UNT Libraries . It has been viewed 2380 times, with 8 in the last month. More information about this thesis can be viewed below.

People and organizations associated with either the creation of this thesis or its content.

  • Daita, Ananda Rohit
  • Namuduri, Kamesh Major Professor

Committee Members

  • Guturu, Parthasarathy
  • University of North Texas Publisher Info: www.unt.edu Place of Publication: Denton, Texas

Rights Holder

For guidance see Citations, Rights, Re-Use .

Provided By

Unt libraries.

The UNT Libraries serve the university and community by providing access to physical and online collections, fostering information literacy, supporting academic research, and much, much more.

Descriptive information to help identify this thesis. Follow the links below to find similar items on the Digital Library.

Degree Information

  • Name: Master of Science
  • Level: Master's
  • Department: Department of Electrical Engineering
  • College: College of Engineering
  • Discipline: Electrical Engineering
  • PublicationType: Master's Thesis
  • Grantor: University of North Texas

This thesis aims to utilize data analysis and predictive modeling techniques and apply them in different domains for gaining insights. The topics were chosen keeping the same in mind. Analysis of customer interests is a crucial factor in present marketing trends and hence we worked on twitter data which is a significant part of digital marketing. Neuroscience, especially psychological behavior, is an important research area. We chose eye tracking data based on which we differentiated human concentration while watching controllable (video game) videos and uncontrollable (sports) videos. Currently, cities are using data analysis for becoming smart cities. We worked on the City of Lewisville emergency services data and predicted the vehicle-accident-prone areas for development of precautionary measures in those areas.

  • data analysis
  • digital marketing
  • eye tracking
  • smart cities

Library of Congress Subject Headings

  • Eye tracking.
  • Internet marketing.
  • Quantitative research.
  • Smart cities.
  • Thesis or Dissertation

Unique identifying numbers for this thesis in the Digital Library or other systems.

  • Accession or Local Control No : submission_1135
  • Archival Resource Key : ark:/67531/metadc1157624

Collections

This thesis is part of the following collection of related materials.

UNT Theses and Dissertations

Theses and dissertations represent a wealth of scholarly and artistic content created by masters and doctoral students in the degree-seeking process. Some ETDs in this collection are restricted to use by the UNT community .

What responsibilities do I have when using this thesis?

Digital Files

  • 39 image files available in multiple sizes
  • 1 file (.pdf)
  • Metadata API: descriptive and downloadable metadata available in other formats

Dates and time periods associated with this thesis.

Creation Date

Added to the unt digital library.

  • June 6, 2018, 1:19 p.m.

Description Last Updated

  • March 17, 2021, 11:06 a.m.

Usage Statistics

When was this thesis last used?

Interact With This Thesis

Here are some suggestions for what to do next.

Search Inside

  • or search this site for other thesis or dissertations

Start Reading

  • All Formats

Citations, Rights, Re-Use

  • Citing this Thesis
  • Responsibilities of Use
  • Licensing and Permissions
  • Linking and Embedding
  • Copies and Reproductions

International Image Interoperability Framework

IIF Logo

We support the IIIF Presentation API

Print / Share

Links for robots.

Helpful links in machine-readable formats.

Archival Resource Key (ARK)

  • ERC Record: /ark:/67531/metadc1157624/?
  • Persistence Statement: /ark:/67531/metadc1157624/??

International Image Interoperability Framework (IIIF)

  • IIIF Manifest: /ark:/67531/metadc1157624/manifest/

Metadata Formats

  • UNTL Format: /ark:/67531/metadc1157624/metadata.untl.xml
  • DC RDF: /ark:/67531/metadc1157624/metadata.dc.rdf
  • DC XML: /ark:/67531/metadc1157624/metadata.dc.xml
  • OAI_DC : /oai/?verb=GetRecord&metadataPrefix=oai_dc&identifier=info:ark/67531/metadc1157624
  • METS : /ark:/67531/metadc1157624/metadata.mets.xml
  • OpenSearch Document: /ark:/67531/metadc1157624/opensearch.xml
  • Thumbnail: /ark:/67531/metadc1157624/thumbnail/
  • Small Image: /ark:/67531/metadc1157624/small/
  • In-text: /ark:/67531/metadc1157624/urls.txt
  • Usage Stats: /stats/stats.json?ark=ark:/67531/metadc1157624

Daita, Ananda Rohit. The Art and Science of Data Analysis , thesis , May 2018; Denton, Texas . ( https://digital.library.unt.edu/ark:/67531/metadc1157624/ : accessed July 3, 2024 ), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu ; .

data science thesis pdf

BSc/MSc Thesis

Our research group offers various interesting topics for a BSc or MSc thesis, the latter both in Computer Science and Scientific Computing . These topics are typically closely related to ongoing research projects (see our Research Page and Publications ). Below, we outline the basic procedure you should follow when planning to do a thesis in our group. Please read the following carefully! You also might want to take a quick look at past topics students covered in their theses. Please also note that we currently cannot accommodate all requests for advising a thesis as in current semester  as well as in the upcoming summer semester 2024 we are already advising numerous MSc and BSc theses.

Requirements

A key requirement is that you have taken some advanced courses offered by our group. This includes Data Science for Text Analytics  or  Complex Network Analysis (ICNA) and the more recent master level class on Natural Language Processing with Transformers  (INLPT). Student should also have some background in machine learning, ideally in combination with NLP. We also strongly recommend that prior to starting a thesis (especially a BSc thesis) in our group, you do an advanced software practical to become familiar with the data and tools we use in many of our projects. Most students typically do this in the semester before they officially start their thesis. Further requirements include

  • very good programming experience with Python (strongly preferred, including framework like pandas and numpy)
  • solid background in statistics and linear algebra
  • (optionally) experience with the machine learning frameworks such as PyTorch
  • (optionally) experience with NLP frameworks such as spaCy, gensim, LangChain
  • (optionally) experience with Opensearch or Elasticsearch
  • knowledge using tools such as Github and Docker

It is also advantageous if you have taken some graduate courses in the areas of efficient algorithms (e.g., IEA1 ) and in particular machine learning (e.g., IML , IFML or IAI ). Being familiar with frameworks like scikit-learn , Keras or PyTorch is advantageous.

If you have only taken the undergraduate course introduction to databases (IDB) and none of the other above courses, it is unlikely that we can accommodate your request.

Make also sure that you are familiar with the examination regulations ("Prüfungsordnung") that apply to your program of study.

Getting in Contact

Prior to getting in contact with us you should, of course, read this page in its entirety. If you think your interests and expertise are a good fit for our group and research activities, send an email to Prof. Michael Gertz with the subject "Anfrage BSc Arbeit" or "Anfrage MSc Arbeit" and include the following information:

  • your current transcript (as PDF). You can download this from the LSF .
  • information about your field of application ("Anwendungsfach"), in particular the courses you have taken
  • your programming experience and projects you worked on
  • areas of interest based on the research conducted in our group
  • any other information you think might strengthen your request

We will then review this information and get back to you with the scheduling of an appointment in person to discuss further details.

Thesis Expose

Once we agree on a topic for your thesis, before you officially register for a thesis, we would like to get an idea of how you approach scientific research and whether you are able to do scientific writing. For this, we require that you write an expose of your planned thesis research (see, e.g., here or here ) . This document is about 4-6 pages and has to include a description of

  • the context of your project and research
  • problem statement(s)
  • objectives and planned approaches
  • related work
  • milestones towards a timely completion of the thesis

Especially for the related work, it is important that you get a good overview  early on in your thesis project; of course, your advisor will give you some starting points. Most of the time, such an expose becomes an integral part of the introductory chapter of your thesis, so there is no time and effort wasted. The expose needs to be submitted to your advisor on schedule (which you arrange with your advisor), who will then discuss the expose with you and coordinate the next steps. Occasionally we also have students give a 10-15 minute presentation of their research plan in front of the members of our group in order to get further ideas, comments, suggestions, and pointers on their thesis.

Official Registration

In agreement with your advisor, after you have submitted an expose of good quality, you plan for an official start date of the thesis. For this, please fill out the  form suitable for your program of study:

  • Für Anmeldung einer Bachelorarbeit, siehe hier . 
  • For officially registering your master's thesis, see here . 
  • Registration form for a MSc thesis in Scientific Computing (please see Mrs. Kiesel to obtain a form).

Hand in this form to Prof. Michael Gertz who will then turn in the signed form.

Thesis Research and Advising

  • Here are some hints on grammar and style we maintain locally.
  • Some easy, purely syntactic  hints  on writing good research papers (from Prof. Felix Naumann )
  • Dos and don'ts, Universität Heidelberg, Prof. Dr. Anette Frank
  • Leitfaden zur Abfassung wissenschaftlicher Arbeiten, Ruhr-Universität Bochum, Katarina Klein
  • Leitfaden zur Abfassung wissenschaftlicher Arbeiten, TU Dresden, Maria Lieber

In addition, you can find a detailed description how to write a seminar paper using our template for seminar papers. The hints in this template might also be crucial when you are writing a thesis: [ seminar template .zip ] [ report sample pdf ] [ slides english pdf ] [ slides german pdf ]

Feel also free to ask us for copies of BSc/MSc thesis students did in the past in our group.

Thesis Template

  • Thesis template [.zip] ; see a sample PDF here .

Thesis Presentation

  • English LaTeX-Beamer template for the presentation: template [.zip] , sample PDF
  • German LaTeX-Beamer template for the presentation: template [.zip] , sample PDF

IMAGES

  1. (PDF) Towards Data Science

    data science thesis pdf

  2. Data science

    data science thesis pdf

  3. data analysis report example pdf

    data science thesis pdf

  4. (PDF) Introduction to Data Science

    data science thesis pdf

  5. CS3353 Data Science

    data science thesis pdf

  6. (PDF) Summary of the first part of the Data Analysis in the Ph.D. Thesis

    data science thesis pdf

VIDEO

  1. Why Data Science?

  2. DATA SCIENCE [MODULE-2]

  3. DS125: Intro to Data Science, Spring2023, Lecture#2, 23-Jan-2023

  4. DATA SCIENCE [MODULE-1]

  5. A student's perspective: What is the MSc Thesis process like?

  6. Data science : Lecture 1

COMMENTS

  1. PDF Master Thesis: Data Science and Marketing Analytics

    Erasmus School of Economics. Master Thesis: Data Science and Marketing Analytics. Interpretable Machine Learning for Attribution Modeling. A Machine Learning Approach for Conversion Attribution in Digital Marketing Student name: Jordy Martodipoetro Student number: 454072 Supervisor: Dr. Kathrin Gruber Second assessor: Prof. Bas Donkers Date ...

  2. Computational and Data Sciences (PhD) Dissertations

    PDF. Machine Learning and Geostatistical Approaches for Discovery of Weather and Climate Events Related to El Niño Phenomena, Sachi Perera. PDF. Global to Glocal: A Confluence of Data Science and Earth Observations in the Advancement of the SDGs, Rejoice Thomas. Dissertations from 2023 PDF

  3. PDF Applied Data Science Master Thesis

    the association between variables. The first assumption of this method is that there exists a linear relationship (formula 7) between the predictor variables. formula 7 - Multiple linear regression. Y = β0 + β1. x1 + β2x2 + · · · + βpxp + εWhere βi are unknown constants, representing the model coefficient, and .

  4. PDF Reliable and Flexible Inference for High Dimensional Data

    This thesis contains three self-contained chapters that adjust di erent aspects of high dimensional analysis. Chapter 1. A catalytic prior distribution is designed to stabilize a high-dimensional \working model" by shrinking it toward a \simpli ed model." The shrinkage is achieved by supplementing the observed data with a small amount of \synthetic

  5. PDF Harvard University

    Harvard University

  6. PDF Thesis topics for the master thesis Data Science and Business Analytics

    thesis is an exploration by well-motivated simulation scenarios. (3) Find/collect an appropriate set of data to illustrate the method. The context of the data should be explained, as well as a discussion of the results and an interpretation for the context of the data. Main reference: A. Fisher, C. Rudin, F. Dominici (2019).

  7. PDF DSpace@MIT Home

    DSpace@MIT Home

  8. PDF The Data Science Machine: Emulating Human Intelligence in Data Science

    8-1 Three different data science competitions held during the period of 2014-2015. On the left is the data model for KDD Cup 2014, at the bottom center is the data model for IJCAI, and on the right is the data model of KDD Cup 2015. A total of 906 teams took part in these competitions. We note that two out of three competitions are

  9. PDF Optimization-based Modeling in Investment and Data Science a

    Thesis Outline This dissertation is organized into five parts. Part 1 gives a high-level overview of the content of this dissertation. Part 2 (Chapter 2) is the modern revisit of the classical idea of Kelly gambling using distributional robust ... views on mathematics, on data science, and furthermore, on life choices. His unconditional ...

  10. PDF University of Washington

    University of Washington

  11. PDF Investigating the Impact of Big Data Analytics on Supply Chain

    Thesis Title: Investigating the Impact of Big Data Analytics on Supply Chain Operations: Case Studies from the UK Private Sector A thesis submitted for the degree of Doctor of Philosophy By Ruaa Hasan Brunel Business School Brunel University London 2021 . 2 | P a g e

  12. Doctor of Data Science and Analytics Dissertations

    The Ph.D. in Data Science and Analytics is an advanced degree with a dual focus of application and research - where students will engage in real world business problems, which will inform and guide their research interests. We launched the first formal PhD program in Data Science in 2015.

  13. Data Science Masters Theses // Arch : Northwestern University

    Data Science Masters Theses. The Master of Science in Data Science program requires the successful completion of 12 courses to obtain a degree. These requirements cover six core courses, a leadership or project management course, two required courses corresponding to a declared specialization, two electives, and a capstone project or thesis.

  14. PDF Customer Segmentation and Targeting by Data Science Methods

    Inwook Moon. geting by Data Science MethodsYear2020Number of pages45The objective of this thesis is performing a segmentation analysis as well as. lassifying target segment members with a given survey data. With the performance of this customer survey data analysis, the purpose of this research is to confirm the.

  15. PDF Data Science for Small Businesses

    Data Science for. Small Businesses. by. Aveesha Sharma. A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science. Approved April 2016 by the Graduate Supervisory Committee: Arbi Ghazarian, Chair Ashraf Gaffar Srividya Bansal.

  16. PhD Dissertations

    PhD Dissertations [All are .pdf files] Probabilistic Reinforcement Learning: Using Data to Define Desired Outcomes, and Inferring How to Get There Benjamin Eysenbach, 2023. Data-driven Decisions - An Anomaly Detection Perspective Shubhranshu Shekhar, 2023. METHODS AND APPLICATIONS OF EXPLAINABLE MACHINE LEARNING Joon Sik Kim, 2023. Applied Mathematics of the Future Kin G. Olivares, 2023

  17. PDF Masters Thesis for Data Science

    Masters Thesis for Data Science Author: Rana Muhammad Ahmad [email protected] Matricola: 1734354 . A New approach to piping engineering data control and management in the epc company during the engineering phase | Author: Rana Muhammad Ahmad 1 Acknowledgments

  18. Thesis/Capstone for Master's in Data Science

    Upload your thesis or capstone PDF and any related supplemental files (data, code, images, presentations, documentation, etc.) Select a visibility: Public, Northwestern-only, Embargo (i.e. delayed release) Save your work to the repository; Your thesis manuscript or capstone report will then be published on the MSDS page.

  19. MIT Theses

    MIT's DSpace contains more than 58,000 theses completed at MIT dating as far back as the mid 1800's. Theses in this collection have been scanned by the MIT Libraries or submitted in electronic format by thesis authors. Since 2004 all new Masters and Ph.D. theses are scanned and added to this collection after degrees are awarded.

  20. The Art and Science of Data Analysis

    This thesis aims to utilize data analysis and predictive modeling techniques and apply them in different domains for gaining insights. The topics were chosen keeping the same in mind. Analysis of customer interests is a crucial factor in present marketing trends and hence we worked on twitter data which is a significant part of digital marketing. Neuroscience, especially psychological behavior ...

  21. (PDF) Top 20 Data Science Research Topics and Areas For the 2020-2030

    CART decision tree methodology, classification trees, regression trees, interactive dihotomiser, C4.5, C5.5, decision stump, conditional decision tree, M5, and etc. 9. Logistic regression ...

  22. BSc/MSc Thesis

    Thesis template [.zip]; see a sample PDF here. Thesis Presentation Once you have submitted your thesis to the respective examination office (Mrs. Sopka for Computer Science, Mrs. Kiesel for Scientific Computing), together with your advsior, you schedule the presentation of your thesis. Once we have determined a date and time (in the case of a ...