### Current and Past Student Projects

#### African Easterly Waves in Current and Future Climates

##### Victoria Dollar, RTG Seminar Project, Fall 2017,

Spring 2018, *Mentor: Moustaoui*

The African Easterly Waves (AEWs) activity during the most recent decade (2008-2015) is reported and analyzed, and the same methodology is applied to predictions for a decade at the end of the century (2090-2099) . The data utilized are obtained from assimilated analyses of the National Center for Environmental Prediction (NCEP) and climate projections from the Community Earth System Model (CESM). The power spectral density computed by the multi-taper spectral analysis method and averaged over West Africa and over both decades shows the dominance of waves with periods in the 3-5 day window. The spectrum of AEWs in the future climate shows a shift towards low frequencies. The role of the intensity of the jet on the wave activity is supported by idealized simulations.

#### A Multi-resolution Approach for Superparamagnetic

Relaxometry Data

##### Miandra Ellis, RTG Seminar Project, Spring

2018, *Mentor: Renaut*

Superparamagnetic Relaxometry (SPMR) is a novel technique which uses antigen-bound nanoparticles to assist in early cancer detection. A challenge of translating this technique to mainstream clinical applications is the reconstruction of the bound particle signal. The primary focus of this semester’s work was to determine if a multi-resolution approach could be used to accurately reconstruct the signal, including the position and magnitude of a source. By reducing the search space we hoped for a method which would be less computationally intensive. From our results it appears that the multi-resolution approach is promising for accurately localizing the bound particles.

#### Patterns of dropouts and the role of

socio-demographic and perception factors for middle

school students

##### Bechir Amdouni, RTG Seminar Project, Spring

2018, *Mentor: Mubayi*

Numerous research have found an impact of gender, race, socioeconomic status (SES), school achievement, school engagement, and academic ability on academic achievement. Few have looked at more than one factor together. However, no research have combined all these factors together and their impact on academic achievement. In this paper, first, we looked at gathered data at multiple time points using ordered multinomial logistic regression(OMLR) to identify the main factors of academic achievement. Secondly, we built a discrete time Markov chain (DTMC) model using the finding from the OMLR.

#### Optimal Sampling for Polynomial Data Fitting on

Complex Regions

##### Tony Liu, RTG Seminar Project, Spring

2017, Fall 2017, Spring 2018, *Mentor: Platte*

It is well-known that polynomial interpolation using equispaced points in one dimension is unstable. On the other hand, using Chebyshev nodes in one dimension provides both stable and highly accurate points for polynomial interpolation. In higher dimensional complex regions, optimal interpolation points are not well understood. The goals of this project are to find nearly optimal sampling points in one- and two-dimensional domains for interpolation, least-squares fitting, and finite difference approximations. The optimality of sampling points is investigated using the Lebesgue constant.

#### Nonparametric Subsampling for Big Data

##### Abigael Nachtsheim, RTG Seminar Project, Spring

2018, *Mentor: Stufken*

The desire to build predictive models based on datasets with tens of millions of observations is not uncommon today. However, with large datasets, standard statistical methods for analysis and model building can become infeasible due to computational limitations. One approach is to take a subsample from the full dataset. Standard statistical methods can then be applied to build predictive models using only the subdata. Existing approaches to data reduction often rely on the assumption that the full data follow a specified model (Wang et al., 2017). However, such assumptions are not always applicable, particularly in the big data context. We explore two new methods of subdata selection that do not require model assumptions. These proposed approaches use k-means clustering and space-filling designs in an attempt to spread the subdata uniformly throughout the region of the full data. We perform a simulation study and an analysis of real data to investigate the efficacy of the predictive models that result from these methods.

#### Modeling motor-cargo complexes through particle

filtering and the EM algorithm

##### Lauren Crow, RTG Seminar Project, Fall

2017, *Mentor: Fricks*

Movement of proteins is a biophysical process involving transient binding of particles to a microtubule. Specifically, different types of motors aid in the transport of cargo, such as vesicles and organelles. The movement is modeled as a series of switches, based on a Poisson process, between two possible states: random diffusion or Brownian directed movement. Using observed data that is obscured by assumed Gaussian error, the true movement of the cargo and regime switches are predicted. The predictions are based on the stochastic Expectation-Maximization (EM) algorithm, implementing a particle filter and maximum likelihood estimation. The results are first tested through a simulation study and then applied to real data.

#### Methods for Handling Imbalanced Datasets

##### Miandra Ellis, RTG Seminar Project, Fall

2017, *Mentor: Swanson*

Motivated by a comparison between classifiers built using balanced and imbalanced datasets, this project aimed to address issues with imbalance in training data when using the soft margin Support Vector Machine. Oversampling and Synthetic Minority Oversampling were used to balance the training dataset to illustrate how these resampling techniques could be used to alleviate problems arising from imbalance. This allowed us to conclude that both of these re-sampling based approaches could increase the specificity of a classifier.

#### Assorted Methods for Predicting Superior Soybean

Varieties

##### Camille Moyer, RTG Seminar Project, Fall

2017, *Mentor: Armbruster*

While genetic modification in soy beans has allowed farmers to increase their yield over the years, models for predicting which genetic strain could be the most successful in particular regions have fallen behind. This project uses three different methods to construct viable prediction models for newly created varieties of soy beans: clustering methods, Kalman filtering, and parenclitic networks.

#### Augmenting Definitive Screening Designs for

Prediction Under Second-Order Models

##### Abigael Nachtsheim, RTG Seminar Project, Fall

2017, *Mentor: Stufken*

Jones and Nachtsheim (2011) introduced a class of three-level screening designs called definitive screening designs (DSDs). The structure of these designs results in the statistical independence of main effects and two-factor interactions; the absence of complete confounding among two-factor interactions; and the ability to estimate all quadratic effects. In this paper we explore the construction of series of augmented designs, moving from the starting DSD to designs comparable in sample size to central composite designs. We perform a simulation study to calculate the predictive mean square error for each design to determine the number of augmented runs necessary to effectively fit the correct second-order model.

#### Regularization with Shot Noise: A Bayesian Approach

##### Joe Sadow, RTG Seminar Project, Fall

2017, *Mentor: Sanders*

In this paper we consider inverse problems in the presence of Poisson noise. A probabilistic treatment of the noisy regularization problem allows for a more comprehensive quantification of uncertainty in the problem. The Bayesian framework for optimization is explored by adding data-oriented terms to the image reconstruction problem and comparing with the classic function space optimization techniques. The reconstruction effort is described and implemented for image data containing Poisson noise, a situation relevant to many particle-counting imaging problems.

#### Image processing tools for energy dispersive X-ray

(EDX) imaging

##### Michael Byrne, RTG Seminar Project, Fall

2017, *Mentor: Sanders*

Energy dispersive X-ray (EDX) spectroscopy is a technique used to determine the chemical composition. The sample is exposed to an excitation energy, triggering atomic reactions that result in X-ray emission. The number of emitted X-rays are recorded at each energy level, and the result is a spectrum indicating peaks for different elements at particular energy levels. From the series of spectrum data, an image representation of the density for each element in the sample may be recovered. While EDX spectroscopy offers the power resolve the densities of each element in the sample, the process of generating images for each element is nontrivial. In this paper we explore various image processing tools such as low-pass filters and principal component analysis that can be used to produce improved images from EDX spectroscopy data. Once we understand how these tools effect the resulting images, we hope to implement more advanced image reconstruction tools to improve the image formation.

#### Function Approximation on Spherical Domains

##### Genesis Islas, RTG Seminar Project, Spring

2017, *Mentor: Platte*

This project investigates a gridding technique for function approximation on a spherical domain. This work is motivated by problems that arise in atmospheric research. The goal is to study the discretization based on the cubed sphere domain decomposition. This method decomposes the sphere into six identical regions where uniformly distributed nodes map onto nearly uniformly distributed nodes on the cube. We contrast this to the latitude and longitude discretization where the uniformity of the node distributions is completely lost by the change of coordinates and results in oversampling near the poles. The effect of using different sampling distributions for function approximation is explored.

(slides)

#### Tomography and Sampling

##### Joe Sadow, RTG Seminar Project, Spring

2017, *Mentor: Sanders*

The purpose of this project is to motivate and develop the general tomographic imaging problem. The Radon transform and its intimacy with the classic Fourier transform will be established. The inverse problem, will be defined along with an exploration of related iterative reconstruction schemes. The optimal use of sampling patterns is also explored.

(slides)

#### Model Selection and Data with Asymmetric

Distribution Testing Using the IBOSS Approach

##### John Stockton, RTG Seminar Project, Fall 2016 and

Spring 2017, *Mentor: Stufken*

With the increasing need to analyze data sets with potentially billions of entries and thousands of predictor variables, many methods have been proposed to computationally efficiently study these so-called “big data” sets; in particular, a recently proposed method called the Information-Based Optimal Subdata Selection (IBOSS) method. Preliminary studies have concluded the effectiveness of the method over previously introduced methods such as the Uniform Sampling Method and Leverage-based Sampling Methods in regards to the linear regression equation constructed from the given subdata by each given method, using a variety of simulated data sets and some real data sets. In the Fall of 2016, I conducted preliminary studies regarding the distribution of simulated data sets, and concluded the success of the process when the distribution used to generate the covariates is generally symmetric, though in all cases, the responses with each data set have been constructed using a linear model, and a linear model was fit for the subdata. Naturally, this raises some questions regarding how successful the IBOSS algorithm would be perform in basic model selection. In this project, I study how model selection performs when using IBOSS across two-factor interaction terms. Additionally, I explore the effects of skewed predictor data has on subdata selection methods.

(slides)

#### Leverage Subsampling in

Multivariate-Multinormally-Distributed Data

##### Lauren Crow, RTG Seminar Project, Fall 2016 and

Spring 2017, *Mentors: Stufken and Cochran*

Big data analysis has been on the rise and with it, a need for new research methods. One area of focus is subdata selection. In this project, there are several types of subdata selection methods that are discussed and compared, including basic leverage sampling (BLEV), shrinkage leverage sampling (SLEV), unweighted leverage sampling (LEVUNW), and uniform sampling (UNIF). After an in-depth comparison using mean squared error on simulated data as the criteria, it has been determined that the unweighted leverage sampling method resulted in the most accurate estimation of the true parameters among these four methods, making leverage-based subsampling a valuable solution to modeling big data. However, this was only determined under the assumption that an ordinary linear model with one response was being used and that the errors were independent and followed a normal distribution. To see if the results still held in other circumstances, three new models were proposed that both involved multivariate-multinormally-distributed data. The three models had ten parameters and two responses, although they could be generalized to even more responses or a different number of parameters. In the first, the errors were independent and identically distributed. In the second, the errors remained independent but had different levels of variance for each response. Finally, the third model had different levels of dependence among the errors, causing correlation among both the errors and the responses. Leverage sampling proved to perform well in multivariate data with and without the assumption of independence and identical distributions, with unweighted leverage sampling consistently performing the best. That is, the previous results can be extended into these new types of models. Although the methods were implemented using manageable-sized data, these methods can be applied in multivariate systems of a much larger size and on real data instead of simulated data.

(slides)

#### Nonuniform Fast Fourier Transforms

##### Tony Liu, RTG Seminar Project, Fall

2016, *Mentors: Sanders and Platte*

The Fast Fourier Transform (FFT) allows for the efficient computation of the Discrete Fourier Transform (DFT) of a set of values into its frequency components. The FFT, along with its inverse, are widely used in many applications in science, engineering, mathematics, and medicine. The FFT reduces the computational workload of the DFT from O(n^2) down to O(n log n); however, in order to implement the FFT, a uniformly spaced set of data is required in both the time or frequency domain. In many applications, samples are nonuniform and multiple iterations of Fourier transforms are required. In order to overcome computational limitations, Nonuniform FFTs (NUFFTs) are often used. In the recent years, a number of algorithms have been developed to solve this type of problem. These NUFFTs are derived by combining interpolation and the use of the traditional FFT on an oversampled uniform space. This project addresses the basics of the Fourier transform as well as the DFT, the derivation of the FFT, motivation for NUFFTs, and the derivation of one NUFFT algorithm.

(slides)

#### Signal Reconstruction using Least Absolute

Errors

##### Genesis Islas, RTG Seminar Project, Fall

2016, *Mentor: Sanders and Platte*

This project compares the l1 and l2 norms for signal reconstruction from noisy measurements. Suppose f is our unknown (nx1 vector). We would like to recover f from a given data vector b where f and b are related such that Af+e=b. Here, A is m×n and e is an unknown vector of errors. Then f can be approximated by solving the minimization problem min_f ||Af − b||. A popular method of solving this problem is least squares, which minimizes the l2 norm. However, the least squares method can perform poorly when the errors on the signal have large magnitude even if they are few. This provides the motivation for solving the minimization problem with the l1 norm. It has been shown that if certain conditions are met on both A and e, solving the minimization problem with the l1 norm is equivalent to solving it with l0. In this project, we explore some numerical examples to illustrate the effectiveness of recovering a signal using the l1 norm.

(slides)

#### Deep Learning on 3D Geometries

##### Hope Yao, RTG Seminar Project, Fall 2016, *Mentor:*

*Sanders*

This project extended traditional 2D convolutional neural network into 3D. Fourier convolution is investigated to deal with increasing computational cost brought by the extra dimension. Numerical results show that our model is able to achieve nine percent testing error on ModelNet10 dataset, which is comparable to the best result reported in 2015.

(slides)

#### Bootstrapping in the Context of Big Data

##### Shantrue John Chang, RTG Seminar Project, Fall

2016, *Mentor: Cochran*

Bootstrap provides a simple, but powerful way of assessing the quality of estimators, “assessors”. However, when working with big/massive data sets, most computers cannot keep up with the computationally demanding process required for bootstrap. Branches of bootstrap have been developed to deal with computational costs. This project explores Bag of Little Bootstraps, a proposed bootstrap technique for big and massive data.