This is a graduate-level seminar on astrophysical data analysis. Class time will be split between lecture introducing key concepts and hacks using the AstroML Python package provided by the textbook authors.
Instructor: Prof. Eric Gawiser, Serin 303W, 848-445-8874, gawiser@physics.rutgers.edu
Lecture | Date | Topic | Text Chapter | Hacks & Discussions |
---|---|---|---|---|
1 | Sep 9 | Intro | 1,Appendices | Syllabus preferences and term project design How to determine if a set of points follows a correlation Installation of Python and AstroML package |
2 | Sep 16 | Algorithms and computational efficiency | 2 | Input both versions of the SDSS Stripe 82 standard star catalog; reproduce textbook Fig. 1.6 for both versions of that catalog; note and explain the differences between the versions Produce Figs. 1.9 and 1.10; tune the contours/colorbars to best represent the data; can you develop an even better visualization of these data? |
3 | Sep 23 | Review of probability and statistics | 3 | Choose your favorite type of tree and use it to find the nearest-neighbor to the star at the lowest-right location in Fig. 1.6; compare the run-time to brute force Use Bayes' rule to solve the N=3 version of the Monty Hall problem for rules of Type I (host never reveals a car) and Type II (host chooses randomly which door to open); perform a Monte Carlo simulation to check your derivation for each type of game |
4 | Sep 30 | Frequentist vs. Bayesian approaches | 4A | Choose a panel of Fig. 3.23; predict sigma_x, sigma_y, sigma_xy, sigma_1, sigma_2, and alpha by eye; fit a bivariate gaussian distribution and compare your results to those predictions; plot the residuals perpendicular to the major axis of the ellipse; find a distribution that provides a decent fit to these residuals and report its best-fit parameters; now try this all again on Fig. 1.6 |
5 | Oct 7 | Classical statistical inference | 4B | Term project pitches Reproduce one of the panels of Fig. 3.24; determine how many resamplings are needed to see the difference between the no outlier and outlier cases; now use 10% outliers - how many resamplings are needed? |
6 | Oct 14 | Bayesian statistical inference | 5A | Discussion of term project roles for advisors and students Figure out what went wrong for \sigma_G^* in the left panel of Fig. 4.4; what could be done to avoid this? Produce the right panel of Fig. 4.4, and see if you can improve the \sigma_G behavior. |
7 | Oct 21 | Markov Chain Monte Carlo | 5B | Reproduce Fig. 5.6; simplify the error model to one you think is realistic for astronomical data; describe how the distribution and gaussian fits change; can you develop an error model that makes the two gaussian fits agree well? |
8 | Oct 28 | Density estimation | 6 | Mid-course evaluations Reproduce Fig. 5.26; note if your MCMC parameter contours match precisely, and explain why or why not; now conduct a test for convergence |
9 | Nov 4 | Dimensional Reduction | 7A | Ch. 6 features visualizations of the SDSS Great Wall in Figs. 6.3, 6.4, 6.7, 6.15; can you improve on these? Options include: Epanechnikov & cosine kernels, varying h, varying K, Kth-nearest vs. all-K-nearest neighbors, color vs. B/W display |
10 | Nov 11 | Data Mining | 7B | Reproduce the right panel of Fig. 6.17. How do run-time and performance vary if you switch from L-S estimator (6.45) to naive estimator (6.44)? How about if you sub-sample the galaxies? Do the errorbars change as you'd expect if you increase the number of bootstrap samples? |
11 | Nov 25 | Regression | 8A | Use the SDSS spectra of Section 1.5.5 to reproduce Fig. 7.4. Now pick your favorite method (column) and see how the results change as (where possible) you toggle each of normalization, whitening, and mean-subtraction. Can you do anything to make the results more physically meaningful? |
12 | Dec 2 | Model Fitting | 8B, Hogg, Bovy & Lang 2010 | Determine if the Lyman Alpha Emitters from Vargas+14 and Hagen+14 lie above the z=2 Star Formation Rate-Stellar Mass correlation reported by Kurczynski+16 |
13 | Dec 9 | Classification | 9 | Presentations I Plot the simulated supernova data of Figure 8.2. Choose a regression method for which cross-validation is applicable. Use cross-validation to optimize the "hyperparameters" of the method. To the extent possible, predict the distribution of future data. |
14 | Dec 11 12-3PM | Time series | 10 | Presentations II Discussion of modifying this course for future offerings Several figures in Chapter 9 (starting with Fig. 9.3) illustrate the multi-color classification of RR Lyrae vs. main sequence stars. Develop a rough-but-realistic figure-of-merit for the S/N of some measurement that includes terms for contamination and incompleteness. Use this to select a preferred classification method for RR Lyrae vs. main sequence stars, and use training and cross-validation as appropriate to predict the value of the figure-of-merit that will be achieved with similar future data. |
Student | Advisor | Topic |
---|---|---|
Charlotte | Yssa | Gaussian Process Fitting of Supernovae |
Sabrina | Charlotte | LIGO Data-Mining |
Adam | Sabrina | Measuring Simulated Gas Distributions beyond the Density PDF |
Amir | Adam | Machine-Learning Additional Factors for Photometric Redshifts |
Angkun | Angkun | Machine Learning to find Fractal Patterns in Density-of-State Diagrams |
Yssa | Amir | Generating Mock Kepler-2 Datasets to Find Transients |
Last revised September 2, 2019