Casey’s Technical Ramblings

This is a place where I attempt to form coherent thoughts about current technology, computer science, math and the general things happening on the Internet.

Feed icon A feed of the most recent posts is available.

Recent Posts

24 Oct 2014

Data Science and Hadoop: Part 5, Benford's Law Analysis View Comments

Benford's Law analysis of healthcare payment data with Spark using Median Absolute Divergence.

24 Oct 2014

Data Science and Hadoop: Part 4, Outlier Analysis View Comments

Outlier analysis of healthcare payment data with Spark using Median Absolute Divergence.

24 Oct 2014

Data Science and Hadoop: Part 3, Basic Structural Analysis View Comments

Basic structural analysis of healthcare payment data using Spark SQL and Python.

24 Oct 2014

Data Science and Hadoop: Part 2, Data Overview and Preprocessing View Comments

Part 1 of a series of analyses with PySpark with healthcare financial data. Data Overview and preprocessing for Center for Medicare and Medicaid open payments data.

24 Oct 2014

Data Science and Hadoop: Impressions and Example View Comments

A discussion of the challenges of doing Data Science projects with Hadoop.

07 May 2014

Making Sense of Political Texts with NLP View Comments

Clustering senatorial speeches from 2008 by topic using t-stochastic neighbor embedding and latent dirichlet allocation.

08 Apr 2014

Spark for Data Science: A Case Study View Comments

An analysis of which Unix commands appear together more than random chance would suggest.

20 Mar 2012

Better News through Computational Political Science View Comments

I recently gave a talk on a NLP project that I worked on for Kent's ACM

20 Mar 2012

Hadoop Best Practices View Comments

I recently gave a talk at the Cleveland Hadoop User Group on Hadoop Best Practices