Archives

This is the complete archive of posts from my technical blog in reverse chronological order.

22 Jan 2018

A Blockchain Story Told Through The Eyes of Two Users View Comments

Cryptocurrencies are generating a lot of hype right now. Blockchain-based systems provide a huge amount of transparent detail about how a currency is actually used. Despite this fact, analysis has been largely based around traditional security analysis which ignores the full amount of data available in favor of simpler metrics which treat the system as a black box. Here, I look at two deeper analytics that tell a story of how a cryptocurrency is actually used, which may be of interest to blockchain developers and investors alike.

04 Dec 2015

Word2Vec with Non-Textual Data View Comments

I'm going to describe some of the challenges with understanding data and I'll go into some technical detail of how to borrow some scalable unsupervised learning from natural language processing coupled with a very nice data visualization technique to facilitate understanding the natural organization and arrangement of data.

24 Oct 2014

Data Science and Hadoop: Impressions and Example View Comments

A discussion of the challenges of doing Data Science projects with Hadoop.

24 Oct 2014

Data Science and Hadoop: Part 5, Benford's Law Analysis View Comments

Benford's Law analysis of healthcare payment data with Spark using Median Absolute Divergence.

24 Oct 2014

Data Science and Hadoop: Part 4, Outlier Analysis View Comments

Outlier analysis of healthcare payment data with Spark using Median Absolute Deviation.

24 Oct 2014

Data Science and Hadoop: Part 3, Basic Structural Analysis View Comments

Basic structural analysis of healthcare payment data using Spark SQL and Python.

24 Oct 2014

Data Science and Hadoop: Part 2, Data Overview and Preprocessing View Comments

Part 1 of a series of analyses with PySpark with healthcare financial data. Data Overview and preprocessing for Center for Medicare and Medicaid open payments data.

07 May 2014

Making Sense of Political Texts with NLP View Comments

Clustering senatorial speeches from 2008 by topic using t-stochastic neighbor embedding and latent dirichlet allocation.

08 Apr 2014

Spark for Data Science: A Case Study View Comments

An analysis of which Unix commands appear together more than random chance would suggest.

20 Mar 2012

Better News through Computational Political Science View Comments

I recently gave a talk on a NLP project that I worked on for Kent's ACM

20 Mar 2012

Hadoop Best Practices View Comments

I recently gave a talk at the Cleveland Hadoop User Group on Hadoop Best Practices