Insider’s Guide to Apache Spark

Posted: November 9, 2015 in Projects

insideBIGDATA_Guide_SparkI’d like to announce my new technology guide – An Insider’s Guide to Apache Spark on behalf of insideBIGDATA and sponsored by industry analytics leader TIBCO. The guide is a useful new resource directed toward enterprise thought leaders who wish to gain strategic insights into this exciting new computing framework. As one of the most exciting and widely adopted open-source projects, Apache Spark in-memory clusters are  driving new opportunities for application development as well as increased intake of IT infrastructure.

The guide includes the following topics:

  • An overview of Spark
  • Why Spark is so hot
  • Looking at Spark through a Hadoop lens
  • Spark SQL
  • The TIBCO–Spark connection

You can download my new Spark guide HERE.

I’m also taking part in an upcoming TIBCO webinar on Nov. 17 at 1:30pm ET. Click HERE to register.


Melange_booth1I was pleased to accept an invitation from Melange Live CEO Tom Keefer to exhibit at his 1st annual event “where fashion meets technology.” The two-day conference was held on Sept. 16-17 at The New Mart in the heart of LA’s fashion district. The show featured the latest and greatest innovators in this new and upcoming space. I had a blast at my booth (see picture) which was front/center and met with all sorts of interesting people including a hot new startup accelerator in the Downtown LA area: As a data scientist I was encouraged with all the new technologies coming out, most of which depend on using data for greater insights.

bigdata_fashion_featureMy long affiliation with LA’s preeminent fashion mart – The New Mart, has been a fruitful one. This collection of over 70 high-end fashion showrooms is managed by a forward-thinking team that allowed me to engage methods of statistical learning to increase the reach of their many clothing lines through use of social media data sources. I built some cool technology to yield a weekly “Fashion top 10” that serves to drive The New Mart’s social media effort. Using sentiment analysis coupled with data sources like Twitter, Facebook, Instagram and fashion blogs, spreading brand awareness is approached in a strategic and focused manner.

insideBIGDATA Guide to Retail

Posted: September 9, 2015 in Projects, Uncategorized

insideBIGDATA_Guide_RetailI’d like to announce the availability of a new technology guide that I was contracted to research, develop and write — “insideBIGDATA Guide to Retail” sponsored by Dell and Intel. This guide is directed toward line of business leaders in conjunction with enterprise technologists with a focus on the above opportunities for retailers and how Dell can help them get started. The guide also will serve as a resource for retailers that are farther along the big data path and have more advanced technology requirements.

I was excited about writing this guide since I spend a lot of my time as a practicing data scientist in the fashion industry where I build machine learning solutions to enhance brand awareness.

You can download a copy of the guide HERE.

MachineLearning_book_cover_smallI’m very proud (and relieved) to announce that my year-long+ book project is finally done! “Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R” will be available later this year from Technics Publications. The book provides an introduction to the entire data science process, highlighting the ways that machine learning can be used to solve business problems. Both supervised and unsupervised statistical learning techniques are included. The R statistical programming language is used throughout. Here is the table of contents:


Chapter 1: Machine Learning Overview

Chapter 2: Data Access

Chapter 3: Data Munging

Chapter 4: Exploratory Data Analysis

Chapter 5: Regression

Chapter 6: Classification

Chapter 7: Evaluating Model Performance

Chapter 8: Unsupervised Learning

The book is perfect for newbies just entering the data science field who wish to quickly get up to speed with the technology. I plan to use the book for the introductory courses I teach for corporations and universities. You can pre-order the book on Amazon HERE.


insideBIGDATA_Guide_Research_featureI’m pleased to announce that I was contracted to research, develop and write a new technology guide “insideBIGDATA Guide to Scientific Research” sponsored by Dell and Intel. The goal for this Guide is to provide a road map for scientific researchers wishing to capitalize on the rapid growth of big data technology for collecting, transforming, analyzing, and visualizing large scientific data sets.

I was particularly excited about writing this guide since, in a previous life, I was a researcher in the data analysis effort for a large-scale astrophysics project.

You can download a copy of the guide HERE.

Hadoop Summit 2015

Posted: May 18, 2015 in Events, Uncategorized

hadoop_summit_logoI am pleased to report that I will be attending the upcoming Hadoop Summit 2015 in San Jose on June 9-11. I’ll be the guest of Hortonworks (host of the show and one of the leading Hadoop distributions) and will be covering the conference for insideBIGDATA. Check out insideBIGDATA’s new Hadoop 101 learning channel where I shall publish many of the new presentations I find at the Hadoop Summit.