Sunday, August 16, 2015
I am often traveling between Sydney and Melbourne, so feel free to contact me regarding our Analytics consulting services.
ADMA 'Advancing Analytics'
Analytics8 sponsored this year's 'Advancing Analytics' forum held by the Australian Association for Data-Driven Marketing and Advertising (ADMA). It was a good event, with some interesting speakers discussing their work.
I also wrote some LinkedIn posts available here;
Knowledge Transfer Sessions
I will also soon be hosting and delivering monthly 'Knowledge Transfer Sessions' aimed at students and young analytics practitioners. The objective is to review typical data analytics challenges with a focus upon the practical details, though processes involved, and any pitfalls entered. Just my attempt to help mentor others, improve the skill gaps, and demand for data science / analytics practitioners.
ADMA will be involved, and we're currently planning the format. More details to follow shortly.
Sunday, September 8, 2013
o Using national census data
o Store level summary analysis
o Sales data summarised to a customer level
o Customer Segmentation
o Time series forecast of spend for each customer
o Next best product
o Churn propensity
o Price Elasticity
o Product Cannibalisation & Halo
o Price Sensitivity
- Loss of expected revenue, not no. of customers churn
A common peculiarity in many organisations is the definition of churn is simply how many customers are lost. Sometimes analysis may include or segment churn by a spend category (high value vs low value etc), but rarely is the churn model actually predicting how much money is lost.
Furthermore, the behavioural/transactions within a retail organisation is less than other types of industries. For example a typical telecom in Australia would collect 200 million records a day, which may be more than a year for a typical leading retail organisation (in terms of behavioural/transactional records).
For an industry like telecommunications or banking there is almost a monthly contractual behaviour that can be fairly easily analysed. For example the number of calls made and received, or the monthly pay check going into your bank account and the purchase transactions that debit your account. These happen very frequently and often forced into cycles (week, month). Conversely, in retail you may have weeks or months between voluntary visits, and spend in any one visit may differ depending upon seasonality or customer. A customer churn definition in retail must recognise the flexible RFM nature of the customer. In the limited time I had available I didn’t look into approaches to analyse customers differently depending upon their RFM behaviour. In the data I was analysing the majority of customers visited at least quarterly, so to keep things simple I aggregated customer spend to a quarterly summary. I then forecasted spend individually for each customer for the subsequent three quarters using a structural time series modelling approach (unconstrained component model) with a hierarchy up to customer segment (in order to also provide forecast spend by customer segments).
The forecast of spend could then be used as a measure of the potential loss of revenue. If the forecast was lower than the customer’s previous average behaviour, then it suggest decreasing spend and potential ‘churn’. The difference between the customer’s forecast and rolling average could easily be used as a loss of expected revenue (aka. a churn score). Marketing activities then focus upon reducing forecasted revenue loss through changes in RFM behaviour of each customer.
2) Market Basket Analysis
- Representation of product and price used in basket analysis.
In a typical grocery retail environment a product can be on the shelves and sold to customers for many years. The product doesn’t change and simple marketing basket analysis can be very valuable, but what if the product will only exist for 3 months and never repeats? How are associations of new products predicted? What if products are always new?
In some retail environments a product may only be on the shelves for a few months and may never repeat. For example take the situation in a Kids Clothes dept, there may be a Marvel Avengers t-shirt with a specific stock keeping unit (SKU) code. Within a few months the stock is sold and the product is replaced with a new t-shirt, perhaps Ironman :)
In these situations where a product code is frequently changing, market basket analysis applied to directly to the products will have limited long-term applicability and success because any basket associations that existed last month or last year may not match any existing products. Solutions to this might involve either matching all new products to equivalent old ones, or the basket analysis itself could be applied to some form of representation of a product.
Without going into too many details (for fear of losing some intellectual property and giving away too many tricks...) I used a simple approach to group products based upon their price. Within each product/sku grouping (for example, Kids T-Shirts) there will be many different product codes that change over time. Some products will be priced at the bottom scale, others priced the highest for that product category (ignoring discounts). Any new product can be matched to the old simply by being in the same price range.
Ok, so now we have basket analysis that can be applied not upon products, but ‘low value product category A’ and ‘high value product category B’. This may seem like a subtle difference, but it is something I haven’t heard or seen retail vendors do yet. Even the leaders (Tesco, Walmart etc). Marketing basket associations are often reported as “Customer that bought Milk also bought Bread”. Honestly, my first thoughts in response when I hear this are “so the f##k what...”. It is one of those common examples of analysis that doesn’t shout out business value. Simply having an association doesn’t help the retail organisation better manage the internal politics of cross department promotions, nor does it help them even understand if sales of a low priced or promoted item associates with sales of other low or high priced items for example.
In short, my point is I built associations that differentiated by price and products, rather than just associate products.
Doing this I enabled basket associations that, for example, highlighted the increase in $ sales that occur in kids footwear (full price) whenever there is a promotion (cheaper price) in Kids Clothes.
One specific product (Children’s Wear -> Girls 7-10 -> Pleated Dress).
Product price varies from $6.01 to $22.40. Average (mean) price of $15.40
It is a subtle difference, to perform market basket analysis upon products including the price. This enables analysis of associations that might only occur when one or more products are reduced in price. This can help a retail organisation then better design and optimise promotions.
Sunday, November 27, 2011
This specific post is to help promote the launch of the new IAPA website and increase focus on Analytics in Australia (and Sydney, where I am normally based). The topic of this post is something that has been at the forefornt of my mind and seems to be a central theme of many of the projects I have been working on recently. It is certaininly a current problem for many Marketing/Customer Analytics departments. So here are a few thoughts and comments on 'big data'. Apologies for typos, it is mostly written piecemeal on my iPhone during short 5 mins breaks...
So, below is a series of my most recent observations from Analytics projects I have been involved with that involved resolving, or encountered 'big data' problems:
After a week of design and preliminary work I began to conasider ways to optimise the performance of my queries and computations, and I asked about the server specifications. I assumed some big server with dozens of processors, but unfortunately what I was connecting to was a dual core 4GB desktop PC under an Analyst's desk...
So, what is the best way to account for outliers, skewed distributions, poor data sparsity, or highly likely erreonous data features? Well an approach (that i am not keen on) taken by some is to apply several variable transformations indiscriminatly to all 'raw' variables and subsequentially let a variable selection process pick the best input variables for propensity modeling etc. When combined with data which represents transposed time series (so a variable represents a value in 'month1' the next variable the same value dimension in 'month2' etc) then this can easily generate in excess of 20,000 variables (by say 10 million customers...). It is true there are variable selection methods that handle 20,000 quite well, but the metadata and processing to create those datasets is often significant and the whole process often incurs excessive costs in terms of time to delivery of results.
Additional problems that may arise when you start working with many thousands of variables is that variable naming needs to be easily understood and interpretable. The last thing a data miner wants to do is spend hours working out what those transformed and selected important variables in the propensity model actually mean and represent in the raw data.
Which leads me to my next point..
- Variable / Data Understanding
One of the core skills of a good data miner is the understanding and translate complex data in order to solve business problems.
As organisations obtain more data it is not just about more records, often the data reveals new subtle operational details and customer behaviors not previously known, or completely new sources of data (FaceBook, social chat, location based services etc). This in turn often requires extended knowledge of the business and operational systems to enable the correct data warehouse values or variable manipulations and selections to be made.
An analyst is expected to understand most parts of an organization's data at a level of detail most individuals in the organisation are not concerned with, and this is often a momental task.
As an example of 'big data' bad practice, I've encountered verbose variables names which immediately require truncation (due to IT / variable name limit reasons), others which make understand the value or meaning of the variable difficult, or naming conventions which are undocumented. For example: "number_of_broken_promises" is one of the funniest long max variable names I've seen, whilst others such as "ccxs_ytdspd_m1_pct" can be guessed when you have the business context but definitely require detailed documentation or a key.
- Diverse Skillsets
Sunday, February 13, 2011
Many Android handsets now have near field communications (NFC) technology. According to some reputable sources (http://www.bloomberg.com/news/2011-01-25/apple-plans-service-that-lets-iphone-users-pay-with-handsets.html) the 5th generation of iPhone will also include near field communications (NFC) technology, which amongst other things can allow users to pay for goods and services just like they currently do with their credit card.
Many iPhone users already buy songs and applications from iTunes, which has made it become a significant global billing platform, and provides a notable proportion of revenue for Apple (4.1% of its total quarterly earnings for Q1, see http://www.fiercemobilecontent.com/story/apples-itunes-revenues-top-11-billion-q1/2011-01-19)
If iPhone, iPad and iPod users adopt widespread use of NFC for purchase of everyday groceries and general retail goods, then iTunes could quickly curve a huge slice out of the VISA and Mastercard revenue stream.
The use of smart phone applications for communication (such as Facebook and Twitter) have already taken significant chunks out of telco’s revenues from traditional voice communication. As smart devices and apps further empower users, telcos face the greater danger of becoming a dumb pipe. In my opinion there is the opportunity for NFC to enable telco’s to develop a closer relationship with customers and act as the information conduit (rather than Google or Apple).
With varying degrees of success, telco’s currently perform a lot of data mining to understand usage patterns, household demographics, forecasting of network demand etc. Much of this analysis is marketing focused, with an objective to gain new customers, retain a customer, and/or spend more. Most importantly for data miners these marketing activities usually involve intelligently processing very large amounts of data. There are a lot of parallels with data mining performed by VISA and Mastercard, so you would think that telco’s might have the infrastructure and experience to play in the area of credit cards.
Some telcos are able to provide single billing, whereby the entire household has a single bill for multiple mobile services, wireless broadband, fixed/land telephony, cable TV etc. If a telco already has the rating system to charge for usage of high transaction telephony services, and also provide a single unified household billing platform, then incorporating the purchase of retail goods and a NFC system should not be a challenge for a telco. From my experience I’ve not seen VISA or many retail banks offer a single bill for your household purchases, across multiple individuals and products. This capability places telco’s head and shoulders above banks and credit card companies in the customer experience stakes.
Most developed countries have 3G or better mobile networks, and when combined with smart phones can easily pin-point the location of a customer. If telco’s used NFC to process and learn each customers (or household’s) purchase habits and preferences, then there is no reason why they couldn’t recommend products and offers for shopping centres or stores in your immediate vicinity in real-time. The additional revenue opportunities might even be able to cover the cost of moderate telephony usage, so customers could get a mobile plan subsidised by advertising and purchase revenue. For example, the telco would develop the trusted relationship with the customer, and many retailers could pay a commission to target specific customer segments, or individuals in the vicinity that buy similar products. Retailers wouldn’t need to implement their own loyalty cards to identify customers, they could simply get summarised information about who shops at their stores, how often, share of wallet etc from the telco company that manages the relationship with the customer. I would relish the opportunity to analysis *that* kind of data!
Granted there are a lot of challenges, but the fantasy of Minority Report might not be that unrealistic…
Thursday, October 28, 2010
The business problem to solve was generating customer insight (Businesses with loans), with considerations for each client business' financial health and business loan repayment risk.
The first thing we concentrated on was tax payments. The data I had access to contained typical finance account monthly summaries (eg. balance at close of month, total $ of transactions etc) but also two years of detailed transactional history of all outgoing and inbound money transfers/payments (eg. including tax payments made by many thousands of businesses). We examined two years of summary data and also all transactions for only those money transfers/payments that involved the account number belonging to the tax man.
The core idea was to understand each businesses tax payments over time in order to get an accurate view of their financial health. Obviously this would have great importance in predicting future loan repayments or likelihood of future financial problems. One main objective was to understand if tax payment behavior differed significantly between customers, and a secondary consideration was the risk profiles of any subgroups or segments that could be identified.
It was a quick preliminary investigation (less than two weeks work) so I tackled the problem very simplistically to meet deadlines.
For the majority of client businesses tax payments occur quarterly or monthly, so I first summarized the data to a quarterly aggregation, for example;
As you can see above, each customer could have many records (actually it was a maximum of 8, one for each quarter over a two year period), each record showing the account balance at the end of the quarter and the net sum of payments made to (or from!) the tax man.
Then I created two offset copies of Tax Payments, one being the previous record (Lag) and the other being the subsequent record (Lead) like so;
I then simply scaled the data so that everything was between 0-1 by using;
(X – (minimum of X)) / ((maximum of X) - (minimum of X))
Obviously, where X is one of the variables representing quarterly account balance or tax payments, and the maximum is within Customer ID.
For example the raw data here;
Got rescaled to;
I did the all raw balance and tax payment variable rescaling this way so that I could later run a Pearson’s correlation, and k-means clustering, and also graph data easily on the same axis (directly compare balance and tax payments). Some business customers had very large account balances, but small tax payments.
For example I could eventually generate a line chart like this showing a specific business’ relationship between balance (dotted line) and tax payments (bold red line);
I then ran a simple Pearson’s correlation with the variable ‘Balance’ correlated against the 3 tax payment variables (original, lag , and lead) with a correlation Group By clause on the Customer ID. This would output three correlation scores, one for the original (account balance and tax payments in same month), second for the correlation between current account balance and previous month’s tax payments, and the third for the current account balance and future month tax payments.
My thought process was to use the highest correlation score (along with balance and tax payment amounts as described below) to build k-means clusters to segment the customer base. Hopefully the segments would reflect, amongst other things, the strongest relationship between account balance and tax payments.
I joined the correlation outputs to the data and then I flipped/transposed and summarized the data so that each quarter was a new column for balance and tax payments, creating a very wide and summarized data set. For example;
…also including the correlation, lag, lead and original value variables in the single record per customer…
Now I have a dataset that is a nice single record per customer, and concentrated on representing the growth or decline in tax payments over the 2 year period. I did this quite simply by converting the raw payments into percentages (of the sum of each customer’s payments over the two years). In some cases a high proportion of the customer’s payments occurred many months ago, which represents a decline in recent quarters.
I then built a K-means model using inputs such as;
- the highest correlation score (of the three per customer) and categorical encoding of the correlations (eg. ‘negative correlation’ / ‘positive correlation’, ‘lag’ / ‘lead’ etc)
- Data manipulated payment sums
- Variables representing growth or decline in payments over time.
The segments that were generated have proved to perform very well. Many features of the client business that were not used in the segmentation (eg number of accounts per client, and risk propensity) could be distinguished quite clearly by each segment.
When I examined the incidence of risk (failure or problems repaying a business loan) for a three month period (also with a three month gap) I found some segments had almost double the risk propensity.
Timeline described below;
As you can see, there were a very small number of risk outcomes (just 204 in three months) but each of these is very high value, so any lift in risk prediction is beneficial. I hate working with such small samples, but sometimes you get given lemons….
Suppose I built five clusters, here’s an example summary of the type of results I managed to get;
Where ‘Risk Index’ is simply calculated as;
(‘% Of Total Risk’ – ‘% Of Client Count’ ) / ‘% Of Client Count’
So, this is showing that cluster 5 has 67.91% higher propensity to be a bad risk that the entire base (well, in the analysis…). Conversely cluster 2 is much less (-70%) likely to be a bad risk than the average customer.
Maybe not your typical financial risk model….
Wednesday, June 16, 2010
See their current newsletter;
I'll make the presentation as vendor neutral and informative as possible (but obviously I can't discuss details of any previous confidential work by myself or SAS).
If you are in Melbourne on Wednesday 23rd June, then feel free to book and attend the presentation. As with all IAPA events it is free and a great opportunity to 'social network' :) with others interested in analysis and data mining.
I hope to see you there!
Wednesday, April 14, 2010
- baby No.2 due in 5 months
- starting a new job
- which means lots of work finalising and handing over data mining projects at my current employer (Optus)
- lots of new stuff to read and learn at the new employer (SAS)
Thursday, March 11, 2010
This news seemed to slip the major national newspapers, which is quite surprising as it is likely to involve significant amounts of money. To be honest I’m not concerned with the consequences, but as a data miner it does interest me how data *is* used, and how it *could* be used.
As technology advances I’m certain the general public will see more examples of invasions of personal privacy and breaches of data confidentiality that enable organisations to gain the upper hand (unless or until they are caught). Keep it honest people!
Wednesday, December 16, 2009
There are lots of parallels to IAPA (Institute of Analytics Professionals of Australia http://www.iapa.org.au/), but the audience seemed to be more hands-on analysts. Being based at Google it had quite a few web based analysts too.
The next meet-up is 11th February 2010. I'll be there having a chat and a few beers.
Tuesday, November 24, 2009
A number of data miners have presented findings based upon using simple ensembles that use the mean prediction of a number of models. I was surprised that some form of weighting isn’t commonly used, and that a simple mean average of multiple models could yield such an improvement in the global predictive power. It kinda reminds me of Gestalt theory phrase "The whole is greater than the sum of the parts". It’s got me thinking, when it is best not to share predictive power. What if one model is the best? There is also a ton of considerations regarding scalability and trade-off between additional processing , added business value, and practicality (don’t mention random forests to me..), but we’re pretend those don’t exist for the purpose of this discussion :)
So this has got me thinking do ensembles work best in situations where there are clearly different sub-populations of customers. For example Netflix is in the retail space, with many customers that rent the same popular blockbuster movies, and a moderate number of customers that rent rarer (or far more diverse, ie long tail) movies. I haven’t looked at the Netflix data so I’m guessing that most customers don’t have hundreds of transactions, so generalising the correct behaviour of the masses to specific customers is important. Netflix data on any specific customer could be quite scant (in terms of rents/transactions). In other industries such as telecom, there are parallels; customers can also be differentiated by nature of communication (voice calls, sms calls, data consumption etc) just like types of movies. Telecom is mostly about quantity though (customer x used to make a lot of calls etc). More importantly there is a huge amount of data about each customer, often with many hundreds of transactions per customer. There is therefore relatively lesser reliance upon supporting behaviour of the masses (although it helps a lot) to understand any specific customer.
Following this logic, I’m thinking that ensembles are great at reducing the error of incorrectly applying insights derived from the generalised masses to those weirdos that rent obscure sci-fi movies! Combining models that explain sub-populations very well makes sense, but what if you don’t have many sub-populations (or can identify and model their behaviour with one model).
But you may shout "hey what about the KDD Cup". Yes, the recent KDD Cup challenge (anonymous featureless telecom data from Orange) was also a won by an ensemble of over thousand models created by IBM Research. I'd like to have had some information about what the hundreds of columns respresented, and this might have helped better understand the Orange data and build more insightful and performing models. Aren't ensemble models used in this way simply a brute force approach to over learn the data? I'd also really like to know how the performance of the winning entry tracks over the suebsequent months for Orange.
Well, I haven’t had a lot of success in using ensemble models in the telecom data I work with, and I’m hoping it is more a reflection of the data than any ineptitude on my part. I’ve tried simply building multiple models on the entire dataset and averaging the scores, but this doesn’t generate much additional improvement (granted on already good models, and I already combine K-means and Neural Nets on the whole base). During my free time I’m just starting to try splitting the entire customer base into dozens of small sub-populations and building a Neural Net model on each, then combining the results and seeing if that yields an improvement. It’ll take a while.
Tuesday, November 3, 2009
Other bloggers have reviewed PAW and encapsulate my views perfectly. For example see some of James Taylor’s blog entries http://jtonedm.com/tag/predictive-analytics-world
James also provides a short overview of my presentation at PAW http://jtonedm.com/2009/10/20/know-your-customers-by-knowing-who-they-know-paw
My presentation at PAW was 35 minutes followed by 10 minutes for questions. I think I over-ran a little because I was very stretched to fit all the content in. For me the problem of data mining is a data manipulation one. I usually spend all my time building a comprehensive customer focused dataset, and usually a simple back-propagation neural network gives great results. I tried to convey that in my presentation, and as James points out I am able to do all my data analysis within a Teradata data warehouse (all my data analysis and model scoring runs as SQL) which isn't common. I'm definitely a believer that more data conquers better algorithms, although that doesn't necessarily mean more rows (girth is important too :))
Sunday, November 1, 2009
Rightly so, Dean pointed out that the building of neurals nets can actually work perfectly fine against unbalanced data. The problem is that when the Neural Net determines a categorical outcome it must know the incidence (probability) of that outcome. By default Clementine will simply take the output neuron values, and if the value is above 0.5 the prediction will be true, else if the output neuron value is below 0.5 the category outcome will be false. This is why in Clementine you need to balance categorical outcome to roughtly 50%/50% when you build the neural net model. In the case of multiple categorical values it is the highest output neuron value which becomes the prediction.
But there is a simple solution!
It is something I have always done out of habit because it has proved to generate better models, and I find a decimal score more useful. Being a cautous individual (and at the time a bit jet lagged) I wanted to double check first, but simply by converting a categorical outcome into a numeric range you will avoid this problem.
In situations where you have a binary categorical outcome (say, churn yes/no, or response yes/no etc) then in Clementine you can use a Derive (flag) node to create alternative outcome values. In a Derive (flag) node simply change the true outcome to 1.0 and the false outcome to 0.0.
By changing the categorical outcome values to a decimal range outcome between 0.0 and 1.0, the Neural Network model will instead expose the output neuron values and the Clementine output score will be a decimal range from 0.0 to 1.0. The distribution of this score should also closely match the probability of the data input into the model during building. In my analysis I cannot use all the data because I have too many records, but I often build models on fairly unbalanced data and simply use the score sorted / ranked to determine which customers to contact first. I subsequently use the lift metric and the incidence of actual outcomes in sub-populations of predicted high scoring customers. I rarely try to create a categorical 'true' or 'false' outcome, so didn't give it much thought until now.
If you want to create an incidence matrix that simply shows how many 'true' or false' outcomes the model achieves, then instead of using the Neural Net score of 0.5 to determine the true or false outcome, you simply use the probabilty of the outcome used to build the model. For example, if I *build* my neural net using data balanced as 250,000 false outcomes and 10,000 true outcomes, then my cut-off neural network score should be 0.04. If my neural network score exceeds 0.04 then I predict true, else if my neural network score is below 0.04 then I predict false. A simple derive node can be used to do this.
If you have a categorical output with multiple values (say, 5 products, or 7 spend bands etc) then you can use a Set-To-Flag node in a similar way to create many new fields, each with a value of either 0.0 or 1.0. Make *all* new set-to-flag fields outputs and the Neural Network will create a decimal score for each output field. This is essential exposing the raw output neuron values, which you can then use in many ways similar to above (or use all output scores in a rough 'fuzzy' logic way as I have in the past:).
I posted a small example stream on the kdkeys Clementine forum http://www.kdkeys.net/forums/70/ShowForum.aspx
Just change the file suffix from .zip to .str and open ther Clementine steeam file. Created using version 12.0, but should work in some older versions.
I hope this makes sense. Free feel to post a comment if elboration is needed!
Monday, October 12, 2009
I'm presenting how leveraging the social interactions of the Optus mobile/cellphone customer base has enabled unparalleled insights into customers and prospects.
In my opinion the presenters and topics being discussed are interesting and worth attending. These conferences are the few events where industry analysts congregate and discuss their work.
I will probably have a few meetings and activities lined up, but I'm always happy to chat over a few beers. If you are there feel free to say 'hi'. I'm in Washington for 4 days, then taking a few days holiday with family in New York.
Sunday, September 13, 2009
I decided to post some churn model outcomes after reading a post by the enigmatic Zyxo on his (or maybe her :)) blog ;http://zyxo.wordpress.com/2009/08/29/data-mining-for-marketing-campaigns-interpretation-of-lift/
I'd like to know if the models rate well :)
I'd love to see reports of the performance of any predictive classification models (anything like churn models) you've been working on, but I realise that is unlikely... For like-minded data miners a simple lift chart might suffice.
The availability of data will greatly influence your ability to identify and predict churn (for the purpose of this post churn is defined as when good fare paying customers voluntarily leave). In this case churn outcome incidence is approx 0.5% per month, where the total population shown in each chart is a few million.
Below are two pictures of recent churn model Lift charts I built. Both models use the previous three months call summary data and the previous month's social group analysis data to predict a churn event occurring in the subsequent month. Models are validated against real unseen historical data.
I'm assuming you know what a lift chart is. Basically, it shows the magnitude increase in the proportions of your target outcome (in this case churn) within small sub-groups of your total population. Sub-groups are rank/sorted by propensity. For example, in the first chart we obtain 10 times more churn in the top 1% of our customers we suspected of churning using the predictive model.
The first model is built for a customer base of prepaid (purchase recharge credit prior to use) mobile customers, where the main sources of data are usage and social network analysis.
The second model is postpaid (usage is subsequently billed to customer) mobile customers, where contract information and billing are additionally available. Obviously contracts commit customers for specified periods of time, so act as very 'predictive' inputs for any model.
- first churn model lift
- second churn model lift
Both charts show our model lift in blue and the best possible result in dotted red. For the first model we are obtaining a lift of approximately 6 or 7 for the top 5% population (where the best possibly outcome would be 20 (eg. (100 / 5) = 20).
The second model is significantly better, with our model able to obtain a lift of approximately 10 for the top 5% of population (half way to perfection :)
I mention lift at 5% population because this gives us the reasonable mailing size and catches a large number of subsequent churners.
Obviously I can't discuss the analysis itself in any depth. I'm just curious what the first impressions are of the lift. I think its good, but I could be delusional! And just to confirm, it is real and validated against unseen data.
Tuesday, July 21, 2009
Below is a summary and short review of the books that are sitting on my desk at work...
- from left to right;
I got a free copy because I contributed to some of the industry examples. I'm even quoted in it! I found the book very useful and would recommend it for any marketing analyst. It talks about ROI and measuring every type of marketing event and customer interaction. Lots of case studies, which I always like. No detail in terms of data analysis itself, but plenty of food for ideas.
- Advances In Knowledge Discovery And Data Mining (editors Usama Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, Ramasamy Uthurusamy)
I first bought this for Rakesh Agrawal's article on Association Rules (Apriori in Clementine), but also found John Elder's Statistical Perspective on Knowledge Discovery very informative. It provides a great concise history of data mining.
- Data Mining Using SAS Applications (author George Fernandez)
I bought this hoping to get a different opinion or learn something new (compared to the SPSS Clementine User Guide I have far too much experience of..). I thought; maybe SAS analysts had a better way to do a specific type of data handling or followed a alternative thought pattern to accomplish a goal. Sadly I was disappointed. Like many data mining books it spend hundreds of pages describing algorithms and expert options for refining your model building and less than 10 pages on data transformations and/or data cleaning. Those 10 pages are well written though. Not worth the purchase in my view. I only hope SAS analysts have better books out there.
- Data Mining Techniques (authors Micheal Berry and Gordon Linoff)
Written by practitioners means a lot. The one book I often re-read just in case I missed something the previous time :) Maybe because it is very applicable to my role as an analyst in a marketing dept in a telecommunications carrier, but I find this book invaluable. Lots of case studies. 100 pages of background and practical tips before it even reaches 'algorithms' is good in my view, and when do you reach the algorithms they are described in practical terms as techniques very well (rather than a laborious stats class, and I didn't do stats at University). I find the whole book a joy to read. A must for every data miner.
- Data Mining. A Tutorial Based Primer (authors Richard Roger and Micheal Geatz)
Whilst going through a phase of keen hobby programming in VB.NET I tried my hand at writing a neural net, decision tree etc from scratch. I found this book really helpful since it goes through every detail a programmer would need to implement their own data mining code in Excel. I work with huge amounts of data, so the thought of doing data mining in Excel makes my giggle (maybe that's a bad thing...) but the principals of data manipulation, cleaning and prediction etc can easily be applied in Excel. If you really want to understand how algorithms work and build your own, then this book is very useful for that purpose.
- Data Mining. Introductory and Advanced Topics
If you did spend several years studying mathematics or statistics then this book would probably act as a great reference and reminder of how algorithms work.
Its very academic and sometimes that's useful. I think there's one line in there somewhere that mentions data cleaning or data transformations as being an industry thing... It is also quite a hard heavy book, so could be useful to rest stuff on.
- Data Mining. Practical Machine Learning Tools and Techniques (Ian Witten and Eibe Frank)
This is a classic example of bait advertising that some authors should be jailed for. On page 52 of this book the authors write;
"Preparing input for a data mining investigation usually consumes the bulk of the effort invested in the entire data mining process. Although this book is not really about the problems of data preparation, we want to give you a feeling for the issues involved...." Fuck me, its not a data mining book then is it? Not only that, they actually use the term "Practical" in the title. Clearly it is not practical at all if it involves absolutely zero data manipulation. If I ever meet one of these authors I will slap them in the face and demand my money back... Oh and over half the book is a damn Weka user guide.
- The Elements Of Statistical Learning. Data Mining, Inference and Prediction (authors Trevor Hastie, Robert Tibshirani, Jerome Friedman)
Very heavy on the stats and squiggly equations (which take me ages to make sense of) but quite well written because I usually manage to understand it. Explains the algorithms stuff very well. I don't refer to it much and only read a few chapters in depth, but it was worth the purchase.
- The Science Of Superheroes (authors Lois Gresh and Robert Weinberg).
Not everything is about data mining. There's a whole world out there, and just maybe it includes super heroes with laser beams shooting out of their eyes. Its a soft-core science book discussing concepts such as; faster than light speed, cosmic rays, genetically engineered hulks, flying without wings, and black holes and how it all relates to real-life superheroes (if they existed). Really good geeky material.
- Data Preparation For Data Mining (author Dorian Pyle)
A good book, and like "Data Mining Techniques" it clearly covers topics with a practical understanding (no 'real-world' case studies though). Where it differs is that this book has a stronger academic or statistics focus. I didn't get a sense that the examples would always relate to large real-world data sets, and many methods I use were not mentioned at all (for example frequency binning) because they have no statistical basis. Here's the problem; this is a great data mining book, but only for the statistics in practical data mining. It is a book I frequently refer to and would recommend, although I'd like to see stuff added that *isn't* based on statistics.
- The Essence Of Databases (author F. Rolland)
101 database for dummies. It describes database schemas, relational concepts, tons of SQL examples for queries and data transformations, describes object oriented databases etc.
Essential stuff for anyone querying a corporate data warehouse. It reads easily and is recommended.
- Data Mining. Concepts, Models, Methods, and Algorithms (author Mehmed Kantardzic)
Another 'list all the algorithms I know' book. I'll be honest; I only quickly flicked through it hoping to see some case studies or something new. It seemed good, but didn't seem to have anything to set it apart from any other algorithms book.
- Statistics Explained. Basic concepts and methods. ( authors R. Fapadia and G. Andersson)
Just in case I forget what a t-test is. Has lots of pictures :)
- Clementine User Guides (author: many at SPSS, well if memory serves me Clay Helberg did a fair chunk of it) . When I was at SPSS I had a small part to play in these. I provided some examples and proof read where possible. I've been using Clementine daily for over a decade, but still refer to the user guide occasionally. I find them useful, but they could benefit from some new examples to take advantage of the many new features that have been added in recent years.
Monday, June 15, 2009
Last year, at the Asia Teradata User Group in Beijing, I presented some generic data mining that was being performed at Optus (mostly simple churn analysis and behavioural segmentation). I also had a few meetings with the analysts from some China telco's about how relatively simple data analysis can scale up to many millions of customers and billions of rows of data.
This year I'll be presenting at the US Teradata User Conference some of the more advanced analysis that I've recently done, notably surrounding social network analysis in the mobile customer base on large amounts of data (several billions of rows). I'm hoping to be able to quote some actual business outcomes and put up some $ numbers.
The US 2009 Teradata User Group Conference & Expo, October 18–22, 2009, at the Gaylord National Resort.
I'll be presenting on Wednesday 21st October 2009 at Maryland D on the Business Track. Judging from the large number of presentations I guessing it will a much smaller and personal room than the 1000+ conference hall I was in last year in Beijing :)
Feel free to say hi and ask lots of questions if you see me there. I might have one free evening for a few beers if anyone wants.
Wednesday, May 20, 2009
The podcasts discuss customer insights and data mining analyses that are performed. We later then discussed social networking analysis and how linking customers by social calling groups helps predict customer action (such as churn or acquisition of an iPhone handset). TCRM is a Teradata tool I am not familiar with, but my colleagues do use it for campaign delivery, and it has the capability to perform trigger based campaigns (such a send a retention offer to other members of a social group when one member of that group churns).
I'm very fortunate that I am occasionally permitted to present my work. One of my main arguments for doing this is that I get peer review and feedback from other data miners, and an idea whether the analytics we do is 'better than most'.
So, I beg you! Please let me know either way; If this stuff is good or bad I need to know (especially if you work in Telco).
- - - - - - - - -
Enhancing Customer Knowledge and Retention at Optus
In This Podcast
Optus is an Australian telecommunications carrier that uses analytics to increase customer retention. The data being analyzed comes from call centers, mobile phone call details, census geo-demographic data, and a history customer behavior. Teradata CRM and the data warehouse environment from Teradata is key to Optus’ success with reliably identifying customers that might churn and offering marketing campaigns that are relevant and timely. Optus saw a 20% reduction in churn.
Social Networking Analysis at Optus
In This Podcast
Tim Manns from Optus discusses how the company uses detailed network data from its Teradata system to look at calling behavior. With 40% of the Australian telecommunications market, the company cross-references each customer with every other customer, groups them together based on who they communicate with, looks at the behavior of the group, and can then predict next steps and target those groups with appropriate products and services.
Monday, May 4, 2009
See a recent news article;
As a Data Miner for a telecommunications provider I frequently use network data in my analysis. How many calls the customer makes, at what time of day, do they communicate using voice or sms etc. I examine data pertaining to *customers* only.
Telecommunications companies often provision services wholesale for another company. This 'wholesale recipient' company will pay for the use of the network, but manage all other activities such as marketing, customer account and billing. In these cases, although the telecommunications company is responsible for supplying the network service and ensuring calls are successfully established (and likely stores data about these calls), it doesn't own the call data for that customer (who belongs to the 'wholesale recipient' company). Make sense? Use of the data that pertains to the actions of someone that is not a customer of that telecommunications company must be treated with the utmost caution.
Every data miner must be aware of data privacy laws, and in many countries failure to adhere to these laws attract heavy financial penalties for the organisation and individuals involved. In Australia some invasion of privacy laws could even potentially involve 2 years jail time.
Recently Telstra, an Australia telecommunication company (and the previous incumbent) was found guilty of serious breaches of data privacy. For the 130 page publicly accessible transcript see;
I guessing that the significant legal costs and years it has taken to get this result is obviously prohibitive for many telcos, so they let it slide. Optus didn't.
Basically, the bit that caught my eye was on item 108 (yes, I speed read the whole thing...). It is legal jargon and reads;
"Telstra asserted that total traffic travelling across its network belonged to Telstra. Optus submitted that whether it belonged to Telstra is not the question posed by cl 15.1 of the Access Agreement. The question under cl 15.1 is whether Telstra owed an obligation under that clause with respect to traffic information recorded by Telstra of communications by Optus customers on the Telstra network because that information was Confidential Information of Optus. The definition of Confidential Information identifies what is the Confidential Information of Optus. Once a CCR records information in relation to a call made by an Optus customer, that information becomes the Confidential Information of Optus because it falls within the definition of ‘Confidential Information’. "
The first sentence is shocking. In English it basically suggests that Telstra treat all network calling data as its own, and freely uses call information made by anyone on that network as it sees fit. That includes calls made by customers of wholesale or competitors companies on their network. In the case of wholesale for fixed line (land) networks Telstra will know the address and likely also the name of the customer. In the early days Optus had little choice but to use some of Telstra's fixed line infrastructure, often the last bits of copper wire that reach a household. The information of this usage was passed to Executive and board members so that they knew customer size and market share by age, geography etc. It is also highly likely (although difficult to prove) that the Telstra retail arm used the data for marketing activities and actioned direct communications to that customer. Anti-competitive to say the least...
One of the short conclusions of the legal findings were;
"For the foregoing reasons, I find that Telstra has used traffic information of Optus, or Communication Information of Optus for the purposes of the Access Agreement, both in the preparation of market share reports and in distributing those reports among Telstra personnel. I also find that such information is Confidential Information of Optus for the purposes of the Access Agreement, or is otherwise subject to the requirements of confidentiality in cl 15 of the Access Agreement, by force of cl 10 of that agreement. I also find that neither such use of such information nor its disclosure for such purposes is permitted by the Access Agreement."
I guess the information here is probably too much in the 'telco land', but hopefully its clear enough to understand the gravity of this. I've known this type of stuff was being conducted by some telco's for a long time, but I'm shocked it was so brazen.
Knowing the big differences between what we (as Data Miners) are 'able to do' regarding insights and personal information (particularly in mobile telecommunications) and what we 'should do' is very important. Years ago the industry passed the early developmental stage of storing data, in recent years we have learned how to understand the data and convert it into useful insights. I still think that many data miners don't realise how important (now more than ever before) it is that we act responsibility in the use of the personal information we obtain from 'our' data.
Wednesday, April 22, 2009
I'm quite proud of the social network analysis (SNA) that I'd first completed months ago. It is refreshed each month (the data warehouse load is too high to run it daily or weekly as I would like). I've been tracking its performance, and am continually surprised.
The trouble is that my colleagues are having trouble understanding how they can use it to formalise customer communications, so I decided to try a different approach than graphs and piecharts etc.
Instead I thought I might try something humorous, hence Dilbert to the rescue! I have created a dozen or so custom Dilbert slides that provides some info about a customer insight made available by the SNA and also has a humorous conclusion to those insights. I'll pass this around the department in a series of daily emails.
Here is one example (I had to change the project nickname to "SNA" for this blog);
Monday, April 6, 2009
SPSS have gone for new product names, including changing Clementine to PASW. I'm more interested in the new features and bug fixes than buzz words. I'll hopefully be getting the new version shortly and will let you know if Clementine 13 (aka Predictive Analytics Soft Ware Modeller) adds value.
Monday, March 30, 2009
For more info see;
I am not able to download the data at work (security / download limits), so I might have to try this at home. I haven't even seen the data yet. I'm hoping its transactional cdr's and not in some summarised form (which it sounds like it is).
I don't have a lot of free time so I might not get around to submitting an entry, but if I do these are some of the data preparation steps and issues I'd consider;
- handle outliers
If the data is real-world then you can guarantee that some values will be at least a thousand times bigger than anything else. Log might not work, so try trimmed mean or frequency binning as a method to remove outliers.
- missing values
The KDD guide suggests that missing or undetermined values were converted into zero. Consider changing this. Many algorithms will treat zero very differently from a null. You might get better results by treating these zero's as nulls.
- percentage comparisons
If a customer can make a voice or sms call, what's the percentage between them? (eg 30% voice vs 70% sms calls). If only voice calls, then consider splitting by time of day or peak vs offpeak as percentages. The use of percentages helps remove differences of scale between high and low quantity customers. If telephony usage covers a number of days or weeks, then consider a similar metric that shows increased or decreased usage over time.
- social networking analysis
If the data is raw transactional cdr's (call detail records) then give a lot of consideration do performing a basic social networking analysis. Even if all you can manage is to identify a circle of friends for each customer, then this may have a big impact upon identification of high churn individuals or up-sell opportunities.
- not all churn is equal
Rank customers by usage and scale the rank to a zero (low) to 1.0 score (high rank). No telco should still be treating every churn as a equal loss. Its not! The loss of a highly valuable customer (high rank) is worse than a low spend customer (low rank). Develop a model to handle this and argue your reasons for why treating all churn the same is a fool's folly. This is difficult if you have no spend information or history of usage over multiple billing cycles.
Hope this helps
Good luck everyone!
Friday, March 27, 2009
I've been asked to present at Uniscon 2009. One to the professors involved at the University of Western Sydney is a relative of an analyst I work with and requested I present. I usually find academic conferences are snooze city, but they promised me free beer and I live in Sydney anyway, so I can get home to see the baby before the night's end. I hope I'm just one of many industry persons there and it proves to be an insightful event.
I'm not presenting work. I will be presenting from a personal perspective as a industry data miner (I've not enough time to prepare my presentation and get legal approval from work) and I'll be discussing generic topics instead of describing recent data mining projects and quoting numbers or factual business outcomes.
I suspect a large part of my attendance is to drive some enthusiasm and make the students interested in data mining and aware of what challenges you face in data mining roles.
If you are attending then feel free to say 'hi'.
For info on the conference see;
wider website http://www.uniscon2009.org/
Below was the presentation title and abstract I threw together (now just have to write it...). There is a social networking analysis (SNA) element to it (because that's what I'm focused on at the moment).
Know your customers. Know who they know they know, and who they don't.
Tim's presentation will describe some of the types of marketing analysis a typical telecommunications company might do, including social network analysis (SNA, which is a hot topic right now). He also elaborates on the technical and practical side of data mining, and what business impacts data mining may have.
More importantly the presentation will help answer questions such as;
- What skills are required for Data Mining?
- What problems are commonly faced during Data Mining projects?
- And just what is this Data Mining stuff all about anyway?
Thursday, March 26, 2009
Survey Link: www.RexerAnalytics.com/Data-Miner-Survey-Intro2.html
Access Code: TM42P
If you frequently conduct data analysis on large amounts fo data (ie data mining!) then I urge you to particpate.
Wednesday, March 11, 2009
1) becoming a daddy
-> lots of fun!
2) recent accouncement of a merger between the telco's Vodafone and Hutchinson.
-> pain in the arse!
For info see
Australia's population is approximately 20 million, which is pretty small, and there were four players in the mobile service provider market (in probable order of market share); Telstra, Optus, Vodafone, Three.
The annoncement that Vodafone and Three are merging reduces this to three players, which reshapes the landscape of Australia to closely match many other countries with mature telecommunications markets. Most countries with mature telecommunications markets have a few players and, in this current economic climate, its not surprising that there will be mergers and therefore consolidation of customers into larger groups.
As a result of the merger, the competitors (Telstra & Optus) will have to review their strategies and probably re-examine customer analysis. Lots of work for us Data Miners...
Tuesday, March 3, 2009
A significant part of the vendor solution is the ability to manage many, we're talking hundreds, of data mining models (predictive, clustering etc).
In my group we do not have many data mining models, maybe a dozen, that we run on a weekly or monthly basis. Each model is quite comprehensive and will score the entire customer base (or near to it) for a specific outcome (churn, up-sell, cross-sell, acquisition, inactivity, credit risk, etc). We can subsequently select sub-populations from the customer base for targetted communications based upon the score or outcome of any single or a combination of models, or any criteria take from customer information.
I'm not entirely sure why you would want hundreds of models in a Telco (or similar) space. Any selection criteria applied to specific customers (say, by age, or gender, or state, or spend) before modeling will ofcourse force a baised sample that feeds into the model and affects its inherant nature. Once this type of selective sampling is performed you can't easily track the corresponding model over time *if* the sampled sub-population ever changes (which is likely because people do get older, move house, or change spend etc). For this reason I can't understand why someone would want or have many models. It makes perfect sense in Retail (for example a model for each product or associations rules for product recommendations), but not many models that apply to sub-populations of your customer base.
Am I missing something here? If you are working with a few products or services and a large customer base why would you prefer many models over a few?
Comments please :)