Tuesday, November 3, 2009

Predictive Analytics World (PAW) was a great event

I found this year’s PAW in Washington a great success. Although I was only able to attend for one day (the day I presented), the handful of varied presentations I did see were very informative and stimulated lots of ideas for my own data mining in the telecommunications industry. PAW is an event clearly run and aimed at industry practitioners. The emphasis of the presentations was lessons learnt, implementation and business outcomes. I strongly recommend attending PAW if you get the chance.

Other bloggers have reviewed PAW and encapsulate my views perfectly. For example see some of James Taylor’s blog entries http://jtonedm.com/tag/predictive-analytics-world

James also provides a short overview of my presentation at PAW http://jtonedm.com/2009/10/20/know-your-customers-by-knowing-who-they-know-paw

My presentation at PAW was 35 minutes followed by 10 minutes for questions. I think I over-ran a little because I was very stretched to fit all the content in. For me the problem of data mining is a data manipulation one. I usually spend all my time building a comprehensive customer focused dataset, and usually a simple back-propagation neural network gives great results. I tried to convey that in my presentation, and as James points out I am able to do all my data analysis within a Teradata data warehouse (all my data analysis and model scoring runs as SQL) which isn't common. I'm definitely a believer that more data conquers better algorithms, although that doesn't necessarily mean more rows (girth is important too :))


Manuel Martin said...

I agree, more data is often a faster path to improved performance than a better algorithm. As long it is the right data, quality meaningful data.

Sandro Saitta said...

Hi Tim,

Thanks for the PAW feedback. BTW, do you have some information about your telco project that you can share? I'm also working on a telco project right now (see on my blog).


Tim Manns said...

Hi Sandro,

Here is a recent post on a Teradata site which has information;

I'd recommend at least using 'circle of friends' type information where possible. Of course using call counts, time of day calls are made, outbound vs inbound, voice and sms calls etc as inputs is valuable in itself.

I do a lot of data preparation, but I use fairly simple back propagation neural nets to score the base (everything in SQL).