Monday, June 15, 2009

See you at the US Teradata User Conference 2009

Quick post because I'm swamped with work...

Last year, at the Asia Teradata User Group in Beijing, I presented some generic data mining that was being performed at Optus (mostly simple churn analysis and behavioural segmentation). I also had a few meetings with the analysts from some China telco's about how relatively simple data analysis can scale up to many millions of customers and billions of rows of data.

This year I'll be presenting at the US Teradata User Conference some of the more advanced analysis that I've recently done, notably surrounding social network analysis in the mobile customer base on large amounts of data (several billions of rows). I'm hoping to be able to quote some actual business outcomes and put up some $ numbers.

The US 2009 Teradata User Group Conference & Expo, October 18–22, 2009, at the Gaylord National Resort.

I'll be presenting on Wednesday 21st October 2009 at Maryland D on the Business Track. Judging from the large number of presentations I guessing it will a much smaller and personal room than the 1000+ conference hall I was in last year in Beijing :)

Feel free to say hi and ask lots of questions if you see me there. I might have one free evening for a few beers if anyone wants.


Rick Burgs said...

is that an emoticon in your session title?

Tim Manns said...

yeah, I can't believe they put it in! I laughed...

One part of my presentation will be on how Teradata is able handle the processing load of data mining tasks such as social networking analysis.

Although I developed our in-house solution using SPSS Clementine, the whole project is a series of SQL queries that runs on the Teradata warehouse and involve processing bilions of rows of transactional level data.

I'll try to keep the techie stuff to a minimum and keep it business focused. I'm trying to ensure I have some success stories and examples for the presenation, although keeping 'intellectual property' disclosure to a minimum and obtaining executive approval is always tricky.

Rick Burgs said...

Tim, I struggle with your usage of the phrase "social networking analysis". Typical SNA applications make delibrate attempts to view the system as an actual network. In mobile telecommunications, I would expect an SNA initiative to make some attempt to learn the structure of these networks, discover how causal influences move through the network, and understand how the network changes over time. It doesn't seem like you're doing that... granted, I don't know every detail of your project, but based on what I've read in your blog, what you're doing would not be considered true SNA.

Tim Manns said...

I made a previous post in Sep '08 that describes the SNA I've developed;

You're right to be concerned, because for the moment what I've developed is currently only used to identify groups of individuals associated together by communication (eg. in a small world network). It could do so much more, but small steps for the marketing team.

There is tons of summarised information that describes the nature of the communication in a calling relationship, and attributes of the relationship itself (in relation to all other relationships either customer might have).

We measure immediate relationships (no degrees of separation) from a target customer. Although I could look at customers with degrees of separation, I haven't found much benefit in doing so yet (except for fun).

What creating simple social association (millions of small world networks) enables me to do is quantify the importance of that relationship for any individual at the same fixed point in time (all communications in a month).

I also repeat the analysis monthly and track social groups over time (I keep 3 months rolling). So, if one customer leaves (churns) we can see that other customers in immediate association will also be far more likely to leave at the same time and also subsequently next month (and any linked to that customer and so on..). I don't see a strong behavioural influence where there is no direct association (eg a degree or two of separation seems to bare little influence to customer behaviour).

I do combine/overlay networks, so I can identify social groups that fulfill any specific criteria, for example where every customer communicates with every other customer (ie a full interconnected group). I can identify customers within a social group that illicit certain behaviour, such as call stimulation (ie anyone connected with this customer calls more than they do to anyone else in any network they are connected to).

I definately see hubs (individual customers highly connected) and for a laugh did link myself to someone famous on the TV (it wasn't Kevin Bacon though...) with only four degrees of separation. I doubt we will run those type of queries though against millions of customers, and I'm struggling to think of why we would want/need to.

I agree it is quite a simplicist use of SNA since I am not concerned with the structure of the network. I don't have a strong stats background, and I tried to get to grips with SNA whilst I was developing my analysis. If I had to summarise what I have done in a 30 sec ice-breaker statement it would be something like this;
"We have created millions of small-world social networks in which individuals are the focal point of each network which together form a scale-free network.”
- it looks scale-free if I plot examples...

Does that help?



martini said...

hi tim, i'm lost as to where to post on the data mining with spss clementine forum so i'll post here.
i'm a clementine newbie and am completely lost as to why i can't aggregrate a file. it seems simple enough but i don't get an output that completely looks correct (as done in spss)

any advise or guidance would be greatly appreciated!

Tim Manns said...

I added a post to

I do check this forum, so best to post questions there.



Sonamine said...

Hi Tim, using just immediate neighbors does seem to cover most of the variation that a carrier would be interested in.

One interesting aspect of SNA is simulating how an event propagates through the network. IBM researchers have created a simple churn diffusion simulation that terminates under certain conditions. These identified churners were then checked against the test data set.

Have you used simulation yet?


Tim Manns said...

Hi Sonamine,

re: "Have you used simulation yet?"

- Interesting idea. I tried looking at filtering degrees of separation based upon frequency and % of comunication. My thought was that by only keeping specific relationship of sufficent 'strength' we might more clearly see an effect. So for example, suppose we had two sets of calling relationships between immediate neighbors connected by one individual (so three people, connected by only one). In cases where communication consumed over 30% then maybe there was more chance of influence between the unconnected individuals (ie a 'stronger' word of mouth effect), but it doesn't seem to happen. A simple direct association combined with detailed call analysis is giving us the best results so far.

For several months I have examined real outcomes and tested the predictive ability of our models that use social affinty (immediate neighbors) and call analysis. Its working very well, so I strongly recommend it.