tag:blogger.com,1999:blog-6028114151548461320.post9027052160123189187..comments2024-03-20T00:15:27.174-07:00Comments on Blog by Tim Manns (data mining blog): Tips for the KDD challenge :)Tim Mannshttp://www.blogger.com/profile/17405266346372888597noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-6028114151548461320.post-62133708577841524732009-04-13T17:22:00.000-07:002009-04-13T17:22:00.000-07:00I added a comment to the KDD cup forum;
http://21...I added a comment to the KDD cup forum;<br /><br />http://213.56.130.10/board/viewtopic.php?id=11<br /><br />it says;<br /><br />"Hello KDD Cup,<br /><br />I'd just like to give some feedback regarding the KDD challenge this year.<br /><br />I think the challenge is a great way to motivate students and new starters to data mining, thus increasing the number of potential candidates for the many data mining roles out there. If this is your aim, I believe you have succeeded. Unfortunately I don't think the challenge will attract industry or experienced data miners.<br /><br />I work for a Telco and perform data mining everyday to solve exactly the same problems described in the challenge. I nearly always use call detail records, and if the data set were composed of CDR's I could demonstrate many different methods in my submission for the KDD challenge. Data mining is about taking detailed unwieldy data, transforming, and processing it into a form that makes it easier to understand and perform better. That sometimes includes building predictive models. It is widely accepted that the majority of the effort and time is required for data preparation, and any data mining challenge must involve this. It concerns me that the KDD cup omitted this fact. In my submission I would have used social network analysis, non-standard methods of data summarisation, and the creation of predictively powerful input variables. <br /><br />I have had problems downloading the data, so I asked about the details of the data. It is my understanding that the data is not cdr's and has already been summarised into a specific format. Therefore the KKD challenge does not require the full scope of a data mining analyst, but is instead a statistical and predictive modelling accuracy exercise. Because the summarised data restricts the challenge to simply building a predictive model I will not be submitting an application.<br /><br />Please give my comments consideration in your next challenge if you are keen to involve industry and established data miners. As it stands KDD are further distancing themselves from industry (ironic considering it's from a telco...)."Tim Mannshttps://www.blogger.com/profile/17405266346372888597noreply@blogger.comtag:blogger.com,1999:blog-6028114151548461320.post-57909667024191791402009-04-02T13:53:00.000-07:002009-04-02T13:53:00.000-07:00bloody hell, when I read re: "var1,var2...var15000...bloody hell, when I read re: "var1,var2...var15000" my heart sank!<BR/><BR/>It's stuff like this that makes me wonder whether most pople at conferences like KDD realise data mining is more than applying math to data in a database (or big text file).<BR/><BR/>My other concern was whether the dta was transactional call detail records (cdr's) or in some summarised form (which would also ruin the fun of doing the KDD for me). <BR/><BR/>I'm disheartened :(Tim Mannshttps://www.blogger.com/profile/17405266346372888597noreply@blogger.comtag:blogger.com,1999:blog-6028114151548461320.post-71228319060396353802009-04-02T04:45:00.000-07:002009-04-02T04:45:00.000-07:00I agree: very disappointing that this is a pure st...I agree: very disappointing that this is a pure statistical exercise with no business content.Allan Engelhardthttp://www.cybaea.net/noreply@blogger.comtag:blogger.com,1999:blog-6028114151548461320.post-22747898339749622722009-04-01T23:32:00.000-07:002009-04-01T23:32:00.000-07:00i downloaded the data sets yesterday and just conf...i downloaded the data sets yesterday and just confirmed with the organizers that all variables have been renamed as var1,var2...var15000. without the real variables names, this will be a pure mathematical/statistical problem.<BR/><BR/>for example, take your "percentage comparisons", you can't do that with this data. what about variable selection, variable correlations....all these will be solely decided by statistical results.<BR/><BR/>suddenly, i am no longer interested in KDD09 as the business meaning/relevance of the data is simply not there. there is a VERY STRONG possibility that lots of people will come up with a model that's very accurate but without or with very little business benefits.Datalligencehttps://www.blogger.com/profile/16461960582799657275noreply@blogger.com