- DistributedDataMining: New WUs and MD5 download errors
11.12.2011 09:44 Uhr
Currently, I am generating new workunits for our medical application. It turned out, I was acting a bit careless, because as a consequence many members are getting MD5 download errors now. The reason is quite simple. During WU generation files are getting copied to the server. Sometimes these files are still existing because we use the some data files for multiple workunits. In these cases we just change the learning parameters or the experiment setup. Anway, overwriting these existing files leads to problems, because the old and the new file have different MD5 checksums and hence all WUs that are related to the old files error out with an MD5 download error. About 30.000 WUs are affected. I’ll take care of it but it might take a while to identify the failing WUs. There is a chance, that the dDM members error out the affected old WUs in the meantime. In that case, the new WUs should work fine. Lesson learned: Never generate new WUs if you are in a hurry – in the end it takes much more time and causes preventable trouble. I am sorry for the inconvenience! Best regards, Nico
- DistributedDataMining: Let dDM benefit from your Christmas shopping
07.12.2011 16:16 Uhr
Hi there, As you know, the dDM project does not get financial support by universities, research institutes or private commercial organizations. Thus, we depend on private help to keep the project and our research running. Since the dDM website became part of the Amazon Associates Program, the dDM project can benefit from your Christmas shopping at Amazons online shop. Amazon rewards dDM with up to 6% of any issued gift certificate and up to 10% of each sold store item. If you intend to buy some Christmas gifts at Amazon, please remember dDM and follow the URL provided on this page. The provided URL links to the closest Amazon store of your region and contains our dDM partner ID. In case the wrong store is chosen, please select the right one out of the given country list. The dDM project gets a reward for each item bought after using these links. As always, all Amazon rewards or PayPal Donations are used to pay the server rent, maintenance and internet traffic. The possible remain is used to finance our scientific publications or conference presentations. Perhaps, we can even buy some new hardware in order to replace the aged project server. Thank you in advance for all of your generous support! Best regards and have a nice Christmas season! Nico
- DistributedDataMining: New application version 1.35 for Medical Data Analysis
05.12.2011 13:22 Uhr
Recently, I’ve released version 1.35 of our medical data analysis application. I am happy to announce that this version solves the problem of never ending workunits. The new version is out there for almost a week and so far there aren’t any workunits that had to be killed manually by our ddm members. Best regards, Nico
- DistributedDataMining: New application version 1.34 for Medical Data Analysis
29.11.2011 12:00 Uhr
Today, I’ve released a new version for our medical application. Version 1.34 should be able to detect the java location automatically. Doing so, it shouldn’t be necessary to adapt the PATH variable manually. This was necessary because it seems that the path variable that we used until now was changed by an automatic windows update. As a consequence our wrapper couldn’t find java and all workunits failed. Some users might have received an email notification and the suggestion to install java even if java was already installed on their computers. I am sorry for the confusion and the inconvenience caused by the circumstances. The new version uses the variables JAVA_HOME, JRE_HOME and JDK_HOME, which are usually set during java installation. I did some tests and it worked fine here. Lets see how it works out there. Any feedback and error reports are welcome in the forum related to the medical application. Regards, Nico
- DistributedDataMining: Update on Medical Application
29.08.2011 12:38 Uhr
Hi there, Here’s a quick update on our medical application: the latest results look really good! Lots of interesting relationships between the analyzed input variables were revealed and relevant research questions could be answered. Parts of these results will be presented at a scientific conference in November. Expect more details on that soon! We’ve learned a lot from the results regarding the use of parameter optimization and feature selection – which now allows us to create much more efficient (and hopefully even more reliable) experiments. For example, we’re planning to apply sophisticated search methods from the field of evolutionary computation combined with a more statistically sound validation approach. For that purpose, we developed a new version of our application, which will run the new experiments much faster and will take over quite a lot of the data analysis steps that were done manually before. This new version is being tested extensively at the moment and will be available soon. But we’re also thinking about implementing completely new ideas and exploring different research directions. For example, one potential project will be focusing on agent-based modeling in the field of simulating the emergence of biological and social phenomena. Well, that’s it for now. Thanks again for your support! Best, Daniel
- DistributedDataMining: Pending credit
17.08.2011 10:32 Uhr
As you might have noticed we had some minor problems regarding pending credits in the past. Today, I’ve fixed this issue. The problem was caused by a safety mechanism that is uses to avoid benchmark cheating. The algorithm marks suspicious workunits for further inspection and the affected workunits were pending until I’ve checked them manually. Unfortunately, there was a slight error in the cheating detection heuristic and as a result much more workunits than necessary were marked. Today, I found the problem and corrected the code. At a first glance, there won’t be any more pending workunits.
- DistributedDataMining: New application version 5.01 for Time Series Analysis Application
16.08.2011 14:39 Uhr
Today, I’ve released version 5.01 of the Time Series Analysis Application. The new version uses the same wrapper technology that is already in use for the medical application. It overcomes several problems and decreases the fraction of failing workunits. So far, the new version supports 32&64 bit Linux systems only. Versions for Windows will be published soon. As always, comments and error reports are welcome in the forum.
- DistributedDataMining: Website translations powered by Google
11.08.2011 18:22 Uhr
Hi there, recently, some users suggested to provide this website in different languages. Unfortunately, a translation in other languages would be a huge effort and can’t be done by myself. Therefore, I was looking for a simple solution and I’ve found Google Translate. Today, I’ve added this nice feature in our dDM website in order to translate the content in different languages. From now on, you find a Translate section in the right top corner. I hope this motivates new users to join the dDM community and helps to understand what we are trying to achieve. Any comments are welcome in the website issues forum. Best regards Nico
- DistributedDataMining: Publication announcement
21.07.2011 08:50 Uhr
Recently, we got notification of the acceptance of our latest scientific paper that is titled Dengraph-HO Density-based Hierarchical Community Detection for Explorative Visual Network Analysis. In the context of our Social Network Analysis sub-project we developed the new DenGraph-HO algorithm that is able to detect hierarchical communities in social networks. The paper will be presented at the Thirty-first SGAI International Conference on Artificial Intelligence (AI-2011) in December 2011. After presentation it will be published in the conference proceedings and the dDM website. Without the contribution of our dDM members this work wouldn’t have been possible. Thank you for your support.
- DistributedDataMining: Shorter workunits for our medical application
06.06.2011 08:13 Uhr
Hi there, as already announced by Daniel, we are going to continue our research in the field of Medical Data Analysis. Last night, I generated about 100,000 new workunits for the medical application. We took into account the results of our latest poll and adapted our workunit structure. This time, the WU runtime is significantly shorter. Depending on your CPU a single experiment needs between 30 and 90 minutes. During the next few days it might happen that the estimated WU runtime is far away from reality. We simply don’t have enough information to provide a valid estimation. Please, let the WU finish anyway. The more valid WUs we get the better our estimation for the following units will be. I guess the issue will be solved in two or three days. Thanks for your support.
- DistributedDataMining: Website changes
05.06.2011 17:47 Uhr
Recently, I did some changes regarding our dDM website in order to honour the efforts of our dDM members:
- The Member of the day
- and the Donators for our server infrastructure
are prominently presented on the website. I’d like to thank all members for their contribution to the dDM project.
- DistributedDataMining: State of affairs
01.06.2011 13:54 Uhr
Hi there, This is Daniel, the guy who’s (more or less) responsible for the Medical Data Analysis application you’ve been all working on so impressively in the last months. First of all, I’d like to take this opportunity to greatly thank all of you for your massive support and the computational power you’re generously donating to this project! So far, thousands of experiments were successfully conducted with your help that would have taken me ages to do on my own. I really appreciate your work, bringing forward the whole project big time! Currently, I’m in the process of analyzing this huge amount of results you provided in order to make sense of it scientifically, since I’m working on a publication on that topic. As you can imagine, the analysis part will take some time – but a first glimpse at the data already revealed promising things. After having explored the data in a more “horizontal” manner up to now (investigating different combinations of parameters and configurations), in the next weeks I’d like to continue “vertically” with the experiments (testing the best combinations identified so far at increasing level of detail, allowing for more profound conclusions about the validity of our results). I’m sure that this will give really interesting insights. So I hope you are all set for the next round of the project and to get some numbers crunched… Best, Daniel
- DistributedDataMining: New workunits for the Time Series Analysis Application
10.05.2011 12:23 Uhr
Some of you might have notice that there are again some workunits for the time series analysis application. During the weekend I’ve generated about 35 thousand WUs. These WUs were processed before and got cancelled for different reasons. Mainly because the client didn’t have java installed, in some cases there were some problems with too less available memory. After finishing these old WUs a new batch of WUs will be available. For the new batch we will use the same wrapper technology as it is already in use for our medical application. In addition, we are going to integrate a newer version of the open source data mining suite RapidMiner.
- DistributedDataMining: Faster validation and less pending credit
02.05.2011 11:35 Uhr
Some members might remember that we had some problems regarding the cpu time counting in versions prior to 1.21 of the medical data analysis application. Since all WUs assigned to this malicious versions have been sent back to the dDM server, I decided to soften the safety mechanism that was responsible for plausibility checks. So far, a heuristic was used in order to hold back suspicious results for a manual check. As a consequence the validation of affected results was time delayed. From now on, the heuristic is less strict and the number of results that has to be checked by me will be significantly decreased. As a result, the result validation will be faster and we will have less pending credit.
- DistributedDataMining: New application version 1.23 for Medical Data Analysis
27.04.2011 14:44 Uhr
Today, I’ve released Version 1.23 of the Medical Data Analysis Application. Supported operating systems are Windows and Linux. Besides some bug fixes and minor changes of the error logging mechanism, the overall performance was improved by reducing the communication between boincclient and java. Comments or error reports are welcome in the Forum.
- DistributedDataMining: Server problems
17.03.2011 23:26 Uhr
Recently, we had some serious server troubles and the project went offline for a couple of hours. So far, I don’t know what exactly happened. I’ll look into it. As far as I can say, we didn’t loose any data and the database is consistent. It was not necessary to restore the latest backup.
- DistributedDataMining: Wrong CPU-time counting in Medical Data Analysis
02.03.2011 22:44 Uhr
We are facing a problem regarding CPU-time counting. In some cases, the CPU time for a WU is not counted correctly. The boinc manager reports then hundreds or thousands of CPU hours and consequently the credit it much too high. This problem was briefly discussed in our ddm forum and I am working on solving this issue. In fact, I’ve recently released a couple of new application versions. It’s quite hard to find the error because it appears rarely and all my local tests are working perfect. In the meantime I’ve activated a safety mechanism: Suspicious WU were not credited automatically and checked manually. From time to time, a WU having a wrong cpu time gets credit anyway. This happens because the safety mechanism uses just heuristics in order to find malicious WUs and doesn’t work in all cases. The credits of all affected WUs will be corrected at once, as soon I’ve found the cause of the error. Latest version is 1.18. It should handle suspending/resuming correctly and has as well some changes in the cputime counting parts.
- DistributedDataMining: Errors in Medical Data Analysis – Application Version 1.10
15.02.2011 23:04 Uhr
Recently, I’ve released version 1.10 in order to overcome the resume/suspend problem. Even if we had some progress regarding suspending/resuming other problems have occurred: - Due to extensive logging the error log file exceeds the upper size limit in some case . The effected WUs won’t be uploaded to the dDM server and are marked as failure. I am going to grant the credit anyway. - During Suspending/Resuming cpu time is counted twice. As a result the reported run times are way to high. A safety mechanism, I’ve implemented on the server a couple of months ago, gets activated and put the uploaded WU on hold. In the web interface these WUs appear as ‘Pending’. I’ve to figure out how to handle this situation and to correct the cpu time. Good news: I’ve found the error that is responsible for the double cpu time counting. In addition I’ve decreased the number of messages in the error log file. It shouldn’t exceed its limits any longer. The new version will be 1.11. I am going to release it today.
- DistributedDataMining: New application version for Medical Data Analysis
10.02.2011 02:42 Uhr
Today, I’ve release version 1.08 of our Medical Data Analysis Application. There are some minor changes: - Improved error logging and handling - Corrected CPU time counting - Suspending/Resuming under Windows
- DistributedDataMining: New application for Medical Data Analysis
14.01.2011 23:31 Uhr
As already announced here, we continue Daniel’s research in the field of Medical Data Analysis. Therefore, we’ve implemented a new and more flexible java wrapper. Now, after finishing our tests, a new application about Laryngeal high-speed video classification is available for Windows and Linux operation systems. Please report any noticeable problems via the forum.
- DistributedDataMining: Team Challenge of The Knights Who Say Ni!
07.01.2011 22:17 Uhr
Recently, the team The Knights Who Say Ni! started a team challenge on our DistributedDataMining project. The challenge was originally announced here. So far, 15 team members are participating in order to support our research and to increase their team credit. Today, our thank goes especially to the team The Knights Who Say Ni! for supporting our research. Due to the expected higher server load our dDM project might suffer from performance loses. We are constantly working on overcoming these issues. Please report any noticeable problems via the forum. We like to emphasize the correct project URL http://www.distributeddatamining.org/DistributedDataMining and the need for java. Further information for new dDM members can be found here.
- DistributedDataMining: New poll about your preferred WU runtime
20.12.2010 04:14 Uhr
As mentioned before, we are planning to continue our research in Medical Data Analysis. Our latest tests are promising and we’ve already released a small number of Linux test WUs to the public. The characteristic of the data and the new features of our latest RapidMiner wrapper makes it possible to determine the runtime of the new workunits in advance. In order to find out, what runtime is preferred by our dDM members, we’ve started a new poll. Please, vote for your preferred WU runtime and help us to support your demands.
- DistributedDataMining: Press report about dDM
18.12.2010 06:52 Uhr
Recently, the popular german journal Handelsblatt has briefly reported about the dDM project: http://www.handelsblatt.com/technologie/it-internet/verteiltes-rechnen-wenn-der-eigene-rechner-zur-alien-falle-wird;2680330;10#bgStart Besides SETI@home, Einstein@home and other well known BOINC projects, our DistributedDataMining project is listed as one of the most popular Distributed Computing projects. We are proud of the publicity and the appreciation.
- DistributedDataMining: Book announcement
15.12.2010 12:27 Uhr
Today, it’s a great pleasure to announce the latest book by Dr. Daniel Voigt: Objective Analysis and Classification of Vocal Fold Dynamics from Laryngeal High-Speed Recordings. Aachen: Shaker Verlag GmbH; 2010 Daniel’s work in the field of Medical Data Analysis about Laryngeal high-speed video classification was partially powered by the dDM project. As usual, the results of our efforts are public available: Daniel published his phd thesis as book at Shaker. Currently, we are planning to continue our research in this area. As soon as our final tests are finished a new medical application will be available for all dDM members. Congratulations to Daniel and special thanks to the dDM community for supporting our research!
- DistributedDataMining: Team Challenge of L'Alliance Francophone
16.09.2010 08:26 Uhr
The team L’Alliance Francophone runs a team challenge from September 17th to October 1st on our dDM project. The challenge includes the whole team and was originally announced here. Today, our thank goes especially to the team L’Alliance Francophone for supporting our research. Due to the expected higher server load our dDM project might suffer from performance loses. We are constantly working on overcoming these issues. Please report any noticeable problems via the Number Crunching Forum. We like to emphasize the correct project URL http://www.distributeddatamining.org/DistributedDataMining and the need for java. Further information for new dDM members can be found here.
- DistributedDataMining: Aborted WUs and granted credit
30.08.2010 20:27 Uhr
Today, I’ve noticed that about 30 WUs got cancelled recently due to an unknown server problem. Because of the long runtime of these WUs, I added the complete credit to the related user accounts anyway. Now, I am going to have a look into the problem in order to fix it. Sorry for the inconvenience.
- DistributedDataMining: More RAM for dDM servers
30.08.2010 08:53 Uhr
Recently, I’ve noticed some performance problems of the dDM database. In some cases these problems might lead to slow or delayed connections of the boinc clients. Worst case scenario was as follows: Clients couldn’t connect to the server and wait for one hour (because of load reducing) before they try again. In order to speed up the data base, I’ve spent more RAM to the servers. This includes the frontend server that is responsible for the websites and the backend server that carries the dDM data base. Let’s see if it helps to normalize situation.
- DistributedDataMining: New images for the Boinc Manager
27.08.2010 17:00 Uhr
Today, I’ve added two new images that are shown in the simple view of the Boinc Manager. I’ve chosen these images in order to symbolize the Stock Price Prediction Application. The first one is a coloured version of the SPP logo that is also used on the website. The second one is less abstract and shows a Stock Price Diagnosis. I hope the new images pleases you even if the most dDM members prefer the advanced view in the BM.
- DistributedDataMining: New AppVersion 4.30: 32bit for Win and Linux + 64bit for linux
01.08.2010 16:01 Uhr
Today, I’ve released AppVersion 4.30 for Linux and Windows systems. It’s the first time, dDM releases a 64bit version for Linux. Besides that, there are no big changes – just small bug fixes. The most mentionable new feature addresses the problem of multiple java processes that remain after a crash of the java wrapper: Every time a java process is create the java wrapper stores the java process ID in the checkpoint file. After a restart of the wrapper (for whatever reason) it is checking if the old java process (based on the stored PID) is still running. Doing so we can avoid that two java processes doing the same work and consume the double cpu time.
- DistributedDataMining: Suggestions for website improvements
24.07.2010 07:02 Uhr
Our new website is on-line for several days and dDM members could participate in a poll about the new layout. Most of the participants like the new website but 20% don’t like the new layout. Therefore, I’d like to invite you to post your comments, feedback and suggestions how we could improve our new website furthermore. Please, use the Forum Website Issues to share your ideas.