
[Apologies for multiple postings] ------------------------------------------------------------------------- Authorship Identification of SOurce COde (AI-SOCO) Website: https://sites.google.com/view/ai-soco-2020/ To be organized at FIRE 2020 (http://fire.irsi.res.in/fire/2020/home) 10 - 13 December Virtual Conference ------------------------------------------------------------------------------- -------------------------- Task Description: -------------------------- General authorship identification is essential to the detection of undesirable deception of others' content misuse or exposing the owners of some anonymous hurtful content. This is done by revealing the author of that content. Authorship Identification of SOurce COde (AI-SOCO) focuses on uncovering the author who wrote some piece of code. This facilitates solving issues related to cheating in academic, work and open source environments. Also, it can be helpful in detecting the authors of malware softwares over the world. The dataset is composed of source codes collected from the open submissions in the Codeforces online judge. Codeforces is an online judge for hosting competitive programming contests such that each contest consists of multiple problems to be solved by the participants. A Codeforces participant can solve a problem by writing a solution for it using any of the available programming languages on the website, and then submitting the solution through the website. The solution's result can be correct (accepted) or incorrect (wrong answer, time limit exceeded, etc.). In our dataset, we selected 1000 users and collected 100 source codes from each one. So, the total number of source codes is 100,000. All collected source codes are correct and written using the C++ programming language. For each user, all collected source codes are from unique problems. Given the pre-defined set of source codes and their writers, the task participants should build systems that are able to detect the writer given any new, unseen before source codes from the previously defined writers list. Full task description can be found at: https://sites.google.com/view/ai-soco-2020/ ------------ Timeline ------------ 8th June - Open track websites 8th June – Training and development data release 31st July – Test data release 7th September – Run submission deadline 20th September – Results declared 31st October – Working notes and overview papers due (tentative) 10th-13th December - FIRE 2020 ---------------- Organizers ---------------- Ali Fadel, Jordan University of Science and Technology, Jordan Husam Musleh, Jordan University of Science and Technology, Jordan Ibraheem Tuffaha, Jordan University of Science and Technology, Jordan Mahmoud Al-Ayyoub, Jordan University of Science and Technology, Jordan Yaser Jararweh, Duquesne University, USA Elhadj Benkhelifa, Staffordshire University, UK Paolo Rosso, Universitat Politècnica de València, Spain For regular updates subscribe to our mailing list: ai-soco-fire@googlegroups.com Regards, Organizers of the Authorship Identification of SOurce COde (AI-SOCO) Task