Title: Managing Big Data Storage: Approaches and Challenges
M. Azharul Hasan, Ph. D
Department of Computer Science and Engineering (CSE)
Khulna University of Engineering & Technology (KUET)
The popularity and advancement of computing has reached a level that has led to the generation and maintenance of large data sets called Big data. Big data is evolving with great velocity, large volumes, and great diversity of data from different perspective of the industry. Such an amplification of data has brought into question for the existing data storage and management system about their capabilities. This is because the storage requirement for this data is complex and needs a holistic approach to mitigate its challenges. The major concern is the huge data that needs to be stored in distributed manner and retrieving information from them. In this talk the recent development and major concern of the storage system for large volume data with great velocity will be discussed.
K. M. Azharul Hasan received his B.Sc. (Engg.) degree from Khulna University, Bangladesh in 1999 and M. E. from Asian Institute of Technology (AIT), Thailand in 2002 both in Computer Science. He received his Ph.D. from the Graduate School of Engineering, University of Fukui, Japan in 2006. His research interest lies in the areas of big data technologies, high-performance computing, information retrieval and text processing. Currently he is a professor of the Department of Computer Science and Engineering at Khulna University of Engineering and Technology. He is working with structured and text data for last 20 years. He has proven academic and research skill and aims to continue his career as an academician especially for teaching, research and academic curricula development. Over the years, Dr. Hasan has received many prizes and awards, including the university gold medal, best paper awards from leading conferences. He is a member of the IEEE and Fellow of IEB.
Title:Big Data and Machine Learning in Natural Language Processing and Health Care System
Mohammad Nurul Huda, Ph. D
Professor and Director of MSCSE Program
Department of Computer Science and Engineering (CSE)
United International University (UIU), Bangladesh
This title describes the importance of Big Data and Machine Learning in Natural Language Processing (NLP) and Health Care System. Today around 80 % of total data is available in the raw data form. Big Data comes from information stored in big organizations as well as enterprises. Examples include information about employees, company purchase, sale records, business transactions, the previous record of organizations, social media, etc. Though human uses language, which is ambiguous and unstructured to be interpreted by computers, yet with the help of NLP, this large unstructured data can be harnessed for evolving patterns inside data to know better the information contained in data. NLP can solve significant problems of the business world by using Big Data for example, business of retail, healthcare, business, stock exchange, financial institutions. On the other hand, Machine Learning plays a significant role in NLP such as Neural Machine Translation, End to End Model in Speech Recognition, Doctors Hand Written Prescription Recognition using Convolutional Neural Network (CNN), etc.
Dr. Mohammad Nurul Huda is working as a Professor and Director of MSCSE Program in the Department of Computer Science and Engineering (CSE), United International University (UIU), Bangladesh. He completed his PhD degree from Toyohashi University of Technology, Aichi, Japan on Automatic Speech Recognition (ASR). He graduated from Bangladesh University of Engineering and Technology (BUET) in Computer Science and Engineering (CSE) department.
Dr. Huda is one of the Senior Directors at eGeneration Ltd, a leading software company in the area of Natural Language Processing (NLP), Machine Learning (ML), Artificial Intelligence (AI), Blockchain, Data Engineering, Data Science, System Integration, IT Consultancy, Digital Platforms & Training, etc. He is working in eGeneration Ltd. as an AI and NLP expert.
He has 22 years+ experience in university teaching and high expertise on Text and Spoken Language Technologies for Bangla (widely known as Bengali), Japanese and English. He specially works on Machine Learning, Speech Recognition, Speech Synthesis, Speech Analysis, Speaker Recognition, Question Answering System, Bangla Chatbot, Machine Translation, Sentiment Analysis, Bangla Spell and Grammar Checker, International Phonetic Alphabet (IPA) Conversion, Font Converter, Bangla Morphology, POS Tagging, Bangla Universal Networking Language (UNL), Bangla Similarity Measure, Bangla Document Classification, Artificial Intelligence, Computational Linguistics, Pattern Classification, and Natural Language Processing (NLP). He has more than 140 international research articles in the related fields (http://cse.uiu.ac.bd/profiles/mnh/ : Publications). Among them more than 78 are SCOPUS indexed articles.
Professor Huda also worked as a consulting expert in Smart Irrigation System: World Bank Project under Renewable Energy Lab, UIU, Bangladesh, Anubad: English to Bengali Machine Translation Project, sponsored by UIU, and Principal Investigator of Smart Reception: Innovation Fund by ICT Ministry, Bangladesh.
In addition, Dr. Huda has authored the ICT book of higher secondary academic syllabus for college students, which was approved by the National Curriculum and Textbook Board. Moreover, he has shared his expertise in numerous major NLP and ML international conferences organized in home and abroad. Recently, he was invited at the UNESCO Headquarter, Paris, France for the Language Technology for All (LT4All) conference, where NLP experts from 88 countries in the world were invited. Dr. Huda was the only NLP expert representing Bangladesh. Professor Huda is the most prominent NLP researcher in Bangladesh and has worldwide recognition.
Title: Natural Language Interactions with Visualizations for Exploring Large Datasets
EnamulHoque Prince, Ph. D
School of Information Technology
York University, Canada
The inceptions of the World Wide Web, the rise of social media, and the recent emergence of big data have led to rapidly growing information spaces. We can consider this phenomenon as a double-edged sword: while the abundance of data opens up a great opportunity for significant discoveries, the information overload problem poses critical challenges, for both understanding and presentation.
In this talk, I will describe how we can address the challenges of information overload problem using an interdisciplinary lens, combining information visualization and human-computer interaction with natural language processing. I will first present the design and evaluation of a set of visual analytics tools for exploring and understanding online conversations by combining topic modeling and sentiment analysis with visualization techniques. I will then discuss how natural language can be used as an input modality to facilitate interaction with visual analytics. Finally, I will present some ongoing works on building automatic data storytelling tools and creating accessible data visualizations.
Enamul Hoque Prince is an Assistant Professor in the School of Information Technology at York University. Before joining York University, he was a postdoctoral fellow in the HCI group at Stanford University. His research addresses the challenges of the information overload problem using an interdisciplinary lens, combining information visualization and human-computer interaction with natural language processing. More specifically, he focuses on devising novel visual analytics techniques to explore large datasets as well as understanding the utility and potential trade-offs of such techniques from the real user’s perspectives.
Enamul completed his Ph.D. in Computer Science from the University of British Columbia. He has conducted research on visual analytics at Tableau Software and the Qatar Computing Research Institute. His work has appeared in top journals and conferences including IEEE Transactions on Visualization and Computer Graphics, ACM Transactions on Interactive Intelligent Systems, ACM CHI Conference on Human Factors in Computing Systems, ACM User Interface Software and Technology Symposium, ACM Intelligent User Interfaces, EuroVis and ACL. His research has been funded by NSERC Canada, National Research Council Canada, and York University among others.
Title: Security Threats and Countermeasures in Health Records using Machine Learning
ASM Kayes, Ph. D Discipline Coordinator for the Bachelor of Cybersecurity Program Department of Computer Science and Information Technology La Trobe University, Australia
Security threats such as data breaches in important health systems are increasing in number and severity. Highly public systems that host personal data are very likely to be targeted, and consequently a lack of trust to large public health systems is emerging. Security threats occur for various reasons, such as the lack of awareness of cybersecurity and privacy risks, human errors, organised cybercrimes and attacks and unauthorised access due to the lack of appropriate access controls, and can cost lives and economies, such as Australia’s many billions of dollars, in addition to reputational cost. They can also act as a deterrent for widespread use of important IT systems, such as that seen in Australia’s costly $1.5bn, opt-out My Health Record (MHR) system. The number of security threats involving the MHR system has risen by 12% year-on-year for last two financial years, with 37 cases reported in 2018-19 and 42 cases in 2019-20. These cyber incidents (e.g., data breaches and other security threats) are estimated to have cost the Australian economy up to $29bn per year where 86% of data breaches are financially motivated. The consequences of security threats are increasing in severity as IT systems, such as health and financial systems, become more central to business and digital society. A significant disruption of the digital infrastructures resulting from a significant data breach would cost the economy further $30bn or 1.5% of Australia’s GDP. The COVID-19 data breach landscape is also severe where COVID-19-related terms such as pandemic and corona, and COVID-19-themed attacks have shown up in threat indicators. In the health sector, 55% of the data breaches in Australia have occurred due to human behavioural factors compared to 35% across all other sectors. The UK has experienced similar data breach cases, with a recent report revealing that nearly 150,000 British patients have had their confidential data used without consent in one of the important GP IT systems in UK, National Health Service (NHS) digital system. In this talk, I will give some insights on these data breaches in large public systems. I will discuss about the consequences after these data breach cases. I will also talk about the main challenges to successfully preventing data breaches in large public health systems. According to my research investigations, no practical security and access control solutions exist in the literature that can be directly applied to safeguard public health system against these security threats. In this talk, I will highlight a novel, machine learning-based access control framework to safeguard our large IT systems by enabling data-driven, multi-factor access control mechanisms with dynamic policies. The outcomes may yield new access control theories, models and software prototypes to detect security threats, protect patient privacy and reduce the costs of data breaches in large health systems.Biography:
Dr. ASM Kayes is currently a Discipline Coordinator for the Bachelor of Cybersecurity Program in the Department of Computer Science and Information Technology at La Trobe University. He is also the Coordinator for the Master of Cybersecurity subjects. Recently, he has been elected to the position of La Trobe University Academic Board, 2020-2021. Dr Kayes is a cybersecurity expert with many years of research experience. He is passionate about safeguarding data, systems and people from cyber-attacks. He has broad contemporary knowledge around cybersecurity and has a deep research and technical skills in data privacy and security. He is currently supervising many PhD students at La Trobe and Victoria Universities.
Dr Kayes has collaborated with many distinguished scientists within Australia and internationally from USA, UK and so on. His research outputs have been published in top-tier journals and conferences, such as Future Generation Computer Systems, Information Systems, The Oxford Computer Journal, CAiSE, ICSOC, WISE, IEEE TrustCom, and so on. He has already been recognised as an established researcher in the areas of data privacy, security and access control. His proposed security and access control models have been cited over 300 times in the last two years. He has collaborated with industry partners and has been awarded over $415k funding for cybersecurity, privacy, machine learning and deep learning R&D projects.
Over the years, Dr Kayes has received many prizes and awards, including the best paper awards from leading conferences. He is a member of the IEEE since 2014, dedicated to serving the research community and the IEEE young professionals. He is also a member of the Australian Computer Society. He was invited to serve as Academic Editor and Guest Editor for many renowned journals, such as Sensors, Security and Communication Networks, and Journal of Sensors and Actuator Networks. He was also invited to serve as Reviewer for major IEEE Transactions and other top-tier journals such as IEEE Transactions on Industrial Informatics, Information Fusion, Future Generation Computer Systems and IEEE Systems Journal. He has served on Program Vice Chair of Research Tracks and Technical Program Committees of top IEEE and other conferences, such as IEEE AINA, IEEE TrustCom, and CoopIS. As Guest Editor, he has been editing several security topics and special issues, such as “Security Threats and Countermeasures in Cyber-Physical Systems”, “Security Breaches and Access Control” and “IoT and Artificial Intelligence Approaches to Defeat COVID-19 Outbreak”.
Title:Optimization of Ordered Problems in the Era of BIG DATA – Applications, Challenges and Algorithms
Md. Saiful Islam, Ph. D Lecturer, School of Information and Communication Technology, Griffith University, Australia
Maintaining population diversity is critical to the performance of a Genetic Algorithm (GA) where applying appropriate strategies to maintain this diversity is an active and ongoing research topic. Given that the goal in an optimisation problem is to find a global optimum in the solution space, it is important for a GA to maintain a balance between exploring the solution space and exploiting known solutions. Without sufficient exploration, the GA is likely to miss promising areas of the fitness landscape and without effective exploitation, the GA will be unable to improve on good quality solutions that it has already found. How best to manage this balance is an open research problem. Furthermore, ordered optimisation problems pose a set of constraints unique from general problems. While these constraints can determine the validity of a solution, depending on the constraints, they can also determine that two seemingly different solutions are, in fact, the same solution but ordered differently. As such, it is important for an approach that aims to actively monitor and maintain diversity to consider these constraints and how they affect the relationships between nodes in a GA’s solution. We address these concerns with a sequence-wise approach to measuring population diversity in ordered problems. This allows for mechanisms that introduce diversity to a GA’s population to determine the similarity between solutions by the similarity of their sequences. In our proposed frameworks, we demonstrate how this approach improves a GA’s ability to find better quality solutions. This approach is further expanded to maintain population diversity in parallel populations in an island model configuration. Each island population is encouraged to search in different areas of the solution space by maximising the distance between each island. This approach is further demonstrated in several configurations on the CPU and the GPU. The robustness of these frameworks is demonstrated through extensive benchmark testing from several ordered problems.
Dr. Md. Saiful Islam is a Lecturer in the School of Information and Communication Technology, Griffith University, Australia. Before joining Griffith, he was a Research Fellow in the School of Engineering and Mathematical Sciences, La Trobe University, Australia from May 2016 to January 2017 and Research Associate in the Department of Computer Science and Software Engineering at Swinburne University of Technology from November 2013 to April 2016. He has finished his Ph.D. in Computer Science and Software Engineering from Swinburne University of Technology, Australia in February 2014. He has received his BSc (Hons) and MS degree in Computer Science and Engineering from University of Dhaka, Bangladesh, in 2005 and 2007, respectively. He has received the Faculty of Information and Communication Technologies Dean’s Award Research Excellence (2nd Prize), Swinburne University of Technology in 2013. He has received the best paper awards in ACM SSDBM 2017 (as first author), DASFAA 2019 (as co-author) and ADMA 2019 (as co-author). He has published more than 50 research papers in prestigious computer science journals such as The VLDB Journal, IEEE TKDE, Elsevier Information Systems, Journal of Systems and Software, Future Generation Computer Systems, Journal of Network and Computer Applications, Springer World Wide Web and MDPI Sensors, and conferences such as IEEE ICDE, ACM CIKM, ACM SSDBM, IJCNN, IEEE CEC, DASFAA and WISE. He is a regular reviewer for the top journals such as IEEE TKDE, The VLDB Journal, IEEE TFS, IEEE TPDS, IEEE TII, Elsevier Knowledge-based Systems and Future Generation Computer Systems. He was a program committee member and associate reviewer of many top computer science conferences such as IEEE ICDE, SIGMOD, SSDBM, DASFAA and ADMA, and senior program committee member for ApWEB-WAIM 2020. His current research interests are in the areas of database usability, spatial and graph data management, artificial intelligence, deep learning, health informatics, human-in-the-loop and big data analytics.
Title: A Lightweight Speaker Recognition System Using Timbre Properties
Firoz Mridha, Ph. D
Department of Computer Science and Engineering
Bangladesh University of Business and Technology ,Dhaka, Bangladesh
Speaker recognition is an active research area that contains notable usage in biometric security and authentication system. Currently, there exist many well-performing models in the speaker recognition domain. However, most advanced models implement deep neural network (DNN) architectures that require GPU support for real-time speech recognition, and it is not suitable for low-end devices. Instead of implementing DNN, we propose a lightweight text-independent speaker recognition model based on a random forest classifier. The proposed model uses extracts timbral properties from human speech that are further classified using random forest. Timbre refers to the fundamental properties of sound that allow listeners to discriminate among them. Our prototype uses the seven most actively searched timbre properties, such as boominess, brightness, depth, hardness, roughness, sharpness, and warmth as features of our speaker recognition model. In the recognition system, we use 0.2 second time frames for each prediction. Further, we extract the seven timbral features using random forest (RF) regressors. Each of the RF regressors only generates a single value, pointing to each of the timbral features’ scale. Moreover, we a single RF classifier that accumulates the seven scales and generates the target output. We experimented with our speaker recognition system on speaker verification and speaker identification tasks. As an earlier research work in extracting new speech features, our architecture has some advantages and drawbacks. In the speaker identification phase, the method achieves a maximum accuracy of 78%. However, the method’s accuracy starts to reduce if the number of speakers is increased in the test data. On the contrary, in the speaker verification phase, the model maintains a mean accuracy of 87%, having an equal error rate (ERR) of 0.24.Biography:
Dr. M. Firoz Mridha is currently working as an associate professor in the department of Computer Science and Engineering of the Bangladesh University of Business and Technology. He also worked as a faculty member of CSE department in the University of Asia Pacific and as a graduate co-ordinator from 2012 to 2019. He received his Ph.D. in AI/ML from Jahangirnagar University (JU) in the year 2017. He joined as a Lecturer at the department of Computer Science and Engineering, Stamford University Bangladesh in June 2007. He was promoted as a Senior Lecturer at the same department in October, 2010 and also promoted as an assistant professor at the same department in October, 2011. Then he joined UAP on May, 2012 as an Assistant Professor. His research experience, within both academia and industry, results over 80 Journal and Conference publications. He is the author of “AutoEmbedder: A semi-supervised DNN embedding system for clustering”. His research interests include Artificial Intelligence (AI), Machine learning, Deep learning, Natural Language Processing (NLP) and Big Data analysis etc. He has served as a program committee member in several international conferences/workshops. He is serving as an Associate Editor of several journals.
Title:Bangladesh National Digital Architecture: Task Accomplishment, Achievement and Utilization
Senior Technical Specialist
CIRT project, Bangladesh Computer Council
The Government of Bangladesh is pledged to transform the country into ‘Digital Bangladesh’, which is an integral part of the Government’s Vision 2021. The Government of Bangladesh is pursuing a vigorous e-government program spanning the entire public sector. Bangladesh National Digital Architecture (BNDA) is formulated based on TOGAF with a view to provide a blueprint for automation of government offices and seamless information exchange among the government agencies. The BNDA work is intended to improve the basic conditions for efficient and coherent public ICT use. The BNDA is expected to enable optimization of the value of the government’s ICT investments, reduce the risks of individual projects. The work includes the drafting of general guidelines and principles for building ICT systems in the public sector, and the dissemination of standards for data interchange. BNDA work is carried out in close cooperation between the public sector, enterprises and academia. This presentation intends to shed light on the BNDA work from the perspective of 1) architecture frameworks and methodologies , 2) governance structure, 3) architecture principles and standards , 4) implementations & service integrations and 5) Shared Services/Platforms.
Mr. Tanimul Bari is a Senior Technical Specialist in the e-Government Component of the Govt funded project “Strengthening of BGD e-Gov CIRT” under Bangladesh Computer Council (BCC). He is working with BCC since July 2015. He is the team leader of National Enterprise Architecture (NEA) and e-Government Interoperability Framework (e-GIF) development team for the Government of Bangladesh. He advises the government organizations on designing and architecting Information Systems so that they are aligned to their business needs and seamless flow of information between different information systems is ensured. Mr. Bari has experience of designing wide variety of high-exposure, complex, mission-critical Systems and services for many public and private sector organizations in Bangladesh. Mr. Bari is a Top Rated Information Technology (IT) professional and received the highest peer recognition for ethical standards, managerial and technical ability. He has achieved a Bachelor Degree in Computer Science from the Bangladesh University of Engineering and Technology (BUET) in 2003. He has vast knowledge on Blockchain technology and platforms (IBM Hyperledger/Etherium). Mr. Bari is also a Certified ISO 27001 Lead Auditor and a Certified COBIT foundation professional. He has visited several countries and participated in several national/international conferences, workshops and seminars.
Title: Bangla Speech Corpus Preparation for Robust LVCSR (Large Vocabulary Continuous Speech Recognition) System
PhD. Research Fellow, Dept. of CSE, SUST
Assistant Professor, Dept. of CSE, Leading University
The speakers’ variability caused by accent, gender, age, speech rate, and phones’ realizations are the key elements that plays role for robust speech recognition. Bengali or Bangla (বাংলা /baŋla/) is the language of over 262 million people, of whom more 178 million live in Bangladesh, others live in West Bengal (a part of India) and other countries of the world. Bangladeshi Standard Bangla is different in phonetic context from the Kolkata (capital of West Bengal) Standard. This language is representative of the standard variety widely spoken in Dhaka and other urban areas of Bangladesh. There are also some extremely deviant dialects in Bangladesh such as Chittagong, Chakma in the Chittagong Hills, Sylhet, Rajbangshi in Rangpur, Hajong in Mymenshingh and Sylhet etc.. An individual speaker’s regional affiliation is reflected by his or her Accent. Our previous study examines both male and female speakers from the Sylhet region, which has one of the extremely deviant dialects in Bangla, and speakers from different districts of North-West and Middle Part of Bangladesh. So, the correlations between the two groups of single-accent speakers of Bangla language had been studied by analysis of accent acoustic features such as fundamental frequency (pitch), formant frequencies and pitch slope. Accent related changes in speech affect the ASR performance as had been observed in our previous study, raising the need for accent-specific acoustic models to handle the speakers from highly deviant dialects and also the need for considering the accent-affected speakers’ variability in the corpora development for robust ASR system in Bangladeshi Bangla. Here, we discussed the common issues and activities related to the development of a large speech corpus named সুবাক্য (Subak.ko). This corpus is designed by considering the speakers from both high and less deviant dialects of Bangladesh as well as the standard colloquial Bangladeshi Bangla. Our corpus has been evaluated and compared with the “Large Bengali ASR training data”, which is the largest publicly available corpus for Bangladeshi Bangla, released by Google. We have presented the strength and shortcomings of both Google’s corpus and Subak.ko corpus.
Mr. Shafkat Kibria was born in Howapara, Sylhet, Bangladesh in 1978. He received the B.S. degree in computer science from East West University, Dhaka, in 2001 and the M.S. degree computer science from Umeå University, Umeå, Sweden in 2005. He is currently working as an Assistant Professor in the Dept. of CSE, Leading University, Sylhet. Besides that, he is currently pursuing the Ph.D. degree computer science at Shahjalal University of Science and Technology (SUST), Sylhet, Bangladesh.
From September 2006 to October 2007, he was a Lecturer in Sylhet International University, Sylhet, Bangladesh. From 2007 to 2008, he worked as Ph.D. Research Fellow at an EU project (DustBot) in AASS Research Center, Örebro, Sweden. In 2011, he joined Manarat International University (MIU), Dhaka as a Lecturer; since 2016 to Feb’2020, he had been an Assistant Professor with the Dept. of CSE, MIU, Dhaka; that time, he was in study leave. From 2016 to date, He is working as PhD. Research Fellow in HEQEP project – “Development of Multi-Platform Speech and Language Processing Software for Bangla” (CP3888 – https://sustbanglaresearch.org/) at Dept. of CSE, SUST. AfterHis research interests include Speech Recognition, Accent Analysis, Bangla Speech to Text, Bangla Natural Language Processing, Human Robot Interaction (HRI), Machine Learning systems (behavior-based robotics), Mobile robotics (path planning, outdoor robotics), Human Computer Interaction (HCI), Financial Computation & Analysis for Decision Making, Embedded System based smart & Ubiquitous Environment.
Title: Creating a Large Phonetically Balanced Speech Corpus for Bangla Text-to-Speech
PhD. Research Fellow, Dept. of CSE, SUST
Assistant Professor & Head (acting), Dept. of CSE, Leading University.
Any research using deep learning tools need a good dataset. Modern speech processing researches such as speech recognition and speech synthesis are being conducted in various deep learning techniques. Here we present a “phonetically-balanced speech corpus” for Bangla text-to-speech (TTS) synthesis. Preparing a good speech corpus for speech synthesis is a challenging task. Only a few resource-rich languages (e.g. English) has a publicly available large TTS corpus. For Bangla TTS, there is no such large corpus available publicly. Google has released a public Bangla TTS corpus which contains around 3 hours of speech. We have prepared a “clean” TTS corpus of 30 hours. Our corpus contains more than 17,000 sentences. It is a “phonetically-balanced” corpus, meaning that it contains all possible Bangla phonetic units in sufficient amounts. We have used this corpus to build a “Bangla Statistical Parametric Speech Synthesizer” and obtain a satisfactory result compared to the existing Bangla TTS systems.Biography:
Mr. Arif Ahmad was born in Sylhet, Bangladesh in 1985. He received his B.Sc degree in Computer Science and Information Technology from Islamic University of Technology (IUT), Dhaka in 2009. Currently he is pursuing his PhD in Bangla Text-to-Speech Synthesis under the department of Computer Science and Engineering of Shahjalal University of Science and Technology (SUST), Sylhet, Bangladesh. In January 2010, he joined Leading University (Sylhet) as a lecturer where he is currently serving as an Assistant Professor and the Head (acting) of the department. His research interests include Natural Language Processing, Digital Speech Processing, Machine Learning, and Artificial Intelligence.