“Having a companion devoted to listen to your recitations, marking your mistakes, giving you feedback, that would be a game changer in Muslim households.” We speak to the masterminds behind Tarteel, the world’s future largest dataset for Quranic verses.
Seated cross-legged on a green carpet, Muslim children and new converts used to learn Quran by heart and recite it to a Sheikh or a preacher chairing the Kuttab. In the Middle East, this had been the norm for centuries until the 1920s, when Turkey decided to abolish this medieval educational method and established a more institutionalised and secular version known to us as schools today. Throughout the Middle East, Christianity and Islam are now taught in schools, but kuttabshave remained untouched in rural towns and villages, especially in Saudi Arabia and Egypt.
Fast-forward to the 21stcentury, in July 2018, Abdel-Latif Abdel-Fattah, a software engineer at Twitter organised Muslim Hacks 1.0 in California. The two-day hackathon invited California’s Muslims to join and build a product to help their community. This was when Abdel-Fattah met with his friend Abu-Bakar Abid, a PhD student at Stanford University, to work on a machine learning project called Tarteel that will basically act like a remote Kuttab except that the preacher or the sheikh will be a bot.
“The Quran is a significant part of my life as it is the case with many Muslims,” Abid tells Startup Scene. “I grew up as a Muslim in the States; reading the Quran, memorising it, studying Tajweed (special Quranic phonation) with a teacher. As I got older, one of the things that I recognised is that not every individual has access to those teachers to review the memorisation and study Tajweed.” That’s what triggered the conversation that spun for years before they start executing the project. They thought: if they could actually programme a software that would understand what a person is reciting, that could open so many possibilities. If their Tajweed is not accurate, Tarteel can offer them tips and suggestions.
“To build something like this, we wanted to use machine learning (ML) and to use ML well, we need a lot of data,” Abid explains. “If you go on YouTube, you’ll find a lot of preachers reciting the Quran; this is not the kind of data that we wanted.” Abid and Abdel-Fattah wanted to feed their ML database with recitations from regular people because it would be easier to recognise similar recitations that are not as high quality as the ones recorded in studios.
“So, the initial seed of the project was to collect high quality data from ordinary people reciting the Quran,” says Abid.
The team behind Tarteel has united to build the world’s first public dataset of Quranic recitations by ordinary people. They divided their project into three phases: Data collection; training ML model; deploying this in applications - and this is where the business model will come in, exploring ways to monetise the app. “Having a companion devoted to listen to your recitations, marking your mistakes, giving you feedback, that would be a game changer in Muslim households,” Abid anticipates.
“At this point we don’t have a business model,” says Abid. “We want to work on a solution for Quran teacher inaccessibility. And when we find a solution for this problem, we will figure out ways to generate revenue.”
Their goal now is to collect 50,000 recitations of verses; so far they’re 9,987 verses away from their target. Based on similar models, they believe that this quantity is sufficient to train the ML model that can at least recognise what words and what letters are being said, and in turn, enable that first level of applications which will open possibilities for more data that Tarteel can use.
When they first started working on gathering the recorded Quranic verses, almost all of their recitations were recorded by men and generally dominated by a younger generation reciters who are tech-savvy enough to use the app. “So, a big challenge for us, so far, is reaching out to this diverse segment; in terms of age, gender, ethnicities,” Abdel-Fattah says. “We’ve pushed hard on this and recently brought someone on to the team in recruiting female translators. Now the proportion of female translators has jumped to 25 percent of the team. We are still trying to improve our numbers. Because eventually the downstream app we’re envisioning is supposed to be accessible to everyone.”
But the attempts of bringing females onboard is followed by a number of societal challenges. “We did get some privacy concerns raised by women; they were uncomfortable with the idea of just anybody having access to their voice recordings,” Abdel-Fattah adds. “So, we learned that we needed to balance out the privacy considerations with the availability of recordings.” To address that concern, the programmers took some steps to address that; they didn’t release any of their recordings publicly until a critical threshold of 10,000 recordings was reached to prove that Tarteel had enough of a significant mass that would allow the data to be aggregated and be affectively anonymised.
“On the other hand, one of the great things is that in the past few years, ML has made tremendous advancements when it comes to speech recognition,” says Abdel-Fattah. “The challenge though, is that for these techniques to reach worldwide you need lots and lots of data but that’s not even enough, you also need data that is representative of your final user base and that’s where Tarteel’s data collection phase comes to importance. That is why we decided to create a database from scratch, and that’s not an easy task.”
Sign up for the weekly newsletter