When one hears the time period machine learning, what instantly comes to thoughts is one thing difficult and onerous to perceive. We sat down with Kathleen Siminyu, a machine learning fellow at Mozilla Foundation to higher perceive what her journey and what the job entails that is what she had to say.
Tell us about your self?
My identify is Kathleen Siminyu and I’m a machine learning fellow at Mozilla Foundation. Professionally, I’m a Natural Language Processing researcher. At Mozilla, I’m working on the widespread voice dataset which is actually a platform that permits language communities to construct datasets and one of many languages therein is Kiswahili which is what I work on.
How did get began in your profession path?
For my undergraduate diploma I studied math and pc science at J.Okay.U.A.T and whereas I used to be in 4th yr, I made a decision to enterprise into a area which utilized each. I did my analysis and took place data science which encompassed each math and pc science. My 4th yr challenge was on data science which helped me after I completed my diploma as I used to be in a position to embrace it in my portfolio.
Other than the diploma, I began to do on-line programs on platforms like Edex and Coursera which had been associated to data science. I did this to enhance my information in addition to increase my C.V as a result of at the top of the day I used to be not a educated data scientist.
My first job was at Africa’s Talking and I did a lot of learning on the job. At first, my function extra about offering metrics like how a lot airtime was offered and issues like that slightly than data science. However, I managed to automate most of those processes which implies that I had extra time to focus on my passions that’s data engineering.
During this time, I got here to understand that there was a want for African language tooling or sources and that I.T was not the place I the place would have the ability to comply with that curiosity. This meant that I’ve to enterprise again to academia, at this time I discovered analysis communities who had been constructing NLP for African languages. This is what has actually contributed to my learning journey due to that incontrovertible fact that in Africa there are only a few tutorial establishments that provide levels on data science and synthetic intelligence. However, there are grassroot communities who’re nurturing this expertise and I’m a product of such.
What is Kiswahili Common Voice dataset and what are its advantages?
A Kiswahili widespread voice dataset is actually a dataset for speech recognition often known as speech to textual content. It is mainly a process which includes turning audio into textual content. One of its makes use of is in captioning for Tv/movies and a few convention platforms like Zoom.
The dataset in itself begins with us gathering textual content and that are then damaged down at sentence stage after which despatched to individuals. At Mozilla, we crowdsource the audio side as properly and while you go onto our platform and signal your self up as a contributor, you’ll begin receiving sentences and you’ll report your self saying these sentences out loud.
So a dataset for speech recognition or transcription is actually a textual content accompanied by audio of what is within the textual content. That is the data that you’d fill into your machine learning algorithm or mannequin for it to begin learning how to transcribe Kiswahili textual content. This is as a result of it is then in a position to do a mapping a phrase to the respective sound.
It is essential as a result of the datasets can be utilized to develop finish person merchandise. The transcriptions can be utilized on platforms like Zoom or Google Meet.
Other than Luhya and Kiswahili which different language have you ever labored on?
Earlier on in my analysis days, I labored on a process generally known as machine translation which is analogous to Google translate whereby you possibly can sort in English and it offers you a French translation. During this time, I labored on a number of Kenyan languages similar to Kamba, Kikuyu and Luo. When it comes to speech recognition, I began with the Luhya language and and now I’m presently working on Kiswahili.
What are your future plans?
When I’m executed with Kiswahili, I would really like to proceed with my work on Kenyan languages. During my work I’ve to understand that the translations often pivot on English, say a Kamba-English or Luhya-English translation. I’m pondering of fixing this pivot to Kiswahili such that we will have a Kiswahili-Kamba or Kiswahili-Luhya translation mannequin. This is extra so as a result of we’ve over 200 Million Kiswahili audio system worldwide.
https://hapakenya.com/2022/08/02/kathleen-siminyu-a-machine-learning-fellow-at-mozilla-on-what-it-takes-to-become-a-data-scientist/