How to Get Started in Data Analytics – Insights from Experience

Blog post on November 2021 Community Call by Geoffrey Gone

Data Analytics is one of the promising skills in the world of technology that has proved useful across all fields – from health, education, agriculture, construction, transportation, telecommunications, retail, banking, to finance. How does one harness this skill to ensure that impact is created on the Africancontinent?  Global Lab Network hosted its November 2021 Community Call to discuss how individuals can get started with data science. The speakers were Derek Degbedzui, Founder and Researcher with M.A.I.L.S-Connect, and Emmanuel Sekyi, Tech Lead at PaalUp.

Cross-section of the participants on the call

The discussion started with Derek mentioning data as one of the things we struggle to understand in terms of what it offers the world, and this makes it necessary for its study to solve problems. Data analytics in its simplest context enables us to have autonomy or control over the sources of data that we have gathered. He mentioned how important or useful data is and compared the traditional research process in academia to the real world, emphasising the role of modern data science in making broader and better decisions.

Derek further highlighted the failure of systems to solve future problems as a challenge to data science. He also demonstrated the thinking process that should be considered in building a system with the example of a ranking system for newsfeed. The major processes involved were setting goals and objectives and prioritizing these objectives. He also asked a mind-blowing question which is; “At what level should Africans be comfortable to say that we can use analytics to solve our problems and that they wouldn’t pose any challenge to us in the future?”  He answered by saying that Africans should engage in more research and our application or solutions should be explained to the public through engagements.

At what level should Africans be comfortable to say that we can use analytics to solve our problems and that they wouldn’t pose any challenge to us in the future?

Derek Degbedzui

The second speaker, Emmanuel, delved into the topic using a data-driven approach, to describe what beginners need to consider or learn to zoom into the field of data science. His approach used the data science process of gathering data (published articles), analysis, and interpretation to seek advice from the entire data science community. The approach used a public API from dev.to, with the hashtags #datascience #dataanalytics #beginners to gather 1 million data points relating to machine learning and data science. The data was then cleaned and insights extracted. Common themes or insights from the analysis were to learn how to build a model and learn the python programming language, pandas, numPy, and matplotlib.  

Emmanuel went further to explore some relevant skills needed as a beginner and refuted the advice of taking too many online courses. He further explained that those courses will not in themselves help you to become a data scientist, but by building stuff, you are most suited to work in the data science industry. The relevant skills one should learn were categorized into four groups:

  1. Getting data (data scraping/crawling, working with APIs, SQL)
  2. Asking a good question (unambiguous, questions whose answers are testable, what-if questions)
  3.  Programming languages (Python)
  4. Mathematics (Linear Algebra, Basic Calculus, Basic Statistics)

He also emphasized that one mistake people make is to think that technical skills are the most important thing in data science but he begs to differ. He stated that the best data scientist is one with domain knowledge who is learning data skills to apply to whatever domain they are working in. Emmanuel concluded his presentation by saying, a good data scientist should have: a core domain knowledge (46%), the ability to ask good questions (23%), programming skills (15%), and knowledge of mathematics (15%).

The conversation continued with questions, comments, and contributions from participants. Deborah Dormah Kanubala, a lecturer and researcher at Academic City University College, a participant on the call, talked about the essence of ethics in data science and research. She buttressed the point that machine learning engineers should not only focus on the performance of models but also ensure that their models are not biased or discriminatory towards any marginalized group or give the wrong result to any person underrepresented in the dataset. She highlighted the fact that engineers should be concerned about the source of the data and how it was collected to ensure that the data is a good representation of what they want to use it for.

Another participant, Soh asked Emmanuel to explain how machine learning has a bearing on data analytics. Emmanuel answered by saying that the science of working with data is data science. Everything that falls in the category of working with any form of data would fall under data science. The term data analytics is mostly used when you describe the application of data in industry or analysis of data for some insight. And machine learning is the application of mathematical methods to get insight from data. So, you can simply say that data science is a broad umbrella under which data analytics and machine learning falls.

Another participant, Harry asked about policies regarding the use of data on the continent mostly around people who are building health tech solutions. Derek answered by saying though real-world data collection has issues such as privacy and regulatory concerns we do not necessarily need the policy frameworks to be able to know the limits within which to operate. He made us understand that any team that works on a data project mostly relies on the advice of the technical person or the domain expert to provide the guiding principles. Failure to have the domain expert provide guidance could be detrimental. We can also open stakeholder engagement to know the limit within which to operate. Deborah added that the country has no well-defined policy or framework for data usage as compared to the general data protection regulation (GDPR) which is strictly adhered to in Europe. She added that some researchers use randomly generated data which needs no regulation and for publicly available data, the user should cite the source of the data. Kizito, a participant, also added that the constitution is a framework of all frameworks so we do not need an act specifically for data protection or regulation since the constitution has sufficient regulation to protect individual rights. Therefore, if any data company uses your data in a wrong manner you can go to the law court and seek redress.

Concluding, Emmanuel advised that the participants should build stuff that only they can build, i.e. in their domain of expertise. Derek also concluded that we should always remind ourselves of the call to duty, which is, for every opportunity that we have, we should be able to utilize them to benefit others and solve problems.

Watch the full discussion in the video below and check out our Community Call playlist for videos of our previous discussions. Our final event for the year, Science Cafe, will be held on 17 December 2021. Mr Leo Ayerakwa will make a presentation on The Role of R&D in Ghana’s National Development, followed by an expert-led discussion on the topic. You’re welcome to join our Facebook group or follow our Twitter feed to get quick updates on this and future events/activities.

Design a site like this with WordPress.com
Get started