Meet the winners of the Data Engineering Championship – Analytics India Magazine

0
255

MachineHack has recently concluded Data Engineering Championship – a hiring hackathon for data scientists and data engineers, organised in association with Publicis Sapient, iMerit, USEReady, Tiger Analytics & The Math Company. 
The hackathon was a part of the Data Engineering Summit 2022, presented by Google Cloud and organised by Analytics India Magazine, and was a huge success with over 700 registrations. The winners stood a chance to present their solution approach at DES 2022 & got an opportunity to land an interview with one of the leading analytics organisations.
You can read more about the dataset here.
Here are the solution approaches of the winners who secured the top three positions in the Data Engineering Championship.
Rathinaraj got interested in predictive analytics in 2017. He attended Coursera and Udemy courses in statistics, exploratory data analysis (EDA), machine learning, data science and deep learning to improve his skills. In addition, he has participated in a slew of ML hackathons on different platforms to test and build his knowledge.
Approach
The participants were provided details about an airport along with a weather information dataset. They had columns such as ‘DATE’,’ LOW’,’ HIGH’, and’ TIMESTAMP’ for which the participants could impute the constant value. In the year column missing records, you can impute 2020 as they had 2020 as a year for all other records. It is the same with the month column where one can impute with 01 as we had 01(Jan) for all other records. The main challenges in the datasets were: 
Data missingness
In the airport details with the weather information dataset, for several columns, 20 per cent of the data are missing. The bar chart below shows the non-missing records count of the columns.
The formula for computing can be easily formulated with other dependent columns. For missing records in the dependent columns, used imputation based on group-by of the mean value. 
Formula column with uncertainty
The definition for the WIND_CHILL column given in the competition was “the perceived temperature due to the cooling effect of wind blowing”. Rathinaraj utilised information from the TMAX (temperature max0, AWND (MAX wind speed of the day), SNOW and timing of the day when the flight departs. He used a combination of this information and calculated the WIND_CHILL columns. WIND_CHILL column is in ranges from 0 to 80 Fahrenheit. The WIND_CHILL column is vital in the competition to get the best score as the mean absolute error increases in the same range(0 to 80) for wrong calculation.
Rathinaraj feels that MachineHack provides participants with different domains of the ML and Data Engineering competition. “Participating in the competition helps me to become more knowledgeable. After the competition ends, I always spend time exploring the top-ranked achiever’s solution approach and codes,” he adds.
Jeena has been working as an embedded system engineer for nine years in Mumbai, and for the last five years, she has been working in a courier company in Singapore, where her profile is to maintain the In-house ERP system which is built on .Net Framework and SQL Database and analyse the data available to identify the trends for sales, operations, customer service etc. 
“I started analysing the data with the limited knowledge I had, and my interest in data analysing started here and hence decided to have in-depth knowledge in this field. So, in July 2021, I enrolled in a data science online course. After spending 12 months in the course, I studied supervised and unsupervised Machine learning and Time Series. Then, I moved on to deep learning, NLP.
Approach
Jeena’s approach to the problem included the following steps:
“Solving hackathons helped put into practice the knowledge I gained from the theory, which was a huge confidence booster for me,” concluded Jeena.
Suresh has always been passionate about data science and curious about understanding its connection with real-world business use cases. “This curiosity enabled me to spend additional effort during the day and the weekends to learn more about it from the internet, which eventually created a pathway to knowing about the hackathon events happening across the globe in the data science space,” he said.
Approach
Suresh says that a use case was given to calculate Wind Chillness, Airline Seat Distribution, Snow Ratio and a few other useful pieces of information along with the date and time stamp, which helps the airline companies to plan their trips from the airport data dump. The dump contained about 200k rows and 26 columns with various information (such as wind speed, latitude, longitude, snowfall, flight ID, etc.).
 He followed these steps:
“I was delighted to be part of this hackathon event conducted by MachineHack, which helped me to improve my analytical and problem-solving skills. Moreover, the rules and guidelines set by MachineHack for such events helped in intuiting my competitive skills to keep myself in the top three positions every day on the leaderboard,” Suresh adds.
Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul
Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul
Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep
Conference, Virtual
Deep Learning DevCon 2022
29th Oct
Stay Connected with a larger ecosystem of data science and ML Professionals
Discover special offers, top stories, upcoming events, and more.
Do you want to know how kernel regularizers adds penalty terms to the network weights and optimize performance. Here is the answer.
JAX is a high performance numerical computation python library.
Explainable AI refers to strategies and procedures that explains the ML solutions.
Do you want to automate data analysis in your projects? LUX is an API which yields efficient and a quick data analysis. Have a look into it.
Program evolution using large language-based perturbation bridges the gap between evolutionary algorithms and those that operate on the level of human thoughts.
It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.
Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.
People just chase certificates for the sake of it instead of learning the tool.
This article is about the limitations of tree based machine learning models and the conditions that forbid the use of tree based models in machine learning.
The genetic Algorithm works on theory of Evolution for optimization of constraints
Stay up to date with our latest news, receive exclusive deals, and more.
© Analytics India Magazine Pvt Ltd 2022
Terms of use
Privacy Policy
Copyright

source