Kevin Dsouza currently serves as the Lead Data Engineer at Chegg Inc., where he plays a pivotal role in transforming raw data into actionable insights that drive strategic decision-making. With a robust background in data engineering, Kevin specializes in leveraging advanced technologies and programming languages,...
Kevin Dsouza currently serves as the Lead Data Engineer at Chegg Inc., where he plays a pivotal role in transforming raw data into actionable insights that drive strategic decision-making. With a robust background in data engineering, Kevin specializes in leveraging advanced technologies and programming languages, including Python, SQL, and UNIX shell scripting, to create efficient data pipelines and optimize data workflows. His expertise spans a diverse array of databases such as Oracle, Microsoft SQL Server, PostgreSQL, and MongoDB, enabling him to manage and manipulate data across various platforms seamlessly.
One of Kevin's key projects involved designing and implementing a Databricks pipeline that integrates traffic data sourced from Similarweb APIs. This innovative solution captures critical metrics, including desktop and mobile unique visitors, as well as overall visit counts. By merging this data and storing it in a Redshift table on a monthly basis, Kevin has significantly enhanced the BI team's ability to perform competitor analysis, providing them with valuable insights that inform marketing strategies and product development.
In addition to his technical acumen, Kevin is well-versed in big data technologies such as Apache Hadoop, Spark, and Airflow, which he utilizes to streamline data processing and ensure high availability of data for analytics. His proficiency in cloud platforms like Google Cloud Platform (GCP) and Amazon Web Services (AWS) further underscores his capability to architect scalable data solutions that meet the dynamic needs of Chegg Inc. As a leader in the data engineering space, Kevin Dsouza continues to drive innovation and efficiency, positioning Chegg at the forefront of data-driven decision-making in the education technology sector.