Published inData Hackers·4 days agoMember-onlyIntrodução ao tmap para Visualização e Análise de DadosUma breve introdução ao pacote tmap do R para análise de dados geoespaciais. Introdução Nem todo cientista de dados terá que decidir entre Python ou R. Constantemente vejo discussões sobre esse assunto, algumas delas irritantes, outras bastante engraçadas. Mas a verdade é que existem e sempre existirão muitos Cientistas de Dados…Tmap11 min readTmap11 min read

Published inTowards Data Science·6 days agoMember-onlyIntroducing tmap for Visualization and Data AnalysisA brief introduction to the library tmap in R for Geospatial Data Exploration. — Introduction Not every Data Scientist will have to decide between Python or R. I constantly see discussions around that matter, some of them annoying, some quite funny. But the truth is that there are and there will be many Data Scientists who can use both languages because they won’t need to…Tmap11 min readTmap11 min read

Published inData Hackers·Jan 19, 2024Member-onlyCriando um Data Pipeline do ZeroUm projeto iniciante de Engenharia de Dados, ponta-a-ponta. Introdução Engenharia de Dados sempre foi uma área de meu interesse, mas nunca tive tempo de criar um projeto porque preciso me dividir entre muitas coisas como trabalho, família e tudo mais que precisa do meu tempo e atenção. …Data Engineering13 min readData Engineering13 min read

Jan 8, 2024Member-onlyCreating a Data Pipeline from ScratchA Basic Data Engineering Project end-to-end. Introduction Data Engineering was always an area of interest to me, but I never really had time to create a project because I need to split my time among many things like work, family, and everything else that need my time and attention. So, I…Data Engineering13 min readData Engineering13 min read

Published inTowards Data Science·Dec 22, 2023Member-onlyRanking Diamonds with PCA in PySparkThe challenges of running Principal Component Analysis in PySpark — Introduction Here we go for another post about PySpark. I have been enjoying writing about this subject, as it feels to me that we are lacking of good blog posts about PySpark, especially when we talk about Machine Learning in MLlib — by the way, that is Spark’s Machine Learning Library…Principal Component8 min readPrincipal Component8 min read

Published inTowards Data Science·Dec 12, 2023Member-onlyBest Data Wrangling Functions in PySparkLearn the most helpful functions when wrangling Big Data with PySpark — Introduction I work with PySpark in Databricks on a daily basis. My work as a Data Scientist requires me to deal with large amounts of data in many different tables. It is a challenging job, many times. As much as the Extract, Transform and Load (ETL) process sounds like something simple…Pyspark7 min readPyspark7 min read

Published inTowards Data Science·Dec 8, 2023Member-onlyHow Reliable Is a Ratio?Learn how to assess how reliable a ratio really is using Empirical Bayes Analysis in Python — Introduction One of my references in the Data Science field is Julia Silge. On her Tidy Tuesday videos she always makes a code-along type of video teaching/ showing a given technique, helping other analysts to upskill and incorporate that to their repertoire. Last Tuesday, the topic was Empirical Bayes (her blog…Bayesian Statistics9 min readBayesian Statistics9 min read

Published inTowards Data Science·Nov 4, 2023Member-onlyIntroduction to Logistic Regression in PySparkTutorial to run your first classification model in Databricks — Introduction Big Data. Large datasets. Cloud… Those words are everywhere, following us around and in the thoughts of clients, interviewers, managers and directors. …Logistic Regression10 min readLogistic Regression10 min read

Oct 16, 2023Pascal’s Triangle II Problem in Leet CodeA proposed solution for a programming problem from Leet Code. Introduction If you ever participated of a coding interview, you probably had these brain teaser problems from websites like leetcode.com. I have been looking at those and trying to solve some of them. Let me tell you why: I want to…Pascal Triangle5 min readPascal Triangle5 min read

Published inTowards Data Science·Oct 6, 2023Member-onlyA Simple Guide to Understand the apply() Functions in RLearn how to use these helpful functions once and for all — Introduction I will start this post by saying that I work daily with R and Python languages. Honestly, I find it easier and more intuitive the way the apply functions are used in Python. Thinking about the reason behind that, I believe it is because there aren’t many options in Python…Rstats8 min readRstats8 min read