Using statistics in real life problems. An use case of confidence interval.

Recently I’ve been brushing up my statistics skills, as a good part of Data Science and ML flow through those concepts.

The study subject this time was confidence intervals.

Confidence interval is a range of numbers that comprehends the true mean value of the data.

For example: if my confidence interval is between 1 and 3 with 95% of confidence, it really means that 95% of the times that I pick a number from that dataset, it will be between 1 and 3.

School Bus Use Case

Well, most of the people…


Um manual com o básico e essencial para trabalhar com datas e horas em Python.

Trabalhar com objetos datetime em Python pode ser complicado. Acredite em mim, já passei por isso.

Eu poderia apontar pelo menos uma dúzia de vezes nas quais gastei vários minutos procurando os trechos de código certos para transformar ou formatar datas em Python. E, apesar de ser estressante, não deveria ser complicado.

O fato é que não é complicado, mas tem lá seus truques. Você deve entender o tipo de objeto com o qual está lidando antes de planejar sua transformação, caso contrário, continuará gerando…


Learn the essential code snippets to deal with datetime in Python.

Working with datetime objects in Python can be tricky. Believe me, been there, done that.

I could point at least a dozen times I spent many minutes searching for the right code snippets to transform or format dates in Python. And that can be stressful, but, I mean, it shouldn’t be complicated.

You must understand the type of object you’re dealing with before you plan your transformation.

Fact is that it is not complicated, but it is tricky. You must understand the type of object you’re dealing with before…


An end-to-end Data Science project of a regression model to predict car prices.

The Project

  • Programming language: Python
  • Algorithm: Supervised learning Random Forest Regression
  • Goal: My goal for this project was to create a model to estimate car prices in Brazil. That model was deployed to a web app created with Streamlit, what could — for example — help a store to estimate car prices that they were intending to buy.

The Dataset

This dataset was collected from the internet via scrape in August of 2021. It contains the following variables:

  • Automobile: Car manufacturer and model.
  • Description: Description with engine, fuel, gearbox.
  • Automatic: Binary…


Aprenda o básico da biblioteca openpyxl e aumente sua produtividade.

Introdução

O openpyxl é uma biblioteca Python que nos permite ler e escrever arquivos Excel usando um código fácil, permitindo assim que as pessoas melhorem o desempenho no trabalho.

Quando digo melhorar o desempenho, permita-me explicar o porquê: tenho mais de 13 anos no setor de TI e posso dizer que muitos deles foram gastos em planilhas e pastas de trabalho do Excel, bem como escrevendo longos códigos VBA para automatizar algumas tarefas.

Com a biblioteca openpyxl, podemos usar Python para processar todos os dados (como manipulação, explorações) se tivermos muitos dados, assim como podemos criar scripts que desempenharão a mesma…


Learn the basics of openpyxl package and create scripts to improve your daily job.

Introduction

openpyxl is a Python library to allow you to read and write Excel files using an easy code, thus enabling people to improve work performance.

When I say improve performance, allow me to explain why: I have 13+ years in the IT industry and I may say that many of those were spent behind many Excel sheets and workbooks, as well as writing long VBA codes to automate some tasks.

With openpyxl library we can use Python to process all the data (like manipulation, explorations) if…


Use value_counts( ) method from Pandas with bins to quickly cut your dataset in groups.

Intro

value_counts() is a well known method from Pandas, but it also has its tricks.

One of the most interesting I found recently is the ability to cut the dataset in bins using the hyperparameter bins . It is something most don’t know.

Values and Cut together

The use of it is really straightforward. Let’s load a dataset and see how it works.

import pandas as pd
import seaborn as sns
# Dataset
df = sns.load_dataset('tips')

Now, the cut. We’re cutting the tips column in 4 bins to understand how…


Learn how to deploy a Web App using share from Streamlit.

Last post, I showed you how to build your first app.

Now it is time to deploy it and have more material build your portfolio.

Preparing for Deploy

Deploying is even as easy as creating an app. All you need is a GitHub account and a requirements file.

Begin by creating a requirements.txt file. To do that, look at your #imports section on your code and you will need to get the versions.

In our example case, I need the versions used for these 5 libraries.

import seaborn as sns
import numpy as…


Learn the codes to create each basic feature and how to deploy your web app in minutes.

Introduction

Streamlit (https://streamlit.io/) launched on October 2019 by former Google and Zoox employees Adrien Treuille, Thiago Teixeira and Amanda Kelly.

Their idea was to help data scientists and data engineers to be able to quickly create a web application to show the result of their work. The opportunity showed up because the founders identified that many could not spend a large amount of time building something sometimes in a programming language that was different than the project’s. As per Adrien’s words:

“(…) we’re giving…


Plot a heat mapped correlation matrix in just a couple of code lines using Pandas.

Liner regression is one of the most popular machine learning algorithms these days.

It is, as well, probably the first model you create when you’re learning ML and Data Science. Its simplicity to understand and apply opens the door to start “predicting” numbers.

Ok. Initial thoughts off the head, now let’s connect it to our subject: correlation.

Linear models, like linear regression, are based on the correlation measure.

Correlation, what is it?

Correlation measures the strength of the linear relationship between two variables. This, in simpler words, means that…

Gustavo Santos

Data Scientist. I extract insights from data to help people and companies to make better and data driven decisions.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store