Using statistics in real life problems. An use case of confidence interval.
Recently I’ve been brushing up my statistics skills, as a good part of Data Science and ML flow through those concepts.
The study subject this time was confidence intervals.
Confidence interval is a range of numbers that comprehends the true mean value of the data.
For example: if my confidence interval is between 1 and 3 with 95% of confidence, it really means that 95% of the times that I pick a number from that dataset, it will be between 1 and 3.
Well, most of the people…
Um manual com o básico e essencial para trabalhar com datas e horas em Python.
Trabalhar com objetos
datetime em Python pode ser complicado. Acredite em mim, já passei por isso.
Eu poderia apontar pelo menos uma dúzia de vezes nas quais gastei vários minutos procurando os trechos de código certos para transformar ou formatar datas em Python. E, apesar de ser estressante, não deveria ser complicado.
O fato é que não é complicado, mas tem lá seus truques. Você deve entender o tipo de objeto com o qual está lidando antes de planejar sua transformação, caso contrário, continuará gerando…
Learn the essential code snippets to deal with datetime in Python.
datetime objects in Python can be tricky. Believe me, been there, done that.
I could point at least a dozen times I spent many minutes searching for the right code snippets to transform or format dates in Python. And that can be stressful, but, I mean, it shouldn’t be complicated.
You must understand the type of object you’re dealing with before you plan your transformation.
Fact is that it is not complicated, but it is tricky. You must understand the type of object you’re dealing with before…
An end-to-end Data Science project of a regression model to predict car prices.
This dataset was collected from the internet via scrape in August of 2021. It contains the following variables:
openpyxl é uma biblioteca Python que nos permite ler e escrever arquivos Excel usando um código fácil, permitindo assim que as pessoas melhorem o desempenho no trabalho.
Quando digo melhorar o desempenho, permita-me explicar o porquê: tenho mais de 13 anos no setor de TI e posso dizer que muitos deles foram gastos em planilhas e pastas de trabalho do Excel, bem como escrevendo longos códigos VBA para automatizar algumas tarefas.
Com a biblioteca
openpyxl, podemos usar Python para processar todos os dados (como manipulação, explorações) se tivermos muitos dados, assim como podemos criar scripts que desempenharão a mesma…
Learn the basics of openpyxl package and create scripts to improve your daily job.
openpyxl is a Python library to allow you to read and write Excel files using an easy code, thus enabling people to improve work performance.
When I say improve performance, allow me to explain why: I have 13+ years in the IT industry and I may say that many of those were spent behind many Excel sheets and workbooks, as well as writing long VBA codes to automate some tasks.
openpyxl library we can use Python to process all the data (like manipulation, explorations) if…
Use value_counts( ) method from Pandas with bins to quickly cut your dataset in groups.
value_counts() is a well known method from Pandas, but it also has its tricks.
One of the most interesting I found recently is the ability to cut the dataset in bins using the hyperparameter
bins . It is something most don’t know.
The use of it is really straightforward. Let’s load a dataset and see how it works.
import pandas as pd
import seaborn as sns# Dataset
df = sns.load_dataset('tips')
Now, the cut. We’re cutting the tips column in 4 bins to understand how…
Learn how to deploy a Web App using share from Streamlit.
Last post, I showed you how to build your first app.
Now it is time to deploy it and have more material build your portfolio.
Deploying is even as easy as creating an app. All you need is a GitHub account and a requirements file.
Begin by creating a requirements.txt file. To do that, look at your #imports section on your code and you will need to get the versions.
In our example case, I need the versions used for these 5 libraries.
import seaborn as sns
import numpy as…
Learn the codes to create each basic feature and how to deploy your web app in minutes.
Streamlit (https://streamlit.io/) launched on October 2019 by former Google and Zoox employees Adrien Treuille, Thiago Teixeira and Amanda Kelly.
Their idea was to help data scientists and data engineers to be able to quickly create a web application to show the result of their work. The opportunity showed up because the founders identified that many could not spend a large amount of time building something sometimes in a programming language that was different than the project’s. As per Adrien’s words:
“(…) we’re giving…
Plot a heat mapped correlation matrix in just a couple of code lines using Pandas.
Liner regression is one of the most popular machine learning algorithms these days.
It is, as well, probably the first model you create when you’re learning ML and Data Science. Its simplicity to understand and apply opens the door to start “predicting” numbers.
Ok. Initial thoughts off the head, now let’s connect it to our subject: correlation.
Linear models, like linear regression, are based on the correlation measure.
Correlation measures the strength of the linear relationship between two variables. This, in simpler words, means that…