I'm doing my personal project Exploratory Data Analysis recently. It's easy to think, 'Let's process all dataset, shall we?" When in stuck, I'll ask experts. The expert told me that I can use sampling. I was like, " What?" millions of questions pop up in my head like, How possible does sampling works? Why could the average of random sample be close to the average of the population in dataset?
What is sampling, really?
My EDA project is about housing price in the UK from 1995 to 2017 where the total number of rows is 22 million. Anyone would have a hard time if trying to process the whole dataset.
That's why we need sampling, a statistical methodology. Sampling is a portion of a total population with the aim of representing the full population.
My confusion was not the sample size but the randomness. I'm still trying to understand what is going on actually.
To be honest with you, I think my English need to be improved. Somehow, I have no idea about my weakness. The problem is that my brain is fixed with a certain patterns. I hired a guy who can speak 11 languages told me that I need to improve all my English areas.
Let's be honest. There won't be improvement if I kept writing without feedback. Plus, there won't be improvement if I do not integrate with those feedback. It's such classic dilemma. Honestly, I wanted to try the Benjamin Franklin method.
Benjamin Franklin loves reading book while he was a kid. His father found his love for books and he sent his son to a print shop for the apprenticeship. Later on he became a statesman, inventor, most importantly, the thinker that influenced the whole American history. This is because he was really good at writing practiced in the print shop.
What he did was to collect some of best writing from issues of the British culture and politics magazine, The Spectator and reverse engineered the prose.
He took some of the papers, and made short hints of the sentiment in each sentence, laid them by a few days, and then, without looking at the book, try'd to complete the papers again, by expressing each hinted sentiment at length, and as fully as it had been expressed before, in any suitable words that should come to hand.
To summarize, he took notes at a sentence level, sat aside for X days,( My preference would be 2 days, just a guess). and tried to recreate the sentences from his own head, without looking at the originals.
After the process, he compared his Spectator with the originals, discovered some of his faults and corrected them. But he found he needed a stock of words, or a readiness in recollecting and using them.
What I found interesting is I should do the similar. now my day comprises of the coding time. Almost all day. Once I got into the flow, I hardly think I couldn't make any progress. However, the progress is quite limited.
Benjamin Franklin found that his vocabulary was lacking and his prose was not complicated. Then he improved his method. Instead of taking straightforward notes on the articles he tried to recreate, he turned them into poems.
It's such a good way. But I wonder whether he had other things to work on?
This is called the arrangement of thoughts. I did them using Powerpoint. I found myself weird that I have to copy and paste each sentence on Powerpoint but it isn't good for me to understand English actually.
I don't read a lot and I spent most of my time in coding. I should make some changes.