Machine Learning
This blog contains some of the most interesting machine learning questions that could be beneficial for the interviews in the Machine learning sphere.
00:00:00 (00) Introduction to the Video / Speaker / ML / Agenda - (1) Core Concepts (2) Python based (3) Scenario based questions
00:03:07 (01) How the can the concept of ML can be explained to a school going kid ........ 00:04:28 (02) What are the types of ML.......... Supervised / Unsupervised / Reinforcement learning (Hit and try / reward & Penalty) / Semi-supervised 00:09:04 (03) What is your favorite algorithm and its explanation 00:09:46 (04) Difference b/w deep learning and M/L 00:11:29 (05) Difference between Classification and Regression 00:12:46 (06) What do u mean by selection bias 00:13:39 (07) Difference b/w Precision and Recall 00:17:03 (08) Explain True Positive (TP), TN, FP, FN 00:18:42 (09) What is a confusion matrix..........used for summarizing the performance of a classification algo 00:20:47 (10) Difference b/w inductive and deductive learning 00:22:20 (11) Difference b/w KNN and k-means clustering......... supervized vs Unsupervised; K meaning in KNN is neighbours and in K-meana it is no . of clusters 00:23:53 (12) What is ROC curve and what does it represent. ........Receiver operating characteristics Plot of True Positive rate vs False Positive rate 00:26:53 (13) Difference b/w Type-I and Type-II errors.....Type I is False Positive (FP) and Type II False Negative (FN) 00:28:13 (14) Is it better to have too many FP or too many FN 00:30:47 (15) Which is more important to you. Model accuracy or model performance.......model accuracy is part of model performance 00:31:48 (16) Differnce b/w Gini impurity and Entropy in decision tree 00:33:19 (17) Difference b/w Entropy and Information gain .... Information gain getting better as the ndes are getting purer 00:34:40 (18) What is overfitting. how do u ensure you are not overfitting wth a model..... More data .. ensemlbing models ... simpler models .. adding regularizations 00:37:50 (19) Explain ensembling learning tech in ML... Bagging / Boosting 00:41:32 (20) What is Bagging and Boosting in ML 00:44:49 (21) How wud u screen for outliers and how do u handle them 00:47:56 (22) What is collinearity and multi collinearity 00:48:54 (23) What is Eigenvectors and Eigenvalues 00:51:33 (24) What is A/B Testing 00:52:55 (25) What is cluster sampling 00:53:51 (26) Running binary clasification tree is simple. But do u know how the tree decide on whcih variable to split at the root node and its succeeding child nodes 00:56:18 (27) (01) Name a few libraries in python used for data analyss and Scientific computations 00:58:58 (28) (02) Which library wud u prefer for plotting in python: Seaborn or Matplotlib or Bokeh 01:00:32 (29) (03) How are numpy and scipy related to each other 01:01:28 (30) (04) Main differnce b/w Pandas series and single column dataframe in Python 01:02:35 (31) (05) How can u handle duplicate values in a dateset for variable in Python 01:03:16 (32) (06) Write a basic ML progrsm to check the accuracy of the dataset importing any dataset using any classifier 01:07:46 (33) (01) U r given a datset consisting of variables having more than 30% missing values. Let's say out of 50 vars, 8 vars have missing values higher than 30%; How will u deal with them 01:09:42 (34) (02) Write a SQL query that makes recommendations using the pages that ur friends liked. Assume u have two tables: a 2 col table of users and their friends and 2 col table of users and pages they like. It shud not recommend pages u already liked 01:12:00 (35) (03) There is a game where u r asked to roll two fair six sided dice. If the sum of the vals on the dice equls seven, then u win $21. However you must pay $5 to play each time u roll both dice. Do u play the ame. Also, if the player plays it 6 times what is the probability of him making money 01:15:06 (36) (04) We have 2 options for seving ads with newsfeed: (1) Out of every 25 stories 1 will be an ad (2) every story has a 4% chance of being an ad. For each option, wat is the xpected numbers of ads shown in 100 news stores. If we go with optin 2, what is the chance the user wiull be shown a single ad in 100 stories. Wat abt no ads at all 01:18:31 (37) (05) How wud u predict who will renew their subscription next month? What data would u need to solve this. What analysis would u do? Wud u build predictive models. If so which 01:22:04 (38) (06) How do u map nicknames to real names 01:23:34 (39) (07) A jar has 1000 coins of which 999 are fair and 1 is double headed. Pick a coin at random and toss it 10 times. Given that u see 10 heads, wat is the probability that the next toss of that coin s 01:28:02 (40) (08) Suppose u r given a data set which has missing values spread along 1 SD from the median. What % of data would remain unaffected and why 01:28:53 (41) (09) U r given a cancer detection data set. Let u suppose when u build a classification model u achieved an accuracy of 96%. Why shud not u be happy with ur model performance. What can u do about it 01:31:48 (42) (10) U r working on a time series dataset. Ur manager has asked you to build a high accuract model. U start with the tree algo asince u know it works faily well on all kinds of data. Later u tried a time series regression model and got higher accuracy than the earlier model. Can this happen. 01:33:16 (43) (11) Suppose u found that ur model is suffering from low bias and high variance. Which algo u think cud tackle the situation and why 01:36:02 (44) (12) U r given a dataset. The dataset contains many variables, some of which are high correlated and u know abt it. Ur manager has asked u to run PCA. Wud u remove correlated vars first 01:37:21 (45) (13) U r asked to build a multiple regressioon model but ur model R-square isnot as good as u want it to be. For improvement, u remove the intercept term, now ur model r-square becomes 0.8 from 0.3. Is it posssible. how 01:39:10 (46) (14) U r asked to build a random forest model with 1000 trees. During its training u got training error as 0.00. But on testing the validation error was 34.23. What is going on. Have not u trained the model perfectly
Comments
Post a Comment