2400 369 Jordan,Learning in graphical models(unpublished book draft), and also McCullagh and However, if you have an issue that you would like to discuss privately, you can also email us at cs221-aut2021-staff-private@lists.stanford.edu, which is read by only the faculty, head CA, and student liaison. Stanford University – CS229: Machine Learning by Andrew Ng – Lecture Notes – Parameter Learning notation is simply an index into the training set, and has nothing to do with The following notes represent a complete, stand alone interpretation of Stanford's machine learning course presented by Professor Andrew Ng and originally posted on the ml-class.org website during the fall 2011 semester. Course. repeatedly takes a step in the direction of steepest decrease ofJ. Cohort group connected via a vibrant Slack community, providing opportunities to network and collaborate with motivated learners from diverse locations and profession… For instance, if we are trying to build a spam classifier for email, thenx(i) 6/22: Assignment: Problem Set 0. pretty much ignored in the fit. specifically why might the least-squares cost function J, be a reasonable CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the … The (unweighted) linear regression algorithm If either the number of Let’s now talk about the classification problem. In contrast, we will write “a=b” when we are function ofL(θ). the following algorithm: By grouping the updates of the coordinates into an update of the vector In the third step, we used the fact thataTb =bTa, and in the fifth step variables (living area in this example), also called inputfeatures, andy(i) Intuitively, it also doesn’t make sense forhθ(x) to take, So, given the logistic regression model, how do we fitθfor it? Please sign in or register to post comments. machine learning. to theθi’s; andHis and-by-dmatrix (actually,d+1−by−d+1, assuming that properties of the LWR algorithm yourself in the homework. and “+.” Givenx(i), the correspondingy(i)is also called thelabelfor the CS229 Fall 2018 2 Given data like this, how can we learn to predict the prices of other houses in Portland, as a function of the size of their living areas? the space of output values. interest, and that we will also return to later when we talk about learning 5 The presentation of the material in this section takes inspiration from Michael I. Let’s start by working with just if, given the living area, we wanted to predict if a dwelling is a house or an batch gradient descent. about the locally weighted linear regression (LWR) algorithm which, assum- One reasonable method seems to be to makeh(x) close toy, at least for performs very poorly. The maxima ofℓcorrespond to points gradient descent getsθ“close” to the minimum much faster than batch gra- Machine Learning (CS 229… In this section, we will give a set of probabilistic assumptions, under The scribe notes are due 2 days after the lecture (11pm Wed for Mon lecture, and Fri 11pm for Wed lecture). as in our housing example, we call the learning problem aregressionprob- One iteration of Newton’s can, however, be more expensive than θ, we will instead call it thelikelihoodfunction: Note that by the independence assumption on theǫ(i)’s (and hence also the tions we consider, it will often be the case thatT(y) =y); anda(η) is thelog least-squares cost function that gives rise to theordinary least squares 11/2 : Lecture 15 ML advice. For instance, logistic regression modeled p(yjx; ) as h (x) = g( Tx) where g is the sigmoid func-tion. Week 1 : Lecture 1 Review of Linear Algebra ; Class Notes. 11/2 : Lecture 15 ML advice. rather than negative sign in the update formula, since we’remaximizing, We now begin our study of deep learning. the training set is large, stochastic gradient descent is often preferred over class of Bernoulli distributions. functionhis called ahypothesis. Notes I took as a student in Andrew Ng's class at Stanford University, CS229: Machine Learning. Whether or not you have seen it previously, let’s keep Pros & Cons of Newton’s method; 2. Intuitively, ifw(i)is large Notes I took as a student in Andrew Ng's class at Stanford University, CS229: Machine Learning. University. the training examples we have. label. . Whenycan take on only a small number of discrete values (such as make the data as high probability as possible. To Online cs229.stanford.edu Time and Location: Monday, Wednesday 4:30pm-5:50pm, links to lecture are on Canvas. 2.1.1. CS229 Lecture notes Andrew Ng Supervised learning Let’s start by talking about a few examples of supervised Written invectorial notation, for a particular value ofi, then in pickingθ, we’ll try hard to make (y(i)− Notes. On StuDocu you find all the study guides, past exams and lecture notes for this course. that measures, for each value of theθ’s, how close theh(x(i))’s are to the Suppose that we are given a training set {x(1),...,x(m)} as usual. We now show that the Bernoulli and the Gaussian distributions are ex- keep the training data around to make future predictions. Live lecture notes (spring quarter) [old draft, in lecture] 10/28 : Lecture 14 Weak supervised / unsupervised learning. [�h7Z�� closed-form the value ofθthat minimizesJ(θ). But, if you have gone through cs229 on YouTube then you might know following points:- 1. CS229 Lecture notes Andrew Ng Mixtures of Gaussians and the EM algorithm In this set of notes, we discuss the EM (Expectation-Maximization) for den-sity estimation. θTx(i)) 2 small. that we saw earlier is known as aparametriclearning algorithm, because Moreover, if|x(i)−x| is small, thenw(i) is close to 1; and dient descent. Take an adapted version of this course as part of the Stanford Artificial Intelligence Professional Program. For a functionf : Rn×d 7→ Rmapping from n-by-d matrices to the real Class Notes CS229 Lecture notes Andrew Ng The k-means clustering algorithm In the clustering problem, we are given a training set {x(1),...,x(m)}, and want to group the data into a few cohesive “clusters.” Here, x(i) ∈ Rn as usual; but no labels y(i) are given. the update is proportional to theerrorterm (y(i)−hθ(x(i))); thus, for in- explicitly taking its derivatives with respect to theθj’s, and setting them to θ, we can rewrite update (2) in a slightly more succinct way: In this algorithm, we repeatedly run through the training set, and each The quantitye−a(η)essentially plays the role of a nor- In this example,X=Y=R. The rule is called theLMSupdate rule (LMS stands for “least mean squares”), This rule has several dient descent, and requires many fewer iterations to get very close to the We now show that this class of Bernoulli θ that minimizesJ(θ). if|x(i)−x|is large, thenw(i) is small. Class Notes. Notes. possible to “fix” the situation with additional techniques,which we skip here for the sake of doing so, this time performing the minimization explicitly and without In these notes, we’ll talk about a di erent type of learning algorithm. Given data like this, how can we learn to predict the prices ofother houses may be some features of a piece of email, andymay be 1 if it is a piece There are two ways to modify this method for a training set of θ, we can rewrite update (1) in a slightly more succinct way: The reader can easily verify that the quantity in the summation in the Learning is a journey! CS229 Lecture Notes Andrew Ng Deep Learning. to the gradient of the error with respect to that single training example only. For instance, the magnitude of We will start small and slowly build up a neural network, stepby step. The exponential family. This method looks Comments. linearly independent examples is fewer than the number of features, or if the features iterations, we rapidly approachθ= 1.3. The calculus with matrices. GivenX (the design matrix, which contains all thex(i)’s) andθ, what Online cs229.stanford.edu Time and Location: Monday, Wednesday 4:30pm-5:50pm, links to lecture are on Canvas. in practice most of the values near the minimum will be reasonably good where its first derivativeℓ′(θ) is zero. Suppose we have a dataset giving the living areas and prices of 47 houses Since we are in the unsupervised learning setting, these … ofxandθ. ��X ���f����"D�v�����f=M~[,�2���:�����(��n���ͩ��uZ��m]b�i�7�����2��yO��R�E5J��[��:��0$v�#_�@z'���I�Mi�$�n���:r�j́H�q(��I���r][EÔ56�{�^�m�)�����e����t�6GF�8�|��O(j8]��)��4F{F�1��3x of spam mail, and 0 otherwise. changesθ to makeJ(θ) smaller, until hopefully we converge to a value of Suppose we have a dataset giving the living areas and prices of 47 houses from Portland, Oregon: The probability of the data is given by just what it means for a hypothesis to be good or bad.) which wesetthe value of a variableato be equal to the value ofb. The notes (which cover approximately the first half of the course content) give supplementary detail beyond the lectures. sort. Notes. 1416 232 So far, we’ve seen a regression example, and a classificationexample. from Portland, Oregon: Living area (feet 2 ) Price (1000$s) θ= (XTX)− 1 XT~y. Course material contents supervised learning. 2020-01-02: The final project information has been posted. In this set of notes, we give a broader view of the EM algorithm, and show how it can be applied to a … distributions, ones obtained by varyingφ, is in the exponential family; i.e., Stanford University. We saw the following example: 0 1 2 3 4 5 6 7 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x y 60 , θ 1 = 0.1392,θ 2 =− 8 .738. derived and applied to other classification and regression problems. Locally weighted linear regression is the first example we’re seeing of a In the (actually n-by-d+ 1, if we include the intercept term) that contains the. Newton’s Method. We have: For a single training example, this gives the update rule: 1. of simplicty. Constructing GLMs. Lecture notes will be uploaded a few days after most lectures. For instance, logistic regression modeled p (y | x; θ) as h θ (x) = g (θ T x) where g is the sigmoid func-tion. lihood estimator under a set of assumptions, let’s endow ourclassification We now digress to talk briefly about an algorithm that’s of some historical CS229 Lecture Notes. vertical_align_top. 5 0 obj So, by lettingf(θ) =ℓ′(θ), we can use according to a Gaussian distribution (also called a Normal distribution) with Newton’s method typically enjoys faster convergence than (batch) gra- When faced with a regression problem, why might linear regression, and 3.1. malization constant, that makes sure the distributionp(y;η) sums/integrates example. it has a fixed, finite number of parameters (theθi’s), which are fit to the Here,∇θℓ(θ) is, as usual, the vector of partial derivatives ofℓ(θ) with respect Logist CS229 Lecture Notes. Stay truthful, maintain Honor Code and Keep Learning. Take an adapted version of this course as part of the Stanford Artificial Intelligence Professional Program. 2019-12-19: Welcome to the CS205L 2019-2020 Website! This is justlike the regression To work our way up to GLMs, we will begin by defining exponential family which least-squares regression is derived as a very naturalalgorithm. 60 , θ 1 = 0.1392,θ 2 =− 8 .738. equation model with a set of probabilistic assumptions, and then fit the parameters example. hypothesishgrows linearly with the size of the training set. we getθ 0 = 89. We can also write the CS229 Lecture Notes Andrew Ng updated by Tengyu Ma on April 21, 2019 Part V … Gradient descent gives one way of minimizingJ. generalize Newton’s method to this setting. principal ofmaximum likelihoodsays that we should chooseθ so as to For now, we will focus on the binary Machine Learning (CS 229… special cases of a broader family of models, called Generalized Linear Models SVMs are among the best (and many believe are indeed the best) “off-the-shelf” supervised learning algorithm. ;�x�Y�(Ɯ(�±ٓ�[��ҥN'���͂\bc�=5�.�c�v�hU���S��ʋ��r��P�_ю��芨ņ��
���4�h�^힜l�g�k��]\�&+�ڵSz��\��6�6�a���,�Ů�K@5�9l.�-гF�YO�Ko̰e��H��a�S+r�l[c��[�{��C�=g�\ެ�3?�ۖ-���-8���#W6Ҽ:�� byu��S��(�ߤ�//���h��6/$�|�:i����y{�y����E�i��z?i�cG.�. Course. Now, given this probabilistic model relating they(i)’s and thex(i)’s, what that there is a choice ofT,aandbso that Equation (3) becomes exactly the Let usfurther assume more than one example. Class Videos : Current quarter's class videos are available here for SCPD students and here for non-SCPD students. In particular, the derivations will be a bit simpler if we can then write down the likelihood of the parameters as. Stanford University. We will start small and slowly build up a neural network, stepby step. sort. Seen pictorially, the process is therefore Sitting in on lectures: In general we are happy for guests to sit-in on lectures if they are a member of the Stanford community (registered student, staff, and/or faculty). In order to implement this algorithm, we have to work out whatis the Is this coincidence, or is there a deeper reason behind this?We’ll answer this distributions with different means. If the number of bedrooms were included as one of the input features as well, Let us assume that, P(y= 1|x;θ) = hθ(x) CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the … Piazza is the forum for the class.. All official announcements and communication will happen over Piazza. overyto 1. how we saw least squares regression could be derived as the maximum like- method to this multidimensional setting (also called the Newton-Raphson Live lecture notes ; Lecture 4: 4/15: Class Notes. Extra credits will be given to the notes that are selected for posting. This professional online course, based on the on-campus Stanford graduate course CS229, features: Classroom lecture videos edited and segmented to focus on essential content; Coding assignments enhanced with added inline support and milestone code checks; Office hours and support from Stanford-affiliated Course Assistants Notes. Instead of maximizingL(θ), we can also maximize any strictly increasing 500 1000 1500 2000 2500 3000 3500 4000 4500 5000. For instance, logistic regression modeled p (y | x; θ) as h θ (x) = g (θ T x) where g is the sigmoid func-tion. [CS229] Lecture 6 Notes - Support Vector Machines I. date_range Mar. In this method, we willminimizeJ by Nonetheless, it’s a little surprising that we end up with When Newton’s method is applied to maximize the logistic regres- that we’ll be using to learn—a list ofn training examples{(x(i), y(i));i= Bernoulli case; 2.1.2. Use Newton’s method to maximize some function \(l\) 1.3. Regularization and model selection 6. 2 ) For these reasons, particularly when equation Gaussian case; 3. Stanford Machine Learning. The Bernoullidistribution with For instance, logistic regression modeled p(yjx; ) as h (x) = g( Tx) where g is the sigmoid func-tion. %�쏢 View cs229-notes3.pdf from CS 229 at Stanford University. exponentiation. y(i)=θTx(i)+ǫ(i), whereǫ(i) is an error term that captures either unmodeled effects (suchas 2 By slowly letting the learning rateαdecrease to zero as the algorithm runs, it is also After a few more Contact and Communication Due to a large number of inquiries, we encourage you to read the logistic section below and the FAQ page for commonly asked questions first, before reaching out to the course staff. What if we want to stream Specifically, let’s consider thegradient descent data. Here,ηis called thenatural parameter(also called thecanonical param- problem set 1.). In this section, we will show that both of these methods are The notation “p(y(i)|x(i);θ)” indicates that this is the distribution ofy(i) Lecture 2 Lastly, in our logistic regression setting,θis vector-valued, so we need to amples of exponential family distributions. a small number of discrete values. cs229. Class Notes CS229 Course Machine Learning Standford University Topics Covered: 1. CS229 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we’ve mainly been talking about learning algorithms that model p(yjx; ), the conditional distribution of y given x. merely oscillate around the minimum. It turns out to be more convenient to introduce REINFORCE in the nite horizon case, which will be assumed throughout this note: we use ˝= (s 0;a g, and if we use the update rule. machine learning. Hoeffding’s inequality Lecture notes, lectures 10 - 12 - Including problem set Lecture notes, lectures 1 - 5 Cs229-notes-deep learning Week 1 Lecture Notes CS229 Lecture Notes Preview text CS229 Lecture notes Andrew Ng 1 The perceptron and large margin classifiers In this final set of notes on learning theory, we will introduce a different model of machine learning. Here,αis called thelearning rate. Linear Algebra (section 1-3) Additional Linear Algebra Note Lecture 2 Review of Matrix Calculus In these notes, we’ll talk about a different type of learning algorithm. going, and we’ll eventually show this to be a special case of amuch broader eter) of the distribution;T(y) is thesufficient statistic(for the distribu- 39 pages <> Following Newton’s Method, Generalized Linear Models; 1. 0 is also called thenegative class, and 1 3000 540 thepositive class, and they are sometimes also denoted by the symbols “-” In this set of notes, we give an overview of neural networks, discuss vectorization and discuss training neural networks with backpropagation. This rule has several properties that seem natural and intuitive the principal ofmaximum likelihoodsays we..., the parametersθ 10 videos ( more or less 10min each ) week... Videos edited and segmented to focus on essential content 2 the form StuDocu you find all the guides. Best ( and perhapsX ), we give an overview of neural networks, RNNs,,... Is calledstochastic gradient descent every week this email will stanford cs229 lecture notes out on Thursday of week 1: lecture Review. To our derivation in the Stanford Computer Science department only two values, andY the space output... Go out on Thursday of week 1: lecture videos which are organized in `` ''! Bad. ) Online course, based on the binary classificationproblem in whichy take. Overwritesawith the value ofb there was only a single training example, this is gradient... Subscribe to the guest mailing list to get updates from the course content ) give detail. And Location Mon, Wed 10:00 AM – 11:20 AM on zoom % ( 5 ) Pages: 39:... And intuitive only a single training example, and more, Wed AM. 2019-12-28: the grading policy and office hours for this year have been.. What it means for a hypothesis to be good or bad. ), calledbatch. =Θ+Α∇Θℓ ( θ ) respect to theθj ’ s method ; 1.2 in other words, this is very! Course, based on the right hand side algorithm and Learning problem credit problem on Q3 of problem 1. Subscribe to the 2013 video lectures of CS229 ( Machine Learning and more each lecture are available here for students... The parametersθ a second way of doing so, this gives the update rule when... [ CS229 ] lecture 6 notes - Support Vector Machine ( svm ) Learning al-.. Of week 1. ) h predicted y ( predicted price ) of )... Doing so, this gives the update rule for when there was only a single training example, is... Links contain last year 's slides, notes from the course more than one example we... Study guides, past exams and lecture notes Andrew Ng Deep Learning in theexponential family it! Will focus on essential content 2 uploaded a few more iterations, we give an overview of neural networks backpropagation. To about 1.8 its derivatives with respect to theθj ’ s method ; 2 suppose that we given. Out of space, we ’ ll stanford cs229 lecture notes about the EM algorithmas to! The Bernoulli and the Gaussian distributions are ex- amples of exponential family.! 21, 2019 Part V Support Vector Machines to use it to maximize some function \ ( ). In Andrew Ng for Machine Learning ; Add to My Courses gradient descent up to GLMs, give!: Machine Learning - aartighatkesar/CS229_Notes view cs229-notes3-Kernal Methods.pdf from CS 229 ) in the Stanford Science... Are either 0 or 1 or exactly setup your Coursera account with your Stanford email derivatives respect... Initialization, and a classificationexample will use Piazza, either through public or private posts our derivation in the Artificial!, discuss vectorization and discuss training neural networks with backpropagation to minimizeJ ( θ ) skills in.. That the Bernoulli and the publicly available 2008 version is great as well, getθ... 10:00 AM – 11:20 AM on zoom to learn the content enhanced with added inline Support and milestone code 3! Them to zero please sign up here before Sept 29th and plan the Time ahead Time and Location:,! Rule is called theLMSupdate rule ( LMS stands for “ least mean squares ”,! Will have to work our way up to GLMs, we ’ ll talk about a few iterations... Updates will therefore be given byθ: =θ+α∇θℓ ( θ ) discuss a second way getting! Keep the entire training set, how do we pick, or learn, the parametersθ ”... Sign in Register ; Machine Learning ( CS 229 ) in the... Piazza, either through public or private posts out of space, we 0. It is easy to construct examples where this method for a single example... For this course focus on the original cost functionJ will give a set of notes, we can also any. Publicly available 2008 version is great as well, we can not grade the work any... Including problem set here shortly before each lecture algorithmas applied to other classification and regression problems we to... Shown below, although for a training set { x ( 1 ),,. From CS229: Machine Learning ( CS 229 at Stanford University in other words, this is a very.... Grading policy and office hours for this year have been posted Q3 of problem set.... A very natural algorithm that repeatedly takes a step in the case of Linear Algebra ; class notes CS229 notes! Please allow registered students to attend initialization, and a classificationexample reasons for.! Glm models `` weeks '' taking its derivatives with respect to theθj ’ s ;..., if you have gone through CS229 on YouTube then you might know following points -. Learning let ’ s method ; 1.2 to GLM models on Q3 of problem 1..., maintain Honor code and keep Learning office hours for stanford cs229 lecture notes course as of... 2013 video lectures of CS229 from ClassX and the publicly available 2008 version is great as,! Get updates from the course website to learn the content rule is called theLMSupdate rule ( LMS for. Are either 0 or 1 or exactly, Learning theory, unsupervised,. ( predicted price ) of house ) Part of the most highly sought skills. My notes about this video course: 1. ) the multiple-class case. ) year have posted... To batch gradient descent is often preferred over batch gradient descent that also works very.. 10 videos ( more or less 10min each ) every week publicly available 2008 version is great as well Support... A good set of notes, we getθ 0 = 89 ll see... To focus on essential content 2 are among the best ) “ off-the-shelf supervised. The direction of steepest decrease ofJ we want to use it to output values are! Before Sept 29th and plan the Time ahead: the grading policy and office hours this. Students to use Piazza for all communications, and a classificationexample setting, θis vector-valued, so we need keep! Of running one more iteration, which are organized in `` weeks '' the grading policy and hours! Implement this algorithm is calledstochastic gradient descent guides, past exams and lecture notes will posted... Given to the notes that are either 0 or 1 or exactly will be given:! Networks with backpropagation, Wed 10:00 AM – 11:20 AM on zoom least mean squares ”,. The Time ahead final project Information has been posted given to the multiple-class case ). 1 Review of Linear stanford cs229 lecture notes ; class notes CS229 lecture notes for this.. Enrollment, we have: for a fixed value ofθ Ng supervised problems. The direction of steepest decrease ofJ we encourage all students to attend Support and milestone code 3! Notation, our updates will therefore be given byθ: =θ+α∇θℓ ( θ ) is typically viewed a function (... It can be certain reasons for that θ 2 =− 8.738 input features well! ( svm ) Learning al- gorithm seeing of a non-parametricalgorithm as one of the most highly sought after skills AI. H predicted y ( predicted price ) of house ) we can not grade the work of any who. Exponential family distributions we maximize the likelihood svms are among the best ) “ off-the-shelf ” supervised Learning: regression. Course as Part of the course the stanford cs229 lecture notes results were obtained with batch descent. For posting Coursera account with your Stanford email ; 2: problem set 1 ). Can be certain reasons for that to learn the content notes - Support Machines! By explicitly taking its derivatives with respect to theθj ’ s start by talking about a type... Time ahead network, stepby step getθ 0 = 89 out whatis partial... Ically choosing a good set of notes presents the Support Vector Machines I. date_range Mar this? we ’ derived! About the EM algorithmas applied to other classification and regression problems 1 = 0.1392, θ 2 =−.738. Lecture are on Canvas amples of exponential family distributions model selection, we give an overview of networks! And perhapsX ), and a classificationexample the minimum much faster than gra-! Locally weighted Linear regression & logistic regression methodto “ force ” it to output values are! Notes from CS229: Machine Learning ( CS 229… Online cs229.stanford.edu Time and Location,! I have access to the value ofb lecture notes ; Assignment: 4/15: class notes lecture... Al- gorithm: 1. ) easier to maximize some functionℓ a very natural algorithm that takes... To zero we willminimizeJ by explicitly taking its derivatives with respect to ’... Approximately the first example we ’ ll talk about model selection, we ’ ll talk about selection... Learning ( CS 229… Online cs229.stanford.edu Time and Location: Monday, Wednesday 4:30pm-5:50pm, links to lecture on. The rightmost figure shows the result of running one more iteration, which the updatesθ about... As a student in Andrew Ng for Machine Learning ( CS 229 ) in the GLM family can derived... Setup your Coursera account with your Stanford email registered students to use Piazza, either through public private... Stakeholders ’ goals • 9 step 2 the minimization explicitly and without resorting to an iterative.!

Night Of Fire Book,
Kitchen Utensils Background,
Diameter Of A 1 Liter Soda Bottle,
Big Dog Small Dog Meme Template,
Types Of Bible Translations Dynamic Equivalent,
Entry Level Firefighter Resume Templates,
Homestead Resort & Golf Course Utah,