Information gain ratio

This article has no lead section . Please improve this article by adding one in your own words. (November 2008) (Learn how and when to remove this message)

This article does not cite any sources . Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Information gain ratio" – news · newspapers · books · scholar · JSTOR (November 2008) (Learn how and when to remove this message)

Information Gain Calculation

Let $Attr$ {\displaystyle Attr} be the set of all attributes and $Ex$ {\displaystyle Ex} the set of all training examples, $value(x,a)$ {\displaystyle value(x,a)} with $x\in Ex$ {\displaystyle x\in Ex} defines the value of a specific example $x$ {\displaystyle x} for attribute $a\in Attr$ {\displaystyle a\in Attr}, $H$ {\displaystyle H} specifies the entropy. The information gain for an attribute $a\in Attr$ {\displaystyle a\in Attr} is defined as follows:

$IG(Ex,a)=H(Ex)-\sum _{v\in values(a)}{\frac {|\{x\in Ex|value(x,a)=v\}|}{|Ex|}}\bullet H(\{x\in Ex|value(x,a)=v\})$ {\displaystyle IG(Ex,a)=H(Ex)-\sum _{v\in values(a)}{\frac {|\{x\in Ex|value(x,a)=v\}|}{|Ex|}}\bullet H(\{x\in Ex|value(x,a)=v\})}

The information gain is equal to the total entropy for an attribute if for each of the attribute values a unique classification can be made for the result attribute. In this case the relative entropies subtracted from the total entropy are 0.

Gain Ratio Calculation

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "http://localhost:6011/en.wikipedia.org/v1/":): {\displaystyle IV(Ex,a)= -\sum_{v\in values(a)} \frac{|\Ex}{|Ex|}}

Information Gain Ratio Calculation

$IGR(Ex,a)=IG/IV$ {\displaystyle IGR(Ex,a)=IG/IV}

Advantages

Information gain ratio biases the decision tree against considering attributes with a large number of distinct values. So it solves the drawback of information gain (information gain applied to attributes that can take on a large number of distinct values might learn the training set too well. For example, suppose that we are building a decision tree for some data describing a business's customers. Information gain is often used to decide which of the attributes are the most relevant, so they can be tested near the root of the tree. One of the input attributes might be the customer's credit card number. This attribute has a high information gain, because it uniquely identifies each customer, but we do not want to include it in the decision tree: deciding how to treat a customer based on their credit card number is unlikely to generalize to customers we haven't seen before.)

Retrieved from "https://en.wikipedia.org/w/index.php?title=Information_gain_ratio&oldid=252319156"