

What is Skewed Data?
The measure of the asymmetry of a distribution of probability that is ideally symmetric and is given by the third standardized moment is skewness. In simple words, skew is the measure of how much a random variable's probability distribution varies from the normal distribution.
When both sides of the distribution are not distributed equally then this is known as Skewed Data. It is not a symmetrical distribution. To quickly see if the data is skewed, we can use a histogram.
A Skewed Histogram
[Image will be Uploaded Soon]
Types of Skewness
Well, the normal distribution is the distribution of the probability without any skewness. There are two types of skewness, apart from this:
Positive Skewness
Negative Skewness
Positive Skewness
A positively skewed distribution (often referred to as Right-Skewed) is a distribution type where most values are concentrated to the left tail of the distribution whereas the right tail of the distribution is longer. A positively skewed distribution is the complete opposite of a negatively skewed distribution.
[Image will be Uploaded Soon]
A Positively Skewed Curve
In contrast to normally distributed data, where all central trend measurements (mean, median, and mode) are equal to each other, with positively skewed data, the observations are dispersed. The general relationship between the central tendency measures in a positively skewed distribution can be expressed using the following inequalities:
Mean > Median > Mode
Negative Skewness
A negatively skewed distribution (often referred to as Left-Skewed) is a kind of distribution where more values are on the right side of the distribution graph whereas the left tail of its distribution graph is longer.
[Image will be Uploaded Soon]
A Negatively Skewed Curve
Apart from normally distributed data, where all central trend measurements (mean, median, and mode) are equal to each other, with negatively skewed data, the measurements are dispersed. The general relationship between central trend measures in the negatively skewed distribution can be displayed using the following inequality:
Mode > Median > Mean
How to Find Skewness of Data?
One measure of skewness would be to subtract the mean from the mode, then divide the difference by the Standard Deviation of the data. This is called Pearson's first coefficient of skewness. We have a dimensionless quantity as the explanation for dividing the difference. This explains why there is positive skewness in data skewed to the right. The mean is greater than the mode if the data set is skewed to the right, so subtracting the mode from the mean gives a positive number. A similar argument shows why there is negative skewness in data skewed to the left.
To calculate the asymmetry of a data set, Pearson's second coefficient of skewness is also used. We deduct the mode from the median for this value, multiply this number by 3 and then divide it by the Standard Deviation.
Note: If the data shows a strong mode, Pearson's first coefficient of skewness is useful. Pearson's second coefficient can be preferable if the data has a poor mode or several modes, as it does not depend on mode as a central tendency measure.
Uses of Skewed Data
In various contexts, skewed data arises very naturally. Incomes are skewed to the right because the mean can be significantly influenced by even a few people making millions of dollars, and there are no negative incomes. Similarly, details related to a product's lifetime, such as a light bulb brand, is skewed to the right. Here, zero is the smallest that a lifetime can be, and long-lasting light bulbs can give the data a positive skew.
What is Skewness in Statistics?
In statistics, if one asks what is skewness, it is the degree of asymmetry found in a distribution of probability. Distributions can exhibit to varying degrees right (positive) skewness or left (negative) skewness. Zero skewness exhibits a natural distribution (bell curve).
Conclusion
In a statistical distribution, data is considered skewed when the curve appears bent or skewed either to the left side or on the right. The graph shows symmetry in a normal distribution, implying that there are just as many data values on the left side of the median as on the right side.
FAQs on Skewness
1. What is skewness in statistics?
In statistics, skewness is a measure of the asymmetry or lopsidedness of a probability distribution. A perfectly symmetrical distribution, like a normal distribution or 'bell curve', has a skewness of zero. If a distribution is not symmetrical, it is considered skewed, meaning its data points are not evenly distributed around the mean.
2. What are the different types of skewness and how do they affect the mean, median, and mode?
There are three main conditions of skewness, defined by the relationship between the mean, median, and mode:
- Symmetrical Distribution (Zero Skewness): The data is evenly distributed. Here, the Mean = Median = Mode.
- Positive Skewness (Right-Skewed): The distribution has a long tail on the right side. This happens because a few high-value outliers pull the mean to the right. In this case, the relationship is Mean > Median > Mode.
- Negative Skewness (Left-Skewed): The distribution has a long tail on the left side, caused by a few low-value outliers. The relationship here is Mode > Median > Mean.
3. How is skewness measured or calculated?
Skewness can be calculated using several methods. One of the most common methods taught in the CBSE syllabus is Karl Pearson's Coefficient of Skewness. It has two main formulas:
- First Coefficient: This is used when the mode is clearly defined. The formula is Sk = (Mean - Mode) / Standard Deviation.
- Second Coefficient (Empirical Formula): This is more reliable when the mode is ill-defined or the data has multiple modes. The formula is Sk = 3(Mean - Median) / Standard Deviation.
4. Why is understanding skewness important in real-world data analysis?
Understanding skewness is crucial for several reasons. It provides insights that measures of central tendency (like the mean) alone cannot. It helps in:
- Accurate Data Interpretation: It reveals the underlying shape of the data. For example, a positively skewed income distribution tells us that most people earn less than the average income.
- Risk Assessment: In finance, the skewness of investment returns helps in assessing risk. A negatively skewed return suggests a higher probability of large losses.
- Model Selection: Many statistical models and machine learning algorithms assume a normal distribution. If the data is highly skewed, it may need to be transformed before a model can be accurately applied.
5. What is the key difference between skewness and kurtosis?
While both skewness and kurtosis describe the shape of a distribution, they measure different characteristics. The key difference is:
- Skewness measures the degree of asymmetry of a distribution. It tells us whether the data is concentrated on one side and if the distribution leans to the left or right.
- Kurtosis measures the 'tailedness' or the sharpness of the peak of a distribution. It indicates how much data is in the tails versus the center, telling us if the distribution is flat or sharply peaked compared to a normal distribution.
6. How does a real-world example like scores in an easy exam demonstrate negative skewness?
Scores from a very easy exam are a classic example of a negatively skewed (left-skewed) distribution. Here's why:
- Most students find the exam easy and score very high marks, creating a large cluster of data points at the higher end (the right side of the graph).
- A few students who did not prepare well will score low marks. These low scores are outliers that create a long tail to the left.
- Because of these low-score outliers, the mean score is pulled down below the median score, which is a hallmark of negative skewness.
7. Is it possible for the coefficient of skewness to be greater than 1 or less than -1?
Yes, it is possible, though not very common in typical datasets. While many moderately skewed distributions have a Pearson's coefficient of skewness between -1 and +1, this is a practical guideline, not a strict mathematical rule. For this specific coefficient, values are generally expected to lie within a range of -3 to +3. A value greater than 1 (e.g., 1.5) or less than -1 (e.g., -1.5) simply indicates a very high degree of skewness in the data.

















