Python Problems
Problem 1:
• Load the boston dataset by using Pandas read_csv().
• Remove column zero (the tag for this column is ‘Unnamed: 0’
• Remove column tagged as ‘dist’ and join the two parts of the dataframe (to the left and right of
the column ‘dis’) back together in a new dataframe called df2.
• Calculate the mean of column called ‘age’ and add it as a new column with the mean value
repeated for all rows.
Problem 2:
• Generate a vector of 1000 random numbers between 0 to 100.
• Plot a histogram of these numbers with number of bins equal to 10.
• Calculate the average of these numbers by using numpy method mean().
• Plot a line with a red color from the mean point on the histogram plot in y direction to show the
location of mean in the histogram plot.
• Make two matrices as follows and perform matrix multiplication:
3 6
4 9
1 5
* 4 12 21
23 15 −4
• Take the transpose of the first matrix and multiply it by itself. What is the relationship of the
resultant matrix and the original matrix?
Problem 3
Generate an array of normally distributed that contains 10000 samples. The mean of the
distribution is 10 and the standard deviation is 3.
1. Plot the samples vs. its index
2. Draw a line from the mean value with color green and thickness of 2.
3. Draw dashed lines from mean ± 2* standard deviation with a red color and thickness of 1.
4. Draw the histogram of this distribution and draw solid lines from the mean and mean ±
2* standard deviation with magenta.
5. Calculate the percentage of the samples that fall between the two standard deviation
lines from the data you have generated and print it as an output.