CMPF104: Data Cleaning and Preprocessing: Data science and Data Anaytics: Programming For Foundation In Engineering, Assignment, UNITEN, Malaysia
| University | Universiti Tenaga Nasional (UNITEN) |
| Subject | CMPF104: Programming For Foundation In Engineering |
Data science and Data Anaytics
Download the dataset from BRIGHTEN. If your student ID ends with an odd number, select Concrete_Data_A dataset, and if your student ID ends with an even number, select Concrete_Data_B dataset. Using the Python attributes, function and libraries to solve the following problems.
a) Data Cleaning and Preprocessing:
- Use Pandas to load the dataset. Name the dataframe as concrete_df_XXX.
- Remove ‘Number’ column using .drop() function and visualize the first ten (10)
rows of the data. - Handle any missing values by dropping or replacing the empty cells. Check for missing values using functions like .info() or .isnull().sum()
- Convert the data frame to array, using to_numpy() function.
- Divide the data into two sets of data with division of 80% and 20% for train and test data, respectively. Name the dataset as train_data_XXX and test_data_XXX
b) Data Analysis:
- Calculate the correlation between the variables in the dataframe.
- Utilize NumPy and Pandas to calculate summary statistics of the data such as
maximum, minimum, standard deviation, average, median and mode of each
category. - Use Pandas functions like .describe() for an overview of summary statistics and apply NumPy functions for specific calculations.
c) Visualization:
- Use Matplotlib to create visualizations such as line plots for train and test data
across all categories. - Generate histogram plots and box plots for all variables.
- Ensure that the visualizations are clear, informative, and aesthetically pleasing.
- Customize your plots by adding the titles, labels and legends
Get Help By Expert
Recent Solved Questions
- Building a Personal Brand as a CEO: A Case Study of Vivy Yusof, the Cofounder of FashionValet and the dUCk Group: Marketing Management, Assignment, OUM, Malaysia
- PRJ3173: Business Research Design, Research Paper, SU, Malaysia
- HR Management Report, OUM, Malaysia Sedap Fried Chicken Bhd is a new and rapid growing fast-food company in Malaysia
- MAT1830: Discrete mathematics for computer science Assignment, MUM, Malaysia The Corruption Perceptions Index uses perceptions of the general public, business people
- Fitness and Health Assignment, USIM, Malaysia Aizat is a 30-year-old male with a height and weight of 80 kg and 1.70 m, respectively. He asks you for the types of exercise
- EER1001: Electrical Services for Facilities Assignment, TP, Malaysia What is Total Connected Load (TCL)?
- Research Methodology Assignment, UPM, Malaysia The manager of “Mee Segera Disukai Ramai” suspects that half of his 300 male and female workers are not very motivated
- Customers enter the waiting line at a cafeteria’s only cash register on a first-come, first-served basis: Optimization Technique Assignment, UMP, Malaysia
- TTTK2933: Object Oriented Programming Assignment, UKM, Malaysia Write a class of each object including their object constructors and appropriate identified attributes and methods
- Entrepreneurship Assignment, UKM, Malaysia Entrepreneurship is the art of creating something new, whether a new business or enterprise