CMPF104: Data Cleaning and Preprocessing: Data science and Data Anaytics: Programming For Foundation In Engineering, Assignment, UNITEN, Malaysia
| University | Universiti Tenaga Nasional (UNITEN) |
| Subject | CMPF104: Programming For Foundation In Engineering |
Data science and Data Anaytics
Download the dataset from BRIGHTEN. If your student ID ends with an odd number, select Concrete_Data_A dataset, and if your student ID ends with an even number, select Concrete_Data_B dataset. Using the Python attributes, function and libraries to solve the following problems.
a) Data Cleaning and Preprocessing:
- Use Pandas to load the dataset. Name the dataframe as concrete_df_XXX.
- Remove ‘Number’ column using .drop() function and visualize the first ten (10)
rows of the data. - Handle any missing values by dropping or replacing the empty cells. Check for missing values using functions like .info() or .isnull().sum()
- Convert the data frame to array, using to_numpy() function.
- Divide the data into two sets of data with division of 80% and 20% for train and test data, respectively. Name the dataset as train_data_XXX and test_data_XXX
b) Data Analysis:
- Calculate the correlation between the variables in the dataframe.
- Utilize NumPy and Pandas to calculate summary statistics of the data such as
maximum, minimum, standard deviation, average, median and mode of each
category. - Use Pandas functions like .describe() for an overview of summary statistics and apply NumPy functions for specific calculations.
c) Visualization:
- Use Matplotlib to create visualizations such as line plots for train and test data
across all categories. - Generate histogram plots and box plots for all variables.
- Ensure that the visualizations are clear, informative, and aesthetically pleasing.
- Customize your plots by adding the titles, labels and legends
Get Help By Expert
Recent Solved Questions
- MEDS1159: Corporate Communications Case Study, UOG, Malaysia Coursework is receipted on the understanding that it is the student’s own work
- ACC116 Introduction To Cost Accounting Assignment, UiTM, Malaysia Limau Manis Sdn Bhd is a medium-sized company that manufactures a variety of pickles in the northern region of Malaysia
- Environmental Science Assignment, UON, Malaysia Enzim E memangkin hidrolisis substrat S Kinetik E (V0) sebagai satu fungsi S diukur dan data diperoleh seperti dalam Jadual
- What is the impact of earning volatility on accounting comparability of SMEs in economy development ?: Business Research Project Thesis, MISI, Malaysia
- Employment Relations Essay, UMS, Malaysia Describe and explain the purpose of the Malaysia Labour Acts and the implications of the International Labour Organisation
- BAFB4124: UNITAR International University Assignment- Investment Analysis, Malaysia
- BHO0255: The Psychology of Work and Organisations Assignment, HWU, Malaysia With reference to relevant literature, critically explore and evaluate the notion
- BBUS2103: Discuss the types of company directors, When may a director be terminated and removed from his office: Company Law, Assignment, OUM, Malaysia
- Object Oriented Programming Assignment, UiTM, Malaysia Based on research done by Mr. Rushikesh S. Raut, the use of object-oriented programming has increased in the software
- AAC20403: Financial Accounting and Reporting Assignment, MSU, Malaysia Using a diagram clearly explains the concept of price ceiling and Why are traders not able to adhere to the ceiling price of chicken