CMPF104: Data Cleaning and Preprocessing: Data science and Data Anaytics: Programming For Foundation In Engineering, Assignment, UNITEN, Malaysia
| University | Universiti Tenaga Nasional (UNITEN) |
| Subject | CMPF104: Programming For Foundation In Engineering |
Data science and Data Anaytics
Download the dataset from BRIGHTEN. If your student ID ends with an odd number, select Concrete_Data_A dataset, and if your student ID ends with an even number, select Concrete_Data_B dataset. Using the Python attributes, function and libraries to solve the following problems.
a) Data Cleaning and Preprocessing:
- Use Pandas to load the dataset. Name the dataframe as concrete_df_XXX.
- Remove ‘Number’ column using .drop() function and visualize the first ten (10)
rows of the data. - Handle any missing values by dropping or replacing the empty cells. Check for missing values using functions like .info() or .isnull().sum()
- Convert the data frame to array, using to_numpy() function.
- Divide the data into two sets of data with division of 80% and 20% for train and test data, respectively. Name the dataset as train_data_XXX and test_data_XXX
b) Data Analysis:
- Calculate the correlation between the variables in the dataframe.
- Utilize NumPy and Pandas to calculate summary statistics of the data such as
maximum, minimum, standard deviation, average, median and mode of each
category. - Use Pandas functions like .describe() for an overview of summary statistics and apply NumPy functions for specific calculations.
c) Visualization:
- Use Matplotlib to create visualizations such as line plots for train and test data
across all categories. - Generate histogram plots and box plots for all variables.
- Ensure that the visualizations are clear, informative, and aesthetically pleasing.
- Customize your plots by adding the titles, labels and legends
Get Help By Expert
Recent Solved Questions
- Analyse the challenges for the business to go international using digital marketing and give suggestions to overcome the challenges: Digital Marketing, Assignment, Malaysia
- Banking Assignment, APU, Malaysia As a member of the AML, CDD checks need to be performed for the potential client, GMT due to the nature of the business
- BTX5220: International Issues in Employment Law Coursework, MUM, Malaysia If individuals were employed solely on their ability to do the job, there would be no need for any anti-discrimination statutes
- CV81: History of Psychology Assignment, KU, Malaysia You are to write a rather lengthy email to your juniors in the BSc Psychology degree program
- Project Management Research Paper, CU, Malaysia Loyalty is one of the greatest intangible assets that any organization can possess and improving client loyalty
- Fundamental of Statistics Assignment, APU, Malaysia Suppose that an accounting firm does a study to determine the time needed to complete one person’s tax forms
- Impact of Continuous professional development Amongst Football Administrators in Malaysia, Research Proposal, Malaysia
- Enterprise Resource Planning Course Work, MMU, Malaysia Beantown Bikes is satisfied with the quotation and agreed to the terms and conditions of the sale
- Human Resource Management Assignment, SU, Malaysia KPMG’s global 2020 HR Pulse Survey received responses from nearly 1,300 HR executives worldwide across diversified
- BFW2341: International Financial Management Essay, MUM, Malaysia Explore the international financial environment and the importance of foreign exchange risk management