This week’s discussion topic allows you to create and clean your own dataset similar to what you will be required to do for your Term Research Project for the course. In Week 4, you will complete the first two sections in the Word Document attached below (Introduction and Purpose, Definition of Variables). You will also insert your formatted descriptive statistics table into the third section. Once you have completed these sections, you should copy and paste these items into the text box provided in the discussion post. You should also attach your formatted dataset along with your completed word document. To begin, go to the World Bank Data Database and access the Millennium Development Goals using this link:
https://databank.worldbank.org/source/millennium-development-goals
Choose at least 5 independent variables (predictors) and 1 dependent variable (outcome). For your dependent variable, you should choose GNI per capita, Atlas method (current US$). You may choose any other variables as predictors but note that not all countries are surveyed for all variables. You will have missing data as some countries may not have observations for the specific variables that you choose. You will have to delete any country that has too much missing data for any of your variables, so be careful which variables you choose. Your initial dataset will need a sample size of 100 or greater for each variable. To increase the number of countries with acceptable observations, you will program the databank to generate an average value for the span of 2006-2015 rather than generating output for any one specific year. There are many variables in this data bank series, so you should try to be unique and capture items of interest to you. If you choose the exact same variables as someone else, I may require you to change at least one. If your sample size is large enough, I may select certain cases for you to analyze so that your results will be unique. I will approve the final data frame in the order that they are originally posted. You are not required to include any references in your descriptions or definitions. The World Bank Dataset that you attach will be sufficient for your reference. Also keep in mind that you will not be graded on your hypotheses, so you do not have to spend hours upon hours trying to come up with a perfect prediction model. You will be graded on your ability to download, clean, and present data. The description and hypothesized relationships are primarily there so that I can determine whether or not you know how to perform and read the results. BUS 5253 Week 4 Discussion Walk-Through Fall 2024 W2
This week’s discussion topic allows you to begin the first of 3 steps in completing your Term Research Project for the course. The assignment will build upon the assignment that you began in Week 4, and you may use the same variables and basic hypotheses you created last week, unless I specifically inform you to make a change. As such, it may be in your best interest to wait until you receive your grade for the Week 4 Discussion before you begin this one.
Note that I generally will not make you change anything unless your original assignment was not correct, was exactly like another students’, or if I predict that your dataset will not be sufficient to meet the instructions given for the Term Project. To meet the final criteria, your dataset much contain at least 75 complete observations with no missing data or outliers, all observations should be from individual countries, the observations should represent the average value from 2006-2015, and the final dataset must not contain any correlation values above .7 (or below -.7).
In Week 5, you will complete the first two sections in the Word Document attached below (Introduction and Purpose, Definition of Variables). Much of this follows the same instruction as in Week 4, and you are not required to write a literature review or go into any great detail. You are not being graded on how well your hypotheses predictions are researched. I only need you to be clear and direct so that I may grade you on how well you are able to test your hypotheses in the final term project. Then insert a graphic to further describe your hypothesized relationships and regression design.
The new material for this week will begin in Section 3, where you will describe your sample and explain your final dataset. Using the same parameters as last week, download a dataset from the World Bank Millenium Development Goals database containing at least 5 continuous independent variables and 1 dependent variable (GNI per capita, Atlas method unless otherwise instructed). Set the parameters to ensure that no country is included more than once, and that each observation represents a 10-year average for the span of 2006-2015.
https://databank.worldbank.org/source/millennium-development-goals
Use the Data Analysis Toolpak in Microsoft Excel to create a table for your descriptive statistics. Be sure to format the table so that it is neat, easy to read, and may be easily inserted into a Word document. I recommend creating shorter variable names to make your charts, tables, and graphs easier to manage. I will show you how to do this in the Week 4 discussion video, and I will revisit those instructions this week. Create a separate tab for your original data with shortened names as explained in Week 4 and videos. Following the instructions in the Week 4 and Week 5 videos, create a descriptive statistics table using the Data Analysis Toolpak for your first entry into Section 3. Be sure that every variable has greater than 100 observations. Next, identify all missing values and use listwise deletion as a treatment for missing data (purge any observation with missing values for any variable using the Filter Function in Excel). Copy your filtered data into a new tab labeled “No Missing” using the Paste Values command. Create descriptive statistics for the second entry into Section 3. Follow the directions on Weeks 4 and 5 videos to ensure that your data are formatted correctly. Be sure to document how many countries were removed (if your sample size is below 75, you must replace variables with too many missing values and start over). Next, you will treat outliers. For simplicity, your treatment for extreme outliers will be to remove all observations greater than 3 standard deviations from the mean for each variable, and you are only required to use one iteration. Use your No Missing tab to identify and filter outliers using the mean and standard deviations of each variable. I will show you how to do this relatively quickly in this week’s discussion video. Once you complete this step, copy your filtered data into a new tab labeled “No Outliers”. Ensure that your sample size is still greater than 75 and create a descriptive statistics table for your third entry into Section 3. If your sample size after purging missing data and outliers is less than 75, replace one or more variables, then start over. If your sample size is still greater than 75, and you have no extremely high correlation pairs in Week 6, then this will be the descriptive statistics table that you use for the final term project. The final step for this week is to create a correlation table for the variables in your project and insert it into the end of the descriptive statistics section. You will not be graded on whether or not your correlations are in range. Your only objective in this step is to provide a neat and easy to read table to prove that your data is sufficient to use in a multiple regression equation, which you will learn to perform in Week 6. At this point, you only have to show that all possible correlations are between -.7 and .7. You will learn why next week. To do this, use the Data Analysis Toolpak to create a complete correlation table, format the data and numbers to look neat and easy to read (I recommend using short variable names as mentioned above and rounding all correlations to 3 decimal places), and identify any correlations with an absolute value greater than .7. For any variables too highly correlated, you will have to remove and replace at least one of them and start over at the beginning. This is why we are doing the correlation table early, then completing the written variable and sample descriptions a week later. I do not want you to have to complete the written requirements for these sections’ multiple times.
(24) BUS 5253 Week 5 Intro and Discussion Fa2024W2 – YouTube
This week’s discussion topic allows you to continue with the second step in completing your Term Research Project for the course. In Week 6, you will complete the Quantitative Results, which should include your descriptive statistics as well as your correlation matrix from last week. Truly, if you did a really good job selecting your sample last week, this should not take that much time or effort. Once you have a suitable sample free of outliers and correlation issues, then you can test your hypothesized relationships using the same sample data to perform a multiple regression analysis.
The Week 6 video, along with your textbook assignments for this week should enable you to perform this calculation with ease. In the video, I go through all steps from downloading the data, through the descriptives, creating the correlation matrix, and testing the hypotheses. As such, the video is a bit repetitive. If you are confident about your work in weeks 4 and 5, then you should be able to skip ahead to the final 15 or 20 minutes to complete the regression.
For this assignment, complete the Week 6 Word Document. Complete all calculations using the Excel Data Analysis Toolpak and save all your work on separate tabs as shown in the Week 6 Discussion Video. You may continue with the Week 5 documents and simply rename them so long as your last name is in the file name and all sections for this week are complete. You should also copy your Word document and paste it into the Canvas text box to make it easier for other students to review and comment. Be sure to upload both attached documents. If you have trouble uploading or inserting both documents, simply respond to your own post and attach them there. You should complete and post these sections, and comment on at least 2 other posts by November 24.
The original discussion board grading rubric does not match this assignment well, so some of you may have noticed that I really have not been using it. Generally, you get full credit so long as you are neat, make an honest effort, deliver the correct files as requested in the correct format, and interact with other students. The full instructions from the previous weeks are listed below:
First, if I notified you that you would need to make changes to your data selection, then select new variables accordingly. You should also clean up any mistakes you noticed in the written document if you have time. It will make the final week easier for you. Please keep all of your quantitative work on ONE excel book that you download from World Bank. I need to see the original data that you download, your cleaned data, your meta-data that was included in the download, and your calculations. This allows me to help you if you get off track. Next, document the total number of countries that were included for all series in your dataset. You do not have to list this number for each series since each may be different; just count the number of observations in your initial table. Then, filter out the missing data as I showed you in the videos. Once you have applied “listwise deletion” for all missing data, note how many were removed. Follow this up with your treatment for outliers. For simplicity, you might want to just address anything more than 3 standard deviations away from the mean. Note how many outliers here are noted and how the sample was affected. Make sure that you retain at least 75 observations for your final sample size. If not, find replacement variables with missing data with series that have higher response rates.
***Please note that you should not list every single country – I don’t want to read that any more than you want to write it. A simple statement like this one will be fine: “The measurement items were obtained from the Word Bank with 162 countries included in the dataset. Outliers and missing items were treated using listwise deletion. A total of 26 countries were purged due to missing data, reducing the sample size to 136. ***Make sure that you retain at least 75 observations for your final sample size. If not, find variables with a lot of missing data and replace them with series that have higher response rates.
Finally, do the calculations for your correlation matrix in the Results section. You should write that first paragraph up and include the correlation matrix as a table for this week’s discussion. From here, you can easily go ahead and run your regression if you want, but you can wait until the final paper to format and write up the results and provide a brief discussion.
***You will want to do your calculation for descriptives and your correlation matrix before you start writing all this up. Make sure that you do not have to change variables due to low responses or multicollinearity. You don’t want to have to write all of this multiple times.
For this project, you are required to download real data from the World Bank Database and use multiple regression analysis to determine the impact of the Millennium Development Goals on national wealth. This assignment will require you to choose five independent variables and one dependent variable as directed in the attached Word document. You will define your variables, state hypotheses, collect data, apply treatments for outliers and missing data, provide descriptive statistics including a correlation matrix, and regress the dependent variable onto the predictors using Microsoft Excel (all calculations must be done using Excel functions and the Data Analysis Toolpak). In addition to performing the quantitative analysis for each step, you will be required to write the introduction, define the variables, build the hypothetical arguments, explain your sample, To begin, go to the World Bank Data Database and access the Millennium Development Goals Database using this link:
https://databank.worldbank.org/source/millennium-development-goals
Choose at least 5 independent variables (predictors) and 1 dependent variable (outcome). For your dependent variable, you should choose Gross National Income per capita, Atlas method in $US. You may choose any other variables as predictors (except for other measures of wealth, such as GDP) but note that not all countries are surveyed for all variables. Your final sample must include at least 75 countries (after cleaning the data). Data must be properly treated for missing items and outliers, and all correlations must be weaker than .7 (must lie between -.7 and .7) to receive any credit for the quantitative or results sections. Your final submission must include the Excel document with your original data and all analyses in addition to the Word document which serves as your written report. ***Please note that since most of your data and explanations will come from the World Bank downloads and the accompanying metadata, you are not required to provide additional references. However, ALL written text should be in your own words.
***You must submit both the Word document and Excel file to receive any credit for this assignment. The Excel file should contain all original data, metadata, and all calculations. All work should be included and easy to find in the same Excel workbook downloaded from the Millennium Development Goals database. All calculations should be made using Excel and the Data Analysis Toolpak. ***Finally, please do not try to fool me by altering your numbers in any way, whether in Excel or the Word document. Falsifying any numerical value in any portion of this assignment will result in an automatic zero for the project. Several students have already found this out the hard way. You are only graded on your correct journey through the process, not the success of your predicted hypotheses. There is absolutely nothing to gain by altering numbers. BUS 5253 Week 7 Intro and Term Project Walk-through Video