IMPORTANT: Updated on 2025-11-07 21:37. Please check back often for changes.

CP2403 / CP3413 — Assignment Part 2 Portfolio Analytics Edition

Weighting: 25% • Trimester: TR3'25 • Due: End of Week 8 (Sunday, 16 Nov 2025)

Read everything on this page carefully before you start. The tickers shown are examples — you may use them or others, but state your rationale.

Quick Navigation

Scenario

Analyse ~36 months of daily market data for a small portfolio (1–3 assets) and one benchmark index. Clean data, build returns, run ANOVA and Chi‑Square tests, and fit linear/multiple regressions to produce concise, decision‑ready conclusions.

Data Sources & Preparation

Update (Nov 2025):

Deprecated caret-index symbols (e.g., ^GSPC, ^NDX, ^STI) and futures tickers have been removed to avoid data issues. Use ETF benchmarks instead — SPY (S&P 500), QQQ (NASDAQ‑100), and IWM (Russell 2000).SPY (S&P 500), QQQ (NASDAQ‑100), and IWM (Russell 2000). Google Sheets and yfinance fully support these.

Ticker selection rules (to encourage diversity)

Path A — Google Sheets (no code)

Path B — Python (yfinance) — recommended for more choices

import yfinance as yf
assets = ["XXX","YYY"]        # examples: ["NVDA","AMZN","META"]
bench  = ["SPY"]            # examples: ["SPY","QQQ","STI"] or futures ["ES=F","NQ=F"]
tickers = assets + bench
df = yf.download(tickers, period="3y", interval="1d", auto_adjust=True).Close
df.to_csv("prices_3y.csv")
Macro drivers (REQUIRED in Task 4 — choose 1–2):
  • Volatility proxy: VIXY (ETF) — use daily % change
  • Oil: (WTI futures) — use daily % return
  • Rates proxy: IEF (7–10Y US Treasuries ETF) — use daily % return
  • Dollar proxy: UUP (USD ETF) — use daily % return
Rule: State each proxy and why it’s relevant to your asset.
Pre‑processing note (state this in your report): how you aligned trading calendars, removed NAs, handled outliers, and ensured consistent date ranges for assets vs benchmark.

Finance Crash Course (fast, practical)

Returns (daily) & why we use them

Return = % change from one day to the next. Simple return: (P_t - P_{t-1})/P_{t-1}. Returns let us compare different assets fairly and run statistics. Log return: ln(P_t/P_{t-1}) — adds better time‑additivity.

Benchmark, α, β (sensitivity)

A benchmark is the market yardstick (e.g., SPY, QQQ; futures:). β is slope on the benchmark; α is intercept (average return unexplained by the benchmark).

Volatility & clustering

Rolling 20‑day standard deviation as a volatility proxy. Markets show volatility clustering: calm periods, then stormy periods.

ANOVA & Chi‑Square

ANOVA tests if mean returns differ across groups (assumes similar variances and approx. normal residuals). Chi‑Square tests independence; expected counts in each cell should be about ≥5.

Regression essentials

Report coefficients (α, β’s), standard errors, t‑stats, p‑values, confidence intervals, R² and Adjusted R². Compare models with AIC (or BIC). Beware overfitting; consider a simple train/test split or out‑of‑sample check.

Motivation — Why this matters for your life

Short version: learning to analyse markets is a practical life skill. You won’t just pass a subject—you’ll build a system for smarter saving, investing, and decision‑making.

3 stories to keep in mind

What you’ll take away

Treat this like a starter kit for your future self. The sooner you build these habits, the more time compounding has to work for you.

Quick starter snippets

Copy‑paste these into your notebooks as needed. They align with the Tasks and rubric.

Data pull (~36 months with yfinance)

import pandas as pd, yfinance as yf
TICKER = "QQQ"  # change me
end = pd.Timestamp.today().normalize()
start = end - pd.DateOffset(months=36)
df = yf.download(TICKER, start=start, end=end, interval="1d", auto_adjust=True)[["Close"]].reset_index()
df = df.sort_values("Date").rename(columns={"Close":"Close"})
df["Return"] = df["Close"].pct_change()
df.to_csv("prices_clean.csv", index=False)

ANOVA (Task 1)

import pandas as pd, statsmodels.api as sm
from statsmodels.formula.api import ols
df = pd.read_csv("prices_clean.csv", parse_dates=["Date"]).sort_values("Date").dropna()
df["Weekday"] = df["Date"].dt.day_name()
m = ols("Return ~ C(Weekday)", data=df).fit()
anova = sm.stats.anova_lm(m, typ=2)
# eta-squared (η²) = SS_between / SS_total
eta2 = anova["sum_sq"]["C(Weekday)"] / anova["sum_sq"].sum()
print(anova, "\n
      

Chi‑Square (Task 2)

import pandas as pd, numpy as np, scipy.stats as st
df = pd.read_csv("prices_clean.csv", parse_dates=["Date"]).sort_values("Date")
df["UpDown"] = np.where(df["Return"]>0,"UP","DOWN")
df["Vol20"] = df["Return"].rolling(20).std()
q = df["Vol20"].quantile([0.33,0.66])
def reg(v):
    if pd.isna(v): return None
    if v<=q.iloc[0]: return "Low"
    if v<=q.iloc[1]: return "Med"
    return "High"
df["Regime"] = df["Vol20"].apply(reg)
ct = pd.crosstab(df["UpDown"], df["Regime"]).fillna(0)
chi2, p, dof, exp = st.chi2_contingency(ct)
n = ct.values.sum()
phi2 = chi2 / n
r, k = ct.shape
print(ct, "\nchi2=",chi2," p=",p," CramerV=",cramers_v)

Linear regression (Task 3) + 95% CI

import pandas as pd, statsmodels.api as sm
df = pd.read_csv("prices_clean.csv", parse_dates=["Date"]).sort_values("Date")
df["Lag1"] = df["Return"].shift(1)
d = df.dropna()
X = sm.add_constant(d["Lag1"]); y = d["Return"]
m = sm.OLS(y, X).fit()
print(m.summary())
print("\n95% CI for beta:", m.conf_int().loc["Lag1"].tolist())

Multiple regression (Task 4) — compare specs

import pandas as pd, numpy as np, statsmodels.api as sm
df = pd.read_csv("prices_clean.csv", parse_dates=["Date"]).sort_values("Date")
df["Lag1"] = df["Return"].shift(1); df["Lag2"] = df["Return"].shift(2); df["Vol20"] = df["Return"].rolling(20).std()
d = df.dropna()
def fit(cols):
    X = sm.add_constant(d[cols]); y = d["Return"]
    return sm.OLS(y, X).fit()
m1 = fit(["Lag1","Lag2"]); m2 = fit(["Lag1","Lag2","Vol20"])
print("M1:", dict(adjR2=m1.rsquared_adj, AIC=m1.aic, BIC=m1.bic))
print("M2:", dict(adjR2=m2.rsquared_adj, AIC=m2.aic, BIC=m2.bic))

Polynomial regression (Task 5) — degree 2 (keep it simple)

import pandas as pd, statsmodels.api as sm
df = pd.read_csv("prices_clean.csv", parse_dates=["Date"]).dropna().sort_values("Date")
# Predict asset return using a quadratic of benchmark return
df["BenchRet"] = df["Return"].shift(0)  # or use benchmark return if you merged it
X = pd.DataFrame({"BenchRet": df["BenchRet"], "BenchRet2": df["BenchRet"]**2}).dropna()
y = df.loc[X.index, "Return"]
X = sm.add_constant(X)
m = sm.OLS(y, X).fit()
print(m.summary())

Logistic regression (Task 6) — classify Up vs Down

import pandas as pd, numpy as np, statsmodels.api as sm
df = pd.read_csv("prices_clean.csv", parse_dates=["Date"]).sort_values("Date")
df["Up"] = (df["Return"] > 0).astype(int)
df["Lag1"] = df["Return"].shift(1)
d = df.dropna(subset=["Up","Lag1"])
X = sm.add_constant(d[["Lag1"]])  # you may add one macro predictor if available
y = d["Up"]
logit = sm.Logit(y, X).fit(disp=False)
print(logit.summary())
pred = (logit.predict(X) >= 0.5).astype(int)
acc = (pred == y).mean()
print("Accuracy:", round(float(acc),3))

Hypothesis Playbook (use for Tasks 1–2 and wherever relevant)

Formulating H0 and H1 (templates)
  • Define population/time-frame: “daily returns over the last ~36 months”.
  • Task 1 (ANOVA) — H0: mean return is the same across all groups; H1: at least one group’s mean differs.
  • Task 2 (Chi‑Square) — H0: direction (Up/Down) is independent of the factor; H1: they are dependent.
  • Regression — H0: chosen coefficient = 0; H1: coefficient ≠ 0 (two‑tailed unless justified).
Significance level, decision, reporting
  • Set α = 0.05 (unless justified).
  • Decision rule: if p ≤ α → reject H0; else fail to reject H0.
  • Provide the relevant test statistic(s), p‑value(s), and a plain‑English sentence for a manager audience. (Effect sizes are not required for Part 2.)
Logistic regression (classification) notes
  • Outcome: Up (1) vs Down (0). Predictors: e.g., Lag1 return (and optionally one macro).
  • Report coefficient signs and a simple performance metric (e.g., accuracy). No ROC/AUC required.
  • Watch for class imbalance; if heavily imbalanced, note it briefly.
Assumptions checklist
  • ANOVA: approx. normal residuals; similar variances (Levene); independent observations.
  • Chi‑Square: expected counts ≥ 5 in most cells; mutually exclusive categories.
  • Regression: linearity, independent errors, homoskedasticity, approx. normal residuals.

Variables You Will Construct

  • Daily return: (P_t - P_{t-1}) / P_{t-1} or log‑return
  • Up/Down label: Up if return > 0 else Down
  • Day‑of‑Week (DoW): Monday … Friday
  • Regime: Bull vs Bear via 50‑SMA vs 200‑SMA
  • Volatility proxy: rolling 20‑day stdev of returns
  • Momentum: prior 20‑day return
  • Volume z‑score: (Volume − mean)/stdev across a rolling window

Tasks

Task 1 — ANOVA Module 5

  • Factor (choose one): Day‑of‑Week; or Regime (benchmark 50‑SMA above/below 200‑SMA); or Volatility terciles.
  • Response: Daily return of one chosen asset.
  • Deliverables: ANOVA table, diagnostics, 1‑paragraph investor takeaway.

Task 2 — Chi‑Square (Independence) Module 6

  • Is Up/Down independent of Day‑of‑Week (or Regime)? Provide contingency table, expected counts, χ² & p‑value; interpret.

Task 3 — Simple Linear Regression Module 7

  • Model: AssetReturn = α + β × BenchmarkReturn + ε. Report α, β, R²; check residuals; explain β (sensitivity).
  • Benchmark examples: SPY, QQQ.

Task 4 — Multiple Regression (REQUIRES 1–2 macro drivers) Modules 7–8

  • Response: Portfolio return (equal‑weight of your assets) or a single asset.
  • Predictors: Benchmark return plus 1–2 macro drivers chosen from the list in Data (ΔVIX, oil return, Δ10Y, USD return).
  • Fit ≥3 model variants; select “best” via R²/Adj‑R²/AIC; justify choice; write a short manager memo.

Task 5 — Polynomial Regression (degree 2) Module 9

  • Augment a simple regression with a quadratic term (degree 2) to capture mild curvature.
  • Model: Return = α + β₁·Benchmark + β₂·Benchmark² + ε (keep predictors minimal).
  • Deliverables: coefficients, p‑values, R²/Adj‑R², and a chart overlaying the fitted curve on the scatter.

Task 6 — Logistic Regression (Up vs Down) Module 10

  • Classify daily direction (Up=1, Down=0) using a simple logistic model.
  • Predictors: Lag1 return (and optionally one macro driver). Keep it simple.
  • Deliverables: coefficient signs, a confusion matrix (2×2), and accuracy. No ROC/AUC required.
Weighting & Marks: Internal 100-point rubric maps to 25% course weight.
T1–T4: 20 pts each (5% each). T5–T6: 10 pts each (2.5% each). Optional Bonus: up to +5 pts (does not exceed 100/25%).

Guiding Questions (submit answers in a Word file)

Answer these Q1–Q14 succinctly in a Word document; name it per the convention below.

  1. Choice & rationale
    Q1. Which assets (1–3) did you pick and why are they economically related?
    Q2. Which benchmark fits best and why not the alternatives listed?
  2. Data hygiene
    Q3. Show how you aligned dates and removed NAs when assets/benchmarks had different trading days.
    Q4. Did you use returns/changes (not levels) for regressions? Show one row example.
  3. Task 1 — ANOVA
    Q5. Which factor did you choose (DoW, Regime, or Vol‑terciles) and why is it sensible for your asset?
    Q6. What does the ANOVA say and what is the single‑sentence “so what?”
  4. Task 2 — Chi‑Square
    Q7. Build the Up/Down vs chosen factor table. Are counts adequate in each cell?
    Q8. What does χ² and the p‑value tell you? State your interpretation.
  5. Task 3 — Simple regression
    Q9. What are α, β, and R²? Provide a one‑sentence PM interpretation for β and R².
    Q10. Show one diagnostic (residual plot or normal QQ) and comment.
  6. Task 4 — Multiple regression (REQUIRES 1–2 macros)
    Q11. Which macro drivers did you choose (ΔVIX, oil return, Δ10Y, USD return) and why do they matter for your assets?
    Q12. Compare three model variants (benchmark‑only vs +macro1 vs +macro1+macro2). Which wins and why (R²/Adj‑R²/AIC)?
    Q13. What would change your conclusion (regime change, out‑of‑sample check)?
  7. Task 5 — Polynomial regression
    Q15. Why is a quadratic term plausible for your asset/benchmark relation? Show the fitted curve.
    Q16. Did the quadratic term materially improve Adj‑R² or AIC? Briefly justify.
  8. Task 6 — Logistic regression
    Q17. What predictors did you use and why? Report the confusion matrix and accuracy.
    Q18. How would misclassification risk affect a simple trading decision?
  9. Communication
    Q14. What’s your final manager memo (3–4 bullets) that a non‑technical audience can use?

Visualisation Requirements

  • Task 1: box/violin + mean markers per group
  • Task 2: stacked bar or mosaic plot
  • Task 3: scatter with fitted line + residual plot
  • Task 4: coefficient table/plot + residual diagnostics

Always label axes/legends with tickers, units, and date range.

Delivery Template (copy into a Word file)

Use the following outline verbatim in your Word document. Keep the headings and order. Replace ALL CAPS placeholders with your details. Submit the completed Word file to Blackboard.

Title: CP2403/CP3413 — Assignment Part 2 (Portfolio Analytics Edition)

STUDENT NAME: FIRSTNAME LASTNAME
STUDENT ID: XXXXXXXXX
PRACTICAL GROUP: A / B / C
DATE: YYYY-MM-DD

1. CHOICES & RATIONALE
1.1 Assets (1–3): TICKER1, TICKER2, ...
    Rationale (1–2 sentences): ...
1.2 Benchmark (ETF): SPY / QQQ / IWM
    Rationale (1 sentence): ...

2. DATA ACQUISITION & HYGIENE
2.1 Source & Window: Google Sheets / yfinance, ~36 months ending TODAY
2.2 Cleaning Steps: aligned trading days, removed NAs, outliers handling (if any)
2.3 Files Provided: prices CSV(s) attached (filenames)

3. TASK 1 — ANOVA (Day-of-Week / Regime / Vol-terciles)
3.1 Hypotheses (H0/H1): ...
3.2 Method & Diagnostics (1–2 lines): ...
3.3 Results (F, p-value): ...
3.4 Investor Takeaway (1 paragraph): ...

4. TASK 2 — Chi‑Square (Independence)
4.1 Table (Up/Down vs Factor): (paste table or screenshot)
4.2 Result (χ², p-value): ...
4.3 Interpretation (1–2 lines): ...

5. TASK 3 — Simple Linear Regression
5.1 Model: AssetReturn = α + β × BenchmarkReturn + ε
5.2 Results (α, β, R²): ...
5.3 Diagnostic (1 plot + 1 sentence): ...
5.4 Interpretation for PM (1–2 lines): ...

6. TASK 4 — Multiple Regression (with 1–2 macro drivers)
6.1 Predictors used: Benchmark + MACRO1 (+ MACRO2)
6.2 Model Comparison (≥3 specs): brief table with Adj-R² / AIC
6.3 Chosen Model & Why: ...
6.4 Manager Memo (3–4 bullets): ...

7.  TASK 5 — Polynomial Regression (degree 2)
7.1 Why quadratic is plausible (1–2 lines): ...
7.2 Results (βs, Adj‑R²/AIC) + fitted curve screenshot: ...
7.3 Takeaway (1–2 lines): ...

8.  TASK 6 — Logistic Regression (Up vs Down)
8.1 Predictors: Lag1 (+ optional macro)
8.2 Results: coefficients, confusion matrix (2×2), accuracy
8.3 Takeaway (1–2 lines): ...

9. VISUALS
9.1 T1 figure(s): ...
9.2 T2 figure(s): ...
9.3 T3 figure(s): ...
9.4 T4 figure(s): ...
9.5 T5/T6: ...

10. REPRODUCIBILITY CHECKLIST
- I included the CSV(s) used and named files per convention.
- All charts label axes/units/date range.
- I answered Q1–Q14 (and Q15–Q18 if applicable) clearly.
- I stated any assumptions or limitations briefly.

APPENDIX A — Key Tables/Printouts
APPENDIX B — Code snippets (optional)

File Naming (enforce exactly)

  • Main answers (this template): task-answers-FirstNameLastName-StudentID.docx
  • Task 1: task1-ANOVA-FirstNameLastName-StudentID.docx
  • Task 2: task2-ChiSquare-FirstNameLastName-StudentID.docx
  • Task 3: task3-LinearReg-FirstNameLastName-StudentID.docx
  • Task 4: task4-MultipleReg-FirstNameLastName-StudentID.docx
  • Task 5 : task5-PolynomialReg-FirstNameLastName-StudentID.docx
  • Task 6 : task6-LogisticReg-FirstNameLastName-StudentID.docx
Grading Alignment (what markers look for)
  • Correctness of method & results reported for each task.
  • Clarity: concise 1–2 line interpretations; manager memo quality.
  • Reproducibility: CSV(s) provided, naming followed, charts labelled.
  • Professionalism: formatting consistent with this template.

Submission & Naming

Submit to LearnJCU. Include your CSV(s) used. If you hosted data on GitHub (optional), paste the CSV link into your Word file for reproducibility. Deliverables:

  • Word (answers): responses to Q1–Q14 — name: task-answers-FirstNameLastName-StudentID.docx
  • Task 1: task1-ANOVA-FirstNameLastName-StudentID.docx (and ...ipynb if used)
  • Task 2: task2-ChiSquare-FirstNameLastName-StudentID.docx (and ...ipynb if used)
  • Task 3: task3-LinearReg-FirstNameLastName-StudentID.docx (and ...ipynb if used)
  • Task 4: task4-MultipleReg-FirstNameLastName-StudentID.docx (and ...ipynb if used)
  • Task 5: task5-PolynomialReg-FirstNameLastName-StudentID.docx (and ...ipynb if used)
  • Task 6: task6-LogisticReg-FirstNameLastName-StudentID.docx (and ...ipynb if used)

Glossary

Core finance & analysis terms (for this brief)

DoW (Day‑of‑Week)

Weekday (Mon–Fri) for each row; used to group/compare returns.

Regime

Market state, e.g., Bull vs Bear via 50‑SMA vs 200‑SMA on the benchmark.

Remove NAs

Drop/fill rows with missing values after aligning dates across tickers.

Return vs Log‑return

Simple: (Pₜ − Pₜ₋₁)/Pₜ₋₁; Log: ln(Pₜ/Pₜ₋₁) — additive over time.

α (Alpha)

Intercept: average return unexplained by the benchmark; daily units.

β (Beta)

Sensitivity to the benchmark; slope of asset on benchmark returns.

Residuals

Errors (actual − fitted). Random residuals suggest model adequacy.

Heteroskedasticity

Residual variance changes with level/fitted value (fan‑shape in plots).

Autocorrelation

Residuals correlated over time; check with Durbin‑Watson or ACF plot.

Multicollinearity

Predictors highly correlated; inflates SEs. Check VIFs; drop/transform.

R² / Adjusted R²

Explained variance; adjusted penalises extra predictors.

AIC / BIC

Model scores that penalise complexity; lower is better (same data).

t‑stat / p‑value

Signal‑to‑noise and its significance probability for a coefficient.

Confidence Interval

Range that plausibly contains the true coefficient value (e.g., 95%).

QQ Plot

Graph to assess if residuals are approximately normal.

Leverage & Influence

Points that can sway the fit; inspect Cook’s distance.

Standardisation

Scale variables (z‑score) for comparability of coefficients.

Winsorise / Outliers

Cap extreme values to reduce undue influence; report if used.

Stationarity

Statistical properties don’t change over time; returns are closer than prices.

Overfitting

Model fits noise; keep parsimonious, validate out‑of‑sample.

Statistics Glossary (what to report for each test)

Report the key test statistic(s), p‑value(s), fit metrics where relevant, and a short, manager‑friendly conclusion. Effect sizes are not required for Part 2.

ANOVA (one‑way)

  • Hypothesis test: F‑test of mean differences across groups.
  • Report: F‑statistic, p‑value, brief interpretation; include basic diagnostics (residuals, variance similarity).

Chi‑Square (independence)

  • Hypothesis test: Pearson’s χ² on a contingency table; H0: variables are independent.
  • Report: contingency table, χ², p‑value, and a 1–2 sentence interpretation; ensure expected counts are adequate.

Regression (OLS)

  • Hypothesis tests: coefficient t‑tests (H0: β=0), overall F‑test.
  • Report: coefficients, standard errors, t‑stats, p‑values, R²/Adj‑R², 95% CI for key β’s.
  • Diagnostics: linearity, homoskedasticity, residual normality (QQ‑plot).

Copy‑ready code snippets

Reminder: Examples on this page are only examples. You may use them or others — just justify your choices briefly.

© 2025 Professor Dr. Tan. All rights reserved.

Bonus Exploration (Optional, up to +5 pts)

Advanced measures not required in Weeks 5–10. Attempt only if you wish to go beyond core requirements.

  • For χ², compute and discuss Cramér’s V.
  • For ANOVA, compute and discuss η² or ω².
  • For logistic, explore a ROC/AUC sketch or threshold tuning. (Optional)

Bonus points cannot push total above the Part 2 maximum (25%).