-
Notifications
You must be signed in to change notification settings - Fork 1
feat: add GO enrichment analysis page for ProteomicsLFQ results #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis PR adds a new Proteomics LFQ results interface with GO enrichment analysis capabilities. Users can view protein abundance data and perform enrichment analysis on significant proteins using Fisher's exact test, with customizable p-value and log2FC thresholds and results visualization across biological process, cellular component, and molecular function categories. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Module as Proteomics LFQ Module
participant MyGene as MyGene.info API
participant Fisher as Fisher's Exact Test
participant Viz as Visualization Engine
User->>Module: Load page & set p-value/log2FC cutoffs
Module->>Module: Filter significant proteins
Module->>MyGene: Fetch UniProt terms for protein IDs
MyGene-->>Module: Return UniProt annotations
Module->>Module: Build background & foreground GO sets
Module->>Fisher: Compute enrichment per GO term
Fisher-->>Module: Return p-values & statistics
Module->>Module: Aggregate results by category (BP/CC/MF)
Module->>Viz: Render top 15 GO terms (bar plots)
Viz-->>User: Display enrichment results & tables
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@content/results_proteomicslfq.py`:
- Line 8: The import list includes an unused symbol ttest_ind; remove ttest_ind
from the import statement that currently reads "from scipy.stats import
ttest_ind, fisher_exact" and only import the used function (fisher_exact) so the
file no longer imports unused symbols.
- Around line 59-64: Replace the bare except in get_clean_uniprot with specific
exception types that reflect possible failures (e.g., catch IndexError and
TypeError) so you don't mask unrelated errors; update the handler to "except
(IndexError, TypeError) as e" (or similar specific tuple) and keep returning
None on those cases while letting other exceptions propagate.
🧹 Nitpick comments (6)
requirements.txt (1)
152-152: Consider pinning themygeneversion for reproducibility.The dependency is added without a version constraint, which could lead to unexpected behavior if the upstream API changes. This is consistent with other manually-added dependencies in this file, but pinning to a known working version (e.g.,
mygene>=3.2.2) would improve build reproducibility.content/results_proteomicslfq.py (5)
50-51: Consider using@st.fragmentfor the GO enrichment workflow.Per coding guidelines, interactive UI updates should use
@st.fragmentdecorator to avoid full page reloads. The enrichment analysis triggered by the button could benefit from being wrapped in a fragment function.Example refactor approach
`@st.fragment` def run_enrichment_analysis(analysis_df, p_cutoff, fc_cutoff): # Move the enrichment logic (lines 52-141) into this function ... # Then call it conditionally if st.button("Run GO Enrichment"): run_enrichment_analysis(pivot_df.dropna(subset=["p-value", "log2FC"]).copy(), p_cutoff, fc_cutoff)As per coding guidelines: "Use
@st.fragmentdecorator for interactive UI updates without full page reloads".
79-80: Use idiomatic pandas filtering for boolean column.The comparison
!= Trueworks but is non-idiomatic. For pandas boolean columns, prefer the bitwise negation operator.Proposed fix
if "notfound" in res.columns: - res = res[res["notfound"] != True] + res = res[~res["notfound"].fillna(False)]
90-91: Lambda captures loop variable - potential late binding issue.While safe here because
.apply()executes immediately, capturinggo_typein a lambda within a loop is a code smell that could cause bugs if refactored.Proposed fix using default argument binding
for go_type in ["BP", "CC", "MF"]: - res[f"{go_type}_terms"] = res["go"].apply(lambda x: extract_go_terms(x, go_type)) + res[f"{go_type}_terms"] = res["go"].apply(lambda x, gt=go_type: extract_go_terms(x, gt))
128-128: Addstrict=Truetozip()for safer iteration.Using
strict=Trueensures the iterables have the same length, catching potential bugs early.Proposed fix
-for tab, go_type in zip([bp_tab, cc_tab, mf_tab], ["BP", "CC", "MF"]): +for tab, go_type in zip([bp_tab, cc_tab, mf_tab], ["BP", "CC", "MF"], strict=True):
140-141: Consider logging the full exception for debugging.While catching exceptions broadly for UI robustness is acceptable, logging the traceback would aid debugging production issues.
Proposed fix
+import logging + +logger = logging.getLogger(__name__) + # ... at the exception handler: except Exception as e: + logger.exception("GO enrichment failed") st.error(f"GO enrichment failed: {e}")
| import plotly.express as px | ||
| import mygene | ||
| from collections import defaultdict | ||
| from scipy.stats import ttest_ind, fisher_exact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import ttest_ind.
The ttest_ind function is imported but never used in this file.
Proposed fix
-from scipy.stats import ttest_ind, fisher_exact
+from scipy.stats import fisher_exact📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from scipy.stats import ttest_ind, fisher_exact | |
| from scipy.stats import fisher_exact |
🤖 Prompt for AI Agents
In `@content/results_proteomicslfq.py` at line 8, The import list includes an
unused symbol ttest_ind; remove ttest_ind from the import statement that
currently reads "from scipy.stats import ttest_ind, fisher_exact" and only
import the used function (fisher_exact) so the file no longer imports unused
symbols.
| def get_clean_uniprot(name): | ||
| try: | ||
| parts = str(name).split("|") | ||
| return parts[1] if len(parts) >= 2 else parts[0] | ||
| except Exception: | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use specific exception types instead of bare Exception.
The broad exception handler could mask unexpected errors. Since this parsing logic can only reasonably fail with IndexError or similar, catch specific exceptions.
Proposed fix
def get_clean_uniprot(name):
try:
parts = str(name).split("|")
return parts[1] if len(parts) >= 2 else parts[0]
- except Exception:
+ except (IndexError, TypeError, AttributeError):
return None🧰 Tools
🪛 Ruff (0.14.14)
[warning] 63-63: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In `@content/results_proteomicslfq.py` around lines 59 - 64, Replace the bare
except in get_clean_uniprot with specific exception types that reflect possible
failures (e.g., catch IndexError and TypeError) so you don't mask unrelated
errors; update the handler to "except (IndexError, TypeError) as e" (or similar
specific tuple) and keep returning None on those cases while letting other
exceptions propagate.
This PR adds a new GO Enrichment Analysis page for ProteomicsLFQ results.
The page allows users to perform GO term enrichment (BP, CC, MF) based on protein-level differential abundance results.
Summary by CodeRabbit
New Features
Chores