{"slug":"airscholar-e2e-data-engineering","name":"E2e Data Engineering","description":"An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.","tagline":"An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.","category":"data-analysis","tags":["data-pipeline","automation","apache-airflow","apache-kafka","apache-spark","apache-zookeeper","big-data","cassandra","containerization","data-engineering"],"author":{"name":"airscholar","verified":false,"url":"https://github.com/airscholar"},"attribution":{"status":"community_indexed","statusLabel":"Community indexed","shortLabel":"COMMUNITY INDEXED","sourceLabel":"GitHub star discovery","sourceDetail":"airscholar/e2e-data-engineering","creatorName":"airscholar","creatorUrl":"https://github.com/airscholar","sourceUrl":"https://github.com/airscholar/e2e-data-engineering","indexedBy":"OpenAgentSkill community index","claimUrl":"https://www.openagentskill.com/skills/airscholar-e2e-data-engineering#claim-this-skill","claimCta":"Claim this skill","trustNote":"This listing was indexed from public sources and is not marked official until a maintainer claim is approved.","publicNote":"Attribution links to the public repository or creator profile. Creators can claim the listing to update ownership signals."},"stats":{"stars":331,"forks":150,"downloads":0,"rating":0,"review_count":0,"quality_score":36.35},"quality":{"score":46,"tier":"review","label":"Needs review","summary":"Inspect the repository carefully before adding it to an agent workflow.","signals":[{"label":"GitHub stars","value":"331","tone":"neutral"},{"label":"Freshness","value":"1y ago","tone":"warning"},{"label":"Install ready","value":"Yes","tone":"positive"},{"label":"License","value":"Unknown","tone":"neutral"}],"warnings":["Repository looks stale"]},"trust":{"version":"trust-score-v3","score":72,"tier":"strong","label":"Strong shortlist","summary":"Good trust signals with a few areas worth checking before rollout.","recommendedAction":"Test in a sandbox workflow and compare its install path with close alternatives.","dimensions":[{"id":"github_adoption","label":"GitHub adoption","score":62,"weight":0.13,"status":"info","detail":"331 GitHub stars"},{"id":"repo_activity","label":"Stars/forks activity","score":62,"weight":0.08,"status":"info","detail":"331 stars, 150 forks; issue activity unavailable in current metadata"},{"id":"maintenance","label":"Recent maintenance","score":38,"weight":0.14,"status":"fail","detail":"1y since push"},{"id":"license","label":"License clarity","score":42,"weight":0.09,"status":"warn","detail":"Unknown"},{"id":"documentation","label":"README/SKILL.md completeness","score":90,"weight":0.14,"status":"pass","detail":"Metadata includes enough usage and workflow context"},{"id":"dependency_risk","label":"Dependency/runtime risk","score":80,"weight":0.12,"status":"info","detail":"external package install surface"},{"id":"installability","label":"Install availability","score":92,"weight":0.1,"status":"pass","detail":"npx skills add airscholar/e2e-data-engineering"},{"id":"install_safety","label":"Install command safety","score":92,"weight":0.1,"status":"pass","detail":"standard package or runtime install path"},{"id":"permission_surface","label":"Permission surface","score":86,"weight":0.07,"status":"pass","detail":"filesystem or document access"},{"id":"repository","label":"Repository evidence","score":86,"weight":0.04,"status":"pass","detail":"https://github.com/airscholar/e2e-data-engineering"},{"id":"review_status","label":"Review status","score":88,"weight":0.05,"status":"pass","detail":"AI review data available"}],"checks":[{"status":"info","label":"GitHub adoption","detail":"331 GitHub stars"},{"status":"info","label":"Stars/forks activity","detail":"331 stars, 150 forks; issue activity unavailable in current metadata"},{"status":"fail","label":"Recent maintenance","detail":"1y since push"},{"status":"warn","label":"License clarity","detail":"Unknown"},{"status":"pass","label":"README/SKILL.md completeness","detail":"Metadata includes enough usage and workflow context"},{"status":"info","label":"Dependency/runtime risk","detail":"external package install surface"},{"status":"pass","label":"Install availability","detail":"npx skills add airscholar/e2e-data-engineering"},{"status":"pass","label":"Install command safety","detail":"standard package or runtime install path"},{"status":"pass","label":"Permission surface","detail":"filesystem or document access"},{"status":"pass","label":"Repository evidence","detail":"https://github.com/airscholar/e2e-data-engineering"},{"status":"pass","label":"Review status","detail":"AI review data available"},{"status":"warn","label":"Ownership","detail":"No approved owner claim yet"},{"status":"info","label":"OpenAgentSkill usage","detail":"No local usage activity yet"}],"strengths":["AI review approved","Install path is available","Repository evidence is available","Install command has no obvious high-risk pattern"],"warnings":["License is unclear","Repository looks stale","Quality score needs review","Recent maintenance: 1y since push","License clarity: Unknown"],"evidence":{"stars":"331 GitHub stars","repoActivity":"331 stars, 150 forks","lastPushed":"1y since push","license":"Unknown","repository":"https://github.com/airscholar/e2e-data-engineering","install":"npx skills add airscholar/e2e-data-engineering","installSafety":"standard package or runtime install path","permissionSurface":"filesystem or document access","documentation":"Strong README/SKILL.md context"},"installReadiness":{"ready":true,"command":"npx skills add airscholar/e2e-data-engineering","policy":"human_review_before_install","label":"Human review before install","notes":["Install path is available","Repository evidence is available","License is unclear","1y since push"]},"agentCompatibility":["Python","Data Pipeline","Codex","Claude Code","Cursor","OpenAgentSkill CLI"],"riskSummary":{"level":"medium","label":"Review before production","notes":["License is unclear","Repository looks stale","Quality score needs review","Recent maintenance: 1y since push","License clarity: Unknown"]}},"safety":{"score":48,"level":"avoid_auto_install","label":"Avoid automatic install","safety_tier":{"tier":"experimental","label":"Experimental","badge":"EXPERIMENTAL","summary":"Sparse or mixed signals. Useful for discovery, but not for autonomous installation.","recommended_action":"Test manually in an isolated workspace and compare against safer alternatives.","auto_install_policy":"review","reasons":["License is unclear","48/100 agent safety score"]},"auto_install_allowed":false,"human_review_required":true,"blocked":false,"audit_risk":"needs_review","permission_hints":[{"id":"network","label":"Network access","reason":"Skill likely fetches remote pages, APIs, repositories, or external services.","severity":"medium"},{"id":"filesystem","label":"Filesystem access","reason":"Skill may read or write project files, documents, generated artifacts, or local workspace state.","severity":"medium"}],"policy_warnings":["License is unclear"],"constraints_applied":{"max_risk":"medium","needs_install_command":true,"min_stars":0}},"safety_gate":{"tier":"experimental","label":"Experimental","badge":"EXPERIMENTAL","auto_install_policy":"review","auto_install_allowed":false,"human_review_required":true,"blocked":false,"recommended_action":"Test manually in an isolated workspace and compare against safer alternatives.","reasons":["License is unclear","48/100 agent safety score"]},"supply_profile":{"track":{"slug":"data","label":"Data, BI, and analytics","shortLabel":"Data","description":"CSV, SQL, notebooks, dashboards, data pipelines, BI, ETL, and spreadsheet analysis."},"scenario":{"label":"Coding agents","description":"I need a coding agent that can understand a repository, edit code, and review pull requests.","useCases":[{"slug":"coding-agents","title":"Coding agents"},{"slug":"rag-knowledge","title":"RAG and knowledge"},{"slug":"browser-automation","title":"Browser automation"}]},"applicableAgents":["Claude Code","CLI","Codex","Cursor","Python"],"install":{"ready":true,"command":"npx skills add airscholar/e2e-data-engineering","primaryTarget":"CLI","targetCount":4},"githubQuality":{"stars":331,"starsLabel":"331","forks":150,"license":"Unknown","qualityScore":46,"trustScore":72,"auditScore":64},"maintenance":{"status":"stale","label":"1y since push","daysSincePush":488,"lastPushedAt":"2025-02-14T13:17:06+00:00"},"risk":{"level":"needs_review","label":"Needs review","requiresReview":true,"notes":["License is unclear","Repository appears stale","Repository looks stale","Quality score needs review","Recent maintenance: 1y since push"]},"coverageTags":["Data","Coding agents","data-analysis","data-pipeline","automation","apache-airflow","apache-kafka","apache-spark"]},"audit":{"audit_score":64,"risk_level":"needs_review","risk_label":"Needs review","warnings":["License is unclear","Repository appears stale","Repository looks stale","Quality score needs review","Recent maintenance: 1y since push"]},"decision":{"readiness_score":36,"readiness_label":"Needs manual review","headline":"Needs validation for Coding agents","role":"Needs validation","primary_fit":"Coding agents","best_for":["Coding agents workflows","Claude Code teams","builders willing to evaluate younger projects"],"risks":["Repository looks stale","No OpenAgentSkill engagement data yet"],"next_steps":["Install it in a sandbox agent and run one Coding agents task end to end.","Compare output quality, latency, and failure behavior against at least one alternative.","Promote it into production only after reviewing repository permissions, license, and maintenance signals."]},"platforms":["Python","Data Pipeline","Claude Code"],"use_cases":[{"slug":"coding-agents","title":"Coding agents","url":"https://www.openagentskill.com/use-cases/coding-agents"},{"slug":"rag-knowledge","title":"RAG and knowledge","url":"https://www.openagentskill.com/use-cases/rag-knowledge"},{"slug":"browser-automation","title":"Browser automation","url":"https://www.openagentskill.com/use-cases/browser-automation"},{"slug":"workflow-automation","title":"Workflow automation","url":"https://www.openagentskill.com/use-cases/workflow-automation"}],"install":"npx skills add airscholar/e2e-data-engineering","install_targets":[{"id":"openagentskill-cli","label":"CLI","title":"OpenAgentSkill CLI","kind":"command","value":"npx skills add airscholar/e2e-data-engineering","description":"Use the registry command when your workflow supports the OpenAgentSkill installer.","copyLabel":"Copy command"},{"id":"codex","label":"Codex","title":"Codex install prompt","kind":"agent-prompt","value":"Install the \"E2e Data Engineering\" agent skill from https://github.com/airscholar/e2e-data-engineering. Read its SKILL.md or equivalent instructions first, install only the files needed for this workspace, and summarize any required setup before using it. Skill purpose: An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.","description":"Give Codex a repo-aware install prompt when the skill is not available through a local CLI.","copyLabel":"Copy prompt"},{"id":"claude-code","label":"Claude Code","title":"Claude Code skill prompt","kind":"agent-prompt","value":"Add \"E2e Data Engineering\" as a Claude Code skill from https://github.com/airscholar/e2e-data-engineering. Inspect the skill instructions, place the reusable skill files in the appropriate local skills location for this project, and report the activation steps. Skill purpose: An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.","description":"Use this prompt to ask Claude Code to add the skill and explain the local activation steps.","copyLabel":"Copy prompt"},{"id":"cursor","label":"Cursor","title":"Cursor rule prompt","kind":"agent-prompt","value":"Turn \"E2e Data Engineering\" from https://github.com/airscholar/e2e-data-engineering into a reusable Cursor project rule or agent instruction. Preserve the core workflow, adapt paths to this repo, and keep the rule scoped to tasks where it is relevant. Skill purpose: An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.","description":"Use this when installing as Cursor project rules or reusable agent instructions.","copyLabel":"Copy prompt"}],"repository":"https://github.com/airscholar/e2e-data-engineering","github_repo":"airscholar/e2e-data-engineering","version":"1.0.0","license":"Unknown","updated_at":"2026-06-16T09:53:13.996707+00:00","canonical_key":"airscholar/e2e-data-engineering","recommendation_reasons":["Install handoff is available","Repository freshness signal is available"],"urls":{"web":"https://www.openagentskill.com/skills/airscholar-e2e-data-engineering","api":"https://www.openagentskill.com/api/agent/skills/airscholar-e2e-data-engineering","install_api":"https://www.openagentskill.com/api/skills/airscholar-e2e-data-engineering/install","audit":"https://www.openagentskill.com/skills/airscholar-e2e-data-engineering/audit","repository":"https://github.com/airscholar/e2e-data-engineering"},"meta":{"endpoint":"/api/registry/manifest/{slug}","canonical_agent_endpoint":"/api/agent/skills/airscholar-e2e-data-engineering","agent_friendly":true,"api_version":"1.0","generated_at":"2026-06-17T18:48:45.364Z"}}