Data Lineage Tracking for AI: Complete Guide | QuizBy Eyal Doron / December 6, 2025 / 1 minute of reading Data Lineage Tracking for AI: Complete Guide | Quiz 1 / 8 1. Why does the article say lineage adds minimal overhead despite concerns? 1. Lineage requires no resources at all 2. Async metadata capture and proper tooling minimize impact while missing lineage costs far exceed implementation 3. Overhead concerns only apply to real-time systems 4. Only large enterprises need to worry about overhead Correct! Why: With async metadata capture and proper tooling lineage adds minimal performance impact – the cost of missing lineage during an incident or audit far exceeds implementation overhead. Context: This addresses the misconception that lineage adds too much overhead. Remember: Async capture plus proper tooling equals minimal impact. 2 / 8 2. Why is the misconception that we can add lineage later dangerous? 1. Lineage can easily be added at any time 2. You cannot reconstruct transformation history from final outputs so lineage must be built from the start 3. Final outputs contain all transformation history 4. Retrofitting lineage takes only a few hours Correct! Why: Retrofitting lineage is extremely difficult because you cannot reconstruct transformation history from final outputs – the article advises building lineage tracking from the start. Context: This is one of four common misconceptions the article addresses. Remember: Cannot reconstruct history from outputs. 3 / 8 3. What does the EU AI Act require regarding training data according to the article? 1. Training data documentation for high-risk systems and demonstrable traceability requirements 2. Documentation is optional for all risk levels 3. No documentation is required for any AI systems 4. Only the model output needs to be documented Correct! Why: The EU AI Act requires training data documentation for high-risk AI systems demonstrating what data trained the model and its characteristics plus traceability requirements. Context: Lineage is the technical foundation for meeting these regulatory requirements. Remember: Document training data plus demonstrate traceability. 4 / 8 4. How does lineage support GDPR right to erasure according to the article? 1. Erasure only requires deleting the original source data 2. Lineage shows which models were trained on a person's data enabling accurate deletion compliance 3. Lineage automatically deletes data when requested 4. GDPR does not apply to AI training data Correct! Why: If someone requests deletion you need to know which models were trained on their data – lineage answers this question and without it you cannot comply accurately. Context: Right to erasure creates complex challenges for AI that only lineage can address. Remember: Deletion requests require knowing which models used the data. 5 / 8 5. What is the critical link for backward lineage according to the article? 1. Database foreign keys 2. Network connection between servers 3. Model-to-data linkage connecting each trained model to its training dataset versions 4. API authentication tokens Correct! Why: Model-to-data linkage explicitly connects each trained model to its training dataset versions – without it you cannot trace a prediction back to its training data. Context: Dataset version identification assigns unique identifiers to training data snapshots. Remember: No model-to-data link equals no backward traceability. 6 / 8 6. What metadata should be captured during the data collection stage? 1. Only metadata required by the AI model 2. Source system identification – collection timestamps – consent and permission metadata 3. Just the database connection string 4. Only the file size and format Correct! Why: The article specifies capturing source system identification (which database or API) and collection timestamps (when data was extracted) and consent and permission metadata (legal basis for use). Context: This metadata becomes critical for GDPR compliance. Remember: Source – Timestamp – Consent. 7 / 8 7. Why does feature engineering obscure data origins according to the article? 1. Features are stored in different databases than source data 2. Feature engineering deletes the original data 3. Derived features like ratios and aggregations create indirect connections to dozens of underlying data points 4. Engineering transforms data into unreadable formats Correct! Why: When you derive new features like ratios and aggregations and embeddings the connection to original data becomes indirect – a customer_risk_score might derive from dozens of underlying data points. Context: This is one of several factors that make AI lineage harder than traditional data lineage. Remember: Derived features hide their sources. 8 / 8 8. What three critical questions does lineage answer according to the article? 1. How much – how fast – how accurate 2. Who accessed – when accessed – why accessed 3. What data – what transformations – what model version 4. Where stored – when backed up – who owns it Correct! Why: The article states lineage answers what data and what transformations and what model version – if you cannot answer all three you have a lineage gap. Context: These questions form the foundation of traceability from prediction back to source. Remember: What data – What transformations – What model version. Your score isThe average score is 0% Restart quiz Download PDF Please leave this field empty🔐 The AI Security Manager's Newsletter Weekly insights on AI risk management, EU AI Act compliance, and practical security strategies. We don’t spam! Read our privacy policy for more info. Thank you! Please check your inbox to confirm your subscription.