Data Lineage Tracking for AI: Complete Guide | QuizBy Eyal Doron / December 6, 2025 / 1 minute of reading Data Lineage Tracking for AI: Complete Guide | Quiz 1 / 8 1. Why is the misconception that we can add lineage later dangerous? 1. Lineage can easily be added at any time 2. You cannot reconstruct transformation history from final outputs so lineage must be built from the start 3. Final outputs contain all transformation history 4. Retrofitting lineage takes only a few hours Correct! Why: Retrofitting lineage is extremely difficult because you cannot reconstruct transformation history from final outputs – the article advises building lineage tracking from the start. Context: This is one of four common misconceptions the article addresses. Remember: Cannot reconstruct history from outputs. 2 / 8 2. What does the EU AI Act require regarding training data according to the article? 1. Documentation is optional for all risk levels 2. No documentation is required for any AI systems 3. Training data documentation for high-risk systems and demonstrable traceability requirements 4. Only the model output needs to be documented Correct! Why: The EU AI Act requires training data documentation for high-risk AI systems demonstrating what data trained the model and its characteristics plus traceability requirements. Context: Lineage is the technical foundation for meeting these regulatory requirements. Remember: Document training data plus demonstrate traceability. 3 / 8 3. How does lineage support GDPR right to erasure according to the article? 1. Erasure only requires deleting the original source data 2. GDPR does not apply to AI training data 3. Lineage shows which models were trained on a person's data enabling accurate deletion compliance 4. Lineage automatically deletes data when requested Correct! Why: If someone requests deletion you need to know which models were trained on their data – lineage answers this question and without it you cannot comply accurately. Context: Right to erasure creates complex challenges for AI that only lineage can address. Remember: Deletion requests require knowing which models used the data. 4 / 8 4. Why is transformation code versioning essential according to the article? 1. Capturing Git hash lets you know exactly which code version processed the data 2. It is only needed for compliance audits 3. It makes the code run faster 4. It reduces storage costs Correct! Why: Capturing the Git hash of the cleaning script lets you know exactly which code version processed the data enabling reproducibility. Context: This is part of documenting every transformation applied to raw data during preparation. Remember: Git hash equals reproducible transformations. 5 / 8 5. What metadata should be captured during the data collection stage? 1. Only metadata required by the AI model 2. Just the database connection string 3. Source system identification – collection timestamps – consent and permission metadata 4. Only the file size and format Correct! Why: The article specifies capturing source system identification (which database or API) and collection timestamps (when data was extracted) and consent and permission metadata (legal basis for use). Context: This metadata becomes critical for GDPR compliance. Remember: Source – Timestamp – Consent. 6 / 8 6. Why does feature engineering obscure data origins according to the article? 1. Feature engineering deletes the original data 2. Engineering transforms data into unreadable formats 3. Derived features like ratios and aggregations create indirect connections to dozens of underlying data points 4. Features are stored in different databases than source data Correct! Why: When you derive new features like ratios and aggregations and embeddings the connection to original data becomes indirect – a customer_risk_score might derive from dozens of underlying data points. Context: This is one of several factors that make AI lineage harder than traditional data lineage. Remember: Derived features hide their sources. 7 / 8 7. What three critical questions does lineage answer according to the article? 1. What data – what transformations – what model version 2. Who accessed – when accessed – why accessed 3. Where stored – when backed up – who owns it 4. How much – how fast – how accurate Correct! Why: The article states lineage answers what data and what transformations and what model version – if you cannot answer all three you have a lineage gap. Context: These questions form the foundation of traceability from prediction back to source. Remember: What data – What transformations – What model version. 8 / 8 8. According to the article – what analogy best describes data lineage for AI? 1. A backup system that stores copies of all data 2. A family tree for your data showing origin and transformations and destination 3. A firewall that protects data from unauthorized access 4. An encryption system that secures data at rest Correct! Why: The article describes data lineage as a family tree for your data showing where data came from and what happened to it along the way and where it ended up. Context: This is also compared to chain-of-custody for your AI pipeline documenting every transformation. Remember: Family tree plus chain-of-custody for data. Your score isThe average score is 0% Restart quiz Download PDF Please leave this field empty🔐 The AI Security Manager's Newsletter Weekly insights on AI risk management, EU AI Act compliance, and practical security strategies. We don’t spam! Read our privacy policy for more info. Thank you! Please check your inbox to confirm your subscription.