Descrição da Oferta
Dissertation Framework
This dissertation is situated within the urgent context of modernizing textile manufacturing processes, a sector critical to the global economy, yet traditionally slow to adopt data-driven methodologies. Textile defect detection plays a pivotal role in reducing waste, increasing production efficiency, and ensuring high product quality. Smartex, as a leader in deploying artificial intelligence and computer vision technologies to address these challenges, offers an ideal environment to explore advanced solutions in visual data management and annotation.
The relevance of this topic is multi-faceted. First, as computer vision models proliferate in industrial applications, the bottleneck of high-quality labeled data becomes increasingly pronounced; Improving labeling workflows can have huge impact on model performance and data quality. Second, scalable image similarity search tools remain underexplored in this specific industrial context, particularly systems tailored for large, ever-expanding repositories of manufacturing imagery. Addressing these gaps promises not only operational gains for Smartex but can drive methodological advancement across Industry 4.0 applications.
The main objective of this work is to design and prototype an efficient, scalable image similarity search tool that directly integrates with Smartex’s data labeling pipeline. This tool aims to accelerate the data annotation process, enhance the consistency and quality of labeled datasets, and facilitate advanced workflows such as active learning by allowing rapid retrieval of visually similar cases, including rare or edge-case defects.
Data availability is robust for this project: Smartex possesses large-scale, real-world image datasets sourced from continuous monitoring of textile production lines. This data, encompassing a wide array of defect types and production scenarios, enables thorough experimentation and evaluation of the proposed similarity search systems within realistic operational constraints.
Internship Plan
Month 1: Orientation & Research
– Onboarding: Understand Smartex’s vision, existing image handling infrastructure, and data labeling workflow.
– Technology Familiarization: Survey current internal tools, evaluate available image datasets.
– Background Reading: Study relevant literature and technology for image retrieval and embeddings.
Month 2: Design & Prototyping
– Requirement Gathering: Meet with Smartex’s data and labeling teams to clarify needs.
– System Architecture: Draft data flow and overall system design.
– Embedding Research: Evaluate multiple embedding models and vector DB options (e.g., FAISS, Pinecone, Milvus).
Month 3: Ingestion & Embedding Pipelines
– Image Ingestion Development: Build automated ingestion pipeline for image data.
– Embedding Extraction: Integrate, test, and compare multiple embedding strategies; set up automated extraction and storage.
Month 4: Vector DB Integration & Search Prototyping
– Vector DB Deployment: Choose and set up the vector database, and index the image embeddings.
– Search Implementation: Develop and optimize core similarity search functionality.
– UI/API Prototyping: Develop initial endpoints or user interfaces for internal testers.
Month 5: System Iteration & User Testing
– Scaling & Optimization: Hardening ingestion/search for production data scales.
– User Trials: Onboard data labeling team for user feedback, iterate as needed.
– Performance Benchmarks: Empirically evaluate system performance and labeler productivity.
Month 6: Documentation & Knowledge Transfer
– Documentation: System architecture, usage guides, and maintenance protocols.
– Dissertation Writing: Complete thesis chapters and assemble deliverables.
– Presentation & Handover: Present to Smartex stakeholders; assist in planning production roll-out if appropriate.
Main Goals Expected
– Build a powerful image similarity search tool
– Streamline and accelerate data labeling
– Robust vector DB integration
– Embedding flexibility
– Improve label quality and speed
– Seamless integration with Smartex workflow
– Advance industry knowledge
– Foster a data-centric culture