Tirsus Online Magazin

Chatbot-Antworten optimieren: Chunking, Fine-Tuning und hybrides Retrieval für RAG

Das Retrieval in RAG ist ein entscheidender Faktor für gute Chatbot-Antworten. Viele Faktoren beeinflussen die Qualität des Retrievals. Der Artikel beleuchtet die wichtigsten Aspekte.

Retrieval Augmented Generation – Proof of Concept

Retrieval Augmented Generation funktioniert! Der Artikel liefert einen Proof of Concept an einem Beispiel mit verschiedenen Open Source Modellen.

Texte automatisch zusammenfassen mit LangChain

Auch lange Texte fasst LangChain mühelos zusammen. Mit einem Trick auch in deutscher Sprache.

Distanzmetriken und Semantische Suche mit Vektor-Embeddings

Semantische Suche und Abstandsmetriken

Für die semantische Suche werden heutzutage Vektor-Embeddings herangezogen. Der Artikel zeigt inwiefern Metriken eine Herausforderung bei der Vektorsuche darstellen.

Chain of Thought Prompt Pattern

Mit dem Chain-of-Thought Prompt Pattern erhalten wir Einblick, wie ein Large Language Model (LLM) zu seiner Schlussfolgerung kommt.

Prompts und GPTs – Programmieren ohne Programmiersprache

Ein Blick in die Zukunft: Ohne Programmiersprache Apps erstellen im OpenAI GPT Store.

Sentence Embeddings für Vektordatenbanken

Semantisches Retrieval basiert auf Vektoren und Vektordatenbanken. Prominenteste Anwendung ist RAG – Retrieval Augmented Generation.
Wie verwandelt man einen Text in einen Vektor? In diesem Artikel beleuchten wir verschiedene Möglichkeiten.

Prompt ProEngineering: Personas und Wiederholungen

Prompt Engineering: Persona und Wiederholungen

Mit geschickt geschriebenen Prompts können wir einem ChatBot wiederholt dieselbe Aufgabe geben. Personas sorgen für den gewünschten Schreibstil.

How LLM transforms traditional enterprise Search

How LLMs are Transforming Enterprise Search

Exploring the shift from traditional enterprise search to AI-driven conversational systems with ChatGPT and Retrieval Augmented Generation (RAG).

LSM-Trees

Log-Structured Merge (LSM)-Trees sind eine innovative Art der Datenorganisation und -speicherung, die besonders für schreib-intensive Szenarien, wie verteilten OLTP-Datenbanken, geeignet sind.

B-Bäume als Datenbank-Index

B-Bäume sind eine zentrale Komponente zur Optimierung in vielen Datenbanksystemen.

How Culture Influences Your Agile Effort

This article identifies difference types of team culture and explains the impact on agile efforts.

Virtuelle Maschinen vernetzen – Tutorial

Baue deine eigene Laptop Cloud und lerne tauche ein in die Welt des verteilten Rechnens.

Daten in die Cloud – der Trend wird zum Sog

Die Cloud ist verlockend für große Datenmengen. Der Trend macht vor Data Engineering nicht Halt. Dabei gibt es Vieles zu bedenken und langfristig zu planen.

Aufbau einer Enterprise Search

Der Aufbau einer Enterprise Search Plattform ist mehr ein Daten-Integrationsprojekt. Affinität zur natürlichen Sprache und Suchmaschinentechnologien sind gefragt.

We Agile Developers Lost our Customers

It’s convenient to think our own requirements. 4 Misconception about customers of agile developers.

Does Surveillance Increase Remote Employee Productivity?

Employee surveillance is on the rise, as the BBC reported. The article noted, that “More than half of companies with over $750m (£574m) in annual revenue used ‘non-traditional’ monitoring techniques on staff…”

ChatGPT for Agile Software Development

Tools like ChatGPT might be the next boost in agile software development – if certain rules are being considered.

Das Date Lakehouse – die nächste Evolutionsstufe

Data Lakehouses eröffnen ungeahnte Perspektiven für Datenhaltung und Datenanalyse in der Cloud, selbst für sehr große Mengen und in Echtzeit.

Variablen tauschen in Python und Java

Die Werte zweier Variablen sollen vertauscht werden. Ein Rezept funktioniert in jeder Programmiersprache. Python bietet eine besonders einfache Möglichkeit.

Are machine learning systems too biased for Agile teams?

Are machine learning systems too biased to be helpful in selecting good candidates for your Agile team? Reflections on real needs and bias.

Das CAP-Theorem

Wer sich mit der Verarbeitung und Analyse großer Datenmengen befasst (aka Big Data), ist täglich mit dem CAP Theorem konfrontiert.

Data Engieering Lifecycle

Daten sind das Gold des 21. Jahrhundert. Doch erst der Data Engineer ermöglicht die Datenanalyse und damit das Schürfen des Datengolds.

Programmcode effizient dokumentieren

Sorgfältige gewählte sprechende Bezeichner (identifier) sind ein wichtiges Dokumentationselement und liefern Kontext zum Programmcode.

Wahrheitstabellen und Bool’sche Logik

Die theoretischen Hintergründe der Wahrheitstabellen und Bool’schen Logik und praktische Tipps zur Anwendung in der Programmierung.

Ordering Guarantee in Apache Kafka

Dieser Blog-Post beleuchtet zeigt die Einschränkungen der garantierten Reihenfolge in Event Hubs im Real-Time Big Data.

Interessant für dich

Python Einzeiler mit for und if

One-Liner, also Einzeiler, mit for und if reduzieren die Anzahl Befehle. Der Blog-Post erläutert mit Beispiele.

Ist mein Code korrekt?

Zu jeder Programmieraufgabe gibt es mehrere richtige Lösungen.Wie können Anfänger sicher sein, dass ihre Lösung korrekt ist? Der Artikel zeigt Qualitätskriterien eines guten Online Trainings für Anfänger.

MapReduce – Funktionale Programmierung zur Big-Data-Analyse

Von der funktionale Programmierung mit Map und Reduce in Python, MapReduce bis hin zu MapReduce für Analyse von Big-Data mit SQL – der Artikel erläutert mit Hilfe von Beispielen.

Zeit im Big Data Stream Processing

Wo wird Data Stream Processing eingesetzt? Welche Infrastruktur ist dazu notwendig und welche Tools existieren? Dieser Artikel zeigt einige grundlegenden Herausforderungen und Konzepte.

Unit Tests im Programmierkurs

Sei sicher, dass dein Code nicht nur funktioniert, sondern auch korrekt funktioniert. Unit-Tests helfen, auch im Programmierkurs.

Der Zuweisungsoperator

Der Zuweisungsoperator – in vielen Programmiersprachen das Gleichheitszeichen -, verwirrt viele Programmieranfänger, hat er doch eine andere Bedeutung als das Gleichheitszeichen der Mathematik.

3 Things To Keep in Mind When Working Remotely as a Team

How many so-called online teams are just a bunch of individuals, each of them working individually on his own tasks? In your opinion, which are the three most important things, when successfully working together online?

Schon länger beliebt

Python Listen definieren und initialisieren

Big Data Training mit minimaler Infrastruktur

Welches ist die minimal benötigte Infrastruktur, um sich mit Big-Data-Technologien vertraut zu machen. Dieser Artikel gibt Antworten in Form eines FAQ und berücksichtigt insbesondere den Aspekt des verteilten Rechnens und der horizontalen Skalierbarkeit.

Streaming mit Window Operation in Apache Spark

Die APIs für Big Data Stream Analytics werden immer einfacher. Real-Time Analysen sind sogar mit SQL möglich. Dabei kommen Window Operationen zum Einsatz. Mit den DataFrames von Apache Spark Structured Streaming sind diese schnell geschrieben.

Evaluation: Ab wann lohnt sich Big Data?

Die Frage, ab wann sich Big Data lohnt, muss differenziert beantwortet werden. Der Blog-Post stellt eine Reihe grundsätzlicher Evaluationskriterien vor, um entscheiden zu können, ob es sich grundsätzlich lohnt, auf horizontal skalierbare Tools zu setzen, die nicht nur für sehr große Datenmengen funktionieren.

10 Tools zur Real-Time-Analyse von Apache Kafka Topics

Die Daten in den Apache Kafka Topics bergen einen Goldschatz an Informationen. Der Blog-Post stellt 10 Real-Time Analytics Tools vor.

Big Data – Definition für die 2020er

Der Artikel beleuchtet spezielle Herausforderungen der Echtzeitanalyse im Big Data Stream Processing besonders im Hinblick auf den Faktor Zeit.

Leader Election am Beispiel von Apache ZooKeeper

Apache ZooKeeper ist ein kampferprobter Koordinationsdienst für verteilte Computer-Systeme. ZooKeeper wird in unterschiedlichsten Systemen eingesetzt. Als Dienst für Dienste tritt er nicht offen in Erscheinung.

Big Data als Programmierparadigma

Der Lockruf der innovativen Big-Data-Technologien ist laut. Coole APIs sind scheinbar einfach zu bedienen. Der Blog Post zieht Vergleiche zwischen Big-Data-Technologien und herkömmlichen Programmierparadigmen.

Big Data Training mit Raspberry Pi

Schon erstaunlich, dass Big Data Technologien auch auf Winzlingen wie Raspberry Pi funktionieren. Nachdem ich immer mit gut ausgestatteten Rechnern gearbeitet habe, reizte mich das Experiment, die Big-Data Software mit unter Minimalbedingungen zum Laufen zu bringen.

Das Ergebnis ist verblüffend – die Latenz ist viel geringer, als ursprünglich vermutet. Und so funktioniert das erste Experiment.

Big Data Training in der Laptop-Cloud mit VirtualBox

Benötigt man mehr Infrastruktur als ein gutes Laptop um sich mit Big-Data-Technologien vertraut zu machen? Dieser Artikel gibt Antworten in Form eines FAQ und berücksichtigt insbesondere den Aspekt des verteilten Rechnens und der horizontalen Skalierbarkeit.

Working Successfully in Virtual Teams

Non-team-players are better at virtual work. Do you agree with this statement? Are virtual workers rather introverts, avoiding contact with other people? If this is true, can virtual teams ever be successful?

Trade-Offs in Virtual Teams

Raise your hand, if you never worked in a virtual team! Which one is your category? This blog post summarizes research on virtual teams in Pre-Corona-Times.

EBook Tutorial: Cluster aus virtuellen Maschinen

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	1 year	This cookies is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
ct_pointer_data	session	CleanTalk–Used to prevent spam on our comments and forms and acts as a complete anti-spam solution and firewall for this site.
ct_timezone	session	CleanTalk–Used to prevent spam on our comments and forms and acts as a complete anti-spam solution and firewall for this site.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	1 year	This cookies is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
ct_pointer_data	session	CleanTalk–Used to prevent spam on our comments and forms and acts as a complete anti-spam solution and firewall for this site.
ct_timezone	session	CleanTalk–Used to prevent spam on our comments and forms and acts as a complete anti-spam solution and firewall for this site.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_gat	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the colllection of data on high traffic sites.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
_gat	1 minute	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the colllection of data on high traffic sites.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
__gads	1 year 24 days	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.

Tirsus Online Magazin

Virtuelle Maschinen vernetzen – Tutorial

Weitere Artikel