<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Data Analytics Archives - Shubham Sonar</title>
	<atom:link href="https://shubhamsonar.com/category/data-analytics/feed/" rel="self" type="application/rss+xml" />
	<link>https://shubhamsonar.com/category/data-analytics/</link>
	<description>11x certified Salesforce System Architect, Developer and an independent Appexchange ISV partner.</description>
	<lastBuildDate>Sun, 15 Mar 2026 21:07:30 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://shubhamsonar.com/wp-content/uploads/2025/12/cropped-siteIcon-32x32.png</url>
	<title>Data Analytics Archives - Shubham Sonar</title>
	<link>https://shubhamsonar.com/category/data-analytics/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Google Search &#8211; &#8220;NO&#8221; AI MODE</title>
		<link>https://shubhamsonar.com/google-search-no-ai-mode/</link>
					<comments>https://shubhamsonar.com/google-search-no-ai-mode/#respond</comments>
		
		<dc:creator><![CDATA[Shubham]]></dc:creator>
		<pubDate>Sun, 15 Mar 2026 20:54:14 +0000</pubDate>
				<category><![CDATA[AI]]></category>
		<category><![CDATA[Data Analytics]]></category>
		<category><![CDATA[#documentation]]></category>
		<category><![CDATA[#tutorial]]></category>
		<guid isPermaLink="false">https://shubhamsonar.com/?p=4468</guid>

					<description><![CDATA[<p>I was trying to learn more about TTS processes and available solution on the internet and as usual I did my regular Google search for the research. As always this was suffocating to see that most results are related to LLM space. I get it, this is because these things are trend &#8211; but as [&#8230;]</p>
<p>The post <a href="https://shubhamsonar.com/google-search-no-ai-mode/">Google Search &#8211; &#8220;NO&#8221; AI MODE</a> appeared first on <a href="https://shubhamsonar.com">Shubham Sonar</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-group alignwide is-layout-flow wp-block-group-is-layout-flow">
<p>I was trying to learn more about TTS processes and available solution on the internet and as usual I did my regular Google search for the research. As always this was suffocating to see that most results are related to LLM space.</p>



<p>I get it, this is because these things are trend &#8211; but as a user this is super frustrating to me, I did not wanted to search around AI models when I said TTS as my search query.</p>



<p>To validate this, I asked GOOGLE AI MODE, whether if &#8220;TTS&#8221; can only be implemented using AI models which for sure: <strong>NO</strong></p>



<p>I see this as a problem, not sure if you are someone like me who want to consume AI contents these days, only when I am specifically searching for it. SO &#8211; here is a GOOGLE NO AI MODE search query template, which I got it crafted with help of Gemini [Right use of AI]:</p>



<p>To use this, simply replace <strong>[YOUR SEARCH KEYWORDS HERE] </strong>with whatever you are actually looking for, and paste the entire block into Google Search:</p>



<p class="is-style-default has-text-color has-link-color has-small-font-size wp-elements-085c704360ed4efb00bdd23a5a28184b" style="color:#636363"><code><strong>[YOUR SEARCH KEYWORDS HERE]</strong> -ai -"artificial intelligence" -llm -"large language model" -chatgpt -"chat gpt" -openai -claude -anthropic -gemini -bard -copilot -llama -huggingface -midjourney -"stable diffusion" -dall-e -dalle -deepseek -grok -mistral -cursor -"vibe coding" -"prompt engineering" -genai -"generative ai" -"machine learning" -"neural network" -"ai agent" -agentic -langchain -llamaindex -"rag" -"retrieval augmented generation"</code></p>



<p> Further, I have broken down the ultimate exclusion list into <strong>modular templates</strong>. You should pick the template that best matches the <em>type</em> of search you are doing.</p>



<ol class="wp-block-list">
<li><strong>The &#8220;Dev &amp; Coding&#8221; Purge</strong>
<ul class="wp-block-list">
<li>[YOUR SEARCH] -ai -llm -chatgpt -cursor -windsurf -aider -copilot -devin -roocode -zed -&#8220;vibe coding&#8221; -&#8220;claude code&#8221; -agentic -&#8220;prompt engineering&#8221; -replit</li>
</ul>
</li>



<li><strong>The &#8220;Modern AI Models &amp; Companies&#8221; Purge</strong>
<ul class="wp-block-list">
<li>[YOUR SEARCH] -ai -&#8220;artificial intelligence&#8221; -llm -openai -chatgpt -gpt -anthropic -claude -google -gemini -meta -llama -xai -grok -deepseek -mistral -qwen -genai</li>
</ul>
</li>



<li><strong>The &#8220;2025/2026 Tech Jargon &amp; Hype&#8221; Purge</strong>
<ul class="wp-block-list">
<li>[YOUR SEARCH] -ai -llm -genai -&#8220;generative ai&#8221; -agentic -slop -workslop -promptslop -&#8220;reasoning model&#8221; -lrm -&#8220;world model&#8221; -&#8220;machine learning&#8221; -&#8220;neural network&#8221; -rag -geo</li>
</ul>
</li>



<li><strong>The &#8220;Creative &amp; Visual Art&#8221; Purge</strong>
<ul class="wp-block-list">
<li>[YOUR SEARCH] -ai -generated -midjourney -&#8220;stable diffusion&#8221; -dalle -dall-e -runway -sora -pika -prompts -prompting -synth -&#8220;generative fill&#8221;</li>
</ul>
</li>
</ol>



<p>Furthermore, you can also ask AI model or mode yourself for generating your own exclusion list as per your needs.</p>
</div>
<p>The post <a href="https://shubhamsonar.com/google-search-no-ai-mode/">Google Search &#8211; &#8220;NO&#8221; AI MODE</a> appeared first on <a href="https://shubhamsonar.com">Shubham Sonar</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://shubhamsonar.com/google-search-no-ai-mode/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>How to read Parquet file format?</title>
		<link>https://shubhamsonar.com/how-to-read-parquet-file-format/</link>
					<comments>https://shubhamsonar.com/how-to-read-parquet-file-format/#comments</comments>
		
		<dc:creator><![CDATA[Shubham]]></dc:creator>
		<pubDate>Sat, 11 May 2024 17:53:06 +0000</pubDate>
				<category><![CDATA[AWS]]></category>
		<category><![CDATA[Data Analytics]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[#documentation]]></category>
		<category><![CDATA[#tutorial]]></category>
		<guid isPermaLink="false">https://shubhamsonar.com/?p=1212</guid>

					<description><![CDATA[<p>Be it Data analysis, Research, Data export/import, Big Data or ML, this article could be a great starting point for you to know about Parquet file format. What is Parquet? Parquet file format is created by Apache Foundation to efficiently store and operate big data (in their initial case for Hadoop ecosystem). It uses concepts [&#8230;]</p>
<p>The post <a href="https://shubhamsonar.com/how-to-read-parquet-file-format/">How to read Parquet file format?</a> appeared first on <a href="https://shubhamsonar.com">Shubham Sonar</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-group alignwide has-global-padding is-layout-constrained wp-block-group-is-layout-constrained">
<div class="wp-block-columns alignwide is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<p>Be it Data analysis, Research, Data export/import, Big Data or ML, this article could be a great starting point for you to know about Parquet file format.</p>



<h3 class="wp-block-heading"><strong>What is </strong>Parquet<strong>?</strong></h3>



<p><a href="https://parquet.apache.org/" target="_blank" rel="noreferrer noopener">Parquet</a> file format is created by Apache Foundation to efficiently store and operate big data (in their initial case for Hadoop ecosystem). It uses concepts from <a href="https://research.google/pubs/dremel-interactive-analysis-of-web-scale-datasets-2/" target="_blank" rel="noreferrer noopener">Dremel</a> to create a column based file format. This format enables large volumes of datasets with efficient compression. CSV files use Comma Separated Vector rows to store individual records of information. Similarly, Parquet uses COLUMNS to store high amounts of data to achieve greater compression and efficiency for analysis.</p>



<p>Amazon AWS stores your RDS export backups to S3 in Parquet format. While the RDS (Create from S3) feature can read this backup &amp; restore the database. You might still need to read these Parquet files without spinning up an RDS DB instance in many cases (Not considering encryptions in this article).</p>



<p>Key things to know about Parquet format:</p>



<ol class="wp-block-list">
<li>It&#8217;s data model, data processing framework and language independent.</li>



<li>Parquet files are immutable in nature. You can&#8217;t update a parquet file.</li>



<li>It&#8217;s a column based file format</li>
</ol>



<p>Let&#8217;s see how we can interact with Parquet format.</p>



<h3 class="wp-block-heading"><strong>What is </strong>Apache Spark<strong>?</strong></h3>



<p><a href="https://spark.apache.org/docs/latest/index.html">Apache Spark</a> is a large data analytics engine which is available for multiple programming languages. It has multiple set of tools for various use cases ranging from Data Analysis to ML. Apache Spark makes efficient uses Parquet format for data analysis function.</p>



<h3 class="wp-block-heading"><strong>Introducing </strong>PySpark</h3>



<p>We will be using <a href="https://spark.apache.org/docs/latest/api/python/index.html">PySpark</a>, a Python API of Apache Spark that helps us interact with Parquet using <strong>pyspark.sql</strong> module. This is just one feature of it. There are many other aspects to same, but I prefer to <a href="https://en.wikipedia.org/wiki/KISS_principle" target="_blank" rel="noreferrer noopener">KISS</a> for now.</p>



<p>Below are some CLI commands for some quick hands on. (Side note: Make sure you are in a fresh Python Virtual Environment and have a Parquet file ready)</p>



<ol class="wp-block-list">
<li>Activate your <a href="https://docs.python.org/3/tutorial/venv.html" target="_blank" rel="noreferrer noopener">python virtual environment</a>.</li>



<li>Install Spark SQL for Python:<br><mark style="background-color:#111111;color:#20ff00" class="has-inline-color">pip install pyspark[sql]</mark></li>



<li>Enter Python CLI session:<br><mark style="background-color:#111111;color:#20ff00" class="has-inline-color">python</mark></li>



<li>Create a SparkSession (This opens a Spark session so that we can connect and interact with your Parquet file in Python):<br><mark style="background-color:#111111;color:#20ff00" class="has-inline-color">from pyspark.sql import SparkSession<br>spark = SparkSession.builder.getOrCreate()</mark></li>



<li>Now that you have spark session created, you can load Parquet file as a spark data frame to interact with:<br><mark style="background-color:#111111;color:#20ff00" class="has-inline-color">df = spark.read.load(&#8216;PATH/TO/PARQUET_FILE.parquet&#8217;)</mark></li>



<li>To show records in parquet file (Only top 20 records):<br><mark style="background-color:#111111;color:#20ff00" class="has-inline-color">df.show()</mark></li>



<li>To show only specific columns from the parquet file:<br><mark style="background-color:#111111;color:#20ff00" class="has-inline-color">df.select(&#8216;column1Header&#8217;, &#8216;column2Header&#8217;).show()</mark></li>



<li>To Print schema of your parquet file:<br><mark style="background-color:#111111;color:#20ff00" class="has-inline-color">df.printSchema()</mark></li>



<li>To print all column names:<br><mark style="background-color:#111111;color:#20ff00" class="has-inline-color">df.columns</mark></li>



<li>To write the DataFrame data into a CSV file (Note- It uses parquet filename for the new file inside the given folder):<br><mark style="background-color:#111111;color:#20ff00" class="has-inline-color">df.write.csv(&#8216;FolderName&#8217;)</mark></li>
</ol>



<p>Further you can also run SQL functions to analyse the data &amp; create more CSV/Parquet files using its output.</p>



<p><strong>Note</strong>: The DataFrames in spark can also come from other data source and not just from your parquet file. This gives us the ability to mix/mash, process, analyse and create our own parquet files for further consumption as its an analysis engine. With Spark you can also merge and process multiple DataFrames. Exporting to CSV/Parquet is just one part of Spark.</p>



<p>Have a nice day <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1fad0.png" alt="🫐" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>



<p></p>
</div>
</div>
</div>
<p>The post <a href="https://shubhamsonar.com/how-to-read-parquet-file-format/">How to read Parquet file format?</a> appeared first on <a href="https://shubhamsonar.com">Shubham Sonar</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://shubhamsonar.com/how-to-read-parquet-file-format/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
	</channel>
</rss>
